MAP

MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining[1]

作者是来自清华叉院和上海AI Lab、QiZhi 研究院的Yunze Liu和Li Yi,论文引用[1]:Liu, Yunze and Li Yi. “MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining.” ArXiv abs/2410.00871 (2024): n. pag.

Time

  • 2025.Mar

Key Words

  • masked Autoregressive Pretraining

总结

  1. 混合的Mamba-Transformer网络最近受到了很多的关注,这些网络利用Transformer的可扩展性和Mamba的long-context modeling和高效计算。然而,有效地预训练这样的混合网络仍然是一个open question,现有的方法,例如MAE 或者自回归 pretraining,主要聚焦于single-type network 架构,相比之下,对于Mamba和Transformer的混合结构,预训练策略必须有效,基于此,作者提出了Masked Autoregressive pretraining,以统一的范式,提高了Mamba和Transformer modules的性能。

Framework \(Fig.1^{[1]}\)

comparison \(Fig.2^{[1]}\)