c-jepa
Connecting Joint-Embedding Predictve Architecture with Contrastive Self-supervised Learning[1]
作者是来自CMU和NYU的 Shentong Mo和Shengbang Tong,论文引用[1]:Mo, Shentong and Shengbang Tong. “Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning.” ArXiv abs/2410.19560 (2024): n. pag.
Time
- 2024.Oct
Key Words
- entire collapsing and mean of patch representation
总结
- 在最近的无监督视觉表征学习中,Joint-Embedding Predictive Architecture(JEPA) 通过创新的masking策略,用于从无标签的imagery重提取visual features。尽管它成功了,还有两个主要的限制:I-JEPA中使用的EMA无法有效阻止模型特征表征的完全崩溃,它的预测在准确学习patch representations的mean方面也存在不足。本文引入了一个新的框架,称之为C-JEPA(Contrastive-JEPA),将Image-based Joint-Embedding Predictive Architecture和Variance-Invariance-Covariance Regularization(VICReg)策略集成到一起,这个结合用于高效地学习variance/covariance,用于阻止整个的崩溃和确保augmented views的mean的invariance,克服了这些局限。