PolaFormer

PoLaFormer: Polarity-Aware Linear Attention for Vision Transformers[1]

作者是来自HITshenzhen、PCL和UQ的Weikang Meng, Yadan Luo等人。论文引用[1]:Meng, Weikang et al. “PolaFormer: Polarity-aware Linear Attention for Vision Transformers.” ArXiv abs/2501.15061 (2025): n. pag.

Time

  • 2025.Mar

### Key Words

总结

  1. 线性注意力是一个有前景的替代softmax-based 的attention 方法,利用kernelized feature maps将复杂度从quadratic降低到linear in sequence length。然后,non-negative constraint on feature maps和在approximation中用到的 relaxed exponential function 导致重要的信息丢失(相比于原始的query-key dot products),导致 less discriminative attention maps with higher entropy。为了解决negative values in query-key pairs的 丢失的interactions,作者提出了polar-aware linear attention mechnism,显式地建模 same-signed 和opposite-signed query-key interactions,确保relational information的全面的收敛。另外,为了恢复attention maps的spiky properties,作者提供了一个理论分析,证明了existence of a class of element-wise functions(with positive first and second derivatives) that can reduce entropy in the attention distribution。为了简单起见和识别每个维度的distinct contributions,作者采用了可学习的power function for rescaling,允许strong和weak attention signals 能够有效地分开。