Denoising Diffusion Autoencoders are Unified Self-supervised Learners[1]

作者是来自北航的Weilai Xiang, Hongyu Yang, Di Huang, Yunhong Wang. 论文引用[1]: Xiang, Weilai et al. “Denoising Diffusion Autoencoders are Unified Self-supervised Learners.” 2023 IEEE/CVF International Conference on Computer Vision (ICCV) (2023): 15756-15766.

Time

  • 2023.Mar

Key Words

  • generative (translation,...) and discriminative (classification, recognition) tasks
  • generative pre-training and denoising autoencoding
  • DDAE as generative models and competitive recognition models
  • exntend generative models for discriminative purposes
  • linear-separable features in unsupervised manner
  • latent space VS pixel-space
阅读全文 »

Auto-Encoding Variational Bayes[1]

两位作者是来自Universiteit van Amsterdam, Machine Learning Group, Diederik P. Kingma, Max Welling.论文引用:Kingma, Diederik P. and Max Welling. “Auto-Encoding Variational Bayes.” CoRR abs/1312.6114 (2013): n. pag.

Time

  • 2013.Dec

Key Words

  • reparameterization of variational lower bound
  • lower bound estimator
  • continuous latent variable with intractable posterior
  • i.i.d dataset with latent variables per datapoint

针对的问题

  1. how can we perform efficient approximate inference and learning with directed probabilistic models whose continuous latent variables or parameters have intractable posterior distributions?
阅读全文 »

Extracting and Composing Robust Features with Denoising Autoencoders[1]

这是来自蒙特利尔大学的团队于2008年发表的文章,作者是Pascal Vincent、Hugo Larochelle、Yoshua Bengio、Pierre-Antoine Manzagol。论文引用:Vincent, Pascal et al. “Extracting and composing robust features with denoising autoencoders.” International Conference on Machine Learning (2008).

Time

  • 2008.Feb

Key Words

总结

  1. 在学习深度生成或者判别模型的困难,可以被一个intial unsupervised learning step解决。这个step将输入maps to 有用的intermediate representations。作者提出了一个新的representation的无监督学习的方式,是基于making learned representations robust to partial corruption of the input pattern

  2. 每层产生输入模式(input pattern)的representation比之前的一层的representation更抽象,因为它是obtained by composing more operations。

    阅读全文 »

Emerging Properties in Self-Supervised Vision Transformers[1]

作者是来自FAIR、Inria和Sorbonne University的团队,论文引用[1]:Caron, Mathilde et al. “Emerging Properties in Self-Supervised Vision Transformers.” 2021 IEEE/CVF International Conference on Computer Vision (ICCV) (2021): 9630-9640.

Time

  • 2021.Apr

动机

  1. Transformer在视觉里的成功是否由于在pretraining里的supervision。Transformer在NLP里的成功的一个主要因素是自监督预训练。
  2. 作者研究了自监督预训练 on ViT features. ### Key Words
  • Self-supervised ViT features
  • self-distillation with no labels (DINO)

总结

  1. 在ViT上的自监督预训练的特点,没有出现在supervised ViTs上的:
    • explicitly包含了scene layout 和,object boundaries,这个信息主要是在最后一个block的自注意力模块。
    • self-supervised ViT 用一个基本的k-NN就能在ImageNet上实现78.3%的准确率,补血药任何fintuning、线性分类器或者数据增强。
  2. 用k-NN实现很好的性能是在和momentum encoder和multi-crop augmentation结合情况下实现的。用smaller patches with ViTs能够提高resulting features的质量。
阅读全文 »

Denoising Diffusion Probabilistic Models[1]

作者是来自Berkeley的Jonathan Ho, Ajay Jain和Pieter Abbeel。论文引用[1]:Ho, Jonathan et al. “Denoising Diffusion Probabilistic Models.” ArXiv abs/2006.11239 (2020): n. pag.

Time

  • 2020.Dec

Key Words

  • Diffusion Model

总结

  1. 作者展示了使用扩散概率模型(一种受非平衡热力学启发的潜在变量模型)生成高质量图像的结果。我们通过在加权变分界上进行训练获得了最佳结果,该界限是根据扩散概率模型与Langgevin动力学的去噪得分匹配之间的新联系设计的。此外,我们的模型自然地允许一种渐进的有损解压缩方案,该方案可以被解释为自回归解码的推广。
阅读全文 »

Return of Unconditional Generation: A Self-supervised Representation Generation Method[1]

作者是来自MIT, CSAIL的Tianhong Li, Dina Katabi和何恺明。论文引用:Li, Tianhong et al. “Return of Unconditional Generation: A Self-supervised Representation Generation Method.” (2023).

对物体图像语义信息本质的理解,而不是停留在图像的模式、特征上,增强泛化性。由小到大,由细微到广大、从局部到整体的去理解、去学习特征表示。

Key Words

  • unconditional generation with unlabeled data.
  • self-supervised encoder: MoCov3 ViT-B
  • Representation Generation: RDM 12-block, 1536-hid-dim for 100 epochs
  • Image generation: MAGE-B for 200 epochs
  • Representation-Conditioned Generation(RCG)
  • generate semantic representations in the representation space ### 总结
  1. 生成模型作为无监督方法发展了很长时间,重要的工作如GAN、VAE和Diffusion Model。这些基础方法聚焦于数据的概率分布,不依赖于人类标注的availability。这个问题经常归类为无条件的生成(unconditional generation),是追求利用大量的无标注数据来学习复杂的分布。缩小有条件和无条件的生成是一个有价值的问题。释放出大规模的无标注数据的能量是必须的一步。

    阅读全文 »

Deconstruting Denoising Diffusion Models for Self-supervised Learning[1]

作者是Xinlei Chen, Zhuang Liu, Saining Xie, Kaiming He,分别来自FAIR和NYU。论文引用:Chen, Xinlei et al. “Deconstructing Denoising Diffusion Models for Self-Supervised Learning.” ArXiv abs/2401.14404 (2024): n. pag.

Key Words

  • Denoising Diffusion Models
  • Denoising Autoencoder
  • low-dimensional latent space

总结

  1. Denoising 是目前生成模型的核心,例如DDM,这些生成模型效果很好,看起来对于视觉内容有学习表征的能力。两个问题:
    • 目前研究DDMs的表征能力是用off-the-shelf的预训练的DDMs,这个原本是用来生成的,现在用来评估识别的表征;
    • 不清楚表征能力是通过denoising-driven过程还是diffusion-driven过程得到的。
  2. 文章的思路是:deconstruct DDM,将它逐步改成经典的DAE,通过这个过程检验它的各个方面。发现主要的一个component是tokenizer:create a low-dimensional latent space。the role of using multiple levels of noise is analogous to a form of data augmentation
阅读全文 »

有没有能够可视化代码函数的执行和调用的工具,看一些大的Project,函数的调用看起来比较乱,记不住、理不清关联。

双视角, 三维轨迹

0%