CLIP
Learning Transferable Visual Models From Natural Language Supervision[1]
作者是来自OpenAI的Alec Radford, Jong Wook Kim, Chris Halacy, Aditya Ramesh, Gabriel Goh, Sandhini, Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, IIya Sustkever.论文引用[1]:Radford, Alec et al. “Learning Transferable Visual Models From Natural Language Supervision.” International Conference on Machine Learning (2021).
Time
- Feb.2021
Key Words
- image-text pairs
- CLIP: Contrastive Language-Image Pre-training
- Learning from natural language supervision
- perform a wide set of tasks during pre-training including OCR,geo-localization, action recognition, and more