Masked vision and language modeling for multi-modal representation learning
Published in arXiv preprint arXiv:2208.02131, 2022
Recommended citation: Gukyeong Kwon and Zhaowei Cai and Avinash Ravichandran and Erhan Bas and Rahul Bhotika and Stefano Soatto (2022). "Masked vision and language modeling for multi-modal representation learning" arXiv preprint arXiv:2208.02131. https://arxiv.org/abs/2208.02131
Citations: 82
Recommended citation: Gukyeong Kwon and Zhaowei Cai and Avinash Ravichandran and Erhan Bas and Rahul Bhotika and Stefano Soatto (2022). “Masked vision and language modeling for multi-modal representation learning” arXiv preprint arXiv:2208.02131.