Masked vision and language modeling for multi-modal representation learning

Published in arXiv preprint arXiv:2208.02131, 2022

Recommended citation: Gukyeong Kwon and Zhaowei Cai and Avinash Ravichandran and Erhan Bas and Rahul Bhotika and Stefano Soatto (2022). "Masked vision and language modeling for multi-modal representation learning" arXiv preprint arXiv:2208.02131. https://arxiv.org/abs/2208.02131

Download paper here

Citations: 82

Recommended citation: Gukyeong Kwon and Zhaowei Cai and Avinash Ravichandran and Erhan Bas and Rahul Bhotika and Stefano Soatto (2022). “Masked vision and language modeling for multi-modal representation learning” arXiv preprint arXiv:2208.02131.