Lumos : Empowering Multimodal LLMs with Scene Text Recognition
Ashish Shenoy, Abhay Harpale, Ankit Ramchandani, Anuj Kumar, Debojeet Chatterjee, Di Xu, Luna Dong, Mohsen Moslehpour, Pierce Chuang, Shicong Zhao, Srihari Jayakumar, Vikas Bhardwaj, Yichao Lu