I am a research assistant at Nanyang Technological University, Singapore with Dr. Albert Li, working on Vision Language Models. I have a bachelors in Electronics Engineering from BITS Pilani, India.
Previously I was advised by Dr. Donglai Wei at Boston College, where I was working on multimodal AI, and by Dr. Rohit Babbar at Aalto University on extreme classification for search and recommendation.
Along with this, I have also worked at the Harvard Visual Computing Group on Neuron Instance Segmentation, advised by Dr. Hanspeter Pfister.
Email / CV / GitHub / LinkedIn
Presently, I am exploring large language models with an aim to gain an intuition of their behaviour. My long-term goal is to develop multimodal conversational search and recommendation solutions. My research focus is on improving quantitative performance while keeping the computational costs low to make models more accessible to the research community.
We hypothesise that machine translation can be improved by introducing a visual component. For this, we design a new architecture, CLIPTrans, a combination of the multimodal CLIP and the multilingual mBART. We demonstrate significant improvements over the previous MMT SOTA, especially across low-resource languages.
We developed a lightweight convolutional encoder, InceptionXML, in a dynamic negative sampling framework, SyncXML, for short-text extreme classification. InceptionXML in SyncXML beat the previous SOTA on a multitude of performance and parametric metrics.
We take a data-centric approach to short-text extreme classification and propose data augmentation methods, LabelMix and Gandalf, which are derived from label-to-label correlations in the training set. We demonstrate their effects on previous architectures and forward the SOTA by imbuing effective inductive biases that were missing in previous models.