I am a masters student at the University of California, Los Angeles, working with Prof. Aditya Grover on diffusion-based language models. Previously, I have worked as a Research Assistant at Nanyang Technological University, Singapore, with Prof. Albert Li, working on AI Planning with (Large) Language Models. I completed my bachelors in Electronics Engineering from BITS Pilani, India.
I did my undergraduate thesis with Prof. Donglai Wei at Boston College, in collaboration with Harvard VCG, where I was working on multimodal learning. I have also been advised by Prof. Rohit Babbar at Aalto University, working on Extreme Multilabel Classification for retrieval problems.
Email / CV / GitHub / Twitter / Google Scholar / LinkedIn
I am primarily interested in two streams of research, (i) AI planning and sequential decision making and (ii) retrieval methods for search and recommendation. I find many synergies between these fields, especially with emergent agentic behaviour of large language models. My eventual goal is to develop foundation models for planning, which is also the focus of my thesis.
We study an important aspect of LM-based tree-search algorithms, the heuristic, by disentangling the search process from heuristic learning. Subsequently, we develop a mathematical model to select training data in accordance with both algorithms, achieving significant speed-ups in finding solutions to classical planning problems.
We propose a “Pick-Some-Labels” reduction for multilabel classification - a relaxation of the conventional “Pick-All-Labels” reduction. This is coupled with Supervised Contrastive Learning to develop a framework - UniDEC - to concurrently train a dual encoder and classifier. UniDEC achieves state-of-the-art performance on a single GPU, rivalling baselines which require 8-16 GPUs.
We take a data-centric approach to short-text extreme classification and propose data augmentation methods, LabelMix and Gandalf, which are derived from label-to-label correlations in the training set. We demonstrate their effects on previous architectures and forward the SOTA by imbuing effective inductive biases that were missing in previous models.
We hypothesise that machine translation can be improved by introducing a visual component. For this, we design a new architecture, CLIPTrans, a combination of the multimodal CLIP and the multilingual mBART. We demonstrate significant improvements over the previous MMT SOTA, especially across low-resource languages.
We developed a lightweight convolutional encoder, InceptionXML, in a dynamic negative sampling framework, SyncXML, for short-text extreme classification. InceptionXML in SyncXML beat the previous SOTA on a multitude of performance and parametric metrics.