CS PhD student at USC's Institute for Creative Technologies, working with Prof. Mohammad Soleymani at the Intelligent Human Perception Lab. Bronze Medallist from IIT Roorkee (2021).
I build and improve multimodal LLMs (audio/video/omni) β specifically using post-training techniques like preference optimization to give models better social and emotion understanding. I also work on video generation for social behaviors.
| Paper | Venue |
|---|---|
| MoD-DPO β Mitigating cross-modal hallucinations in Omni LLMs | CVPR 2026 ποΈ Denver |
| AVERE β Audiovisual emotion reasoning with preference optimization | ICLR 2026 π§π· Rio |
| Face-LLaVA β Facial expression understanding via instruction tuning | WACV 2026 π΅ Tucson |
| DiTaiListener β Controllable listener video generation | ICCV 2025 πΊ Hawai'i |
Multimodal LLMs Post-training & RLHF Emotion Understanding Social AI Video Generation Audio/Visual Reasoning
Currently looking for Research/Applied Scientist internships on multimodal LLMs and video generation β feel free to reach out!

