Skip to content
View hanoonaR's full-sized avatar
๐ŸŽฏ
Focusing
๐ŸŽฏ
Focusing

Organizations

@mbzuai-oryx

Block or report hanoonaR

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
hanoonaR/README.md

Hi there ๐Ÿ‘‹

I am a Ph.D. student in Computer Vision at MBZUAI. My current area of research is focused on exploring the potential of multi-modal understanding from vision and language to build scalable general-purpose vision systems, that continually learn and can generalize to various domains and downstream tasks using an open-vocabulary.

Pinned Loading

  1. mbzuai-oryx/groundingLMM mbzuai-oryx/groundingLMM Public

    [CVPR 2024 ๐Ÿ”ฅ] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.

    Python 951 54

  2. mbzuai-oryx/LLaVA-pp mbzuai-oryx/LLaVA-pp Public

    ๐Ÿ”ฅ๐Ÿ”ฅ LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

    Python 845 61

  3. mbzuai-oryx/Video-ChatGPT mbzuai-oryx/Video-ChatGPT Public

    [ACL 2024 ๐Ÿ”ฅ] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted foโ€ฆ

    Python 1.5k 130

  4. object-centric-ovd object-centric-ovd Public

    [NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection".

    Jupyter Notebook 296 21

  5. muzairkhattak/multimodal-prompt-learning muzairkhattak/multimodal-prompt-learning Public

    [CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

    Python 812 66

  6. muzairkhattak/ViFi-CLIP muzairkhattak/ViFi-CLIP Public

    [CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".

    Python 306 24