Skip to content
View LiWentomng's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report LiWentomng

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Official pytorch implementation of "XHand: Real-time Expressive Hand Avatar"

Python 19 Updated Jul 31, 2024

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 6,603 328 Updated Aug 1, 2024

The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".

Python 114 4 Updated Jul 26, 2024

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

Python 11,133 2,320 Updated Aug 1, 2024

When do we not need larger vision models?

Python 278 9 Updated Jul 12, 2024

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 859 49 Updated Jul 14, 2024

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Python 595 37 Updated Jul 29, 2024

Code release for PianoMotion10M

Python 44 2 Updated Jun 15, 2024

4M: Massively Multimodal Masked Modeling

Python 1,463 85 Updated Jul 17, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

205 4 Updated Jun 16, 2024

Oriented object detection on STAR dataset.

Python 35 1 Updated Jul 9, 2024

Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"

Python 731 39 Updated Jul 31, 2024
127 Updated Dec 22, 2023

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Python 2,672 238 Updated Aug 1, 2024

Efficient Multimodal Large Language Models: A Survey

198 7 Updated May 31, 2024

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Python 1,177 72 Updated Jul 16, 2024

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Python 8,147 573 Updated Aug 1, 2024

The official implementation of "Label-efficient Semantic Scene Completion with Scribble Annotations" (IJCAI 2024)

Python 11 1 Updated Jul 27, 2024

On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)

Python 380 27 Updated Jul 27, 2024

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,080 82 Updated Jul 26, 2024

Document Artifical Intelligence

102 2 Updated Aug 1, 2024

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Python 266 17 Updated Jul 19, 2024
Python 1,414 78 Updated Jul 29, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,179 120 Updated Jun 26, 2024

A native PyTorch Library for large model training

Python 1,387 125 Updated Aug 1, 2024

Open weights LLM from Google DeepMind.

Python 2,287 282 Updated Jul 30, 2024

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python 772 54 Updated Jul 10, 2024

The official Meta Llama 3 GitHub site

Python 25,074 2,753 Updated Jul 31, 2024
Next