-
Zhejiang University
- Hang Zhou
- https://cslwt.github.io
Block or Report
Block or report LiWentomng
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
Official pytorch implementation of "XHand: Real-time Expressive Hand Avatar"
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
When do we not need larger vision models?
Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Oriented object detection on STAR dataset.
Official implementation of "Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling"
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
Efficient Multimodal Large Language Models: A Survey
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
The official implementation of "Label-efficient Semantic Scene Completion with Scribble Annotations" (IJCAI 2024)
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
A native PyTorch Library for large model training
Open weights LLM from Google DeepMind.
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)