Hugging Face – Posts

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

All HF Hub posts

clem

posted an update 1 day ago

Post

1219

This is the week of small AI language models!

3 replies

Severian

posted an update 2 days ago

Post

2254

GraphRAG-Ollama-UI

I've been working on a local version of Microsoft's GraphRAG that uses Ollama for everything. It's got a new interactive UI built with Gradio that makes it easier to manage data, run queries, and visualize results. It's not fully featured or set up to harness the entire GraphRAG library yet but it allows you to run all the standard commands for Indexing/Processing and chatting with your graph. Some key features:

Uses local models via Ollama for LLM and embeddings

3D graph visualization of the knowledge graph using Plotly

File management through the UI (upload, view, edit, delete)

Settings management in the interface

Real-time logging for debugging

https://github.com/severian42/GraphRAG-Ollama-UI

1 reply

Wauplin

posted an update 1 day ago

Post

1287

🚀 Just released version 0.24.0 of the 𝚑𝚞𝚐𝚐𝚒𝚗𝚐𝚏𝚊𝚌𝚎_𝚑𝚞𝚋 Python library!

Exciting updates include:
⚡ InferenceClient is now a drop-in replacement for OpenAI's chat completion!

✨ Support for response_format, adapter_id , truncate, and more in InferenceClient

💾 Serialization module with a save_torch_model helper that handles shared layers, sharding, naming convention, and safe serialization. Basically a condensed version of logic scattered across safetensors, transformers , accelerate

📁 Optimized HfFileSystem to avoid getting rate limited when browsing HuggingFaceFW/fineweb

🔨 HfApi & CLI improvements: prevent empty commits, create repo inside resource group, webhooks API, more options in the Search API, etc.

Check out the full release notes for more details:
Wauplin/huggingface_hub#7
👀

4 replies

merve

posted an update about 13 hours ago

Post

515

Chameleon 🦎 by Meta is now available in Hugging Face transformers 😍
A vision language model that comes in 7B and 34B sizes 🤩
But what makes this model so special?

Demo: merve/chameleon-7b
Models: facebook/chameleon-668da9663f80d483b4c61f58

keep reading ⥥

Chameleon is a unique model: it attempts to scale early fusion 🤨
But what is early fusion?
Modern vision language models use a vision encoder with a projection layer to project image embeddings so it can be promptable to text decoder (LLM)

Early fusion on the other hand attempts to fuse all features together (image patches and text) by using an image tokenizer and all tokens are projected into a shared space, which enables seamless generation 😏

Authors have also introduced different architectural improvements (QK norm and revise placement of layer norms) for scalable and stable training and they were able to increase the token count (5x tokens compared to Llama 3 which is a must with early-fusion IMO)

This model is an any-to-any model thanks to early fusion: it can take image and text input and output image and text, but image generation are disabled to prevent malicious use.

One can also do text-only prompting, authors noted the model catches up with larger LLMs (like Mixtral 8x7B or larger Llama-2 70B) and also image-pair prompting with larger VLMs like IDEFICS2-80B (see paper for the benchmarks Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818))
Thanks for reading!

1 reply

cdminix

posted an update 1 day ago

Post

853

Since new TTS (Text-to-Speech) systems are coming out what feels like every day, and it's currently hard to compare them, my latest project has focused on doing just that.

I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.

I wanted the see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.

The paper with all the details is available here: https://arxiv.org/abs/2407.12707

TuringsSolutions

posted an update 1 day ago

Post

1012

I have invented a method that is better than Diffusion. A company got a billion dollar valuation yesterday for less than what I am currently giving the world for free. I am starting to suspect that is the issue, I am giving it away for free. I meant it to be a gift to the world, but no one will even look at it. I am changing the licensing soon. It will no longer be free. At the moment, you can view exactly how Swarm Neural Networks can do everything Reverse Diffusion can do, for far less money. It can even make API calls.

SNN Image Generator: TuringsSolutions/SNN-Image-Generator

SNN Function Caller (Controlled By TinyLlama): TuringsSolutions/Qwen-2.0.5B-Swarm-Function-Caller

21 replies

Csplk

posted an update 1 day ago

Post

945

# Offensive Physical Security Reconnaissance Planning Automation with public facing RTSP streams and Moondream

After some late night casual hacking about on VLMs for criminal attack vector reconnaissance automaton experiments using Moondream (as usual) based image-text-text with pre defined text prompts that are tuned for extracting weakness or customer identity and monitory based theft physical red team engagement reconnaissance and vector of malicious or criminal activity Working on a space. Thanks again for such a wonderful blessing of super power image-text-to-text model with minimal computational power needed @vikhyatk

I have started actually implementing a custom little tool with both static html space sand python gradio spaces on the go which I shall share as hf spaces when done them.

---

vikhyatk/moondream2

vikhyatk/moondream2

1 reply

DavidVivancos

posted an update 2 days ago

Post

1268

#ICLM 2024 is almost there 🔥🔥🔥 PM if you will be in Vienna next week, Glad to catchup with the Hugging Face community there!

I would like to contribute 🎁 by releasing the sixth Knowledge Vault, with 100 lectures visualized from the last 10 years of ICML from 2014 to 2024, (10 from 2024 will be included after the conference) including knowledge graphs for all the Invited Lectures and some extras, with almost 3000 topics represented using AI.

You can explore it here:
🌏 https://theendofknowledge.com/Vaults/6/ICML-2015-2024.html

And you can learn more about the Vaults here:
📝https://www.linkedin.com/pulse/knowledge-vaults-david-vivancos-lbjef/

And previous Vaults relevant to the #huggingface community are:

🌏 [ @lexfridman 2018-2024 Interviews] https://theendofknowledge.com/Vaults/1/Lex100-2024.html

🌏 [ICLR 2014-2023] https://theendofknowledge.com/Vaults/2/ICLR2014-2023.html

🌏 [AIForGood 2017-2024] https://theendofknowledge.com/Vaults/4/AIForGood2017-2024.html

🌏 [CVPR 2015-2024] https://theendofknowledge.com/Vaults/5/CVPR-2015-2024.html

Hope you like them!

And great to see you all at #icml2024 @clem @thomwolf @julien-c and team

multimodalart

posted an update about 10 hours ago

Post

265

New feature 🔥
Image models and LoRAs now have little previews 🤏

If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris

kenshinn

posted an update about 14 hours ago

Post

409

Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! 🙌 DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a

Recently active users