Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

clem 
posted an update 1 day ago
view post
Post
1219
This is the week of small AI language models!
Β·
Severian 
posted an update 2 days ago
view post
Post
2254
GraphRAG-Ollama-UI

I've been working on a local version of Microsoft's GraphRAG that uses Ollama for everything. It's got a new interactive UI built with Gradio that makes it easier to manage data, run queries, and visualize results. It's not fully featured or set up to harness the entire GraphRAG library yet but it allows you to run all the standard commands for Indexing/Processing and chatting with your graph. Some key features:

Uses local models via Ollama for LLM and embeddings

3D graph visualization of the knowledge graph using Plotly

File management through the UI (upload, view, edit, delete)

Settings management in the interface

Real-time logging for debugging

https://github.com/severian42/GraphRAG-Ollama-UI
  • 1 reply
Β·
Wauplin 
posted an update 1 day ago
view post
Post
1287
πŸš€ Just released version 0.24.0 of the πš‘πšžπšπšπš’πš—πšπšπšŠπšŒπšŽ_πš‘πšžπš‹ Python library!

Exciting updates include:
⚑ InferenceClient is now a drop-in replacement for OpenAI's chat completion!

✨ Support for response_format, adapter_id , truncate, and more in InferenceClient

πŸ’Ύ Serialization module with a save_torch_model helper that handles shared layers, sharding, naming convention, and safe serialization. Basically a condensed version of logic scattered across safetensors, transformers , accelerate

πŸ“ Optimized HfFileSystem to avoid getting rate limited when browsing HuggingFaceFW/fineweb

πŸ”¨ HfApi & CLI improvements: prevent empty commits, create repo inside resource group, webhooks API, more options in the Search API, etc.

Check out the full release notes for more details:
Wauplin/huggingface_hub#7
πŸ‘€
Β·
merve 
posted an update about 13 hours ago
view post
Post
515
Chameleon 🦎 by Meta is now available in Hugging Face transformers 😍
A vision language model that comes in 7B and 34B sizes 🀩
But what makes this model so special?

Demo: merve/chameleon-7b
Models: facebook/chameleon-668da9663f80d483b4c61f58

keep reading β₯₯

Chameleon is a unique model: it attempts to scale early fusion 🀨
But what is early fusion?
Modern vision language models use a vision encoder with a projection layer to project image embeddings so it can be promptable to text decoder (LLM)

Early fusion on the other hand attempts to fuse all features together (image patches and text) by using an image tokenizer and all tokens are projected into a shared space, which enables seamless generation 😏

Authors have also introduced different architectural improvements (QK norm and revise placement of layer norms) for scalable and stable training and they were able to increase the token count (5x tokens compared to Llama 3 which is a must with early-fusion IMO)

This model is an any-to-any model thanks to early fusion: it can take image and text input and output image and text, but image generation are disabled to prevent malicious use.

One can also do text-only prompting, authors noted the model catches up with larger LLMs (like Mixtral 8x7B or larger Llama-2 70B) and also image-pair prompting with larger VLMs like IDEFICS2-80B (see paper for the benchmarks Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818))
Thanks for reading!
  • 1 reply
Β·
cdminix 
posted an update 1 day ago
view post
Post
853
Since new TTS (Text-to-Speech) systems are coming out what feels like every day, and it's currently hard to compare them, my latest project has focused on doing just that.

I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.

I wanted the see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.

The paper with all the details is available here: https://arxiv.org/abs/2407.12707
TuringsSolutions 
posted an update 1 day ago
view post
Post
1012
I have invented a method that is better than Diffusion. A company got a billion dollar valuation yesterday for less than what I am currently giving the world for free. I am starting to suspect that is the issue, I am giving it away for free. I meant it to be a gift to the world, but no one will even look at it. I am changing the licensing soon. It will no longer be free. At the moment, you can view exactly how Swarm Neural Networks can do everything Reverse Diffusion can do, for far less money. It can even make API calls.


SNN Image Generator: TuringsSolutions/SNN-Image-Generator

SNN Function Caller (Controlled By TinyLlama): TuringsSolutions/Qwen-2.0.5B-Swarm-Function-Caller
Β·
Csplk 
posted an update 1 day ago
view post
Post
945
# Offensive Physical Security Reconnaissance Planning Automation with public facing RTSP streams and Moondream


After some late night casual hacking about on VLMs for criminal attack vector reconnaissance automaton experiments using Moondream (as usual) based image-text-text with pre defined text prompts that are tuned for extracting weakness or customer identity and monitory based theft physical red team engagement reconnaissance and vector of malicious or criminal activity Working on a space. Thanks again for such a wonderful blessing of super power image-text-to-text model with minimal computational power needed @vikhyatk

I have started actually implementing a custom little tool with both static html space sand python gradio spaces on the go which I shall share as hf spaces when done them.

---

vikhyatk/moondream2

vikhyatk/moondream2
  • 1 reply
Β·
DavidVivancos 
posted an update 2 days ago
view post
Post
1268
#ICLM 2024 is almost there πŸ”₯πŸ”₯πŸ”₯ PM if you will be in Vienna next week, Glad to catchup with the Hugging Face community there!

I would like to contribute 🎁 by releasing the sixth Knowledge Vault, with 100 lectures visualized from the last 10 years of ICML from 2014 to 2024, (10 from 2024 will be included after the conference) including knowledge graphs for all the Invited Lectures and some extras, with almost 3000 topics represented using AI.

You can explore it here:
🌏 https://theendofknowledge.com/Vaults/6/ICML-2015-2024.html

And you can learn more about the Vaults here:
πŸ“https://www.linkedin.com/pulse/knowledge-vaults-david-vivancos-lbjef/

And previous Vaults relevant to the #huggingface community are:

🌏 [ @lexfridman 2018-2024 Interviews] https://theendofknowledge.com/Vaults/1/Lex100-2024.html

🌏 [ICLR 2014-2023] https://theendofknowledge.com/Vaults/2/ICLR2014-2023.html

🌏 [AIForGood 2017-2024] https://theendofknowledge.com/Vaults/4/AIForGood2017-2024.html

🌏 [CVPR 2015-2024] https://theendofknowledge.com/Vaults/5/CVPR-2015-2024.html

Hope you like them!

And great to see you all at #icml2024 @clem @thomwolf @julien-c and team
multimodalart 
posted an update about 10 hours ago
kenshinn 
posted an update about 14 hours ago
view post
Post
409
Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.

Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.

(READ MORE ↓↓↓) Now, our DynMoE addresses these challenges! πŸ™Œ DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.

(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.

Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a