Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign UpAll HF Hub posts
Post
2254
GraphRAG-Ollama-UI
I've been working on a local version of Microsoft's GraphRAG that uses Ollama for everything. It's got a new interactive UI built with Gradio that makes it easier to manage data, run queries, and visualize results. It's not fully featured or set up to harness the entire GraphRAG library yet but it allows you to run all the standard commands for Indexing/Processing and chatting with your graph. Some key features:
Uses local models via Ollama for LLM and embeddings
3D graph visualization of the knowledge graph using Plotly
File management through the UI (upload, view, edit, delete)
Settings management in the interface
Real-time logging for debugging
https://github.com/severian42/GraphRAG-Ollama-UI
I've been working on a local version of Microsoft's GraphRAG that uses Ollama for everything. It's got a new interactive UI built with Gradio that makes it easier to manage data, run queries, and visualize results. It's not fully featured or set up to harness the entire GraphRAG library yet but it allows you to run all the standard commands for Indexing/Processing and chatting with your graph. Some key features:
Uses local models via Ollama for LLM and embeddings
3D graph visualization of the knowledge graph using Plotly
File management through the UI (upload, view, edit, delete)
Settings management in the interface
Real-time logging for debugging
https://github.com/severian42/GraphRAG-Ollama-UI
Post
1287
π Just released version 0.24.0 of the πππππππππππ_πππ Python library!
Exciting updates include:
β‘ InferenceClient is now a drop-in replacement for OpenAI's chat completion!
β¨ Support for response_format, adapter_id , truncate, and more in InferenceClient
πΎ Serialization module with a save_torch_model helper that handles shared layers, sharding, naming convention, and safe serialization. Basically a condensed version of logic scattered across safetensors, transformers , accelerate
π Optimized HfFileSystem to avoid getting rate limited when browsing HuggingFaceFW/fineweb
π¨ HfApi & CLI improvements: prevent empty commits, create repo inside resource group, webhooks API, more options in the Search API, etc.
Check out the full release notes for more details:
Wauplin/huggingface_hub#7
π
Exciting updates include:
β‘ InferenceClient is now a drop-in replacement for OpenAI's chat completion!
β¨ Support for response_format, adapter_id , truncate, and more in InferenceClient
πΎ Serialization module with a save_torch_model helper that handles shared layers, sharding, naming convention, and safe serialization. Basically a condensed version of logic scattered across safetensors, transformers , accelerate
π Optimized HfFileSystem to avoid getting rate limited when browsing HuggingFaceFW/fineweb
π¨ HfApi & CLI improvements: prevent empty commits, create repo inside resource group, webhooks API, more options in the Search API, etc.
Check out the full release notes for more details:
Wauplin/huggingface_hub#7
π
Post
515
Chameleon π¦ by Meta is now available in Hugging Face transformers π
A vision language model that comes in 7B and 34B sizes π€©
But what makes this model so special?
Demo: merve/chameleon-7b
Models: facebook/chameleon-668da9663f80d483b4c61f58
keep reading β₯₯
Chameleon is a unique model: it attempts to scale early fusion π€¨
But what is early fusion?
Modern vision language models use a vision encoder with a projection layer to project image embeddings so it can be promptable to text decoder (LLM)
Early fusion on the other hand attempts to fuse all features together (image patches and text) by using an image tokenizer and all tokens are projected into a shared space, which enables seamless generation π
Authors have also introduced different architectural improvements (QK norm and revise placement of layer norms) for scalable and stable training and they were able to increase the token count (5x tokens compared to Llama 3 which is a must with early-fusion IMO)
This model is an any-to-any model thanks to early fusion: it can take image and text input and output image and text, but image generation are disabled to prevent malicious use.
One can also do text-only prompting, authors noted the model catches up with larger LLMs (like Mixtral 8x7B or larger Llama-2 70B) and also image-pair prompting with larger VLMs like IDEFICS2-80B (see paper for the benchmarks Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818))
Thanks for reading!
A vision language model that comes in 7B and 34B sizes π€©
But what makes this model so special?
Demo: merve/chameleon-7b
Models: facebook/chameleon-668da9663f80d483b4c61f58
keep reading β₯₯
Chameleon is a unique model: it attempts to scale early fusion π€¨
But what is early fusion?
Modern vision language models use a vision encoder with a projection layer to project image embeddings so it can be promptable to text decoder (LLM)
Early fusion on the other hand attempts to fuse all features together (image patches and text) by using an image tokenizer and all tokens are projected into a shared space, which enables seamless generation π
Authors have also introduced different architectural improvements (QK norm and revise placement of layer norms) for scalable and stable training and they were able to increase the token count (5x tokens compared to Llama 3 which is a must with early-fusion IMO)
This model is an any-to-any model thanks to early fusion: it can take image and text input and output image and text, but image generation are disabled to prevent malicious use.
One can also do text-only prompting, authors noted the model catches up with larger LLMs (like Mixtral 8x7B or larger Llama-2 70B) and also image-pair prompting with larger VLMs like IDEFICS2-80B (see paper for the benchmarks Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818))
Thanks for reading!
Post
853
Since new TTS (Text-to-Speech) systems are coming out what feels like every day, and it's currently hard to compare them, my latest project has focused on doing just that.
I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.
I wanted the see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.
The paper with all the details is available here: https://arxiv.org/abs/2407.12707
I was inspired by the TTS-AGI/TTS-Arena (definitely check it out if you haven't), which compares recent TTS system using crowdsourced A/B testing.
I wanted the see if we can also do a similar evaluation with objective metrics and it's now available here:
ttsds/benchmark
Anyone can submit a new TTS model, and I hope this can provide a way to get some information on which areas models perform well or poorly in.
The paper with all the details is available here: https://arxiv.org/abs/2407.12707
![](https://cdn.statically.io/img/cdn-avatars.huggingface.co/v1/production/uploads/noauth/cA64Ix1vh75C7HoClUBhx.png)
TuringsSolutions
posted an update
1 day ago
Post
1012
I have invented a method that is better than Diffusion. A company got a billion dollar valuation yesterday for less than what I am currently giving the world for free. I am starting to suspect that is the issue, I am giving it away for free. I meant it to be a gift to the world, but no one will even look at it. I am changing the licensing soon. It will no longer be free. At the moment, you can view exactly how Swarm Neural Networks can do everything Reverse Diffusion can do, for far less money. It can even make API calls.
SNN Image Generator: TuringsSolutions/SNN-Image-Generator
SNN Function Caller (Controlled By TinyLlama): TuringsSolutions/Qwen-2.0.5B-Swarm-Function-Caller
SNN Image Generator: TuringsSolutions/SNN-Image-Generator
SNN Function Caller (Controlled By TinyLlama): TuringsSolutions/Qwen-2.0.5B-Swarm-Function-Caller
Post
945
# Offensive Physical Security Reconnaissance Planning Automation with public facing RTSP streams and Moondream
After some late night casual hacking about on VLMs for criminal attack vector reconnaissance automaton experiments using Moondream (as usual) based image-text-text with pre defined text prompts that are tuned for extracting weakness or customer identity and monitory based theft physical red team engagement reconnaissance and vector of malicious or criminal activity Working on a space. Thanks again for such a wonderful blessing of super power image-text-to-text model with minimal computational power needed @vikhyatk
I have started actually implementing a custom little tool with both static html space sand python gradio spaces on the go which I shall share as hf spaces when done them.
---
vikhyatk/moondream2
vikhyatk/moondream2
After some late night casual hacking about on VLMs for criminal attack vector reconnaissance automaton experiments using Moondream (as usual) based image-text-text with pre defined text prompts that are tuned for extracting weakness or customer identity and monitory based theft physical red team engagement reconnaissance and vector of malicious or criminal activity Working on a space. Thanks again for such a wonderful blessing of super power image-text-to-text model with minimal computational power needed @vikhyatk
I have started actually implementing a custom little tool with both static html space sand python gradio spaces on the go which I shall share as hf spaces when done them.
---
vikhyatk/moondream2
vikhyatk/moondream2
![](https://cdn.statically.io/img/cdn-avatars.huggingface.co/v1/production/uploads/1671537650254-noauth.jpeg)
DavidVivancos
posted an update
2 days ago
Post
1268
#ICLM 2024 is almost there π₯π₯π₯ PM if you will be in Vienna next week, Glad to catchup with the Hugging Face community there!
I would like to contribute π by releasing the sixth Knowledge Vault, with 100 lectures visualized from the last 10 years of ICML from 2014 to 2024, (10 from 2024 will be included after the conference) including knowledge graphs for all the Invited Lectures and some extras, with almost 3000 topics represented using AI.
You can explore it here:
π https://theendofknowledge.com/Vaults/6/ICML-2015-2024.html
And you can learn more about the Vaults here:
πhttps://www.linkedin.com/pulse/knowledge-vaults-david-vivancos-lbjef/
And previous Vaults relevant to the #huggingface community are:
π [ @lexfridman 2018-2024 Interviews] https://theendofknowledge.com/Vaults/1/Lex100-2024.html
π [ICLR 2014-2023] https://theendofknowledge.com/Vaults/2/ICLR2014-2023.html
π [AIForGood 2017-2024] https://theendofknowledge.com/Vaults/4/AIForGood2017-2024.html
π [CVPR 2015-2024] https://theendofknowledge.com/Vaults/5/CVPR-2015-2024.html
Hope you like them!
And great to see you all at #icml2024 @clem @thomwolf @julien-c and team
I would like to contribute π by releasing the sixth Knowledge Vault, with 100 lectures visualized from the last 10 years of ICML from 2014 to 2024, (10 from 2024 will be included after the conference) including knowledge graphs for all the Invited Lectures and some extras, with almost 3000 topics represented using AI.
You can explore it here:
π https://theendofknowledge.com/Vaults/6/ICML-2015-2024.html
And you can learn more about the Vaults here:
πhttps://www.linkedin.com/pulse/knowledge-vaults-david-vivancos-lbjef/
And previous Vaults relevant to the #huggingface community are:
π [ @lexfridman 2018-2024 Interviews] https://theendofknowledge.com/Vaults/1/Lex100-2024.html
π [ICLR 2014-2023] https://theendofknowledge.com/Vaults/2/ICLR2014-2023.html
π [AIForGood 2017-2024] https://theendofknowledge.com/Vaults/4/AIForGood2017-2024.html
π [CVPR 2015-2024] https://theendofknowledge.com/Vaults/5/CVPR-2015-2024.html
Hope you like them!
And great to see you all at #icml2024 @clem @thomwolf @julien-c and team
![](https://cdn.statically.io/img/cdn-avatars.huggingface.co/v1/production/uploads/1649143001781-624bebf604abc7ebb01789af.jpeg)
multimodalart
posted an update
about 10 hours ago
Post
265
New feature π₯
Image models and LoRAs now have little previews π€
If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris
Image models and LoRAs now have little previews π€
If you don't know where to start to find them, I invite you to browse cool LoRAs in the profile of some amazing fine-tuners: @artificialguybr , @alvdansen , @DoctorDiffusion , @e-n-v-y , @KappaNeuro @ostris
Post
409
Sparse MoE (SMoE) has an unavoidable drawback: the performance of SMoE heavily relies on the choice of hyper-parameters, such as the number of activated experts per token (top-k) and the number of experts.
Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.
(READ MORE βββ) Now, our DynMoE addresses these challenges! π DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.
(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.
Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a
Also, identifying the optimal hyper-parameter without a sufficient number of ablation studies is challenging. As the size of the models continues to grow, this limitation could result in a significant waste of computational resources, and in turn, could hinder the efficiency of training MoE-based models in practice.
(READ MORE βββ) Now, our DynMoE addresses these challenges! π DynMoE incorporates:
(1) a novel gating method that enables each token to automatically determine the number of experts to activate.
(2) An adaptive process automatically adjusts the number of experts during training. Extensive numerical results across Vision, Language, and Vision-Language tasks demonstrate the effectiveness of our approach to achieve competitive performance compared to GMoE for vision and language tasks, and MoE-LLaVA for vision-language tasks, while maintaining efficiency by activating fewer parameters.
Our code is available at https://github.com/LINs-lab/DynMoE, also see the checkpoints at LINs-lab/dynmoe-family-665ed5a331a7e84463cab01a