Neural Magic

Software Development

Somerville, Massachusetts 15,555 followers

High-performance inference serving solutions for you to deploy leading open-source LLMs. #SoftwareDeliveredAI

See jobs Follow

View all 52 employees

About us

Together with our community, we engineer sparse LLM, CV, and NLP models that are more efficient and performant in production. Why does this matter? Sparse models are more flexible and can achieve unrivaled latency and throughput performance on your private CPU and GPU infrastructure. Check us out on GitHub and join the Neural Magic Slack Community to get started with software-delivered AI.

Website: http://neuralmagic.com/
External link for Neural Magic
Industry: Software Development
Company size: 51-200 employees
Headquarters: Somerville, Massachusetts
Type: Privately Held
Founded: 2018
Specialties: machine learning, deep learning, and artificial intelligence

Locations

Primary

55 Davis Sq

Floor 3

Somerville, Massachusetts 02144, US

Get directions

Employees at Neural Magic

See all employees

Updates

Neural Magic

15,555 followers
9h
Report this post
🗓 Happening today at 2PM EST! Learn why vLLM is the leading open-source inference server and how Neural Magic works with enterprises to build and scale vLLM-based model services. https://hubs.li/Q02DVnBd0

Deploy Open-Source LLMs with vLLM and Neural Magic

https://neuralmagic.com

Like Comment Share
Neural Magic

15,555 followers
11h
Report this post
FP8 quantization is now available in vLLM - check it out! Quantized inference is one of the best ways to reduce the costs of LLM deployments.
Anyscale

24,210 followers
1d

We’ve recently contributed FP8 support to vLLM in collaboration with Neural Magic -- with this feature, you can see up to a 1.8x reduction in inter-token latency, with >99% accuracy preservation! A common concern with FP8 is whether users will experience accuracy degradation. To address this, Neural Magic has produced many checkpoints for key models with >99% accuracy preservation across a wide range of benchmarks (https://lnkd.in/gTimN5dZ), including: - Llama3-70b - Mixtral 8x7b - Llama3-8b You can easily try this out on vLLM, and read more about the feature here -- https://lnkd.in/gzKJqerB
Like Comment Share
Neural Magic

15,555 followers
3d
Report this post
Our bi-weekly vLLM Office Hours continue tomorrow. We are excited to bring Philipp Moritz and Cody Yu from Anyscale for a deep dive into FP8 quantization in vLLM. This is an exciting opportunity to give feedback and get your questions answered. Join us: https://lnkd.in/euF8m73q
1 Comment

Like Comment Share
Neural Magic

15,555 followers
1w
Report this post
Are you looking to optimize your #LLM inference for more performance and lower costs? Tune in to hear Eldar Kurtić, our Sr. ML Researcher, break down how quantization can optimize LLM inference and reduce memory footprint without compromising model accuracy.

Eldar Kurtić

Machine Learning
1w

The second episode of the "Efficient Inference through Sparsity and Quantization" podcast series is out now. In the first episode, I talked about how sparsity can enhance the performance and efficiency of machine learning models, leading to significant cost reductions on both CPUs and GPUs. In this newly released episode, we dive deep into quantization techniques. Discover how quantization can further optimize model inference and reduce memory footprint without compromising accuracy. Listen to the second episode here: https://lnkd.in/duK8ijTC

57. Eldar Kurtic - Efficient Inference through sparsity and quantization - Part 2/2

https://spotify.com

Like Comment Share
Neural Magic

15,555 followers
1w
Report this post
The ecosystem of open-source LLMs has exploded over the past year. A new model tops the leaderboard almost every week. Enterprises can now deploy state-of-the-art, open-source LLMs like Llama 3 securely on their infrastructure of choice, fine-tuned with their data for domain-specific use cases, at a significantly lower cost than proprietary APIs. vLLM has emerged as the most popular inference server to deploy open-source LLMs with leading performance, ease of use, broad model support, and heterogeneous hardware backends. Neural Magic is a leading contributor to the vLLM project and offers nm-vllm, an enterprise-ready vLLM distribution. nm-vllm includes stable builds of vLLM with long-term support, tools for optimizing LLMs for inference with techniques like quantization and sparsity, reference architectures for scalable deployments with Kubernetes, integration of telemetry and key monitoring systems, and more. Join us on July 11, 2024, at 2:00 PM EST (11:00 AM PST) to learn why vLLM is the leading open-source inference server and how Neural Magic works with enterprises to build and scale vLLM-based model services.

This content isn’t available here

Access this content and more in the LinkedIn app

1 Comment

Like Comment Share
Neural Magic reposted this

Mark Kurtz

CTO @ Neural Magic
3w Edited
Report this post
🚨 New blog posted! We've published a comprehensive blog at Neural Magic on deploying Llama 3 8B with vLLM. The blog showcases an inexpensive, end-to-end open-source solution for large language models (LLMs), enabling cost-effective, high-performance AI solutions. 🔍 Key Takeaways: - Superior Accuracy: Llama 3 8B outperforms larger models for real-world use cases, with an average performance of 28% better than Llama 2 70B. Cost Efficiency: You can achieve significant savings of up to 16X by running the more accurate, smaller models on a single A10 GPU with faster performance than the baseline for larger models of dual A100s. - Seamless Deployment: Integrate Llama 3 8B with vLLM effortlessly for rapid application AI enhancements. To dive in further, the link is in the comments! #LLMs #vLLM #AI #MachineLearning #Innovation #OpenSource
3 Comments

Like Comment Share
Neural Magic

15,555 followers
4w
Report this post
Neural Magic's CEO, Brian Stevens, recently spent some time with host Heather Haskin from The Catalyst by Softchoice podcast to talk about the intersection of AI, open source, and the future of responsible development. Listen in on "The case for open-source AI" to learn more about the vital role of open-source models and why the democratization of AI is important for the success of today's enterprise. https://lnkd.in/eumGUGBH

The Catalyst by Softchoice

link.chtbl.com

Like Comment Share
Neural Magic

15,555 followers
1mo
Report this post
Optimizing your AI models with techniques like sparsity and quantization increases production performance while decreasing your total infrastructure spend. Eldar Kurtić, our expert in AI model optimization, shares more details in this podcast. Check it out 👇

Eldar Kurtić

Machine Learning
1mo

I was recently invited to share my insights on "Efficient Inference through Sparsity and Quantization" in a two-part podcast series. In the first episode, we dive into how sparsity can improve the performance and efficiency of machine learning models, reducing deployment costs on both CPUs and GPUs. The next episode, which will focus on quantization, is coming soon. Listen to the first episode here: https://lnkd.in/dnaCzzsm

56. Eldar Kurtic - Efficient Inference through sparsity and quantization - Part 1/2

https://spotify.com

Like Comment Share
Neural Magic

15,555 followers
1mo
Report this post
Are you using, or considering, vLLM for your LLM inference serving? Join us this Wednesday to ask all your questions and learn more about accelerated #LLM inference with vLLM and Neural Magic. vLLM project maintainer Simon Mo and vLLM project committer Michael Goin will spend one hour with the community answering questions and sharing recent vLLM project updates. Register and ask your questions here: https://lnkd.in/euF8m73q If you can't make it this Wednesday, you can register for the June 20th session.
1 Comment

Like Comment Share
Neural Magic

15,555 followers
1mo Edited
Report this post
We were at Nutanix #NEXTConf in Barcelona last week and it was such an exciting event! From the main stage announcements to the enthusiastic conversations with attendees at the AI Pavilion, congrats to Nutanix for creating a dynamic environment where innovation thrived. Thank you to the Nutanix team, including Gregory Lehrer, Gali Ross-Hasson, Tarkan Maner, Luke Congdon, Wolfgang Huse, and others for partnering with Neural Magic. We appreciate the opportunity to be a part of the Nutanix #AI strategy. 🙏

Neural Magic at Nutanix .NEXT 2024.

2 Comments

Like Comment Share

Browse jobs

Funding

Neural Magic 3 total rounds

Last Round

Series A Nov 5, 2021

US$ 30.0M

Investors

New Enterprise Associates + 4 Other investors

See more info on crunchbase

Neural Magic

Software Development

Somerville, Massachusetts 15,555 followers

High-performance inference serving solutions for you to deploy leading open-source LLMs. #SoftwareDeliveredAI

About us

DeepSparse

Deep Learning Software

SparseML

Deep Learning Software

SparseZoo

Deep Learning Software

Locations

Employees at Neural Magic

Dimitri Sirota

BigID - Know Your Data | Control Your Data

Jamie Goldstein

Brian Stevens

CEO at Neural Magic. Ex CTO & VP Google Cloud, CTO & EVP Red Hat.

Gil Beyda

Founder & Managing Partner at Genacast Ventures

Updates

Neural Magic at Nutanix .NEXT 2024.

Join now to see what you are missing

Similar pages

Ultralytics

Deci AI (Acquired by NVIDIA)

Roboflow

Run:ai

Weights & Biases

OmniML

Hugging Face

Cerebras Systems

Layer AI

OctoAI

Browse jobs

Scientist jobs

Engineer jobs

Analyst jobs

Machine Learning Engineer jobs

Data Scientist jobs

Software Engineer jobs

Developer jobs

Marketing Manager jobs

Associate Product Marketing Manager jobs

Marketing Project Manager jobs

Vice President jobs

Quality Associate jobs

Manager jobs

Component Engineer jobs

Intern jobs

Associate jobs

Python Developer jobs

Microbiologist jobs

Solutions Architect jobs

Operational Specialist jobs

Funding