Leonard Püttmann’s Post

solution architect @ Kern AI

I built my first own language model called MiniMistral! ... and it's really useless because I'm GPU-poor. But here's what I learned: - Building the model itself is quite easy. The people of Mistral AI for example are so cool to share their model code (at least for the 7B model) here: https://lnkd.in/eHe7qD6f - On the other hand, training models is crazy difficult. For the smallest Llama model, Meta AI used 184320 GPU hours alone, spread over 2000 GPUs. Those are insane numbers (and probably the reason why the Nvidia stock goes brrr). I have settled for a measle 8M parameters for my model. - Even if you have a couple thousand Nvidia GPUs lying around, you need data to get started. Like a lot of really good data. There are some openly available, high-quality datasets like Cosmopedia https://lnkd.in/eJAY2MHK or fineweb https://lnkd.in/eeERudh9 which are massive could actually be used for training a small LLM. You can check our my MiniMistral here (please don't): https://lnkd.in/emcXZA_G

1 Comment

Aditya Advani

Live Free & Fly w Gen AI

Kudos. Goals!

To view or add a comment, sign in

More Relevant Posts

Hian Goh
9mo
Report this post
AI - this time is different. Models are big and become effective. Generative ai has proven that there is a use case. GPUs are like feeding 500 people in a banquet hall for weddings. People get fed, but sometimes you need to wait a long time to get served. New ai architectures are like 500 seat restaurants with 100 tables for 5. You can seat your diners faster but the process is more chaotic. If your compiler sucks, it makes no difference. You need to provide the abstraction all the way to PyTorch. There are few ppl who can code at lower levels. Training is key, but the world will move to inference on the edge. Processor power / watt will become even more important. CPU + Ai acceleration will move from brute force to a dance between power, cost, performance, licence fees, supply chain availability for each and every use case. Model refinement will be a function of resolution of calculation, with FP16 being the best overall accuracy but a push to lower accuracy. maybe even INT4 and INT2. Who knows ASICS will become heterogeneous IP and Node due to backend chiplet processes. OSATS will become more important, Front end guys will get into backend processes. The lines will blur. Hyperscalers will try to make their own dedicated hardware. Automotive companies will try to control their own ai stack Moose’s law is not a law, its a whole bunch of engineers killing themselves to make chips faster and more powerful. It’s Moore’s challenge and not a law. Chips dont become more powerful by themselves. AI however, get smarter as the parameters increase. You just make the models bigger, and the LLM develops a sense of Humour I think people need to start realising - the war for AI is at the chip level, as well as at the application layer. Who will win? AI Acceleration NVIDIA AMD INTEL (Habana) Groq Cerebras Systems Sambanova Graphcore Tenstorrent D Matrix CPU INTEL, AMD, ARM, RISC V And we find more companies every other day popping up. Conclusion - it is not a given that NVIDIA will win.
3 Comments
Like Comment
To view or add a comment, sign in
Markus Wolff

Chapter Lead GenAI @ ALDI Süd
1mo
Report this post
📢 NVIDIA Unveils Nemotron-4 340B for Synthetic Data Generation The GenAI tech companies face a significant challenge: they are running out of data to train their future models. Solution? 🧰 Generate high-quality synthetic training data, ideally using an open model. NVIDIA steps in with Nemotron-4 340B, which includes base, instruct, and reward models—all optimized for NVIDIA NeMo and TensorRT-LLM. 💡 With this advancement, NVIDIA has the potential to enhance custom language model development across various industries such as healthcare, finance, and manufacturing, further amplifying its substantial impact on the GenAI tech world. ➡️ Synthetic data boosts training data quality & model performance ➡️ Open model access fosters AI innovation ➡️ Optimized for efficient training & inference Check out the full details: https://lnkd.in/eEDBETd9

NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

blogs.nvidia.com
Like Comment
To view or add a comment, sign in
Preeti Cholleti

🤝 Follow me and be a part of the worlds 🌎 largest AI Database
10mo
Report this post
Building AI Training Beasts Is As Important As Optimizing AI Inference #AIInference 🤝 Follow us on Discord 🔜: https://lnkd.in/gt823Zd3 _ ❇️ Summary: Software always lags behind hardware in terms of performance optimization, but AI systems fed with full system specifications can potentially adapt application and system software before the hardware is even built. However, this capability is not yet fully realized, and it still requires time and effort from people to optimize software for improving hardware. In the AI sector, software optimizations have contributed significantly to performance improvement, alongside hardware advancements. For instance, Nvidia's GPU accelerators have seen substantial performance boosts through software optimizations, even after the launch of new hardware generations. Nvidia has recently announced algorithm tunings, such as TensorRT-LLM, which can enhance inference performance for AI models. These software tweaks have delivered significant performance improvements, nearing 8X for certain workloads. Hashtags: #chatGPT #AIInferenceMatters #EfficientAIComputing

Building AI Training Beasts Is As Important As Optimizing AI Inference #AIInference

webappia.com
Like Comment
To view or add a comment, sign in
Nishu Sharma

Bridging Marketing and AI for Effective Results | Healthcare Digital Marketing | Navigating the evolving landscape of AI in marketing, ensuring strategies that connect, resonate, and inspire
10mo
Report this post
💡Nvidia Unveils New AI Chip Configuration with 141GB of Memory🚀 Nvidia just announced an all-new AI chip configuration boasting 141GB of memory. Welcome to the world, GH200, based on the efficient Hopper architecture, a true game-changer for large language models (LLMs).👏 LLMs - AI models used for text generation, language translation, and insightful Q&A - are the rage now. But guess what? They demand a considerable memory to train and run, and that's where GH200 steps in.🔍 Sporting a memory of 141GB - a number unmatched by any existing AI chips - GH200 can train and run larger, more complicated LLMs with ease.💪 Moreover, Nvidia has made GH200 more energy-efficient than its preceding counterparts, hence more cost-effective for LLM training and inference. Isn't that a smart move?🌍 The advent of GH200 marks a significant leap in AI science, enabling experts to experiment with larger, intricate LLMs that will undeniably fuel inventive, future AI applications.🔬
Like Comment
To view or add a comment, sign in
kaikai luo

CEO
1mo
Report this post
NVIDIA's Open Synthetic Data Generation for LLMs 🔍 Importance of High-Quality Training Data: NVIDIA emphasizes that high-quality training data is crucial for the performance of large language models (LLMs). However, acquiring large-scale, diverse, high-quality datasets can be both expensive and challenging. 💡 Open Nemotron-4 340B Model: To address this, NVIDIA has released the Nemotron-4 340B model, an open, scalable solution that helps developers generate synthetic data for training LLMs, providing a free resource to overcome data acquisition barriers. ⚙️ Model Optimization & Customization: NVIDIA NeMo and NVIDIA TensorRT-LLM offer tools for optimizing the training and inference of LLMs. These tools enable developers to customize and fine-tune models to meet specific use cases and industry requirements. 🔒 Model Security Assessment: The Nemotron-4 340B models have undergone rigorous security assessments to ensure reliability and safety in commercial applications. NVIDIA encourages users to thoroughly evaluate the generated data to ensure its suitability, safety, and accuracy. 🏢 Enterprise AI Solutions: Through NVIDIA AI Enterprise, the company provides enterprise-grade AI solutions that support efficient LLM operations. Additionally, NVIDIA collaborates closely with HPE to drive AI transformation in businesses. #AI #NVIDIA #LLM #SyntheticData #ModelOptimization #EnterpriseAI #DataSecurity #Nemotron4 #NeMo #TensorRT
Like Comment
To view or add a comment, sign in
ibl.ai

2,469 followers
4w
Report this post
NVIDIA RELEASED OPEN SYNTHETIC DATA GENERATION PIPELINE FOR TRAINING LLMS NVIDIA announced Nemotron-4 340B, a family of open models developers can use to generate synthetic data for training LLMs for commercial applications across healthcare, finance, manufacturing, retail, and other industries. Robust datasets with high-quality training data are prohibitively expensive and difficult to access. Synthetic data mimics the characteristics of real-world data. Through a uniquely permissive open model license, Nemotron-4 340B gives developers a free, scalable way to generate synthetic data that can help build powerful LLMs. The Nemotron-4 340B family includes base, instruct, and reward models that form a pipeline to generate synthetic data for training and refining LLMs. The models are optimized with NVIDIA NeMo, an open-source framework for end-to-end model training, including data curation, customization, and evaluation. They’re also optimized for inference with the open-source NVIDIA TensorRT-LLM library. Nemotron-4 340B can be downloaded from Hugging Face. https://lnkd.in/dc6Xr7q9 / FROM OUR AI CTO Miguel Amigot II

1 Comment
Like Comment
To view or add a comment, sign in
Simon Pfeiffer

Head of Product at Codesphere
8mo Edited
Report this post
This week I got to play first hand with the latest hardware, required for larger AI models and I think not many people have a good feel for the sheer scale of the challenges that come with these new larger large language models. Q8_0 Llama_Chat size: Llama 7B - (least accurate, only usable as fine tuned model) - 7.16 GB Llama 13B - (medium accuracy - suitable for mid range hardware ) - 13.83 GB Llama 70B - (most accurate, suitable for general purpose chatbot) - 73.29 GB Now to run these models the entire model size needs to be loaded in the RAM or vRAM (for GPUs) memory of the hardware. With typical f16p precision (without quantization) you'd even need roughly twice that, around ~130 GB for a 70B model. The largest commercially available GPU (NVIDIA A100, costing upwards of $4k/m) only has 80GB of vRAM so you already need at least 2 to serve a single request at a time 🤯 OpenAI's GPT models are even bigger than 70B so just imagine the cost of running this! Potential alternative? Imagine CPUs with 400+ cores and up to 4 TB of RAM...stay tuned, we are already working on the next innovation 💜

1 Comment
Like Comment
To view or add a comment, sign in
Olusegun Oladele

Building Klasshour---AI personalized learning platform | AI Researcher | Machine & Deep Learning Engineer | Computer Vision | Natural Language Processing | Critical Thinker | Problem Solver
8mo
Report this post
As I began to train an AI model that could classify human behavior I realized that with the large volume of training data that are involved in this project, it would have been impossible to do without NVIDIA GPUs Training an artificial intelligence model is computationally expensive and without NVIDIA GPUs that are cheap and readily available as chips in machines, it would have been impossible to achieve this level of advancement that we have with AI.
Like Comment
To view or add a comment, sign in
TuringPost

4,657 followers
1mo Edited
Report this post
NVIDIA AI has launched Nemotron-4 340B! This family of models creates an open synthetic data generation pipeline for training LLMs. It can be used in various industries like healthcare, finance, and retail. This family of models includes: ▪️ Nemotron-4 340B Instruct model generates synthetic data mirroring real-world traits, enhancing custom LLM's quality and robustness. ▪️ Nemotron-4 340B Reward model further refines this data by grading responses on helpfulness, correctness, coherence, complexity, and verbosity, ranking first on the Hugging Face RewardBench leaderboard. ▪️ Researchers can customize the Nemotron-4 340B Base model with their own data and the HelpSteer2 dataset. Optimized for use with NVIDIA NeMo and TensorRT-LLM, these models offer a free, scalable solution for developing high-quality LLMs. Nemotron-4 340B is available for download on Hugging Face, with additional access coming soon via ai.nvidia.com. Download: https://lnkd.in/e_-paqBy Announcement: https://lnkd.in/erp-t93G
Like Comment
To view or add a comment, sign in
Dr Özgür Demir

Freelance Software Engineer | Artificial Intelligence | Machine Learning | Data Architecture | MLOps | PhD | Founder | ex SoundCloud
10mo
Report this post
When I started with machine learning (AI), the models were small, the datasets were limited, and, compared to today, computers were also much smaller. Recent breakthroughs in this field can be attributed to improved algorithms, including the invention of ReLUs, CNNs, LSTMs, and Transformers. Additionally, the significant increase in training dataset sizes and computational resources has played a crucial role. In the past, I could train most models on a personal computer, and later, on workstations with dedicated GPUs. Nowadays, training large models from scratch demands a substantial amount of computational resources, which translates to a significant cost. To provide a concrete example, training a language model as massive as Llama 2 70B takes approximately 1,720,320 GPU hours on Nvidia A100-80GB GPUs. In other words, training this behemoth using 5000 GPUs takes around 14 days. If you were to rent A100 80GB GPUs at a rate of 1.5 Euros per hour, the cost would amount to approximately 2.5 million Euros. This poses a challenge for smaller companies and universities that struggle to compete with larger companies that have access to extensive computational resources. #machinelearning #ai #aicosts #mlops
1 Comment
Like Comment
To view or add a comment, sign in

1,350 followers

82 Posts

View Profile Follow

Leonard Püttmann’s Post

More Relevant Posts

Explore topics