Leonard Püttmann’s Post

View profile for Leonard Püttmann, graphic

solution architect @ Kern AI

I built my first own language model called MiniMistral! ... and it's really useless because I'm GPU-poor. But here's what I learned: - Building the model itself is quite easy. The people of Mistral AI for example are so cool to share their model code (at least for the 7B model) here: https://lnkd.in/eHe7qD6f - On the other hand, training models is crazy difficult. For the smallest Llama model, Meta AI used 184320 GPU hours alone, spread over 2000 GPUs. Those are insane numbers (and probably the reason why the Nvidia stock goes brrr). I have settled for a measle 8M parameters for my model. - Even if you have a couple thousand Nvidia GPUs lying around, you need data to get started. Like a lot of really good data. There are some openly available, high-quality datasets like Cosmopedia https://lnkd.in/eJAY2MHK or fineweb https://lnkd.in/eeERudh9 which are massive could actually be used for training a small LLM. You can check our my MiniMistral here (please don't): https://lnkd.in/emcXZA_G

  • No alternative text description for this image
Aditya Advani

Live Free & Fly w Gen AI

1w

Kudos. Goals!

Like
Reply

To view or add a comment, sign in

Explore topics