Sravan Bodapatiโ€™s Post

View profile for Sravan Bodapati, graphic

Principal Scientist and Sr.Manager of Applied Science - Amazon AGI Foundation Models

AI at Meta has released the most powerful open-source model yet as of today: ๐‘ณ๐‘ณ๐‘จ๐‘ด๐‘จ3 - 8B and 70B models at 8k context length! They highlight improvements in each of the following aspects as the key differentiator: a) Model Architecture b) Pretraining data c) scaling up pre-training d) Instruction fine-tuning ๐Ÿ’ฅ ๐๐ซ๐ž๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  & ๐’๐œ๐š๐ฅ๐ข๐ง๐  ๐ฎ๐ฉ: โœˆ๏ธ Vocab of 128k tokens & GQA across both 8B and 70B, 8k native context length โœˆ๏ธ Pretrained over 15T tokens, 4 times more code than LLAMA2 โœˆ๏ธ Multi-lingual : over 5% of pretraining data consists of high-quality non English data that covers over 30 languages: don't expect same performance as English โœˆ๏ธ LLAMA2 generates training data (like in Self-Instruct) to generate text classifiers that filter data powering LLAMA3 โœˆ๏ธ Both 8B and 70B improve even after the model is trained on 2 orders of magnitude more data i.e log-linear improvement after 15T tokens training โœˆ๏ธ Training runs on 2 custom-built 24k GPU clusters with an overall training time efficiency of 95% โœˆ๏ธ ๐˜–๐˜ท๐˜ฆ๐˜ณ๐˜ข๐˜ญ๐˜ญ, 3 ๐˜ต๐˜ช๐˜ฎ๐˜ฆ๐˜ด ๐˜ฎ๐˜ฐ๐˜ณ๐˜ฆ ๐˜ฆ๐˜ง๐˜ง๐˜ช๐˜ค๐˜ช๐˜ฆ๐˜ฏ๐˜ต ๐˜ต๐˜ฉ๐˜ข๐˜ฏ ๐˜“๐˜“๐˜ˆ๐˜”๐˜ˆ2 ๐Ÿ’ฅ ๐‘ฐ๐’๐’”๐’•๐’“๐’–๐’„๐’•๐’Š๐’๐’ ๐‘ญ๐’Š๐’๐’†๐’•๐’–๐’๐’Š๐’๐’ˆ: โœˆ๏ธ Combination of SFT + PPO + DPO + Rejection Sampling โœˆ๏ธ ๐‘ธ๐’–๐’‚๐’๐’Š๐’•๐’š ๐’๐’‡ ๐’‘๐’“๐’๐’Ž๐’‘๐’•๐’” ๐’–๐’”๐’†๐’… ๐’Š๐’ ๐‘บ๐‘ญ๐‘ป, ๐’‘๐’“๐’†๐’‡๐’†๐’“๐’†๐’๐’„๐’† ๐’“๐’‚๐’๐’Œ๐’Š๐’๐’ˆ๐’” ๐’–๐’”๐’†๐’… ๐’Š๐’ ๐‘ซ๐‘ท๐‘ถ, ๐‘ท๐‘ท๐‘ถ ๐’‰๐’‚๐’” ๐’‚๐’ ๐’๐’–๐’•๐’”๐’Š๐’›๐’†๐’… ๐’Š๐’Ž๐’‘๐’‚๐’„๐’• ๐’๐’ ๐’Ž๐’๐’…๐’†๐’ ๐’‘๐’†๐’“๐’‡๐’๐’“๐’Ž๐’‚๐’๐’„๐’† โœˆ๏ธ Careful data curation and multiple rounds of QA on annotations from human annotators โœˆ๏ธTraining on preference ranking improved the model performance greatly on coding and reasoning tasks ๐Ÿ’ฅ ๐‘ณ๐‘ณ๐‘จ๐‘ด๐‘จ3 ๐‘ฎ๐’‚๐’–๐’“๐’…๐’“๐’‚๐’Š๐’๐’”: โœˆ๏ธ Updated and new safety tools : LLAMA Gaurd2 & cybersecEval 2 โœˆ๏ธ CodeShield - inference time gaurdrail for filtering insecure code โœˆ๏ธ Instruction Fine-tuned model redteamed for safety : By generating adverserial prompts that try to elicit problematic responses. โœˆ๏ธ LLAMAGaurd - foundational for prompt and response safety, can be easily finetuned to create a new taxonomy ๐Ÿ’ฅ ๐‘ฐ๐’๐’‡๐’†๐’“๐’†๐’๐’„๐’† & ๐‘ญ๐’–๐’•๐’–๐’“๐’† โœˆ๏ธ Despite LLAMA3 having 1B more params than LLAMA2 7B, the improved tokenizer efficiency and GQA contribute to maintaining the inference efficiency on par with Llama 2 7B. โœˆ๏ธ 400B+ parameters, Multilingual, Multimodal, and longer context window LLAMA3 would be available on AWS soon. Try it out! #generativeAI #llm #llama3 #aws #bedrock #sota

Introducing Meta Llama 3: The most capable openly available LLM to date

Introducing Meta Llama 3: The most capable openly available LLM to date

ai.meta.com

Sravan Bodapati

Principal Scientist and Sr.Manager of Applied Science - Amazon AGI Foundation Models

3mo
Like
Reply

To view or add a comment, sign in

Explore topics