AI at Meta has released the most powerful open-source model yet as of today: ๐ณ๐ณ๐จ๐ด๐จ3 - 8B and 70B models at 8k context length! They highlight improvements in each of the following aspects as the key differentiator: a) Model Architecture b) Pretraining data c) scaling up pre-training d) Instruction fine-tuning ๐ฅ ๐๐ซ๐๐ญ๐ซ๐๐ข๐ง๐ข๐ง๐ & ๐๐๐๐ฅ๐ข๐ง๐ ๐ฎ๐ฉ: โ๏ธ Vocab of 128k tokens & GQA across both 8B and 70B, 8k native context length โ๏ธ Pretrained over 15T tokens, 4 times more code than LLAMA2 โ๏ธ Multi-lingual : over 5% of pretraining data consists of high-quality non English data that covers over 30 languages: don't expect same performance as English โ๏ธ LLAMA2 generates training data (like in Self-Instruct) to generate text classifiers that filter data powering LLAMA3 โ๏ธ Both 8B and 70B improve even after the model is trained on 2 orders of magnitude more data i.e log-linear improvement after 15T tokens training โ๏ธ Training runs on 2 custom-built 24k GPU clusters with an overall training time efficiency of 95% โ๏ธ ๐๐ท๐ฆ๐ณ๐ข๐ญ๐ญ, 3 ๐ต๐ช๐ฎ๐ฆ๐ด ๐ฎ๐ฐ๐ณ๐ฆ ๐ฆ๐ง๐ง๐ช๐ค๐ช๐ฆ๐ฏ๐ต ๐ต๐ฉ๐ข๐ฏ ๐๐๐๐๐2 ๐ฅ ๐ฐ๐๐๐๐๐๐๐๐๐๐ ๐ญ๐๐๐๐๐๐๐๐๐: โ๏ธ Combination of SFT + PPO + DPO + Rejection Sampling โ๏ธ ๐ธ๐๐๐๐๐๐ ๐๐ ๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐ ๐บ๐ญ๐ป, ๐๐๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐๐ ๐๐๐๐ ๐๐ ๐ซ๐ท๐ถ, ๐ท๐ท๐ถ ๐๐๐ ๐๐ ๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐ ๐๐ ๐๐๐ ๐๐ ๐๐๐๐๐๐๐๐๐๐๐ โ๏ธ Careful data curation and multiple rounds of QA on annotations from human annotators โ๏ธTraining on preference ranking improved the model performance greatly on coding and reasoning tasks ๐ฅ ๐ณ๐ณ๐จ๐ด๐จ3 ๐ฎ๐๐๐๐ ๐๐๐๐๐: โ๏ธ Updated and new safety tools : LLAMA Gaurd2 & cybersecEval 2 โ๏ธ CodeShield - inference time gaurdrail for filtering insecure code โ๏ธ Instruction Fine-tuned model redteamed for safety : By generating adverserial prompts that try to elicit problematic responses. โ๏ธ LLAMAGaurd - foundational for prompt and response safety, can be easily finetuned to create a new taxonomy ๐ฅ ๐ฐ๐๐๐๐๐๐๐๐ & ๐ญ๐๐๐๐๐ โ๏ธ Despite LLAMA3 having 1B more params than LLAMA2 7B, the improved tokenizer efficiency and GQA contribute to maintaining the inference efficiency on par with Llama 2 7B. โ๏ธ 400B+ parameters, Multilingual, Multimodal, and longer context window LLAMA3 would be available on AWS soon. Try it out! #generativeAI #llm #llama3 #aws #bedrock #sota
Sravan Bodapatiโs Post
More Relevant Posts
-
Microsoft has released the Phi-3 Vision Locally, a lightweight, open-source multimodal model built upon synthetic data and filtered publicly available websites. It's part of the 53 Model Family, with 4.2 billion parameters, containing an image encoder, connector, projector, and 53 Mini Language Model. The model is designed for general-purpose AI systems and applications requiring memory or compute constraint environments, latency-bound scenarios, and has been trained on 500 billion vision and text tokens. ๐น The Phi-3 Vision model is a lightweight state-of-the-art open multimodal model built upon synthetic data and filtered publicly available websites. ๐ The model belongs to the 53 Model Family and has 128k context length in tokens, supporting Visual and Text input capabilities. ๐ช It underwent rigorous enhancement processes, including Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). ๐ The model has been trained on an offline text dataset with a cutoff date of March 15, 2024, and is an open-weight release. ๐ป It requires Python 3.11 and can run even with 16 GB of VRAM. ๐ง The prerequisites include the Transformers library from Source, Flash Attention, MP Pillow, Torch, and Dodge Vision. ๐ The model is designed for general-purpose AI systems and applications requiring memory or compute constraint environments. ๐ฏ It has been trained on 500 billion vision and text tokens. ๐ก It provides uses for Optical Character Recognition (OCR), chart and table understanding, and more. ๐ฉ The model takes text input as an image and the best prompt template is chat format. ๐ฐ๏ธ The training data was generated between February and April this year.
To view or add a comment, sign in
-
Looking to advance your Large Language Models with RAG? With KDB.AI and LlamaIndex, weโre enabling developers to create cutting-edge AI applications, complete with temporal filters for enhanced semantic search and content summarization. With simplified ingestion, indexing and model integration, developers can now quickly build services such as: - Document Q&A - Data Augmented Chatbots - Knowledge Agents - Structured Analytics - Content Generation Read our latest blog: https://lnkd.in/efnmA2jR Then get started with our latest Advanced RAG sample: https://lnkd.in/ewAiJCUT Laurie Voss Jerry Liu Ashok Reddy LlamaIndex #DevOps #Developers #GenAI #RAG #LLMs #Analytics #AI
Build RAG-Enabled Applications with LlamaIndex and KDB.AI | KX
https://kx.com
To view or add a comment, sign in
-
Good layman explanation on what Vectors are!
๐ Hereโs the second in our Behind the Scenes series about building an AI product. This one https://lnkd.in/eiF4nDGX is on Vector Stores and how they are transformative - particularly for what we are doing with Harriet. There's lots of techy resources out there on this subject, so rather than repeating that, this is for business leaders and key decision makers who need a good understanding of this technology, and what it means for your business, without being in the weeds. For context: Harriet is an AI people ops assistant who lives in Slack and is trained on all of your internal documents. She uses vector databases to streamline your data, which then enables Harriet's AI brain to enforce business processes and policies across your organization. Quick summary of what I cover: โจ What vector stores are in layman's terms. โจ Their transformative potential in the business landscape. โจ The synergy between vector stores and LLMs. Keep an eye out for Part II, where Iโll break down the challenges and real-world applications of Vector stores. Any and all thoughts - I would love to hear them. Let me know what you think. #AI #VectorStores #LLMs #CTO #CPO #TechSimplified
Vector stores part I: a non-technical introduction | Harriet
hrharriet.com
To view or add a comment, sign in
-
"Developers who have jumped in to try out Google Gemini for free should know their data might be used to train its generative artificial intelligence (AI) models, including those that power Google AI Studio and Gemini Pro." #data #dataprivacy #trainingdata #aimodels #ai #llm #chat #chatbots #generativeai #api #multimodal #multimodalai #genai #aiagents
What developers trying out Google Gemini should know about their data
zdnet.com
To view or add a comment, sign in
-
๐ Hereโs the second in our Behind the Scenes series about building an AI product. This one https://lnkd.in/eiF4nDGX is on Vector Stores and how they are transformative - particularly for what we are doing with Harriet. There's lots of techy resources out there on this subject, so rather than repeating that, this is for business leaders and key decision makers who need a good understanding of this technology, and what it means for your business, without being in the weeds. For context: Harriet is an AI people ops assistant who lives in Slack and is trained on all of your internal documents. She uses vector databases to streamline your data, which then enables Harriet's AI brain to enforce business processes and policies across your organization. Quick summary of what I cover: โจ What vector stores are in layman's terms. โจ Their transformative potential in the business landscape. โจ The synergy between vector stores and LLMs. Keep an eye out for Part II, where Iโll break down the challenges and real-world applications of Vector stores. Any and all thoughts - I would love to hear them. Let me know what you think. #AI #VectorStores #LLMs #CTO #CPO #TechSimplified
Vector stores part I: a non-technical introduction | Harriet
hrharriet.com
To view or add a comment, sign in
-
Intro to Vector stores (in Laymanโa terms) and super useful insights from our real-world experiences ๐คฉ๐ง Worth checking it out if youโre interested in this space. #pinecone #vectordatabase #llms #ai
๐ Hereโs the second in our Behind the Scenes series about building an AI product. This one https://lnkd.in/eiF4nDGX is on Vector Stores and how they are transformative - particularly for what we are doing with Harriet. There's lots of techy resources out there on this subject, so rather than repeating that, this is for business leaders and key decision makers who need a good understanding of this technology, and what it means for your business, without being in the weeds. For context: Harriet is an AI people ops assistant who lives in Slack and is trained on all of your internal documents. She uses vector databases to streamline your data, which then enables Harriet's AI brain to enforce business processes and policies across your organization. Quick summary of what I cover: โจ What vector stores are in layman's terms. โจ Their transformative potential in the business landscape. โจ The synergy between vector stores and LLMs. Keep an eye out for Part II, where Iโll break down the challenges and real-world applications of Vector stores. Any and all thoughts - I would love to hear them. Let me know what you think. #AI #VectorStores #LLMs #CTO #CPO #TechSimplified
Vector stores part I: a non-technical introduction | Harriet
hrharriet.com
To view or add a comment, sign in
-
๐ Large Language Models (LLMs) at the Enterprise Crossroads: Insights from Arize's Survey ๐ The landscape of artificial intelligence is witnessing a pivotal moment with the accelerating adoption of Large Language Models (LLMs) in enterprises. A survey conducted by Arize in September 2023, involving over 350 AI engineers, data scientists, developers, and technical business executives, sheds light on this transformative phase. The findings reveal a significant shift towards embracing LLM-powered applications, underscoring the growing confidence in their enterprise readiness and deployment capabilities. Major Takeaways from the Survey: ๐น Rapid Adoption: 61.7% of developers and ML teams are now planning or already have an LLM app in production within a year, marking a notable increase from 51.7% in April. ๐น Diverse LLM Ecosystem: While OpenAI remains a dominant player, alternatives like Metaโs Llama 2, Google PaLM 2, Databricks (Dolly), and MosaicML are gaining traction, indicating a vibrant and competitive landscape. ๐น Evolving Barriers: Concerns are shifting from data privacy and the need for a business case to more nuanced challenges like on-prem requirements and the accuracy of responses, highlighting the maturation of LLM adoption considerations. ๐น Regulatory Perspectives: A significant portion of technical teams prefer to delay new AI regulations or better enforce existing ones, suggesting a cautious approach to regulatory intervention. ๐น Preference for Third-Party APIs: Most teams favor using a third-party public API for LLM integration, followed by proprietary fine-tuned models, reflecting a pragmatic approach to leveraging LLM capabilities. Implications for the Future of Corporate AI Strategy: ๐น Strategic Experimentation: The survey underscores the importance of creating sandbox environments for LLM experimentation, allowing enterprises to tailor AI solutions to their specific needs. ๐น LLM Observability and Governance: As adoption deepens, the focus on LLM observability and governance tools becomes crucial, ensuring responsible and efficient deployment. ๐น Innovation and Competitive Edge: The growing adoption of LLMs signals a broader shift towards AI-driven innovation, offering enterprises new avenues to enhance productivity, creativity, and customer engagement. ๐ฌ Your Thoughts? How is your organization navigating the adoption of LLMs? What strategies are you employing to overcome the hurdles and maximize the benefits of this powerful AI technology? ๐ https://lnkd.in/dWwwBaC9 #AI #LLMs #EnterpriseTech #Innovation #ArtificialIntelligence
Survey: Large Language Model Adoption Reaches Tipping Point
arize.com
To view or add a comment, sign in
-
๐ Co-Founder David Buxton shares some helpful insights on vector stores and why they are so crucial when building with LLMs. Business leaders check it out: https://lnkd.in/eiF4nDGX #vectorstores #AI #LLM
๐ Hereโs the second in our Behind the Scenes series about building an AI product. This one https://lnkd.in/eiF4nDGX is on Vector Stores and how they are transformative - particularly for what we are doing with Harriet. There's lots of techy resources out there on this subject, so rather than repeating that, this is for business leaders and key decision makers who need a good understanding of this technology, and what it means for your business, without being in the weeds. For context: Harriet is an AI people ops assistant who lives in Slack and is trained on all of your internal documents. She uses vector databases to streamline your data, which then enables Harriet's AI brain to enforce business processes and policies across your organization. Quick summary of what I cover: โจ What vector stores are in layman's terms. โจ Their transformative potential in the business landscape. โจ The synergy between vector stores and LLMs. Keep an eye out for Part II, where Iโll break down the challenges and real-world applications of Vector stores. Any and all thoughts - I would love to hear them. Let me know what you think. #AI #VectorStores #LLMs #CTO #CPO #TechSimplified
Vector stores part I: a non-technical introduction | Harriet
hrharriet.com
To view or add a comment, sign in
-
Recently, a significant shift has occurred in the landscape of large language models (LLMs), marking a decisive turn towards a business-to-business (B2B) approach. Just a few weeks ago, Snowflake unveiled Artic, their groundbreaking 17 billion parameter model, tailored specifically for enterprise applications. This model stands out not only for its impressive technical capabilities in SQL generation, code completion, and logical operations but also for its bold claim as "The Best LLM for Enterprise AI." This development resonates deeply with me, reflecting a broader trend where companies like Databricks with their Mosaic initiative are emphasizing attributes like training and inference efficiency. As someone deeply immersed in this field, it's fascinating to see these specialized models like Artic and DBRX shift away from the more generalized approach of predecessors like GPT-4. The focus is now distinctly on meeting the nuanced needs of data-driven businesses, where practical tools like code automation are valued over vast general knowledge. Moreover, the move towards smaller, more cost-effective modelsโsuch as Meta's Llama3 8bโsuggests a strategic pivot to accessibility and efficiency. This evolution excites me as it promises a richer diversity of models, each offering tailored capabilities that cater to specific domains of expertise. It's a thrilling time for usโfounders, developers, and usersโas we stand at the cusp of further innovations that promise to redefine the boundaries of what LLMs can achieve. Thank you Tomasz Tunguz for helping understand this new development.
To view or add a comment, sign in
Principal Scientist and Sr.Manager of Applied Science - Amazon AGI Foundation Models
3moMedium: https://medium.com/@sravanbabubodapati/llama3-highlights-4725f849e3a4