Navigating Between Commercial Ownership and Collaborative Openness
This presentation explores the evolution of generative AI, highlighting the trajectories of various models such as GPT-4, and examining the dynamics between commercial interests and the ethics of open collaboration. We offer an in-depth analysis of the levels of openness of different language models, assessing various components and aspects, and exploring how the (de)centralization of computing power and technology could shape the future of AI research and development. Additionally, we explore concrete examples like LLaMA and its descendants, as well as other open and collaborative projects, which illustrate the diversity and creativity in the field, while navigating the complex waters of intellectual property and licensing.
Report
Share
Report
Share
1 of 24
Download to read offline
More Related Content
Similar to AI_dev Europe 2024 - From OpenAI to Opensource AI
1. Microsoft acknowledges that open source software plays an important role in the industry and that it respects and appreciates open source developers and their contributions.
2. Microsoft engages with the open source community through partnerships, communities, and technology research. It contributes to open source projects and supports open source through programs and initiatives.
3. Whether open source is ready for "primetime" or enterprise adoption depends on factors such as the development model, licensing, and how value is provided for different types of users including businesses. It is about evaluating costs and benefits.
How Mistral AI raised €105m with no pitch deck or productPitch Decks
Eyebrows were raised after Paris-based AI startup Mistral raised €105m with no product, a mere four weeks after launching. While some welcomed the raise as a milestone for European tech, others saw it as sign of AI hype reaching a feverish peak.
Their pitch memo focuses on two main selling points:1) that Europe has “yet to see the appearance of a serious contender” in building foundational models, and that Mistral will play a big role in this “major geopolitical issue”, and 2) embracing an open-source approach (unlike OpenAI's “black box” system) will make it easier for companies to build “better, faster” products. The company raised from notable investors including Lightspeed Ventures, French billionaire Xavier Niel and former Google CEO Eric Schmidt.
See more: bestpitchdeck.com/mistral
Mistral.ai aims to become a leading European company in generative AI by developing state-of-the-art models using an open approach. Their models will be trained on large computational resources and high quality data sources. In the first year, they will train two generations of models and develop business integrations. The first models will validate their skills while the second will address shortcomings of current models. Their goal is to train the best open-source text models by end of 2023 and customize models for business needs in the following six months. They aim to distribute the best open-source model and own generalist and specialist models with high value by mid-2024 while establishing commercial partnerships.
The document discusses three initiatives by OW2 to engage mainstream open source software users:
1) Beta-testing campaigns to align developer expectations with user needs and identify exploitation opportunities.
2) Applying "Market Readiness Levels" to provide a simple indicator of a project's maturity to help users make decisions.
3) The "Good Governance Initiative" to help users securely and responsibly use OSS while supporting the open source ecosystem.
2019 12-10 ow2 - OSPO - Open Source Governance et grands utilisateursFrédéric Aatz
OSS & Corporate users: from awareness to sustainability.
Embrace, use, contribute and release .. lead to OSS Governance imperatives for Corporate users. #opensource #ospo #azure #microsoft #openatmicrosoft
The Latest Advances in Generative AI_ Exploring New Technology for Data Integ...Safe Software
Stay ahead of the curve with our upcoming webinar on the latest developments in Generative AI technology. We will dive into the state of Generative AI since our previous webinar in January, including the newly released Azure Open AI tool, and explore its potential applications in the technology industry. Our expert speakers will showcase how this cutting-edge tool can be leveraged in FME data integration workflows for natural language processing, automated workflow generation, and predictive modeling. Our team will also demonstrate the incredible power and productivity of the new OpenAIChatGPTConnector, which leverages the state-of-the-art gpt-3.5-turbo model. Don't miss out on this opportunity to learn from the best in the field and discover how Generative AI can revolutionize your data integration workflows. Register now to unlock the power of Generative AI!
This document discusses open-source based business models. It identifies several models including externally funded ventures like public funding and "needed improvement" funding. It also discusses internally funded models like using open-source software for a company's internal needs before releasing it publicly. The document outlines specialized service-based business models providing services like installation, integration, and support around open-source software. It also notes business models like dual licensing, where a company offers both open-source and commercial licenses.
Crypton Studio is an IT company specializing
in blockchain development.
In 7 years on the global market, we have become
the largest company in the custom blockchain
development segment in Europe.
Since then, we have done 100+ various projects
for clients from 20 countries.
Leveraging Generative AI: Exploring New Technology for Data IntegrationSafe Software
Stay ahead of the curve with our upcoming webinar on developments in Generative AI technology. We will dive into the state of Generative AI since our previous webinar in January, including the Azure Open AI tool, and explore its potential applications in the technology industry. Our expert speakers will showcase how this cutting-edge tool can be leveraged in FME data integration workflows for natural language processing, automated workflow generation, and predictive modeling. Our team will also demonstrate the incredible power and productivity of the new OpenAIChatGPTConnector, which leverages the gpt-3.5-turbo model. Don't miss out on this opportunity to learn from the best in the field and discover how Generative AI can revolutionize your data integration workflows. Register now to unlock the power of Generative AI!
The document discusses innovation through platforms and modeling at the "edge." It describes how platforms can create and capture value by enabling new applications and business outcomes. Specifically, it outlines Merck's plans to develop a Scientific Modeling Platform that would integrate data and predictive models across research, development, and medical domains. This would allow modeling to be used more holistically and help advance the most promising drug candidates. The platform aims to drive innovation by supporting new collaborations, capabilities, and business models.
Pistoia Alliance US Conference 2015 - 1.1.2 Innovation in Pharma - Chris WallerPistoia Alliance
The document discusses innovation through platforms and modeling at the "edge." It describes how platforms can create and capture value by enabling new applications and business outcomes. Specifically, it outlines Merck's plans to develop a Scientific Modeling Platform to integrate data and predictive models across research, development, and medical domains. This platform would support collaborative modeling efforts and drive innovation by providing predictive insights earlier in the drug development process. Ultimately, the platform aims to transform drug discovery and development at Merck through increased use of analytics, data-driven decision making, and more successful projects.
This presentation describes some of the Open Source Ai projects we are working at the Center for Open Source, Data and AI Technologies (CODAIT), including Model Asset Exchange (MAX), Fabric for Deep Learning (FfDL) and Jupyter Enterprise Gateway.
This document provides an introduction to Smart Data Models. It explains that Smart Data Models define attributes, data types and relationships for modeling entities. It describes the structure of Smart Data Models, including technical descriptions, documentation and examples. It outlines the goals of the program, which are to enable implementation of services using data models and contribution to the program. The document discusses current status, with over 800 official data models across 13 domains and 62 subjects. It also explains how to become a contributor to Smart Data Models.
Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용 - 홍운표 데이터 사이언티스트, DataRobot :: AWS Sum...Amazon Web Services Korea
스폰서 발표 세션 | Datarobot, 자동화된 분석 적용 시 분석 절차의 변화 및 효용
홍운표 데이터 사이언티스트, DataRobot
데이터로봇은 기존 분석 소프트웨어와 달리 자동화된 분석 플랫폼입니다. 현업 담당자는 데이터 정의만 완료되면 자신의 업무에 AI를 적용하여 업무 효율을 얻을 수 있고, 데이터 과학자도 기존 분석업무 대비 수십배의 효율성을 얻을 수 있습니다. 데이터로봇은 이렇게 기업 업무에 AI를 쉽게 적용하여, 비지니스 가치를 실현하도록 도와드릴 수 있습니다. 본 세션에서는 데이터로봇이 제공하는 자동화된 분석의 세부 기능을 살펴보고 제품 데모를 통해 자동화된 분석이 어떻게 분석 결과물의 품질을 높이고, 기존 분석 작업보다 훨씬 효율적인 업무를 수행할 수 있게 도와드리는지 확인하실 수 있습니다.
The document summarizes a debate on open source versus proprietary software. It discusses definitions of open source software, popular open source licenses, and advantages of open source such as customizability, security, and lower costs. Open source is gaining adoption in government and enterprise due to benefits like avoiding vendor lock-in, lower costs, and higher quality from community contributions. Surveys find increasing enterprise adoption rates, with over 50% of new software to be open source in the next 5 years. Microsoft is also increasingly supporting open source.
Struggling to find an open source business modeldn
The document discusses the challenges faced by the creator of an open source software called SOFA in developing a sustainable business model. After two years of trying various monetization strategies, total income was only $100. The creator is now reluctantly considering adopting a "pro version/open core" model where additional paid features would be offered while keeping the core open source. This approach is controversial as some feel it would make free users "second class citizens". The creator is seeking feedback on whether there is a more realistic alternative given the need to support himself and family.
Open Source Software Development by TLV PartnersRoy Leiser
Our insights about Open Source software development. Trends, leading brands and practices, success stories, important Exists, Pros and Cons and much more.
Similar to AI_dev Europe 2024 - From OpenAI to Opensource AI (20)
TechForum Iberia 2024 - Towards a Redecentralization of the Internet: Explori...Raphaël Semeteys
The Web and Internet are constantly evolving, and a crucial question arises: how can we (re)decentralize these platforms to ensure an open, resilient and privacy-respecting Internet? In this presentation, we will explore a comprehensive overview of the various current initiatives and technologies contributing to this (re)decentralization. We will delve into the exciting world of Web3 with its concepts of blockchain, dApps, consensus, and DLT. We will also discover the promises of Web 3.0, including projects such as Solid and the semantic web. We will then explore the Fediverse, an ecosystem of decentralized social networks, as well as Holochain, a peer-to-peer application development technology.
2023 - Between Philosophy and Practice: Introducing YogaRaphaël Semeteys
What is Yoga?
Between Philosophy and Practice: a brief Introduction
By Raphaël Semeteys
- DevRel, Senior Architect, Open Source Expert
- Certified Yoga Teacher
+30 years long practitioner
OSX 2023 - Vers une re-decentralisation d’Internet : panorama des technos et ...Raphaël Semeteys
Je vous invite à un voyage captivant à travers les initiatives et technologies émergentes (telles que celles du Web3, du Web3.0, et du Fediverse) qui pourraient façonner un avenir d’Internet décentralisé. Découvrez comment vous pouvez vous impliquer dans ce mouvement en faveur de la (re)decentralisation du Web et d’Internet. Rejoignez-nous pour explorer les opportunités offertes par ces avancées, et contribuez à façonner un Internet où le contrôle des données est entre les mains des utilisateurs.
Web2day 2023 - Internet (re)décentralisé ? Architecture du Web3Raphaël Semeteys
Web3, blockchain, cryptomonnaies, tokens, wallet, NFT, smart contract, dApp, DAO… Il s’agit de prendre du recul sur les buzzwords et les bulles spéculatives pour mieux appréhender ce qui se joue en toile de fond : la re-décentralisation d’Internet !
Cette présentation est la carte que j’aurais aimé avoir au début de mon exploration de l’énorme écosystème qu’est l’Internet décentralisé (ou Web3 comme on dit improprement parfois).
Les termes Blockchain, proof-of-stake, smart contract, dApp, token, NFT, DAO ou Web3 vous laissent perplexes, vous donnent le vertige ou voire même vous énervent ? Venez découvrir les concepts et les composants structurants qui se cachent derrière tout ça !
La conception d'une plateforme est toujours délicate à initier.
Comment démarrer? Quelle est la démarche à adopter pour concevoir une architecture? Quel est le modèle à appliquer: event streaming, orchestration ou chorégraphie? Au travers d'un besoin utilisateur, nous prendrons notre "casquette" d'architecte et déroulerons devant vous une étude pour une toute nouvelle plateforme "Donut @ Home".
Après avoir analysé le besoin, confrontés nos idées et convictions devant vous, nous choisirons, parmi toutes les solutions possibles, quelle est la "moins pire".
Nous vous solliciterons pour valider notre conception et les exemples d'implémentation possibles.
A la fin de cette présentation, vous aurez des clés pour penser et démarrer les études de vos architectures en toute sérénité (ou presque).
Comparison Table of DiskWarrior Alternatives.pdfAndrey Yasko
To help you choose the best DiskWarrior alternative, we've compiled a comparison table summarizing the features, pros, cons, and pricing of six alternatives.
Fluttercon 2024: Showing that you care about security - OpenSSF Scorecards fo...Chris Swan
Have you noticed the OpenSSF Scorecard badges on the official Dart and Flutter repos? It's Google's way of showing that they care about security. Practices such as pinning dependencies, branch protection, required reviews, continuous integration tests etc. are measured to provide a score and accompanying badge.
You can do the same for your projects, and this presentation will show you how, with an emphasis on the unique challenges that come up when working with Dart and Flutter.
The session will provide a walkthrough of the steps involved in securing a first repository, and then what it takes to repeat that process across an organization with multiple repos. It will also look at the ongoing maintenance involved once scorecards have been implemented, and how aspects of that maintenance can be better automated to minimize toil.
Sustainability requires ingenuity and stewardship. Did you know Pigging Solutions pigging systems help you achieve your sustainable manufacturing goals AND provide rapid return on investment.
How? Our systems recover over 99% of product in transfer piping. Recovering trapped product from transfer lines that would otherwise become flush-waste, means you can increase batch yields and eliminate flush waste. From raw materials to finished product, if you can pump it, we can pig it.
An invited talk given by Mark Billinghurst on Research Directions for Cross Reality Interfaces. This was given on July 2nd 2024 as part of the 2024 Summer School on Cross Reality in Hagenberg, Austria (July 1st - 7th)
Support en anglais diffusé lors de l'événement 100% IA organisé dans les locaux parisiens d'Iguane Solutions, le mardi 2 juillet 2024 :
- Présentation de notre plateforme IA plug and play : ses fonctionnalités avancées, telles que son interface utilisateur intuitive, son copilot puissant et des outils de monitoring performants.
- REX client : Cyril Janssens, CTO d’ easybourse, partage son expérience d’utilisation de notre plateforme IA plug & play.
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfNeo4j
Presented at Gartner Data & Analytics, London Maty 2024. BT Group has used the Neo4j Graph Database to enable impressive digital transformation programs over the last 6 years. By re-imagining their operational support systems to adopt self-serve and data lead principles they have substantially reduced the number of applications and complexity of their operations. The result has been a substantial reduction in risk and costs while improving time to value, innovation, and process automation. Join this session to hear their story, the lessons they learned along the way and how their future innovation plans include the exploration of uses of EKG + Generative AI.
Details of description part II: Describing images in practice - Tech Forum 2024BookNet Canada
This presentation explores the practical application of image description techniques. Familiar guidelines will be demonstrated in practice, and descriptions will be developed “live”! If you have learned a lot about the theory of image description techniques but want to feel more confident putting them into practice, this is the presentation for you. There will be useful, actionable information for everyone, whether you are working with authors, colleagues, alone, or leveraging AI as a collaborator.
Link to presentation recording and transcript: https://bnctechforum.ca/sessions/details-of-description-part-ii-describing-images-in-practice/
Presented by BookNet Canada on June 25, 2024, with support from the Department of Canadian Heritage.
Understanding Insider Security Threats: Types, Examples, Effects, and Mitigat...Bert Blevins
Today’s digitally connected world presents a wide range of security challenges for enterprises. Insider security threats are particularly noteworthy because they have the potential to cause significant harm. Unlike external threats, insider risks originate from within the company, making them more subtle and challenging to identify. This blog aims to provide a comprehensive understanding of insider security threats, including their types, examples, effects, and mitigation techniques.
UiPath Community Day Kraków: Devs4Devs ConferenceUiPathCommunity
We are honored to launch and host this event for our UiPath Polish Community, with the help of our partners - Proservartner!
We certainly hope we have managed to spike your interest in the subjects to be presented and the incredible networking opportunities at hand, too!
Check out our proposed agenda below 👇👇
08:30 ☕ Welcome coffee (30')
09:00 Opening note/ Intro to UiPath Community (10')
Cristina Vidu, Global Manager, Marketing Community @UiPath
Dawid Kot, Digital Transformation Lead @Proservartner
09:10 Cloud migration - Proservartner & DOVISTA case study (30')
Marcin Drozdowski, Automation CoE Manager @DOVISTA
Pawel Kamiński, RPA developer @DOVISTA
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
09:40 From bottlenecks to breakthroughs: Citizen Development in action (25')
Pawel Poplawski, Director, Improvement and Automation @McCormick & Company
Michał Cieślak, Senior Manager, Automation Programs @McCormick & Company
10:05 Next-level bots: API integration in UiPath Studio (30')
Mikolaj Zielinski, UiPath MVP, Senior Solutions Engineer @Proservartner
10:35 ☕ Coffee Break (15')
10:50 Document Understanding with my RPA Companion (45')
Ewa Gruszka, Enterprise Sales Specialist, AI & ML @UiPath
11:35 Power up your Robots: GenAI and GPT in REFramework (45')
Krzysztof Karaszewski, Global RPA Product Manager
12:20 🍕 Lunch Break (1hr)
13:20 From Concept to Quality: UiPath Test Suite for AI-powered Knowledge Bots (30')
Kamil Miśko, UiPath MVP, Senior RPA Developer @Zurich Insurance
13:50 Communications Mining - focus on AI capabilities (30')
Thomasz Wierzbicki, Business Analyst @Office Samurai
14:20 Polish MVP panel: Insights on MVP award achievements and career profiling
YOUR RELIABLE WEB DESIGN & DEVELOPMENT TEAM — FOR LASTING SUCCESS
WPRiders is a web development company specialized in WordPress and WooCommerce websites and plugins for customers around the world. The company is headquartered in Bucharest, Romania, but our team members are located all over the world. Our customers are primarily from the US and Western Europe, but we have clients from Australia, Canada and other areas as well.
Some facts about WPRiders and why we are one of the best firms around:
More than 700 five-star reviews! You can check them here.
1500 WordPress projects delivered.
We respond 80% faster than other firms! Data provided by Freshdesk.
We’ve been in business since 2015.
We are located in 7 countries and have 22 team members.
With so many projects delivered, our team knows what works and what doesn’t when it comes to WordPress and WooCommerce.
Our team members are:
- highly experienced developers (employees & contractors with 5 -10+ years of experience),
- great designers with an eye for UX/UI with 10+ years of experience
- project managers with development background who speak both tech and non-tech
- QA specialists
- Conversion Rate Optimisation - CRO experts
They are all working together to provide you with the best possible service. We are passionate about WordPress, and we love creating custom solutions that help our clients achieve their goals.
At WPRiders, we are committed to building long-term relationships with our clients. We believe in accountability, in doing the right thing, as well as in transparency and open communication. You can read more about WPRiders on the About us page.
Quality Patents: Patents That Stand the Test of TimeAurora Consulting
Is your patent a vanity piece of paper for your office wall? Or is it a reliable, defendable, assertable, property right? The difference is often quality.
Is your patent simply a transactional cost and a large pile of legal bills for your startup? Or is it a leverageable asset worthy of attracting precious investment dollars, worth its cost in multiples of valuation? The difference is often quality.
Is your patent application only good enough to get through the examination process? Or has it been crafted to stand the tests of time and varied audiences if you later need to assert that document against an infringer, find yourself litigating with it in an Article 3 Court at the hands of a judge and jury, God forbid, end up having to defend its validity at the PTAB, or even needing to use it to block pirated imports at the International Trade Commission? The difference is often quality.
Quality will be our focus for a good chunk of the remainder of this season. What goes into a quality patent, and where possible, how do you get it without breaking the bank?
** Episode Overview **
In this first episode of our quality series, Kristen Hansen and the panel discuss:
⦿ What do we mean when we say patent quality?
⦿ Why is patent quality important?
⦿ How to balance quality and budget
⦿ The importance of searching, continuations, and draftsperson domain expertise
⦿ Very practical tips, tricks, examples, and Kristen’s Musts for drafting quality applications
https://www.aurorapatents.com/patently-strategic-podcast.html
Mitigating the Impact of State Management in Cloud Stream Processing SystemsScyllaDB
Stream processing is a crucial component of modern data infrastructure, but constructing an efficient and scalable stream processing system can be challenging. Decoupling compute and storage architecture has emerged as an effective solution to these challenges, but it can introduce high latency issues, especially when dealing with complex continuous queries that necessitate managing extra-large internal states.
In this talk, we focus on addressing the high latency issues associated with S3 storage in stream processing systems that employ a decoupled compute and storage architecture. We delve into the root causes of latency in this context and explore various techniques to minimize the impact of S3 latency on stream processing performance. Our proposed approach is to implement a tiered storage mechanism that leverages a blend of high-performance and low-cost storage tiers to reduce data movement between the compute and storage layers while maintaining efficient processing.
Throughout the talk, we will present experimental results that demonstrate the effectiveness of our approach in mitigating the impact of S3 latency on stream processing. By the end of the talk, attendees will have gained insights into how to optimize their stream processing systems for reduced latency and improved cost-efficiency.
7 Most Powerful Solar Storms in the History of Earth.pdfEnterprise Wired
Solar Storms (Geo Magnetic Storms) are the motion of accelerated charged particles in the solar environment with high velocities due to the coronal mass ejection (CME).
How Social Media Hackers Help You to See Your Wife's Message.pdfHackersList
In the modern digital era, social media platforms have become integral to our daily lives. These platforms, including Facebook, Instagram, WhatsApp, and Snapchat, offer countless ways to connect, share, and communicate.
Advanced Techniques for Cyber Security Analysis and Anomaly DetectionBert Blevins
Cybersecurity is a major concern in today's connected digital world. Threats to organizations are constantly evolving and have the potential to compromise sensitive information, disrupt operations, and lead to significant financial losses. Traditional cybersecurity techniques often fall short against modern attackers. Therefore, advanced techniques for cyber security analysis and anomaly detection are essential for protecting digital assets. This blog explores these cutting-edge methods, providing a comprehensive overview of their application and importance.
Advanced Techniques for Cyber Security Analysis and Anomaly Detection
AI_dev Europe 2024 - From OpenAI to Opensource AI
1. Payments to grow your world
Navigating between
Commercial Ownership
and Collaborative Openness
Raphaël Semeteys
Head of DevRel
Open Source Expert
Senior Architect at Worldline
19 June 2024
Paris, France
From OpenAI to Open Source AI
2. We design payments technology
that powers the growth of millions
of businesses around the world.
7000+ engineers
in over 40 countries
Managing 43+ billion
transactions per year
€250M spent in R&D
every year
Handling 150+
payment methods
3. The early days of LLMs
From rule-based and simpler statistical models to LLMs
2010’s 2020’s
2017-2018
Word embeddings
Word2Vec, GloVe
“Attention is All You Need"
Transformers
GenAI, ChatGPT
Responsibility concerns
Tomorrow?
Small Language Models
Mobile, Agents & LAMs
4. GenAI is having its Linux Moment
• Just like open source and Internet, bust much faster!
• Dynamics between collaborative openness and commercial ownership
• Need of clarity on licenses
Labs &
Universities
Individuals
Enterprises
Commodities
5. Defining Openness of a Model
Pre-training
Dataset
Fine-tuning
Dataset
Reward
Model
Model
Data Processing Code
6. Defining Openness of a Model
Score Level Description
Model
(weights)
Pre-
training
Dataset
Fine-
tuning
Dataset
Reward
model
Data
Processing
Code
0 Closed
No access to any public
information, data or asset
1
Published
research
only
Research papers(s) published but
with no more information, data or
asset
2
Restricted
access
Access to asset is possible only
with special agreement
(commercial, research…)
3
Open with
limitations
Access and reuse of asset is
possible but with certain
limitations on usage
4 Totally open
Access and reuse of asset is
possible without restriction on
usage (ex. open source license)
7. Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed
→
GPT-1 & 2 GPT-3.x & 4.x/o
ChatGPT
research paper only
8. Market-Leading Player: OpenAI
Deviation from original vision of research transparency & openness
Non/For-profit (US)
Component Score
Level
description
Model 4 Totally open
Dataset 1
Published
research
only
Code 1
Published
research
only
0 Closed
→
GPT-1 & 2
ChatGPT
research paper only
No training of other commercial LLMs
You may not: […] Use Output to
develop models that compete with
OpenAI.
GPT-3.x & 4.x/o
9. Market-Leading Player: Google
Transition from open research to a pragmatic approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
1
Published
research only
1
Published
research only
0 Closed
→
3
Open with
limitations
1
Published
research only
4
Toolchain
available
↔
10. Market-Leading Player: Google
Transition from open research to a pragmatic approach
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 2
Restricted
access
Code 4 Totally open
1
Published
research only
1
Published
research only
0 Closed
→
3
Open with
limitations
1
Published
research only
4
Toolchain
available
→
You may not use nor allow others to use Gemma or
Model Derivatives to: [illegals activities, unlicensed
practices of profession, abuse, security bypass and
promotion of hatred, abuse, violence, monitoring people
without consent, misinformation/defamation, automate
decisions concerning human rights and well-being, etc.]
Responsible AI contradicts Open Source Definition
11. Other Big Players
Catching up and making their mark in the GenAI Gold Rush
Partner for Infrastructure (inference and training)
Create their own (open) models
12. Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
13. Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
Restriction on usage: license for platforms with 700+ M users
Additional Commercial Terms. If, on the Llama 2 version release date,
the monthly active users of the products or services made available by or
for Licensee, or Licensee’s affiliates, is greater than 700 million monthly
active users in the preceding calendar month, you must request a license
from Meta, which Meta may grant to you in its sole discretion, and you
are not authorized to exercise any of the rights under this Agreement
unless or until Meta otherwise expressly grants you such rights.
14. Market-Leading Player: Meta
Journey to openness
Enterprise (US)
Component Score
Level
description
Model 4 Totally open
Dataset 3
Open with
limitations
Code 4 Totally open
RoBERTa
3
Open with
limitations
1
Published
research only
1
Published
research only
→
LLaMA 3 now more restrictive on redistribution and reuse
Redistribution and Use. If you distribute or make available the Llama Materials (or any
derivative works thereof), or a product or service that uses any of them, including
another AI model, you shall (A) provide a copy of this Agreement with any such Llama
Materials; and (B) prominently display “Built with Meta Llama 3” on a related website,
user interface, blogpost, about page, or product documentation. If you use the Llama
Materials to create, train, fine tune, or otherwise improve an AI model, which is
distributed or made available, you shall also include “Llama 3” at the beginning of any
such AI model name.
15. Llama 2 offspring’s: Alpaca and Vicuna
Fine-tuned models from Llama 2 by universities
Research (US)
Component Score
Level
description
Model 3
Open with
limitations
Pre-training
Dataset
1
Published
research only
Fine-tuning
Dataset
2
Research use
only
Code 4
Under Apache
2 license
Restrictions from both Llama 2 and OpenAI (ShareGPT)
16. Collaborative foundational LLMs
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral
Model 4
Access and
reuse
without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse
without
restriction
4
Access and
reuse
without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse
without
restriction
3
Open with
limitations
4
Access and
reuse
without
restriction
0
No public
information
or access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1
Just
examples
4
Completely
open
Dataset fuzziness: please refer to the specific license depending on the subset you use
Notion of responsible usage
17. Collaborative foundational LLMs
Modified open source licenses
Non-profit (US) Research (UAE) Research (EU) Research (US) Enterprise (FR)
EleutherAI GPT-J Falcon BLOOM OpenLLaMa Mistral
Model 4
Access and
reuse
without
restriction
3
Open with
limitations
3
Open RAIL
license
4
Access and
reuse
without
restriction
4
Access and
reuse
without
restriction
Dataset 3
Open with
limitations
4
Access and
reuse
without
restriction
3
Open with
limitations
4
Access and
reuse
without
restriction
0
No public
information
or access
Code 4
Completely
open
1
General
instructions
4
Completely
open
1
Just
examples
4
Completely
open
This license is, in part, based on the Apache License Version 2.0, with a
series of modifications. The contribution of the Apache License 2.0 to
the framing of this document is acknowledged. Please read this license
carefully, as it is different to other ‘open access’ licenses you may have
encountered previously. Use of Falcon180B for hosted services may
require a separate license.
18. Mistral AI’s French sauce
Navigating both open and close waters
Just like with Open Source, rise of Community VS Enterprise
Mix of AI Models
• Mixture-of-Experts (SMoE): Mixtral 8x7B, 8x22B
• Foundational and fine-tuned models
Mix of Business Models & Licenses
• “Open Source” models, mistral-finetune SDK
• Commercial: optimized Small, Large & Embed Models
• Sustainable openness: new non-production license for codestral
19. Mistral AI’s French sauce
Navigation both open and close waters
Just like with Open Source, revisiting Open in Cloud era
Mix of AI Models
• Mixture-of-Experts (SMoE): Mixtral 8x7B, 8x22B
• Foundational and fine-tuned models
Mix of Business Models & Licenses
• “Open Source” models, mistral-finetune SDK
• Commercial: optimized Small, Large & Embed Models
• Sustainable openness: new non-production license for codestral
MNPL - 3.2. Usage Limitation
- You shall only use the Mistral Models and Derivatives (whether or not created
by Mistral AI) for testing, research, Personal, or evaluation purposes in Non-
Production Environments;
- Subject to the foregoing, You shall not supply the Mistral Models or
Derivatives in the course of a commercial activity, whether in return for
payment or free of charge, in any medium or form, including but not limited to
through a hosted or managed service (e.g. SaaS, cloud instances, etc.), or
behind a software layer.
20. Collaborative fine-tuned LLMs
Impact of foundational models or pre-training datasets
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Research (US)
Dolly BLOOMChat Zephyr LLM360 OLMo-Instruct
Model 4 Based on GPT-J 3
Based on
BLOOM
4
Based on
Mistral
4 Open source 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3
Based on
BLOOM
0
Based on
Mistral
4
RedPajama,
Falcon,
StarCoder
3
Dolma
(ImpACT MR)
Fine-tuning
Dataset
4
Access and
reuse without
restriction
4
Dolly and
LAION
2
Research use
only (OpenAI)
2
Research use
only (OpenAI)
3
Tülu 2
(IMPACT LR)
Reward
model
0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
4
UltraFeedback
(MIT)
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source 4 Open source
21. Collaborative fine-tuned LLMs
Enterprise (US) Enterprise (US) Enterprise (US) Consortium (UAE/US) Research (US)
Dolly BLOOMChat Zephyr LLM360 OLMo-Instruct
Model 4 Based on GPT-J 3
Based on
BLOOM
4
Based on
Mistral
4 Open source 4 Open source
Pre-training
Dataset
3 Based on GPT-J 3
Based on
BLOOM
0
Based on
Mistral
4
RedPajama,
Falcon,
StarCoder
3
Dolma
(ImpACT MR)
Fine-tuning
Dataset
4
Access and
reuse without
restriction
4
Dolly and
LAION
2
Research use
only (OpenAI)
2
Research use
only (OpenAI)
3
Tulu 2
(IMPACT LR)
Reward
model
0
No public
information
available
0
No public
information
available
3
Paper and code
examples
0
No public
information
available
4
UltraFeedback
(MIT)
Code 4 Open source 3 OpenRAIL 3
Example code
available
4 Open source 4 Open source
AI2 ImpACT Licenses - Restrictions
[…] a. military weapons purposes […]
b. purposes of military surveillance […]
c. purposes of generating or disseminating information or content […] without
expressly and intelligibly disclaiming that the text is machine generated;
d. purposes of ‘real time’ remote biometric processing […]
e. fully automated decision-making without a human in the loop […] as spreading
misinformation […]
f. purposes of the predictive administration of justice, law enforcement, immigration,
or asylum processes, such as predicting an individual will commit fraud/crime
Responsible AI contradicts Open Source Definition
22. Other aspects of GenAI’s Linux Moment
Democratize and Decentralize (re)use and innovation
Notebooks
Communities
New Business Models
Collaborative Tools
& Ecosystems
AI Chips
Quantization
Decentralization
Hardware
Optimization
Do One Thing Well
Interoperable Standards
Beyond Python
Opensource Tools
& Frameworks
23. Key takeaways
• Closed APIs → Open Weights → Free AI (as in freedom)
• Datasets and upstream transitivity
• Competitive clauses
• Responsible AI restrictions
• Open Research → Competitive Market → Coopetitive Ecosystem
• Openness fosters reuse and collaboration
• Collaboration brings commoditization and innovation
Just like Open Source!