Many of us will be at the ICML’24 in Vienna and look forward to connecting with you! Among a large number of papers that Amazon would be presenting at ICML (https://lnkd.in/gAQXFrqA), want to highlight a few papers from my group: Fewer Truncations Improve Language Modeling https://lnkd.in/gbDR3F2Q Quick Summary: Best-fit Packing, a method that reduces document truncation during training of language models, enhancing data integrity and improving performance Collage: Light-Weight Low-Precision Strategy for LLM Training https://lnkd.in/ggHi6Th6 Quick Summary: presents Collage, a method utilizing multi-component float representation for low-precision computations to enhance the accuracy and efficiency of large language model training, achieving comparable performance to higher precision methods while significantly reducing memory usage and computational costs. Bifurcated Attention for Single-Context Large-Batch Sampling https://lnkd.in/g7YkTqzt Quick Summary: Paper introduces a method called bifurcated attention, which optimizes memory IO during incremental decoding for high batch sizes and long contexts, achieving significant latency reductions without increasing computational load, thus enhancing real-time applications such as parallel answer generation and ranking. Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models https://lnkd.in/gpey_tUY Quick Summary: Memory-Efficient Zeroth-Order Stochastic Variance-Reduced Gradient (MeZO-SVRG), which improves stability and convergence in fine-tuning language models while reducing memory and computational costs. Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation https://lnkd.in/g_D_gCge Quick Summary: The paper presents a method for evaluating the accuracy of Retrieval-Augmented Large Language Models (RAG). Repoformer: Selective Retrieval for Repository-Level Code Completion https://lnkd.in/gneyRdF7 Quick Summary: A selective retrieval-augmented generation (RAG) framework significantly enhances repository-level code completion performance and efficiency by selectively retrieving contexts only when beneficial. Explaining Probabilistic Models with Distributional Values https://lnkd.in/gaEAmKgr Quick Summary: This paper critiques game-theoretic explainable machine learning methods like SHAP for their misalignment with the desired explanation targets and proposes a new framework using distributional values. -- We work on Amazon Q Developer. Its your assistant for the entire software development lifecycle (SDLC). If this all sounds interesting, feel free to drop by the Amazon booth or reach out to me on LinkedIn and we can setup a time to chat. #amazonscience #genai #ml #ai #icml24 #deeplearning #aws
Anoop Deoras’ Post
More Relevant Posts
-
Exciting News in Database Technology! MIT researchers have introduced GenSQL, a groundbreaking AI-powered tool that simplifies complex database queries. By integrating generative AI with SQL, GenSQL allows users to ask natural language questions and receive accurate insights without needing deep technical expertise. This innovation enhances database interactions, providing personalized and precise answers swiftly. GenSQL's potential applications range from auditing data to generating synthetic data for sensitive information. Kudos to the MIT team for this remarkable advancement in making data more accessible and useful! #AI #DatabaseTechnology #Innovation #MIT #GenSQL #SQL #DataEdge
MIT's GenSQL Bridges AI And SQL For Easy Database Queries
greenbot.com
To view or add a comment, sign in
-
Hello Everyone! 👋 🌟 Exciting News from MIT: Introducing GenSQL! (Simplifying Complex Data Analysis with AI) 🌟 Imagine being able to analyze complex data without needing deep technical expertise. That's exactly what GenSQL, a new AI tool from MIT, offers! 🎉 GenSQL, a generative AI system for databases, is designed to make complex statistical analysis of tabular data easy and accessible. ➡ What's New ? The researchers noticed that SQL didn’t provide an effective way to incorporate probabilistic AI models, but at the same time, approaches that use probabilistic models to make inferences didn’t support complex database queries. They built GenSQL to fill this gap, enabling someone to query both a dataset and a probabilistic model using a straightforward yet powerful formal programming language. Here's why it’s a game-changer: 🔹 User-Friendly: No need to be a data scientist! GenSQL allows users to perform tasks like prediction, anomaly detection, and synthetic data generation with just a few keystrokes. 🔹 Advanced Modeling: It integrates datasets with probabilistic models, which handle uncertainty and adapt to new data, providing more accurate and reliable results. 🔹 Real-World Application: Think about a patient with a history of high blood pressure. GenSQL could detect an unusually low reading for this patient, which might be normal for others but significant for him. Personalized data analysis at its best! 🔹 Efficiency: GenSQL is not just powerful; it's fast! It executes queries in milliseconds, significantly outperforming other AI methods. 🔹 Future Potential: The team at MIT is working towards making GenSQL even more intuitive, aiming to allow natural language queries. Imagine having a ChatGPT-like AI expert that can interact with any database effortlessly! Challenges Addressed: ➡ Accessibility: Simplifies complex data analysis for non-technical users. ➡ Privacy: Generates synthetic data to protect sensitive information. ➡ Efficiency: Speeds up and improves the accuracy of data retrieval and analysis. ➡ Why It Matters ? GenSQL is set to revolutionize how we interact with data, making it more accessible, efficient, and insightful. Whether in healthcare, finance, or any data-driven field, GenSQL’s potential is immense. ➡ Conclusion: GenSQL represents a significant leap in database technology, making complex data analysis more accessible and efficient. Funded by DARPA, Google, and the Siegel Family Foundation, it promises to reshape how we interact with data across various fields. For more info visit : https://lnkd.in/g4jp_Kjq And feel free to share your thoughts or experiences in the comments below! 😊👇 #AI #DataScience #Innovation #MIT #TechNews #GenSQL #DataAnalysis #FutureTech #MachineLearning #LinkedinPost #Technology
To view or add a comment, sign in
-
-
I've seen a number of new companies working to apply LLMs to the data engineering and data query pipeline problem. These firms use LLMs to generate the transforms needed to take API or log file data and place it in the data warehouse. Other firms take natural language text and generate SQL queries to extract requested information. All very interesting and consistent with sustaining innovation. However, I've been wondering if there isn't a significant disruption coming. There is an underlying assumption that SQL is required as an intermediate data store to answer queries. The job-to-be done though is to ask a query and get results. Can LLMs act directly on the log file data and API data? If this becomes possible it will radically change the way the database world works - at least for AI / data science queries. This U. Chicago paper is exploring this approach https://lnkd.in/eFSyBvug
llm_db_vision_vldb23-11.pdf
raulcastrofernandez.com
To view or add a comment, sign in
-
#learningeveryday Today I learnt about the Vector Databases and ROLE OF VECTOR DATABASES IN LARGE LANGUAGE MODELS (LLMS), here are some key Insights ✔ WHAT ARE VECTOR DATABASES?✔ Imagine you're in a huge library full of books. In a traditional library (like a traditional database), books are organized in a simple way, maybe alphabetically. If you're looking for a book on a specific topic, you might have to walk through many aisles, checking each book one by one. This is similar to how traditional databases manage data - it's straightforward but not always efficient, especially when dealing with complex queries. Now, let's think of a vector database as a high-tech library. In this library, there's a smart system. When you look for a book on a specific topic, the system instantly finds all the books related to your query and brings them to you. This is possible because the books (or data, in our case) are stored not just by simple categories, but in a way that reflects their content and relationships to each other. This high-tech library is similar to vector databases. Instead of storing data in rows and columns, vector databases store data as vectors - lists of numbers that represent complex data, like the words in a language model. When you ask this database a question, it uses techniques like 'approximate nearest neighbor' (ANN) search. This is like telling the system, "Find me books similar to this one," and it quickly retrieves the most relevant books based on their content, not just their titles. So, in the world of AI and large language models, vector databases are like our high-tech libraries. They efficiently store and manage the complex, high-dimensional data that these models use, allowing for rapid and efficient retrieval of information, much like finding the perfect book in our futuristic library. ✔ROLE OF VECTOR DATABASES IN LLMs✔ 1. ⭕ Storing High-Dimensional Data : Vector databases, are built specifically for purpose to handle vast amounts of high-dimensional . They store data in a format that aligns with the vector-based nature of LLMs, enabling more efficient storage and quicker access 2. ⭕Speeding Up Data Retrieval : Vector databases utilize advanced techniques like approximate nearest neighbor (ANN) search, which drastically speeds up the process of finding the most relevant data vectors. This means when an LLM needs to access specific pieces of information, it can do so much faster, leading to better performance overall 3. ⭕Semantic Search : Semantic search is about understanding the intent and contextual meaning of search queries. Vector databases improve this aspect by efficiently organizing and retrieving data that closely matches the semantic context of a query, leading to more accurate and contextually relevant search results ✔CONCLUSION Vector databases ability to efficiently store, retrieve, and manage high-dimensional data makes them a more fitting choice for the demanding requirements of these advanced AI models.
To view or add a comment, sign in
-
Aspiring Data Analyst & Scientist | MS Excel | Python | Machine Learning | Predictive Analytics | BA In Economics | KC College Mumbai
Unlocking 📊 Data Patterns : The Role Of 𝗢𝗻𝗲 𝗛𝗼𝘁 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 𝗧𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲 In Data Science.. I Will Recommend You To See The 𝗣𝗿𝗲𝘃𝗶𝗼𝘂𝘀 𝗧𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲 So That You Are Able To Understand This In Much Better Way 👉 https://shorturl.at/6HUWU. So Previously , We Saw And Learnt About 𝗟𝗮𝗯𝗲𝗹 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 𝗧𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲. Label Encoding Technique Basically 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺𝘀 The String ( Textual ) Data Into Numerical Data. This Is Very Important Because 𝗠𝗟 ( Machine Learning ) 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺𝘀 Cannot Understand Textual Data. They Can Only Understand , Process And Interpret The 𝗡𝘂𝗺𝗲��𝗶𝗰𝗮𝗹 𝗗𝗮𝘁𝗮. And Hence We Use 𝗦𝘂𝗰𝗵 𝗧𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲𝘀 To Turn The Textual Data Into Numerical Format. Today 𝗟𝗲𝘁𝘀 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱 About The One Hot Encoding Technique. 🔶 𝗪𝗵𝗮𝘁 𝗜𝘀 𝗢𝗻𝗲 𝗛𝗼𝘁 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 ? 🔹 One hot encoding transforms categorical data into a binary format ( 0 and 1 ) for machine learning algorithms. 🔹 It is used in machine learning to handle categorical variables without assuming any order or hierarchy among them. 🔹 Each category becomes a separate column where 1 indicates the presence and 0 indicates the absence of that category. 🔶 𝗜𝗺𝗽𝗼𝗿𝘁𝗮𝗻𝗰𝗲 𝗢𝗳 𝗢𝗻𝗲 𝗛𝗼𝘁 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 :- 🔹 Essential for algorithms that need numerical input , like neural networks and linear regression models. 🔹 Helps capture patterns and relationships in categorical data that are otherwise challenging for algorithms to interpret directly. 🔶 𝗗𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝗰𝗲 𝗕𝗲𝘁𝘄𝗲𝗲𝗻 𝗟𝗮𝗯𝗲𝗹 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 𝗔𝗻𝗱 𝗢𝗻𝗲 𝗛𝗼𝘁 𝗘𝗻𝗰𝗼𝗱𝗶𝗻𝗴 :- 🔹 Label encoding converts categories into numerical labels (0, 1, 2) based on their order, useful for ordinal data like "low" , "medium" , "high." whereas , 🔹 One hot encoding transforms categories into binary columns ( [1, 0, 0] , [0, 1, 0] ), ideal for categorical data without natural order, ensuring each category is represented independently in machine learning models. For Better Understanding , You Can 𝗥𝗲𝗳𝗲𝗿 𝗧𝗵𝗲 𝗜𝗺𝗮𝗴𝗲 Which I Have Attached. So , This Was 𝗔𝗹𝗹 𝗔𝗯𝗼𝘂𝘁 The One Hot Encoding Technique. Really Very Useful As 𝗜 𝗛𝗮𝘃𝗲 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗮𝗹𝗹𝘆 𝗨𝘀𝗲𝗱 This Technique.. To 𝗧𝗿𝗮𝗻𝘀𝗳𝗼𝗿𝗺 The Textual Data Into Numerical + Every Unique Data Will Have Its Independent Column Indicated By ( 0 And 1 ) Showcasing 𝗔𝗯𝘀𝗲𝗻𝗰𝗲 𝗢𝗿 𝗣𝗿𝗲𝘀𝗲𝗻𝗰𝗲. And Then With The Transformed Data 𝗜 𝗧𝗿𝗮𝗶𝗻𝗲𝗱 𝗧𝗵𝗲 𝗠𝗟 𝗔𝗹𝗴𝗼𝗿𝗶𝘁𝗵𝗺 And Finally 𝗠𝗮𝗱𝗲 𝗔 𝗠𝗼𝗱𝗲𝗹 Which 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝘀 From The Past Data As Well As New Data. 🔶 𝗜 𝗣𝗼𝘀𝘁 𝗢𝗻 𝗟𝗶𝗻𝗸𝗲𝗱𝗜𝗻 𝗔 𝗟𝗼𝘁 𝗢𝗳 𝗜𝗻𝘀𝗶𝗴𝗵𝘁𝘀 💡 𝗔𝗯𝗼𝘂𝘁 :- 🔹 Data Science 🔹 Data Analytics 🔹 Business So 𝗜𝗳 𝗬𝗼𝘂 𝗗𝗼 𝗡𝗼𝘁 𝗪𝗮𝗻𝘁 𝗧𝗼 𝗠𝗶𝘀𝘀 Further Post's On Such Domains And Want To Learn From My Experiences.. You Can 𝗙𝗼𝗹𝗹𝗼𝘄 Shubham Parihar. PEACE . #datascience #datascientist #dataanalyst #onehotencoding #python
To view or add a comment, sign in
-
-
🔍 𝐀𝐫𝐞 𝐲𝐨𝐮 𝐰𝐨𝐧𝐝𝐞𝐫𝐢𝐧𝐠 𝐚𝐛𝐨𝐮𝐭 𝐭𝐡𝐞 𝐜𝐚𝐩𝐭𝐢𝐯𝐚𝐭𝐢𝐧𝐠 𝐫𝐞𝐚𝐥𝐦𝐬 𝐨𝐟 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐚𝐧𝐝 𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞, 𝐚𝐧𝐝 𝐡𝐨𝐰 𝐭𝐡𝐞𝐲'𝐫𝐞 𝐭𝐫𝐚𝐧𝐬𝐟𝐨𝐫𝐦𝐢𝐧𝐠 𝐨𝐮𝐫 𝐰𝐨𝐫𝐥𝐝? 🌐 Let's embark on an enlightening journey to unravel these tech marvels! 🚀 📊𝐖𝐡𝐚𝐭 𝐞𝐱𝐚𝐜𝐭𝐥𝐲 𝐬𝐞𝐭𝐬 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐚𝐧𝐝 𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐚𝐩𝐚𝐫𝐭? 💡Data Science 📊 focuses on extracting valuable insights from structured and unstructured data using techniques like machine learning and data visualization. On the other hand, Artificial Intelligence 🤖 is all about creating machines that can perform tasks requiring human intelligence, such as decision-making and language translation. 📜𝐖𝐡𝐚𝐭'𝐬 𝐭𝐡𝐞 𝐡𝐢𝐬𝐭𝐨𝐫𝐢𝐜𝐚𝐥 𝐜𝐨𝐧𝐭𝐞𝐱𝐭 𝐛𝐞𝐡𝐢𝐧𝐝 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐚𝐧𝐝 𝐀𝐈? 💡Data Science has a rich history dating back to visionaries like John W. Tukey and Peter Naur. 🕰️ Tukey highlighted the potential of data analysis in 1962, while Naur contributed significantly to defining Data Science as a discipline. Over the years, Data Science evolved with the growth of big data, resulting in tools like Hadoop and deep learning. 🧠 On the other hand, the concept of AI has roots dating back to the mid-20th century, with key milestones including Alan Turing's work and the official coining of the term 'Artificial Intelligence' in 1956 at the Dartmouth Conference. AI's evolution saw "winters" of reduced funding followed by breakthroughs, and it's now integrated into various aspects of our lives. 🔄𝐇𝐨𝐰 𝐝𝐨 𝐃𝐚𝐭𝐚 𝐒𝐜𝐢𝐞𝐧𝐜𝐞 𝐚𝐧𝐝 𝐀𝐈 𝐜𝐨𝐥𝐥𝐚𝐛𝐨𝐫𝐚𝐭𝐞 𝐚𝐧𝐝 𝐛𝐞𝐧𝐞𝐟𝐢𝐭 𝐞𝐚𝐜𝐡 𝐨𝐭𝐡𝐞𝐫? 💡Data Science provides the necessary data insights and predictive models that fuel AI algorithms, while AI introduces advanced machine learning techniques like deep learning and NLP to Data Science. Their collaboration is evident in industries like healthcare, finance, marketing, and supply chain management, driving innovation and efficiency. Together, they shape our tech-driven future! 🌟 🚀 So, whether you're a tech enthusiast, a budding data scientist, or simply curious about the forces shaping our world, this article is your gateway to understanding the magic behind Data Science and Artificial Intelligence. Dive in and explore the future!🚀 𝐋𝐞𝐚𝐫𝐧 𝐦𝐨𝐫𝐞: https://lnkd.in/gQzd-uRp #blockchaincouncil #machinelearning #USA #UK #ml #AI #datascience
What is the Difference Between Data Science and Artificial Intelligence?
https://www.blockchain-council.org
To view or add a comment, sign in
-
AI Enthuisiastic -- Passionate about building products leveraged by AI.Recenlty delving into the wonders of Quantum Computing for Machine Learning (QML).
As AI models ( i.e large Language models) grown into exponential size , the need to see the biases learnt by the model will rise once it is decided to put into production for a specific commercial usage or any organization's internal consumption. Now that most deep learnig engineers are well aware about tools like SHAP. Here is an old, not too popular ML-Explainability library : Shapash. Check it out : https://lnkd.in/gu-UZJwN
Shapash: Making Machine Learning Models Understandable - KDnuggets
kdnuggets.com
To view or add a comment, sign in
-
Machine Learning Engineer (Nedgia)|Azure Data Scientist|Power Engineer|Mathematician|MS Big Data in Sports
One of the most challenging tasks for a data scientist is data cleansing and wrangling. Various studies suggest that these tasks could account for up to 70% of the total time spent on implementing a machine learning Proof of Concept (POC). This article explores how the power of Large Language Models (LLMs) such as GPT-4, Mistral, etc., can simplify the preparatory tasks of data cleansing and wrangling, making it easier to build algorithms. https://lnkd.in/djU9QUKn
Automated Detection of Data Quality Issues
towardsdatascience.com
To view or add a comment, sign in
-
“Stanford researchers introduce an approach to grounding conversational agents in hybrid data sources, utilizing both structured data queries and free-text retrieval techniques…In real-life restaurant experiments, SUQL demonstrates 93.8% and 90.3% turn accuracy in single-turn and conversational queries respectively, surpassing linearization-based methods by up to 36.8% and 26.9%.” #AI #chatagents #LLM #data #sql
Researchers at Stanford Introduce SUQL: A Formal Query Language for Integrating Structured and Unstructured Data
https://www.marktechpost.com
To view or add a comment, sign in
-
Transforming natural language into SQL queries? It's not just a dream anymore. Artificial intelligence continues to push the boundaries of what we thought was possible. This groundbreaking approach to transforming natural language into actionable SQL queries has the potential to revolutionize the way we interact with databases. " Imagine the time and effort saved when we no longer have to manually write complex SQL queries. With the power of GPT models, we can now communicate with databases using everyday language." Whether you're a data scientist, analyst, or database administrator, this article will make you rethink your approach to querying data. Share your thoughts on this game-changing technology in the comments. Are you excited about the potential it holds? Let's discuss! https://lnkd.in/dKJvR-Vv
Leveraging GPT Models to Transform Natural Language to SQL Queries - KDnuggets
kdnuggets.com
To view or add a comment, sign in