Mapping AI Models 🗺

CO/AI

The fundamental question of our time is how to keep humans empowered in the age of AI. CO/AI is our answer.

Published May 29, 2024

+ Follow

New Research From Anthropic (the maker of AI chatbot Claude) offers a detailed look inside a modern large language model
A subfield of AI research: Mechanistic Interpretability Aims to understand how these models work by examining their internal mechanisms
For the first time, Anthropic made significant strides in interpreting AI models, specifically Claude 3 Sonnet, using a technique called "dictionary learning."
Finding Patterns: They identified approximately 10 million patterns, or "features," which represent different concepts within the model.
Examples of Features:
San Francisco Feature: Activated when discussing San Francisco.
Scientific Terms: Activated for topics like immunology or elements like lithium.

Model Output changes when these Features are activated ⬆️

When these features are triggered the Model Output changes as seen above

The work has really just begun. The features we found represent a small subset of all the concepts learned by the model during training

This is the first step! in understanding models, tracing LLMs from training data to final output.

CO/AI Letter

350 followers

+ Subscribe

Anthony Batt

Digital Product Designer, Entrepreneur

1mo

I've been fascinated by the work of the Anthropic team, specifically their focus on introspection. I avoid using the term "Mechanistic Interpretability," as it tends to confuse people. Instead, I explain that the creators of LLMs largely don't understand how the neural networks function, but they do have some insights. They are developing tools to observe how an LLM connects information and generates a response, similar to an MRI machine for an LLM. While people often find this intriguing and ask further complex questions, I always attempt to provide simple answers. It's exciting to see Anthropic making progress in this area.

To view or add a comment, sign in

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Mapping AI Models 🗺

CO/AI

The fundamental question of our time is how to keep humans empowered in the age of AI. CO/AI is our answer.

CO/AI Letter

350 followers

More articles by this author

Sign in

Explore topics

CO/AI Letter

350 followers

Consulting Firms + AI

Jun 7, 2024

NVIDIA: The Future of AI Avatars

Jun 5, 2024

TV Turns to Gen AI

Jun 3, 2024

Tempus AI to IPO

May 31, 2024

AI-Powered Search

May 27, 2024

CAA Digitizes Clients

May 24, 2024

Google's AlpaFold 3

May 10, 2024

The 7 Wonders of The World

May 8, 2024

Human Data Interaction

May 6, 2024

The Race for Proprietary Data

May 3, 2024

Sign in

Explore topics