RSS

Sort by: Publisher Date Title

Publisher: FAR.AI
Author: Neel Nanda 2025-06-24

Mechanistic Interpretability: A Whirlwind Tour

Neel Nanda presents a tour of mechanistic interpretability, arguing that machine learning models develop human-comprehensible algorithms even without explicit guidance. He explains how techniques like sparse autoencoders help uncover hidden model str...

Neel Nanda presents a tour of mechanistic interpretability, arguing that machine learning models develop human-comprehensible algorithms even without explicit guidance. He explains how techniques like sparse autoencoders help uncover hidden model str...
Watch Now

Neel Nanda presents a tour of mechanistic interpretability, arguing that machine learning models develop human-comprehensible algorithms even without explicit guidance. He explains how techniques like sparse autoencoders help uncover hidden model str...
Watch Now

Tags: mechanistic interpretability

5 1
Publisher: Lex Fridman
Author: Chris Olah 2025-06-24

Mechanistic Interpretability explained

In this discussion, Chris Olah explains mechanistic interpretability, a field focused on understanding the algorithms inside neural networks by “growing” them rather than programming them directly. He walks through how features and circuits emerg...

In this discussion, Chris Olah explains mechanistic interpretability, a field focused on understanding the algorithms inside neural networks by “growing” them rather than programming them directly. He walks through how features and circuits emerg...
Watch Now

In this discussion, Chris Olah explains mechanistic interpretability, a field focused on understanding the algorithms inside neural networks by “growing” them rather than programming them directly. He walks through how features and circuits emerg...
Watch Now

Tags: artificial intelligence, mechanistic interpretability

5 1
Publisher: Welch Labs
Author: No Author 2025-06-05

The Dark Matter of AI [Mechanistic Interpretability]

This video explores how researchers use mechanistic interpretability—especially sparse autoencoders—to uncover hidden, human‐understandable features in large language models. It highlights the challenges of pinning down internal model behavio...

This video explores how researchers use mechanistic interpretability—especially sparse autoencoders—to uncover hidden, human‐understandable features in large language models. It highlights the challenges of pinning down internal model behavio...
Watch Now

This video explores how researchers use mechanistic interpretability—especially sparse autoencoders—to uncover hidden, human‐understandable features in large language models. It highlights the challenges of pinning down internal model behavio...
Watch Now

Tags: mechanistic interpretability

5 1

Page 1 of 1

Mechanistic Interpretability: A Whirlwind Tour

Mechanistic Interpretability explained

The Dark Matter of AI [Mechanistic Interpretability]

Author Tags