RSS
Sort by: Publisher Date Title
  • Citation Logo
    Mechanistic Interpretability: A Whirlwind Tour
    Publisher: FAR.AI
    2025-06-24

    Mechanistic Interpretability: A Whirlwind Tour

    Neel Nanda presents a tour of mechanistic interpretability, arguing that machine learning models develop human-comprehensible algorithms even without explicit guidance. He explains how techniques like sparse autoencoders help uncover hidden model str...
    Neel Nanda presents a tour of mechanistic interpretability, arguing that machine learning models develop human-comprehensible algorithms even without explicit guidance. He explains how techniques like sparse autoencoders help uncover hidden model str...

    Tags: mechanistic interpretability

    5 1
  • Citation Logo
    Mechanistic Interpretability explained
    Publisher: Lex Fridman
    2025-06-24

    Mechanistic Interpretability explained

    In this discussion, Chris Olah explains mechanistic interpretability, a field focused on understanding the algorithms inside neural networks by “growing” them rather than programming them directly. He walks through how features and circuits emerg...
    In this discussion, Chris Olah explains mechanistic interpretability, a field focused on understanding the algorithms inside neural networks by “growing” them rather than programming them directly. He walks through how features and circuits emerg...

    Tags: artificial intelligence, mechanistic interpretability

    5 1
  • Citation Logo
    The Dark Matter of AI [Mechanistic Interpretability]
    Publisher: Welch Labs
    2025-06-05

    The Dark Matter of AI [Mechanistic Interpretability]

    This video explores how researchers use mechanistic interpretability—especially sparse autoencoders—to uncover hidden, human‐understandable features in large language models. It highlights the challenges of pinning down internal model behavio...
    This video explores how researchers use mechanistic interpretability—especially sparse autoencoders—to uncover hidden, human‐understandable features in large language models. It highlights the challenges of pinning down internal model behavio...

    Tags: mechanistic interpretability

    5 1
Page 1 of 1
Powered by Optimal Access

Author Tags