Aspiring AI Interpretability Researcher
Luis Miguel
Montoya
Understand to Align
Exploring the journey from physics to AI interpretability research.
Journey Sep 2025
August Recap 2025
ARENA, a MATS mini‑sprint, the first weeks of OMSCS, and what I’m doing next.
Project Sep 2025
Towards Disentangling Latent Content and Behavioral Inhibition in Taboo Language Models
Project log for my MATS 9.0 application work probing how latent content and behavioral inhibition interact inside Taboo-fine-tuned Gemma models.
Literature Sep 2025
A Mathematical Framework for Transformer Circuits
Anthropic’s foundational write-up that formalizes how circuits emerge inside transformers and how to reason about them.
Paper