Projects

A collection of research projects, tools, and experiments in AI safety, interpretability, and machine learning.

Towards Disentangling Latent Content and Behavioral Inhibition in Taboo Language Models

Project log for my MATS 9.0 application work probing how latent content and behavioral inhibition interact inside Taboo-fine-tuned Gemma models.

Tech Stack