Towards Disentangling Latent Content and Behavioral Inhibition in Taboo Language Models
In ProgressProject log for my MATS 9.0 application work probing how latent content and behavioral inhibition interact inside Taboo-fine-tuned Gemma models.
Tech Stack
Python PyTorch TransformerLens SAELens
mechanistic-interpretability mats-9 taboo-models gemma
In Progress Read More