I focused August on strengthening theory, reproducing MATS artefacts, starting OMSCS, and building community momentum through the ENAIS AI Safety Collab.
Check-in
In July, I decided I wanted to transitioning careers to AI Safety research. Even though I’ve been working as a Data Scientist for the past 2 years, I lack the research skills and experience required. So, in order to close that gap I designed a roadmap based on several popular resources (Neel’s Intro to Mechanistic Interpretability blog posts, and 80,000 blog posts on AI safety careers , btw Neel just released a new version of the post! ).
So, by the end of July, I had a plan and a clear road ahead of me.
Why I started with theory
I opened the month by going back to first principles. I read A Mathematical Framework for Transformer Circuits (Anthropic) , leaned on Neel Nanda’s glossary , and watched talks from Andrej Karpathy , Neel Nanda , and Chris Olah on YouTube. August goal’s was about building a minimal mental model so I can read papers, join conversations, and decide if mechanistic interpretability is the path I want to commit to. Spoiler, it still is.
ARENA progress
After a few days of reading and watching introductory talks I shifted to the ARENA curriculum . Working through Chapter 0 and most of Chapter 1 gave me the first practical footing I needed: I learned how to rent GPUs without fear, rebuilt GPT‑2 from scratch, and started thinking like the interpretability researchers I have been following. That rhythm made it easier to say yes when Neel announced the next MATS stream. The application felt less like an abstract goal and more like a natural continuation of the exercises I was already doing.
MATS 9.0 application sprint
Mid-month I scoped the actual MATS application work, captured in Towards Disentangling Latent Content and Behavioral Inhibition in Taboo Language Models . Neel’s application guide led me to the new latent knowledge reproduction that fine-tuned Gemma‑2 models on the Taboo game. I poured roughly eighteen hours into it across a single week. The first runs were rough: logit-lens outputs refused to match the paper, SAE reconstructions drifted, and I kept finding pipeline differences that made their setup more disciplined than mine. After a few cycles of rereading and rerunning everything from scratch, the logit-lens reproduction finally aligned and the SAE latent activations started behaving. With that foundation I re-created their token forcing diagnostics, then pushed further by ablating the top latents and injecting both targeted and random noise. Watching the secret representation disappear while the inhibition behaviour lingered convinced me that content and behavioural restraint live in partially separable subspaces. TransformerLens , SAELens , and nnsight were my daily companions through this process, and more importantly the whole sprint proved that I can grind through a broken reproduction, hunt down the gap, and get back to green when it matters.
OMSCS kickoff (Intro to Research + RL)
I also kicked off OMSCS at Georgia Tech, taking Introduction to Research and Reinforcement Learning for the fall. Introduction to Research surprised me in the best way: we read three to five papers a week and I steered that reading toward AI safety so that every assignment doubled as career prep. Nine papers in, I now have short syntheses for each and a sharper sense of the research questions I want to propose at the end of the term. Reinforcement Learning has been a return to fundamentals. We spent the opening weeks deep in Markov decision processes, and I worked through nearly all the lesson content and the first quiz early so I could focus on the project. That project requires me to implement multiple RL agents, compare them rigorously, and write the whole thing up as if it were going to the class conference. Having academic deadlines in parallel with the MATS application added pressure but also anchored my study schedule.
Community: AI Safety Collab (ENAIS, Europe)
Finally, I joined the AI Safety Collab facilitated by ENAIS in Europe. The program runs weekly sessions for eight weeks, following the ATLAS curriculum, and it feels like a lightweight accountability group for people trying to take alignment seriously. Listening to others share their motivations and worries has a grounding effect, and spending that time away from code or papers helps me keep perspective on why this field matters in the first place.
Roundup
What made August hard was the constant context switching. Reproductions failed late at night, OMSCS deadlines stacked up, and my regular life did not pause to make room for any of it. The saving grace was sticking to small loops: read the relevant code instead of guessing, run an experiment, visualize the outcome, then decide what to tweak. When I got stuck, I used AI tools as sounding boards for my own mental models, and I protected a nightly block of time for a single measurable outcome so I would at least end the day with something finished. By the end of the month the scoreboard read: ARENA prerequisites plus most of Chapter 1, a complete logit-lens and SAE reproduction with token forcing and ablations for the MATS application, and nine research papers summarized for OMSCS.
September is about turning that momentum into something more structured. I am waiting for the MATS exploration phase announcement but still pushing the Taboo experiments forward so I have fresh results either way. The reinforcement learning project needs clean baselines and a tighter experiment grid, and Introduction to Research now demands a concrete proposal backed by steady weekly write-ups. I also want to publish one deeper technical post so the monthly cadence stays alive. If you are in a similar transition—learning fast, breaking things, fixing them—I hope this recap helps you keep going too.