Ideas

Topic Page · MAIA Fellowship

MIT AI Safety Fundamental Notes

My notes and takeaways from the MAIA Fellowship, covering recent papers on AI alignment, security, and evaluations.

Note · MAIA Fellowship · Jun 2026

AI Governance and Liability

Tort law, compute governance, export controls, China, institutional accountability, and the regulatory toolbox for governing frontier AI.

Read note →

Note · MAIA Fellowship · Jun 2026

Control and Scalable Oversight

AI control, resampling, monitoring, weak-to-strong generalization, debate, and oversight strategies for systems humans cannot fully inspect unaided.

Read note →

Note · MAIA Fellowship · Jun 2026

Inner Alignment

Deception, reward tampering, mesa-optimization, goal misgeneralization, and why learned objectives may diverge from training objectives.

Read note →

Note · MAIA Fellowship · Jun 2026

Interpretability and Evals

Attribution graphs, linear probes, capability evaluations, propensity evaluations, and alignment auditing as tools for making model behavior legible.

Read note →

Note · MAIA Fellowship · Jun 2026

Outer Alignment

Reward misspecification, specification gaming, RLHF, and the gap between intended objectives and operationalized training signals.

Read note →

Note · MAIA Fellowship · Jun 2026

Threat Models

Instrumental convergence, power-seeking, bioterrorism, cyberwarfare, and gradual disempowerment as different ways AI systems could create risk.

Read note →

Note · MAIA Fellowship · Jun 2026

Transformative AI and Current Trajectory

Scaling drivers, capability trends, and time-horizon forecasts for thinking about whether AGI-like systems may arrive sooner than institutions expect.

Read note →

Memory Palace

Natural Language Processing

Parsing

Model Evaluation

Neural Language Models

Transformers, From First Principles

MIT AI Safety Fundamental Notes

AI Governance and Liability

Control and Scalable Oversight

Inner Alignment

Interpretability and Evals

Outer Alignment

Threat Models

Transformative AI and Current Trajectory

Costly Signaling Framework

Hypothesis Testing for Explainable Vietnamese Legal Relation Classification

Two Equilibria, One Dilemma

China's Digital Authoritarianism: A Realist Approach