News from Mars

May 22, 2025

ARES: Open-Source Infrastructure for Online RL on Coding Agents

We're releasing ARES, our internal framework for training coding agents with online reinforcement learning. ARES enables real-time feedback loops that dramatically outperform static supervised approaches and scales with compute.

8 min read

Apr 14, 2025

Mechanistic Interpretability

K-Steering: Targeted Representation Intervention via Activation Space

How do you change what a model does without breaking everything else? K-Steering offers a surgical approach to representation-level intervention — identifying and modifying key feature directions to achieve precise behavioral control without catastrophic forgetting.

12 min read

Mar 3, 2025

Interpretability

Beyond Static Mechanistic Interpretability: Agentic Long-Horizon Tasks as the Next Frontier

Static circuit analysis reveals important structures in neural networks. But understanding deployed AI agents requires thinking about long-horizon, agentic behavior. We argue that the next phase of mech interp must go dynamic.

10 min read

Feb 19, 2025

Evaluation

Code Review Bench v0: A Rigorous Benchmark for LLM Code Review

We introduce Code Review Bench, a benchmark for evaluating how well large language models perform code review. We measure correctness, depth of analysis, actionability of feedback, and alignment with expert human reviewers.

7 min read

Jan 30, 2025

Community

The Interpretability Prize: Part II — What We Learned from 500+ Submissions

Our second interpretability challenge attracted over 500 submissions from 47 countries. This post reviews the winning approaches, what they reveal about model internals, and what surprised us most about the field's current state.

6 min read

Jan 8, 2025

Perspective

Why the End of Science Would Begin With AI We Don't Understand

Nobel Prizes are now being awarded for black-box models predicting protein structures. This is remarkable — and terrifying. If neural networks become the future of science while remaining opaque, we've traded explanation for prediction.

9 min read

Dec 12, 2024

Research

Feature Geometry in Large Language Models: What the Topology of Representations Tells Us

We examine the geometric structure of learned representations in frontier LLMs. Using tools from algebraic topology and information geometry, we identify persistent patterns that constrain how meaning is encoded across model scale.

14 min read

Nov 28, 2024

Community

Launching the Interpretability Prize: A Call for the Field

We're putting up $50,000 for the best mechanistic interpretability result of 2024. This post explains our motivation, the prize criteria, and why we think open challenges are essential for accelerating progress in this space.

5 min read