Subscribe for free access to Data Points!
Anthropic researchers used new interpretability techniques modeled on laboratory biology to examine how Claude processes information internally. By conducting experiments modifying Claude’s internal states, the team discovered that Claude plans ahead when writing poetry, uses parallel processing paths for mental math, and operates in a shared conceptual space across different languages. Although these methods only capture part of the total computations happening inside LLMs, these findings could help researchers better understand how AI systems work and could lead to more reliable and transparent models. (Anthropic)