DeepMind CEO Demis Hassabis Sets Einstein Test as New AGI Benchmark
The Einstein test is a concrete illustration of Hassabis’s view that true AGI must be able to create new scientific paradigms, not merely solve problems within existing frameworks. According to reports, the test would involve training a large language model on all human knowledge up to a chosen cutoff—1901 or 1911—and then prompting it to produce the equations and conceptual leap that led to general relativity in 1915. The system would have to generate the insight independently, rather than retrieve or re‑explain Einstein’s published work.
Hassabis has repeatedly said that DeepMind’s most celebrated achievements do not meet this bar. The company’s AlphaFold model, which earned a Nobel Prize in Chemistry in 2024 for protein‑folding predictions, operates within a well‑defined problem space with known rules. In the same vein, solving the Erdős problems—hard open questions in mathematics—does not demonstrate the ability to invent a new paradigm. The distinction, the CEO says, is between applying existing knowledge to difficult tasks and generating novel scientific concepts.
In early 2025 Hassabis estimated that AGI would be “probably three to five years away.” By 2026 he revised that estimate to around 2030, give or take one year. The revised timeline is based on the difficulty of the Einstein test and on the current pace of progress in foundational AI research. The CEO has also warned that the “jagged intelligence” of today’s models must be smoothed before AGI arrives.
The Einstein test has implications beyond DeepMind. Other AI labs use more permissive definitions of AGI. OpenAI, for example, has historically tied the term to economic output, describing an AGI system as one that can perform most economically valuable work that humans can. That definition is far lower than the creative leap required by Hassabis’s benchmark. Anthropic, Meta, and other competitors have not publicly committed to the Einstein test.
DeepMind’s history of breakthroughs provides context for the new benchmark. The lab was founded in 2010, acquired by Google in 2014, and merged with Google Brain in 2023 to become Google DeepMind. Its early successes include AlphaGo, which defeated world champion Lee Sedol in 2016, and AlphaZero, which mastered chess, shogi, and Go through self‑play. More recent achievements include AlphaFold, which achieved state‑of‑the‑art protein‑folding predictions, and AlphaTensor, a system that discovered new matrix‑multiplication algorithms.
The Einstein test also highlights the broader debate over AGI’s definition. Some researchers argue that the ability to solve hard problems within known domains is sufficient evidence of general intelligence, while others, like Hassabis, insist that the capacity to invent new scientific theories is the true test. The test’s focus on pre‑1911 knowledge ensures that the AI cannot simply copy existing explanations; it must generate the conceptual framework that Einstein used.
No AI system to date has passed the Einstein test. According to public statements, current large language models can regurgitate Einstein’s equations or explain them after being prompted with the relevant text, but they cannot produce the original derivation from first principles using only early‑20th‑century data. The test therefore remains an aspirational goal.
Looking ahead, DeepMind continues to invest in foundational research and safety studies. The company’s 145‑page safety paper, released in 2025, outlines potential risks and mitigation strategies for AGI. While the Einstein test sets a high bar, it also provides a clear, measurable target for the AI community.
In summary, Demis Hassabis has defined a new, stringent benchmark for AGI that requires an AI to independently derive general relativity from pre‑1911 knowledge. No existing model satisfies this criterion, and DeepMind’s own timeline places the arrival of true AGI around 2030. The Einstein test has sparked discussion about how to measure general intelligence and may shape future research priorities across the industry.