DeepMind CEO Demis Hassabis Sets Einstein Test as New AGI Benchmark

June 14, 2026 By Blab.com AI Team

In a February 2026 panel discussion, DeepMind chief executive Demis Hassabis announced a new benchmark for artificial general intelligence (AGI) that he calls the “Einstein test.” The test asks an AI system to derive Einstein’s general theory of relativity using only the scientific knowledge available before 1911, a task that no current model can complete.

The Einstein test is a concrete illustration of Hassabis’s view that true AGI must be able to create new scientific paradigms, not merely solve problems within existing frameworks. According to reports, the test would involve training a large language model on all human knowledge up to a chosen cutoff—1901 or 1911—and then prompting it to produce the equations and conceptual leap that led to general relativity in 1915. The system would have to generate the insight independently, rather than retrieve or re‑explain Einstein’s published work.

Hassabis has repeatedly said that DeepMind’s most celebrated achievements do not meet this bar. The company’s AlphaFold model, which earned a Nobel Prize in Chemistry in 2024 for protein‑folding predictions, operates within a well‑defined problem space with known rules. In the same vein, solving the Erdős problems—hard open questions in mathematics—does not demonstrate the ability to invent a new paradigm. The distinction, the CEO says, is between applying existing knowledge to difficult tasks and generating novel scientific concepts.

In early 2025 Hassabis estimated that AGI would be “probably three to five years away.” By 2026 he revised that estimate to around 2030, give or take one year. The revised timeline is based on the difficulty of the Einstein test and on the current pace of progress in foundational AI research. The CEO has also warned that the “jagged intelligence” of today’s models must be smoothed before AGI arrives.

The Einstein test has implications beyond DeepMind. Other AI labs use more permissive definitions of AGI. OpenAI, for example, has historically tied the term to economic output, describing an AGI system as one that can perform most economically valuable work that humans can. That definition is far lower than the creative leap required by Hassabis’s benchmark. Anthropic, Meta, and other competitors have not publicly committed to the Einstein test.

DeepMind’s history of breakthroughs provides context for the new benchmark. The lab was founded in 2010, acquired by Google in 2014, and merged with Google Brain in 2023 to become Google DeepMind. Its early successes include AlphaGo, which defeated world champion Lee Sedol in 2016, and AlphaZero, which mastered chess, shogi, and Go through self‑play. More recent achievements include AlphaFold, which achieved state‑of‑the‑art protein‑folding predictions, and AlphaTensor, a system that discovered new matrix‑multiplication algorithms.

The Einstein test also highlights the broader debate over AGI’s definition. Some researchers argue that the ability to solve hard problems within known domains is sufficient evidence of general intelligence, while others, like Hassabis, insist that the capacity to invent new scientific theories is the true test. The test’s focus on pre‑1911 knowledge ensures that the AI cannot simply copy existing explanations; it must generate the conceptual framework that Einstein used.

No AI system to date has passed the Einstein test. According to public statements, current large language models can regurgitate Einstein’s equations or explain them after being prompted with the relevant text, but they cannot produce the original derivation from first principles using only early‑20th‑century data. The test therefore remains an aspirational goal.

Looking ahead, DeepMind continues to invest in foundational research and safety studies. The company’s 145‑page safety paper, released in 2025, outlines potential risks and mitigation strategies for AGI. While the Einstein test sets a high bar, it also provides a clear, measurable target for the AI community.

In summary, Demis Hassabis has defined a new, stringent benchmark for AGI that requires an AI to independently derive general relativity from pre‑1911 knowledge. No existing model satisfies this criterion, and DeepMind’s own timeline places the arrival of true AGI around 2030. The Einstein test has sparked discussion about how to measure general intelligence and may shape future research priorities across the industry.

DeepMind CEO Demis Hassabis Sets Einstein Test as New AGI Benchmark

Latest AI Stories

McKinsey Calls for Governments to Move Beyond AI Pilots to Transform Public Services

Shanghai Electric Unveils New Humanoid Robots and AI-Native Factory Framework at WAIC 2026

Clark State College Secures $100,000 Grant to Advance AI Literacy Across Ohio Community Colleges

Meta and Snapchat Push AI-Powered Glasses Amid Growing Privacy Backlash

OpenAI Faces Local Opposition as $20B Data Center Plan Unveiled in Effingham County, Georgia

Independent Musicians Join Class Action Against AI Music Platforms Suno and Udio

Financial Firms Ramp Up AI Spending, But ROI Measurement Lags Behind

Open-Source AI Models: What "Open" Really Means for Large Language Models

Cigna Expands AI-Enabled Predictive Tools to Cut Chronic-Condition Costs by $200 Million