Commentary

The Future of AI in Science in 2026: Predictions from top researchers and CEOs at Google, OpenAI, Anthropic & More

April 21, 2026

Five of the most influential people in AI are looking at the same technology and reaching very different conclusions. Some see a tool. Some see a collaborator. Some see a risk.

Let's unpack.

The headline prediction: AI starts making small discoveries

Sam Altman says 2026 will likely bring systems able to "figure out novel insights." OpenAI says AI may make "very small discoveries" in 2026 and more significant ones from 2028 onward (OpenAI 2025).
Dario Amodei says powerful AI could compress "50 to 100 years of biological progress" into "5 to 10 years" (Amodei 2024).
Demis Hassabis expects a "new golden era of discovery" within 10 to 15 years.
Yoshua Bengio wants a safer "Scientist AI" that explains the world without acting as an autonomous agent (Bengio et al. 2025).
Yann LeCun and Fei-Fei Li argue that language models alone are not enough; AI will need world models that understand physical reality (LeCun 2022).

In The Gentle Singularity, Altman wrote that 2025 brought agents that can do real cognitive work, that 2026 will likely bring systems able to "figure out novel insights," and that 2027 may bring robots that can act in the physical world. A model that summarizes known literature is useful. A model that finds a new proof step, proposes a mechanism, or suggests a wet-lab experiment that works is something different. It starts to participate in knowledge production.

OpenAI's GPT-5 science case studies report work across mathematics, physics, biology, computer science, astronomy, and materials science. GPT-5 helped researchers synthesize known results, search literature, accelerate computations, and generate new proof ideas. But OpenAI also says the strongest results came from human-AI teams, with scientists defining the questions, choosing methods, critiquing ideas, and validating results (OpenAI 2025).

OpenAI: from assistant to scientific collaborator

Its OpenAI for Science effort says the goal is to help researchers "explore more ideas, test hypotheses faster, and uncover insights" by combining frontier models with tools, workflows, and collaborations. The company is clear that specialized scientific tools remain essential: simulation engines, protein databases, computer algebra systems, and domain-specific infrastructure still matter.

In biology, OpenAI reports that GPT-5 identified a likely mechanism behind a puzzling immune-cell result from an unpublished chart and suggested an experiment that later supported the mechanism. In mathematics, it helped researchers complete a proof related to a problem originally proposed by Paul Erdős. In algorithms and optimization, it found a clear example showing a common decision-making method can fail (OpenAI 2025).

OpenAI says GPT-5 is not autonomous, can hallucinate citations, mechanisms, or proofs, and can miss domain-specific subtleties. For scientists, that is the central constraint, not a footnote.

In 2026, OpenAI is also moving from general models toward domain-specific scientific systems. GPT-Rosalind, introduced in April 2026, is a life-sciences model built for biology, drug discovery, and translational medicine, with improved tool use across chemistry, protein engineering, and genomics (OpenAI 2025).

Anthropic: biology could move 10 times faster, but the lab still matters

In Machines of Loving Grace, Amodei argues that powerful AI could accelerate biological discovery by at least 10-fold, producing "50 to 100 years of biological progress" in "5 to 10 years" (Amodei 2024). But he also adds a constraint: biology has irreducible delays. Experiments, animal studies, hardware, and facilities cannot always be compressed into software time.

Amodei is not saying intelligence is magic. He is saying that many bottlenecks in biology are cognitive: reading literature, forming hypotheses, designing experiments, interpreting data, debugging protocols, and coordinating complex work. AI can speed those up. But wet-lab cycles, clinical trials, manufacturing, and regulation still impose real latency.

Anthropic connects Claude to scientific platforms including ClinicalTrials.gov, ToolUniverse, bioRxiv, medRxiv, Open Targets, ChEMBL, Benchling, 10x Genomics, PubMed, and Wiley Scholar Gateway, and adds skills for scientific problem selection, bioinformatics, instrument-data conversion, and clinical-trial protocol drafting (Anthropic 2026).

In The Adolescence of Technology, Amodei describes powerful AI as potentially arriving within one to two years and uses the metaphor of a "country of geniuses in a datacenter." He is especially concerned about biology, because advanced systems could lower barriers to dangerous biological work if safeguards fail.

Google DeepMind: science is the test of real intelligence

TIME reported that Hassabis expects AGI within five to ten years, but his definition is not only economic. For Hassabis, a true AGI would not merely automate existing tasks. It would help generate new explanations for the universe. His test is scientific: could a system discover something like general relativity from the information Einstein had, or formulate an entirely new mathematical hypothesis?

AlphaFold is still the strongest public example of AI changing a scientific field. Google DeepMind says AlphaFold solved the protein structure prediction problem, was recognized with the 2024 Nobel Prize in Chemistry, and has been used by more than three million researchers in over 190 countries (Google 2025). AlphaFold matters because it was not a demo. It became scientific infrastructure.

Google DeepMind now wants to generalize that pattern. Its AI co-scientist, built on Gemini 2.0, is designed to help scientists generate novel hypotheses and research proposals, going beyond literature review to produce original, testable hypotheses tailored to a research goal (Google Research 2025).

Google reports that the AI co-scientist proposed repurposing candidates for acute myeloid leukemia validated in vitro, suggested targets for liver fibrosis, and independently proposed a mechanism related to antimicrobial resistance that matched unpublished experimental findings. Google's framing remains careful: the AI co-scientist is an assistive system, not a replacement for researchers (Google Research 2025).

In 2026, DeepMind also described Gemini Deep Think as moving from Olympiad-style problems toward professional research problems in mathematics, physics, computer science, and engineering, under expert direction.

Bengio's alternative: non-agentic Scientist AI

Bengio and colleagues have proposed "Scientist AI": a system designed to understand and explain the world from observations rather than act autonomously in pursuit of goals. The key difference is agency. Instead of building systems that plan, act, and pursue objectives, Bengio's model emphasizes world models, question-answering, explanation, uncertainty, and oversight (Bengio et al. 2025).

Many AI-scientist projects aim to close the research loop: read, hypothesize, design, act, test, revise, repeat. Bengio's concern is that the same autonomy that makes such systems powerful also makes them dangerous. A non-agentic Scientist AI could still help researchers generate hypotheses, evaluate evidence, model uncertainty, and design experiments, without giving the system broad freedom to act in the world.

That may be slower. It may also be safer.

For scientists, Bengio's view is a reminder that the word "scientist" should not be used casually. A system that produces plausible hypotheses is not the same as a system that deserves autonomy.

LeCun and Fei-Fei Li: language is not enough

LeCun has argued that current language models are useful but are "not the path to human-level intelligence." AI systems need a "model of the world" to reason, plan, and predict the consequences of actions (LeCun 2022).

This matters for science because science is not just language. It involves instruments, samples, molecules, organisms, physical systems, measurement error, causality, confounding, failed assays, and real-world constraints. A model can be fluent in scientific prose and still be weak at experimental judgment.

Fei-Fei Li makes a related argument from the perspective of spatial intelligence. Building spatially intelligent AI requires "something even more ambitious than LLMs": world models able to understand, reason, generate, and interact with physically and geometrically complex worlds. She connects this directly to accelerating discovery in materials science and medicine.

The race to build an AI scientist

Two ideas worth separating: AI in science means AI tools used inside scientific work, including literature search, code, statistics, molecular modeling, protocol drafting, data analysis, and hypothesis generation. An AI scientist is a stronger claim. It implies a system that can perform much of the loop: identify a question, read the literature, generate hypotheses, design experiments, use tools, interpret results, update beliefs, and communicate findings.

FutureHouse describes its mission as automating scientific discovery. Its Robin system identified ripasudil, a glaucoma drug, as a potential therapeutic candidate for dry age-related macular degeneration. Human researchers ran the physical experiments, but FutureHouse says the hypotheses, analyses, and main figures were generated by Robin (FutureHouse 2025).

Sakana AI's AI Scientist automates idea generation, literature search, experiment planning, iteration, figure generation, manuscript writing, and reviewing in machine-learning research. Nature reported in early 2026 that the system had undergone peer review (Lu et al. 2026). Sakana lists failure modes including incorrect implementations, unfair comparisons, critical errors, and evaluation struggles.

Where the experts agree

All the major efforts are building for discovery, not just communication. The useful systems connect to databases, code, simulation, lab systems, protein models, chemistry tools, clinical-trial registries, instrument data, and domain-specific workflows.

Oversight is where they agree most clearly. OpenAI says GPT-5 is most useful in human-AI teams. Google describes its AI co-scientist as assistive. FutureHouse still relies on human researchers to run physical experiments. None of them claim the human is optional.

What nobody has figured out is measurement. OpenAI's FrontierScience benchmark attempts to measure expert-level scientific reasoning in physics, chemistry, and biology, but OpenAI itself notes that the benchmark remains upstream of real discovery (OpenAI 2025). A field without good benchmarks cannot tell progress from noise.

Where the experts disagree

Timelines are the most visible disagreement. OpenAI's public forecast is cautious: very small discoveries in 2026, more significant ones from 2028 onward. Altman's language is broader: novel insights in 2026. Amodei's biology forecast is more aggressive: 50 to 100 years of progress compressed into 5 to 10 years. Hassabis looks further out, expecting a new golden era of discovery over roughly a decade.

Autonomy is the deeper disagreement. Google, OpenAI, Anthropic, FutureHouse, and Sakana are all building more agentic systems. Bengio argues that science may be better served by systems that explain, predict, and quantify uncertainty without pursuing goals in the world.

Architecture is the third divide. OpenAI and Anthropic are betting on frontier models plus tools and workflows. Google DeepMind combines foundation models with specialized scientific systems, including AlphaFold and the AI co-scientist. LeCun and Fei-Fei Li argue the next step requires world models, not just larger language models.

Then there is trust. Some systems are already useful enough to change scientific workflows, but usefulness is not reliability. The 2026 International AI Safety Report notes that AI agents can do useful work, including research and software engineering, while remaining unreliable on complex, long-horizon tasks. Failures in high-stakes scientific settings are a qualitatively different problem than failures in consumer apps.

The evidence supports acceleration. It does not yet support the bolder claims.

What scientists should watch in 2026

The signals worth watching are not demos. They are AI-generated hypotheses tested prospectively in wet labs, mathematical results that survive expert review, and honest reporting of failures alongside successes. Disclosure of what the AI did versus what humans did will matter as much as the results. So will governance: biology, chemistry, and autonomous laboratories are the domains where agent failures carry the most serious consequences.

An AI model can generate more hypotheses than a lab can test. That is useful. It is also risky if validation does not improve alongside generation.

In 2026, the central bottleneck may shift from idea generation to triage.

Limitations

This article relies on public statements, company reports, interviews, and scientific announcements. Many sources come from organizations building the systems discussed, so their claims should be read as technically informative but institutionally interested.

Definitions also vary. "Discovery," "AI scientist," "agent," "AGI," "world model," and "scientific collaborator" mean different things across OpenAI, Google DeepMind, Anthropic, FutureHouse, Sakana AI, Bengio's work, and LeCun's research program.

Conclusion: AI is a scientific instrument, not yet a scientist

Not yet a scientist in the full sense. But something more than a writing assistant.

The most credible near-term model is human-AI science. Scientists will still define the question, judge the method, understand the system, and validate the result. AI will expand the search space, speed up routine reasoning, generate candidate mechanisms, and connect literatures that no individual scientist could read exhaustively.

That is the clearest signal of 2026. Not that AI will replace scientists, but that science has become the proving ground.

AI disclosure

ChatGPT and Claude were used in drafting and editing this article.