AI Reliability Startup Probably Secures $9 Million Seed Round to Tackle LLM Hallucinations

June 23, 2026 By Blab.com AI Team

Probably, a new AI reliability company, announced a $9 million seed investment from venture firm Andreessen Horowitz on 23 June 2026. The funding is aimed at reducing hallucinations and factual errors that large language models (LLMs) can produce before they reach end users.

Founded by Peter Elias, Probably’s mission is to bring the level of accuracy that deterministic software systems routinely achieve—around 99.99 %—to generative AI. The company’s first product is a data‑science tool that extracts answers from complex datasets. Each answer is accompanied by a citation and a full audit trail that records the steps taken to reach the result.

The core technical contribution is what Elias calls a “validator harness.” In this architecture, the LLM’s initial output is immediately checked against a deterministic system that rejects any result that does not match the underlying data. The model is then trained in tandem with this validator so that it learns to produce outputs that are both fast and consistent with the dataset.

Because the harness narrows the range of acceptable answers so precisely, the system can rely on LLMs that are significantly smaller than the most advanced models available today. The company estimates that it can operate with models that are four capability classes below the leading offerings. Smaller models can run on local hardware rather than on data‑center infrastructure, which reduces token‑processing costs and improves latency.

Elias has noted that major AI laboratories have limited incentive to address hallucinations at this level. The revenue models of those labs often depend on the number of corrections and retries that a model requires. By contrast, Probably’s architecture reverses that logic: the system’s design makes hallucinations costly for the user, creating a stronger business case for precision.

While the current product targets data‑science workflows, the same engine could be adapted to other precision‑sensitive domains. Elias has mentioned accounting, medical services, and any field where factual accuracy is critical.

Andreessen Horowitz’s participation follows a broader trend in 2026 of venture capital focusing on startups that can turn model capability into defensible distribution or compute advantage. The firm’s $90 billion asset base has been allocating significant capital to AI startups that offer tangible reliability improvements.

The validator harness concept aligns with ongoing research into LLM evaluation frameworks. Open‑source projects such as EleutherAI’s LM Evaluation Harness provide benchmarks for model performance, but they do not offer the real‑time validation that Probably’s system delivers.

In addition to the technical benefits, the deterministic nature of the validator harness offers security advantages. Deterministic systems can be audited and reproduced, providing a chain of trust that is harder to tamper with than opaque neural‑network outputs.

The seed round will support the development of the data‑science tool, expansion of the validator harness to other domains, and the hiring of additional engineers and data scientists. The company has not yet announced a public beta or pricing model.

As the AI industry continues to grapple with hallucinations, Probably’s approach represents a concrete attempt to embed reliability into the core of LLM deployment. Whether the validator harness can achieve the promised accuracy and cost benefits remains to be seen, but the investment signals growing confidence that reliability can be engineered into generative AI.

The next milestones for Probably include a public demonstration of the data‑science tool, a roadmap for local hardware deployment, and potential partnerships with enterprises that require audit‑ready AI outputs.

In summary, Probably’s seed funding from Andreessen Horowitz underscores a shift toward reliability‑focused AI solutions. The company’s validator harness offers a deterministic check that could enable smaller, cheaper, and more accurate LLMs, potentially transforming how businesses use generative AI in high‑stakes contexts.

AI Reliability Startup Probably Secures $9 Million Seed Round to Tackle LLM Hallucinations

Latest AI Stories

Baseten Secures $1.5 B Series F, Valuation Climbs to $13 B Amid AI Inference Boom

AI Rush Forces Companies to Re-think Talent Development

Arizona Treasury to Deploy AI-Powered Upgrades to Empowerment Scholarship Account Platform

Probook Secures $40 Million to Scale AI Operating System for Home-Service Trades

Cosmos Health to Deploy AI-Powered Call Center for Subsidiary CosmoFarm

Parents Grapple With Teens Growing Use of AI-Generated Personas

Arkansas Medicaid Algorithm Failure Highlights Growing Risks of AI in Government Decision-Making

INSPYR Solutions Launches INSPYR Velocity to Accelerate Enterprise AI Adoption

AI Adoption Drives New Liability Risks for Canadian Companies, Says Beazley Expert