AI Reliability Startup Probably Secures $9 Million Seed Round to Tackle LLM Hallucinations
Founded by Peter Elias, Probably’s mission is to bring the level of accuracy that deterministic software systems routinely achieve—around 99.99 %—to generative AI. The company’s first product is a data‑science tool that extracts answers from complex datasets. Each answer is accompanied by a citation and a full audit trail that records the steps taken to reach the result.
The core technical contribution is what Elias calls a “validator harness.” In this architecture, the LLM’s initial output is immediately checked against a deterministic system that rejects any result that does not match the underlying data. The model is then trained in tandem with this validator so that it learns to produce outputs that are both fast and consistent with the dataset.
Because the harness narrows the range of acceptable answers so precisely, the system can rely on LLMs that are significantly smaller than the most advanced models available today. The company estimates that it can operate with models that are four capability classes below the leading offerings. Smaller models can run on local hardware rather than on data‑center infrastructure, which reduces token‑processing costs and improves latency.
Elias has noted that major AI laboratories have limited incentive to address hallucinations at this level. The revenue models of those labs often depend on the number of corrections and retries that a model requires. By contrast, Probably’s architecture reverses that logic: the system’s design makes hallucinations costly for the user, creating a stronger business case for precision.
While the current product targets data‑science workflows, the same engine could be adapted to other precision‑sensitive domains. Elias has mentioned accounting, medical services, and any field where factual accuracy is critical.
Andreessen Horowitz’s participation follows a broader trend in 2026 of venture capital focusing on startups that can turn model capability into defensible distribution or compute advantage. The firm’s $90 billion asset base has been allocating significant capital to AI startups that offer tangible reliability improvements.
The validator harness concept aligns with ongoing research into LLM evaluation frameworks. Open‑source projects such as EleutherAI’s LM Evaluation Harness provide benchmarks for model performance, but they do not offer the real‑time validation that Probably’s system delivers.
In addition to the technical benefits, the deterministic nature of the validator harness offers security advantages. Deterministic systems can be audited and reproduced, providing a chain of trust that is harder to tamper with than opaque neural‑network outputs.
The seed round will support the development of the data‑science tool, expansion of the validator harness to other domains, and the hiring of additional engineers and data scientists. The company has not yet announced a public beta or pricing model.
As the AI industry continues to grapple with hallucinations, Probably’s approach represents a concrete attempt to embed reliability into the core of LLM deployment. Whether the validator harness can achieve the promised accuracy and cost benefits remains to be seen, but the investment signals growing confidence that reliability can be engineered into generative AI.
The next milestones for Probably include a public demonstration of the data‑science tool, a roadmap for local hardware deployment, and potential partnerships with enterprises that require audit‑ready AI outputs.
In summary, Probably’s seed funding from Andreessen Horowitz underscores a shift toward reliability‑focused AI solutions. The company’s validator harness offers a deterministic check that could enable smaller, cheaper, and more accurate LLMs, potentially transforming how businesses use generative AI in high‑stakes contexts.