AI Hallucinations in Investing: 3 Examples That Cost Investors Money

published on 09 March 2026

Why verification is crucial for AI-powered investing

AI is the most powerful capability to ever hit investing. It is also the most confident liar.

At Davos 2026, BlackRock CEO Larry Fink told Microsoft CEO Satya Nadella:

"At our firm, things that would take 12 hours of compute now take minutes. For us processing $14 trillion of other people's money with hundreds of thousands of different mandates, we can do that instantaneously ... If it wasn't for the technology and AI today, we would not be able to function to the scale that we are operating."

Larry Fink, CEO, BlackRock at Davos 2026

Here is the part nobody wants to talk about: The AI powering such breakthroughs also makes things up. Not occasionally. Routinely. And it does so with complete confidence.

OpenAI's own data proves it. In April 2025, their latest o3 and o4-mini models hallucinated 33% and 48% of the time on factual queries: More than double their earlier models. OpenAI's explanation? Per their technical report: "more research is needed."

UBS Asset Management, which oversees $1.8 trillion, is blunt about it. Investing demands predictive accuracy and robustness that generative AI does not deliver.

The firms managing the largest portfolios on the planet are telling you: Generative AI is not ready to be trusted with your investment decisions.

Hallucinations are not a research curiosity. They are a massive portfolio risk.

If you are using generative AI to power your investment decisions without verifying its output, you are not innovating. You are gambling on advice from a very articulate Yes Man who fabricates numbers.

What Is AI Hallucination?

AI hallucination is when a large language model generates output that sounds correct but is not. The numbers look real. The citations look legitimate. The analysis reads like it was written by a senior analyst. However, the data is fabricated.

Hallucination is not a malfunction. It is how LLMs are designed.

LLMs do not retrieve facts. They predict the next most probable word in a sequence.

When you ask an LLM for Apple's Q3 free cash flow, the model does not look up the data. Instead, the model generates text that statistically resembles a correct answer, based on patterns in the model's training data.

Sometimes the model's prediction is right. Sometimes the prediction is convincingly wrong. The model does not know the difference and, critically, the model does not tell you which one you are getting.

Model hallucinations are a problem everywhere. In investing, hallucinations are a landmine.

Financial decisions demand precision. Not "roughly correct." Not "directionally right." Exactness.

A hallucinated earnings-per-share number is not a rounding error. It is a broken input feeding a live trade.

A fabricated debt-to-equity ratio does not just look wrong in a report. It reprices risk in the wrong direction.

Unlike a hallucinated recipe or travel itinerary, the feedback loop in financial markets is measured in dollars lost.

The core issue: LLMs write text that looks like knowledge. Investing requires text that is knowledge. Confusing the two is lethal to your portfolio.

Example 1: Fabricated Financial Metrics

Ask a large language model for a company's earnings per share, free cash flow or debt-to-equity ratio. You will get an answer. It will be formatted correctly. It might be in the right ballpark. However, there is a meaningful chance the number is completely fabricated.

Such fabrication is not a theoretical risk. A study that included researchers from Cornell University benchmarked over a dozen leading AI models and found that Finance questions produced among the highest hallucination rates of any category tested. Not Literature. Not Travel Recommendations. Finance: The domain where wrong numbers have a direct line to dollar-denominated consequences.

Here is what hallucination looks like in practice. You ask an LLM: "What was NVIDIA's Q3 2024 earnings per share?" The model returns a precise figure. It might even include the reporting date and a comparison to analyst estimates. The presentation is indistinguishable from a real analyst note.

However, the model did not retrieve that number from NVIDIA's 10-Q. The model predicted the most statistically probable sequence of characters that resembles a correct answer.

If the prediction is wrong, there is no flag, no warning, no disclaimer. You get the same confident delivery whether the number is real or fabricated.

Now consider what happens downstream. A fabricated EPS number feeds into a discounted cash flow model. That model produces a target price. That target price informs a position. Every step looks rigorous. The spreadsheet is clean. The logic is sound. However, the foundation is fiction and the portfolio takes a hit.

The insidious danger of fabricated financial metrics: The output does not look like a mistake. It looks like research.

Example 2: Wrong or Nonexistent Tickers

Ticker confusion is not a new problem on Wall Street.

In 2020, investors piled into Zoom Technologies (Ticker: ZOOM), a tiny Chinese parts manufacturer, thinking they were buying Zoom Video Communications (Ticker: ZM), the video-conferencing platform that was surging during the pandemic. The SEC had to suspend trading in Zoom Technologies to stop the bleeding. That confusion was driven by humans misreading a ticker symbol.

Now give that same confusion an engine that operates at machine speed with zero self-doubt.

Ask a large language model to identify the ticker for a company and you will get an answer. It will be formatted like a real symbol. It might even include the exchange. However, the model is not querying a securities database. The model is predicting the most statistically probable sequence of characters that looks like a correct ticker.

That prediction can go wrong in several ways. The model might return a ticker that was reassigned years ago. It might associate a company with a symbol that belongs to a different company. The model might generate a plausible-sounding ticker that does not even exist. In each case, the model generates output with high confidence, irrespective of whether the response is correct.

This problem is much more common than many investors realize. Ticker symbols are short and frequently reused across exchanges. Companies merge, delist and rebrand. New listings inherit old symbols. The mapping between a company name and its current ticker is a living, breathing dataset that can change. LLMs trained on stale data are unable to track the latest changes. The models predict based on patterns that may be months or years out-of-date.

Now consider what happens when a hallucinated ticker enters an investment workflow. A portfolio manager asks an AI assistant to pull data on a specific company. The model returns the wrong symbol. The system dutifully retrieves price history, volume data and fundamentals for a completely different company. The analysis runs. The numbers look real. The results look clean. However, they belong to the wrong company.

The danger with wrong tickers is that the mistake can be invisible. A fabricated earnings number might look suspicious if it deviates from consensus. A wrong ticker produces real data for a real company. Nothing looks off until you are on the wrong side of the trade.

Example 3: Confidently Wrong Investment Thesis

A fabricated earnings number is a single broken input. A wrong ticker is a single bad lookup. A hallucinated investment thesis is far more dangerous: An entire narrative constructed from fabricated data and packaged to look like professional research.

Ask a large language model to build a bull case for a mid-cap software company. The model will return a multi-paragraph thesis with revenue growth trends, margin expansion catalysts, competitive advantages and a price target. The output reads like it came from a senior equity research analyst.

However, the model did not pull any of the data from SEC filings, earnings transcripts or sell-side research. The revenue growth rate might be invented. The competitive moat might be exaggerated or nonexistent. The management quote might never have been said. The addressable market figure might be a fabrication dressed up in a credible-looking number.

The danger here is not just a single wrong data point. The danger is the compounding effect of several wrong data points.

Each fabricated data point reinforces the others. The invented revenue acceleration supports the margin expansion story. The margin expansion story justifies the price target. The price target validates the position size. The thesis does not fall apart under casual review because every piece of the argument supports every other piece. The thesis only falls apart when you check the inputs against the source data.

A survey by Intuit Credit Karma quantifies the damage. More than half of the respondents who acted on financial guidance from AI reported making a poor financial decision as a result. These are not reckless gamblers. These are people who received analysis that looked credible, acted on it and paid for it.

Here is how the damage compounds in practice. An investor asks an AI assistant to evaluate a position. The model returns a compelling bull case. The investor builds conviction and sizes a position. When the stock drops, the investor holds because the "research" gave them confidence in the long-term thesis. The hallucinated thesis does not just inform a single trade. It creates emotional commitment to a position built on fiction.

A fabricated earnings number can be caught with a single cross-reference against the 10-Q. A wrong ticker can be caught with a simple lookup. A hallucinated investment thesis requires you to independently verify every claim, every number and every assumption in the narrative. Most investors will not do that. The entire value proposition of using AI for research is speed. Nobody saves time by re-doing the entire analysis from scratch.

The compounding problem: The better the model writes, the harder it is to detect hallucination. And LLMs are exceptional writers.

Why Hallucination Keeps Happening

Hallucination is not a bug that the next model release will fix. It is a structural limitation of how large language models work.

  1. The model does not know anything: An LLM does not store facts. It stores statistical relationships between words. When it generates a free cash flow figure, the model is not recalling a number. It is predicting the sequence of characters most likely to follow your question. The model has no mechanism to distinguish a correct answer from a plausible fabrication.
  2. The model has no access to real-time data: Markets move every second. Companies report every quarter. Tickers get reassigned. An LLM trained on a stale snapshot of the world fills its gaps the only way it knows how: By predicting what a correct answer would look like. In investing, "looks like" and "is" are separated by real money.
  3. LLMs are powerful generalists: They are poor specialists. Investing requires exact figures from exact filings for exact reporting periods. A model trained on the entire web does not have the specialized context to reliably deliver that precision. However, the model will still give you an authoritative answer.
  4. There is no accountability loop: A human analyst's research goes through review. A portfolio manager challenges assumptions. An LLM has no such check. The model delivers output with high confidence regardless of whether the output is accurate, outdated or fabricated.
  5. There is no verification layer: Even if a model augments its output with data from an external source, there is no deterministic system confirming the output matches the source. No automated cross-reference against filings. No mathematical proof that the answer is consistent with the source data. The model produces output. The output goes directly to the user. Everything in between is probability. Nothing in between is proof.

These five limitations are not independent. They compound. The question is not whether your AI will hallucinate. The question is whether you will catch it when it does.

What Smart Investors Do Instead

The answer is not to avoid AI. The firms generating the highest risk-adjusted returns are already using it. Man Group is deploying AI-generated signals in live trading. BlackRock is processing trillions with AI-powered systems. The edge does not go to investors who reject AI. The edge goes to investors who verify it.

The principle is straightforward: Never trust a single LLM output for an investment decision. Every AI-generated claim must be cross-referenced against authoritative source data before it touches a portfolio. Every ticker must be validated against a live securities database. Every financial metric must be traced back to the filing it claims to come from.

Verifying generative AI sounds like common sense. The challenge is doing so at scale. The entire value proposition of AI-powered research is speed. Manually re-verifying every output defeats the purpose. The investors who solve this problem do not verify by hand. They verify with systems.

Build Verification into the Architecture

The most effective approaches build verification directly into the AI system. Not as an afterthought. As the foundation.

  • Ground the model in real data: Retrieval-Augmented Generation (RAG) retrieves information from authoritative sources; such as SEC filings, earnings transcripts, pricing databases; before the model generates a response. The model writes from evidence rather than pattern-matching alone. If the model cannot cite its source, the output should be treated as unverified.
  • Decompose the workflow: Agentic AI breaks the research pipeline into specialized sub-agents coordinated by a supervisor. One agent retrieves. Another analyzes. A separate agent verifies output against source material. No single agent operates unchecked. The architecture mirrors how the best investment teams actually work: Analysts research. Portfolio managers challenge. Risk managers verify.
  • Govern the output: Guardrail models, prompt engineering, temperature controls and domain-specific fine-tuning further reduce variability and improve accuracy for investment workflows.

Each of these solutions reduces hallucination risk. None eliminates it. RAG grounds the model, but the model can still misrepresent what it retrieves. Agentic decomposition adds oversight, but every agent in the pipeline is still a probabilistic system. Guardrails catch known failure modes, but not every failure mode. Every solution above operates in the probabilistic domain. Probability is not foolproof.

The Verification Frontier: Neuro-Symbolic AI

The emerging frontier for generative AI verification is fundamentally different. Instead of making probabilistic systems less wrong, the approach combines them with deterministic systems that are provably right.

Neuro-Symbolic AI fuses the flexibility of large language models with the mathematical rigor of automated reasoning. The LLM generates output. The automated reasoning engine translates that output into formal logic and calculates whether the output is consistent with the source data. If there is a discrepancy, the system catches it, flags it and directs the model to try again. The verification is not probabilistic. It is mathematical.

Automated reasoning is not new. Microchip manufacturers have used it for decades to prove correctness of hardware designs before fabrication. NASA used it to control the Mars rovers in 2004. What is new is applying that rigor to AI output.

The hallucination problem will not be solved by building better LLMs. It will be solved by building systems that verify LLM output with mathematical certainty. The investors that integrate automated verification into their AI-powered investment workflows will operate with a structural advantage.

Conclusion

AI is the most powerful capability investors have ever had access to. No technology in the history of capital markets has compressed the distance between question and insight this fast.

However, power without verification is sophisticated gambling.

Every example in this post follows the same pattern: The AI delivered output that looked like research, the investor treated it as research and the portfolio paid the price. The failure was not the AI. The failure was the absence of a system to catch the AI when it was wrong.

Automated verification is the missing layer in most generative AI systems today. Not better prompts. Not bigger models. Not more data. A deterministic check that confirms whether the output is true before it reaches a decision.

The investors that will win the next decade of investing are not the ones with the best AI. They are the ones with the best verification architecture around their AI. The model is the engine. Verification is the steering.

Use your AI to generate. Never trust your AI to be right.