Stop Using AI to Build Your Financial Model. Use It to Challenge It.
Every FP&A vendor sells AI as a model builder. It's the wrong half of the workflow. The leverage is in the role swap: the CFO keeps the pen, the agent backtests the assumptions, and the board gets a memo grounded in your numbers, not industry averages.
The board pack that aged in three months
Picture an August board meeting. You walked in with a scenario model your FP&A lead built with an AI tool over a weekend. The output was clean, three cases laid out with decent commentary. The board nodded.
Then November rolls around. You run the same prompt against the same data with refreshed inputs. The model spits back materially different assumptions on the base case. Your COGS curve moved. The Q3 softness you'd flagged in August is now buried inside a generic seasonality factor the tool re-inferred from industry averages.
Nobody at that board meeting wants to hear "the AI ran it differently this time." That's the moment trust collapses. Not because the AI got it wrong, but because it got it wrong confidently, and differently each time.
The prevailing wisdom, and why every vendor sells it
Open any FP&A vendor blog and the pitch is identical. AI builds your scenario model for you. Workday Adaptive promises an "instant baseline plan." Drivetrain and Tropic publish prompts that generate revenue and headcount models from scratch. LivePlan walks you through using ChatGPT to forecast cash.
It sounds great in a demo. It fails at the board level for three reasons that don't show up until you've shipped a few cycles.
One: generic models don't know your business. Off-the-shelf AI amplifies whatever is in your data and fills the gaps with industry averages. It doesn't know that one customer is 14% of your revenue. It doesn't know your top five is 28%. It doesn't know Q3 always softens because two of your biggest accounts go on a summer freeze.
**Two: the same prompt produces different models.** CFO Connect's 2026 modeling recap put it plainly. The same AI may produce three materially different outputs when asked to build the same model three times. That's fatal for serial board reporting. Your November pack cannot quietly disagree with your August one.
Three: the assumptions are where forecasts live and die. Indinero's team said it out loud. ChatGPT can't build a fully linked model, and if you rely on it without well-informed adjustments, the downside outweighs the savings. SumProduct ran the experiment of letting Claude do the modeling work and reached the same conclusion. The output still needs an experienced modeler to be usable at all.
You end up doing the work twice. Once correcting the AI's assumptions, once building what you would have built anyway.
The role swap that actually works
The operator reality is simpler. The human keeps authorship of the model. The agent doesn't try to write it. It tries to break it.
You feed the agent 8 to 12 quarters of actuals. For each historical forecast cycle, it reconstructs what your model would have predicted for the next quarter using only the data available at that time. Then it compares those predictions to what actually happened.
The deliverable isn't a new model. It's a memo. "Here's where your model has been wrong, and by how much." Revenue forecast off by 6% on average, with the miss concentrated in Q3. COGS overstated in every quarter following a price renegotiation. Churn assumption optimistic by 200bps for two cycles running.
That memo is what the board actually wants. It's grounded in your own numbers, not averages scraped from someone else's industry. And it gives the CFO something to do: defend the assumption, change it, or flag the blind spot. Which is the actual job.
This isn't theoretical. CFO Connect's same recap describes a CFO using AI to "game plan," ask it to challenge her assumptions, poke holes in her arguments, and identify blind spots. Teradata's agentic FP&A pattern does the same thing on variance. It decomposes variance across price, mix, cost and discounting. Cite source tables. Calculate confidence scores. Carl Seidman, one of the more credible practitioner voices in this space, teaches AI for assumption generation and management commentary, not model construction.
The role swap is already happening. Vendors are just slower to sell it because "we challenge your model" is harder to demo than "we built you a model in thirty seconds."
Why governance actually likes this better
There's a name for this pattern in model risk management, and it predates generative AI by decades. It's called a challenger model. Out-of-sample testing and backtesting, where you pit an alternative against your production model and see where it would have done better.
The regulators wrote the playbook. The agent fits inside it cleanly because the human is still the author and the approver. The agent drafts the critique. The CFO signs off on what to change. For high-risk workflows like board materials, AI should be treated as an assistant, not an approver. It can draft the variance explanation. It can't certify it.
That isn't a constraint. It's the unlock. It's why this pattern survives an audit and the "AI built our forecast" pattern doesn't.
It also lines up with the actual data on what's working. Centage's read of CFO AI adoption shows the ROI is concentrated in variance analysis, anomaly detection, and baseline forecast review. Not ground-up scenario building. Only 14% of CFOs report measurable ROI from AI investments to date, and Gartner expects more than 40% of agentic AI projects to be cancelled by 2027. The projects surviving that cull are the ones with a clear seam where the human signs off.
What this looks like in practice
The setup is unglamorous. Roughly three weeks of work to do it properly.
- Week one: connect the data. The GL, the CRM, the operational system that drives your bookings. The agent reads from the source. It does not write back.
- Week two: ingest 8 to 12 quarters of historical actuals and historical forecast snapshots. Reconstruct what the model said, when it said it, and what happened next.
- Week three: define the backtest. Where are you measuring error? Quarterly revenue MAPE? Gross margin variance? Customer concentration risk? Pick the two or three KPIs your board actually argues about.
After that, every iteration is hours, not weeks. The memo gets sharper each quarter because the agent is now backtesting your refinements too.
The human keeps the pen. The agent keeps the receipts.
The better mental model
Stop asking AI to be a junior FP&A analyst that builds models. Start asking it to be the senior reviewer that picks them apart.
The first job has been tried, broadly, for two years and produced 14% trust and 97% insistence on human oversight. The second job, challenger and backtester and blind-spot finder, is the one that compounds. You get a board pack that argues with itself before the board does. That's what senior finance leaders actually pay for.
If you're a CFO at a mid-market firm and the last AI-built model you saw made you uneasy, you already know the answer. The fix isn't a better builder. It's a sharper critic.
Where to start
If you want a concrete first cut at this, the AI Readiness Audit is the two-week engagement we use to map a finance team's actuals, identify the two or three KPIs that matter most, and scope the backtest. You walk out with the "where the model has been wrong" memo on your own data, not someone else's.
Frequently Asked Questions
How many quarters of actuals do I actually need for this to work?
Eight to twelve quarters is the working range. Less than eight and the backtest can't separate signal from a single bad cycle. More than twelve and you start including data from a different version of the business: different customer mix, different pricing model, different cost structure. For most mid-market firms, two to three years of clean actuals is enough.
Can't I just do this with ChatGPT myself?
You can run a one-off critique in a chat window. What you can't do is run it consistently against connected source data every quarter with an audit trail. The value isn't the first memo. It's the tenth one, after the agent has seen your patterns and your corrections. Generic chat tools don't retain that context, and they don't sit on top of your GL.
Doesn't this still need a fully linked three-statement model underneath?
Yes. The agent doesn't build the model. It assumes you have one and stress-tests it. If your model is broken or your chart of accounts is inconsistent across business units, the backtest will surface that fast, which is itself useful. It just means the first deliverable might be data cleanup before it's a critique memo.
How is this different from a standard variance analysis?
Variance analysis tells you the gap between forecast and actual for one period. A backtest tells you the systematic pattern across many periods. Which assumption you keep getting wrong, in which direction, by how much. Variance is a snapshot. The backtest is the trend the snapshots are sitting on.
What's the risk if the agent's critique is itself wrong?
The agent doesn't change the model. It drafts the critique with citations back to the source rows. The CFO reads the memo, agrees or disagrees with each finding, and decides what to update. The seam where the human signs off is the governance layer. That's why the regulator-approved pattern is built this way.
Sources
Cited inline above:
- SumProduct — Building a Financial Model with AI: What Really Happens When You Let Claude Do the Work
- Workday — AI for FP&A (Adaptive Planning)
- CFO Connect — AI in Financial Modeling 2026: Opportunity, Risk, Skills
- Indinero — Can ChatGPT Build a Financial Forecast in 2025?
- FP&A Trends — Agentic AI Projects Fail by 2027: How FP&A Succeeds
Additional sources consulted for this piece:
- Drivetrain — ChatGPT for Finance
- Tropic — ChatGPT Prompts for Financial Planning
- LivePlan — Create a Financial Forecast with ChatGPT
- Cube Software — AI for FP&A
- Teradata — AI Agents in Financial Analysis
- Carl Seidman, Seidman Financial — Excel for Finance
- House Blend — AI Agents in Finance: A CFO Guide (2026)
- Centage — How CFOs Are Using AI for Corporate FP&A