Educational · AI for Excel

How AI reads and analyses your Excel files — explained simply.

When you upload an Excel file to DataHub Pro, what actually happens? This page explains — without jargon — how DataHub Pro's AI reads your spreadsheet, builds an understanding of your data, and answers questions about it in plain English. Including: what happens to your data, how the AI avoids making up numbers, and what the AI genuinely can't do.

Updated 23 May 2026

Upload Your Excel File Now → Watch 2-min demo
No credit card   Files processed securely   Not used to train AI models
2 layers
Parser then AI — never raw LLM
Purpose-built Excel parser, tool-use AI on top
Auditable
Every answer cites its source
No hallucinated numbers, ever
UK/EU hosted
Data stays in UK/EU by default
GDPR-compliant, signed DPA available

What happens when you upload an Excel file

The process has two distinct stages: parsing and analysis. They're separate by design — because what makes AI Excel analysis reliable is that the AI never reads raw file bytes. It works with a structured data model that the parser has already validated and typed.

1

File ingestion and structure detection

DataHub Pro's parser reads the .xlsx file structure — it's a ZIP archive of XML files — and extracts cell values, formats, formulas, shared strings, and sheet names. This is not a conversion to text; it's a structured read of the file's native format. For .xls (old binary format) and .csv files, separate parsers handle the respective formats.

2

Header detection and column typing

The parser identifies the header row — which may not be row 1 if there are title rows, company logos, or blank rows above the data. It detects column data types: numeric, currency, percentage, date (including UK day-first formats like DD/MM/YYYY), time, boolean, and categorical text. Cells that look numeric but are stored as text (a common export artifact) are coerced to their numeric values.

3

Semantic column understanding

With column names and types established, DataHub Pro builds semantic labels for each column. A column named "Rev" with currency values is understood to represent revenue. A column named "Date" with date values is understood to be a time dimension. This semantic layer is how the platform knows to run a Holt-Winters forecast on time-series columns and an RFM analysis on customer-transaction columns — without you asking explicitly.

4

Data model construction

The parsed data is loaded into an in-memory pandas DataFrame — a tabular data structure optimised for fast analytical operations. This is the data model the AI works with. The AI does not see your raw file. It sees a validated, typed table with column names, data types, and summary statistics (row count, min, max, mean, null counts) for each column.

5

Automatic analysis at upload

Once the data model is ready, DataHub Pro runs AutoInsights: it identifies the most analytically significant columns, computes KPIs (totals, averages, growth rates, top/bottom N), writes 3–5 narrative insight statements, and flags any data-quality concerns. This happens automatically, in seconds, without you asking a question.

6

Tool-use AI for questions

When you ask a question — "what were total sales by region last quarter?" — the AI plans a pandas operation: group by Region, filter Date to Q1 2026, sum Revenue. It executes that operation against your actual data and returns the result. The AI's role is planning and interpreting; the calculation is done by deterministic code. This is why every answer comes with a trace showing exactly which columns and filters were used.

Why DataHub Pro's AI doesn't make up numbers

This is the most important thing to understand about how AI Excel file analysis works — or should work. Many AI tools pass your spreadsheet data into an LLM's context window and ask it to answer questions. The LLM generates a response that sounds authoritative but is produced by a language model that was trained on text, not a calculator.

The problem: language models are optimised to produce plausible-sounding text, not arithmetically correct answers. When the model is asked "what's the total for column D?" it generates the most plausible-looking number based on the context — not the actual sum. For simple sums with small numbers, it's usually right. For aggregations with filters, edge cases, or large numbers, hallucination rates on numerical tasks are well-documented at 5–15% even for the best enterprise-grade models.

DataHub Pro uses a different architecture — tool-use AI. The language model's job is to plan: it decides that answering "total sales by region" requires a group-by operation with a sum aggregation on the Revenue column, filtered by Date. That plan is expressed as code. The code is executed against your actual data. The result is returned. The LLM interprets the result in plain English. At no point does the LLM guess a number.

This is the same auditable-tool-use pattern that institutional finance has standardised on for AI analytics — built into DataHub Pro by a founder who built risk models at J.P. Morgan, where a hallucinated number is a regulatory incident.

How your data is kept secure

The most common question about AI Excel analysis is: "Is my data safe?" Here's exactly what happens to your file.

🔒
Encrypted in transit and at rest

TLS 1.3 in transit. AES-256 at rest. Same standards used by institutional banks.

🌍
UK/EU data residency by default

Files stored in UK/EU regions. AI requests routed through EU-region LLM endpoints where available.

🚫
Not used to train AI models

DataHub Pro's LLM providers operate in API business mode. Customer data is contractually excluded from training pipelines.

📄
GDPR-compliant with signed DPA

A Data Processing Agreement is available on request. Data deletion on request. Retention schedule applies to inactive accounts.

Files are not permanently stored. They are processed, the data model is built, and the file is retained only as long as needed for your active session and any analyses you've saved. You can delete your uploaded files at any time from your account.

What AI Excel analysis can and can't do

Honest about limitations

AI Excel analysis works best for data-oriented spreadsheets — files where rows are records and columns are attributes. It works less well for:

  • Complex financial models: If your file is a DCF, LBO, or amortisation model where cells reference other cells in chains, the AI analyses the computed values in those cells — not the formula logic. The model structure itself isn't something the AI navigates.
  • Very wide, denormalised data: Files with 200+ columns often have redundant or derived columns that confuse automatic column typing. The AI handles these files but the AutoInsights may pick less relevant metrics. Use Ask Your Data for targeted questions instead.
  • Data that requires business context: The AI doesn't know your business. It can detect an anomaly — a revenue spike on a specific date — but it can't know that the spike was caused by a one-off client invoice. You provide the business context; the AI provides the quantitative pattern recognition.
  • Real-time analysis: DataHub Pro analyses files you upload — it doesn't connect to live databases or stream real-time data. For truly live dashboards driven by a transactional database, a warehouse-connected BI tool is the right fit.
  • Very large files: Free tier is capped at 100,000 rows and 50 MB. Pro tier supports 2,000,000 rows and 200 MB. Files beyond these limits need to be pre-filtered before upload.

FAQs

How does AI read an Excel file?

When you upload an Excel file to DataHub Pro, the platform first parses the raw .xlsx structure — reading cell values, formats, data types, and sheet names using a purpose-built Excel parser. It then detects the header row, identifies column data types, handles merged cells and formula values, and builds an in-memory data model. This parsed data model is what the AI works with — the AI never reads your file as raw bytes or passes the entire file to an LLM context window.

Can AI analyse an Excel spreadsheet accurately?

Yes, when the AI's analysis is grounded in deterministic computation rather than language model inference. DataHub Pro uses a tool-use approach: the AI plans analyses as pandas operations (sums, filters, groupbys, statistical tests), executes those operations against your actual data, and then interprets the results in plain English. This means every number in the AI's output is the result of a real calculation — not an estimate or a generative response.

Is my Excel data safe when I upload it?

DataHub Pro applies several layers of protection: files are encrypted in transit (TLS 1.3) and at rest (AES-256), stored in UK/EU regions by default, and not used to train AI models. DataHub Pro's LLM providers operate in API business mode, which contractually excludes customer data from training pipelines. Files are deleted on request, and a signed Data Processing Agreement (DPA) is available for GDPR compliance.

What are the limitations of AI Excel analysis?

AI Excel analysis works best for data-oriented spreadsheets. Limitations include: (1) deeply linked financial models — the AI analyses computed values, not formula chains; (2) very large files — free tier is capped at 100k rows/50 MB; (3) real-time data — DataHub Pro analyses uploaded files, not live database connections; (4) business context — the AI detects patterns but can't know the business reasons behind them. For the majority of Excel use cases — monthly reports, client data, exports from CRM/ERP/ecommerce — AI Excel analysis works well.

Upload your Excel file now and see how AI reads it.

The free tier handles up to 50 MB and 100,000 rows. Upload any real file — DataHub Pro parses it, builds the analysis, and shows you the data model it constructed in seconds.

No credit card   Files processed securely   Not used to train AI models

See all AI Excel analysis features → AI Excel Analysis · AI Formula Generator · Anomaly Detection · Ask Your Data · Home