Regression Analysis in Excel — Linear, Multiple & How to Read the Output (2026)

Regression answers the question every analyst eventually asks: how does one thing drive another, and can I predict it? In 40 minutes you’ll run regression in Excel three ways — a scatter trendline, live functions (SLOPE, INTERCEPT, RSQ, FORECAST.LINEAR), and the full Data Analysis ToolPak report — plus multiple regression with LINEST and how to read p-values and R-squared without a statistics degree. Interactive demo included.

Dr Waqas Rafique Dr Waqas Rafique · Founder & CTO · PhD, Statistical Machine Learning
· About

TL;DR

Fastest: scatter chart → right-click points → Add Trendline → Linear → tick Display Equation + R². Live cells: =SLOPE(Y,X), =INTERCEPT(Y,X), =RSQ(Y,X), predict with =FORECAST.LINEAR(newX,Y,X). Full stats: enable the Analysis ToolPak → Data → Data Analysis → Regression (gives p-values, CIs). Multiple predictors: =LINEST(Y,Xcols,TRUE,TRUE). A p-value < 0.05 = significant.

Contents

  1. What is Regression Analysis?
  2. Before You Start
  3. Step 1 — Scatter Chart
  4. Step 2 — Linear Trendline
  5. Step 3 — Slope & Intercept
  6. Step 4 — R-squared
  7. Step 5 — Forecasting
  8. Step 6 — The ToolPak
  9. Step 7 — p-values
  10. Step 8 — Multiple Regression
  11. Try It: Live Regression
  12. A Worked Example
  13. Common Errors & Fixes
  14. FAQ

What is Regression Analysis?

Regression analysis is a statistical method for modelling the relationship between a dependent variable (the thing you want to explain or predict, called Y) and one or more independent variables (the things you think drive it, called X). At its simplest, it fits a straight line through a cloud of points so that the line is as close as possible to all of them — the “line of best fit” — and gives you an equation you can use to predict Y from X.

That line is defined by two numbers: a slope (how much Y changes for each one-unit increase in X) and an intercept (the value of Y when X is zero). If advertising spend is X and sales is Y, a slope of 3 means every extra £1 of ad spend is associated with £3 more sales, on average. Excel finds this line using ordinary least squares — it picks the slope and intercept that minimise the total squared vertical distance between the line and the actual points.

Two questions always follow the line. First, how good is the fit? — answered by R-squared, the share of Y’s variation the line explains. Second, is the relationship real or just noise? — answered by the p-value, which tells you how likely you’d see this relationship by chance if there were truly none. A confident regression result needs both: a meaningful slope, a decent R-squared, and a low p-value. This guide builds all three, from a quick chart trendline up to a full multi-variable model.

One caution worth internalising before you start: regression measures association, not causation. A strong fit between ice-cream sales and drownings does not mean one causes the other (both rise with hot weather). Regression is a powerful tool for prediction and for quantifying relationships, but the causal story always comes from your understanding of the domain, not from the R-squared.

Before You Start

You need paired numeric data

Regression needs at least one X column and one Y column of numbers, with each row a matched observation. For reliable results aim for a reasonable sample — a dozen points can hint at a relationship, but a few dozen or more makes the slope and significance trustworthy.

Plot before you fit

Always eyeball a scatter chart first. Regression will happily fit a straight line to curved, clustered, or outlier-ridden data and report a number — but that number will mislead. Looking at the cloud of points tells you whether a straight line is even the right model.

Mind the argument order

Excel’s SLOPE, INTERCEPT, and RSQ all take the Y range first, X range second — the opposite of how you read “X then Y” on a chart. Getting this backwards is the single most common regression mistake in Excel and produces a plausible-looking but wrong slope.

💡 Pro tip: Keep X and Y in adjacent columns with X on the left. Excel’s scatter chart assumes the left column is X and the right is Y, so adjacency saves you from mislabelled axes.
1
2
3
4
5
6
7
8

1.Step 1 — Plot a Scatter Chart

Select your X and Y columns (X on the left) and go to Insert → Charts → Scatter (the first, plain option — not a line chart). You now see your data as a cloud of points. Before fitting anything, read the cloud: do the points trend up or down together? Is the pattern roughly straight, or curved? Are there obvious outliers sitting far from the rest? This visual check is not optional — it tells you whether linear regression is appropriate at all.

2.Step 2 — Add a Linear Trendline

Right-click any data point → Add Trendline. In the Format Trendline pane, choose Linear, then scroll down and tick Display Equation on chart and Display R-squared value on chart. Excel draws the line of best fit and prints something like y = 2.84x + 15.3 and R² = 0.78 right on the chart.

That one action is a complete simple regression: the equation gives you the slope (2.84) and intercept (15.3), and R² (0.78) tells you the line explains 78% of the variation in Y. For a quick look, this is often all you need.

3.Step 3 — Slope and Intercept as Live Cells

The chart equation is static text. To get the coefficients as live values you can use in other formulas, use the worksheet functions (remember: Y first, X second):

=SLOPE(B2:B50,A2:A50)
=INTERCEPT(B2:B50,A2:A50)

Now you can predict Y for any X yourself with =slope*X + intercept, and the prediction updates automatically as your data changes — ideal for a live model or dashboard where the chart equation cannot reach.

4.Step 4 — Measure Fit with R-squared

R-squared is the headline measure of how well the line fits. Get it in a cell with:

=RSQ(B2:B50,A2:A50)

It returns a number between 0 and 1: 0.78 means 78% of the variation in Y is explained by X. What counts as “good” depends entirely on the field — in physics you might expect 0.99, in human behaviour 0.3 can be meaningful. And resist R-squared worship: a model can have a high R² and still be useless if it violates regression’s assumptions or is fit to a tiny sample. Read it alongside the scatter plot and the p-values, never alone.

FunctionReturns
SLOPE(Y,X)The slope (rise per unit of X)
INTERCEPT(Y,X)Y when X = 0
RSQ(Y,X)R-squared (0–1, fit quality)
CORREL(Y,X)Correlation coefficient (−1 to 1)
FORECAST.LINEAR(x,Y,X)Predicted Y for a new X

5.Step 5 — Forecast with FORECAST.LINEAR

To predict Y for a value of X without assembling the equation yourself, use:

=FORECAST.LINEAR(E2,B2:B50,A2:A50)

where E2 holds the new X value. Excel fits the regression internally and returns the predicted Y in one step. Put your scenario X values in a column and drag this down to get a prediction for each. Remember the cardinal rule of forecasting: predicting within the range of your data (interpolation) is reasonable; predicting far beyond it (extrapolation) assumes the straight-line relationship continues, which it often does not.

6.Step 6 — The Data Analysis ToolPak Regression

For the full statistical picture, use the ToolPak. Enable it once via File → Options → Add-ins → Manage: Excel Add-ins → Go → tick Analysis ToolPak. A Data Analysis button appears on the Data tab. Click it → Regression, set the Y (Input Y Range) and X (Input X Range), tick Labels if you included headers, and choose an output location.

Excel produces a labelled report with everything: the coefficients (slope and intercept), their standard errors, t-statistics, p-values, the 95% confidence intervals, R-squared and Adjusted R-squared, and an ANOVA table with the overall F-test. This is the output you put in front of a decision-maker or a write-up. Its one limitation: it is a static snapshot — it does not recalculate when the data changes, so re-run it after any data update.

7.Step 7 — Interpreting p-values and Significance

Each coefficient in the ToolPak output has a p-value. It answers: if this variable truly had no relationship with Y, how likely is it that we’d see a slope this large by pure chance? The convention is a threshold of 0.05: below it, the relationship is “statistically significant” and the variable likely belongs; above it, the variable may just be noise and a candidate to drop.

Treat 0.05 as a guideline, not gospel. A tiny p-value on a trivially small slope tells you the effect is real but unimportant; a borderline p-value on a large, sensible effect with a small sample may still matter. Read the p-value, the size of the coefficient, the confidence interval, and your domain knowledge together — significance and importance are not the same thing.

⚠ Watch out: Adding more predictors always raises plain R-squared, even useless ones. Use Adjusted R-squared to compare models with different numbers of variables — it penalises pointless predictors.

8.Step 8 — Multiple Regression with LINEST

Real outcomes have several drivers. Multiple regression predicts Y from many X columns at once. The worksheet way is the LINEST array function, with all predictor columns side by side as the X range:

=LINEST(B2:B50,C2:E50,TRUE,TRUE)

The third argument TRUE calculates the intercept normally; the fourth TRUE returns the full set of regression statistics. In Microsoft 365 it spills a block of coefficients and stats automatically; in older Excel, select a wide-enough output range and press Ctrl+Shift+Enter. The coefficients come back in reverse column order (rightmost X first), which trips everyone up the first time — the ToolPak’s labelled Regression output is friendlier if you find LINEST’s layout confusing.

With multiple predictors, watch for multicollinearity — when two X variables are themselves highly correlated (e.g. “ad spend” and “impressions”), the model struggles to separate their effects and the individual coefficients become unstable. If two predictors tell the same story, keep one.

Try It: Live Regression Demo

Below is a scatter of sample data with its least-squares line drawn through it — the same fit Excel computes with SLOPE and INTERCEPT. Drag the noise slider to scatter the points more or less, and watch R-squared fall as the cloud spreads and the line explains less of the variation. Hit New sample to regenerate the data. The equation and R² update live, exactly like a chart trendline.

Least-Squares Regression Visualiser

More noise → lower R². The pink line is the same line of best fit Excel draws.

35
Slope
Intercept

A Worked Example: Ad Spend vs Sales

Suppose you have twelve months of monthly advertising spend (X) and sales (Y). You run =SLOPE(Sales,Spend) and get 4.2, =INTERCEPT(Sales,Spend) gives 18,000, and =RSQ(Sales,Spend) returns 0.81. The model reads: Sales = 4.2 × Spend + 18,000, and it explains 81% of the month-to-month variation in sales.

Interpret each number in plain business terms. The slope of 4.2 says every extra £1 of ad spend is associated with £4.20 of additional sales — a healthy return, though remember it is association, not proof of causation. The intercept of 18,000 is the modelled baseline sales with zero advertising, which is plausible if you have organic demand. The R² of 0.81 is strong for marketing data, where 0.5–0.7 is common, suggesting advertising is a major but not sole driver. Running the same data through the ToolPak would add a p-value — if it came back at, say, 0.003, you would conclude the relationship is highly unlikely to be chance and confidently use the equation to budget next month’s spend. To forecast sales for a planned £25,000 spend: =FORECAST.LINEAR(25000,Sales,Spend) returns roughly £123,000. Crucially, £25,000 should sit within your historical spend range — predicting sales for a £200,000 spend you have never tried would be extrapolation, and the linear relationship may not hold at that scale.

Common Errors & Fixes

The slope looks wrong / inverted

You swapped the arguments. SLOPE, INTERCEPT, and RSQ take Y first, then X. Re-check the order — this is the #1 regression error in Excel.

“Data Analysis” button is missing

The Analysis ToolPak is not enabled. File → Options → Add-ins → Manage: Excel Add-ins → Go → tick Analysis ToolPak. On Mac it is under Tools → Excel Add-ins.

LINEST returns one number, not a block

In older Excel you must select the full output range and press Ctrl+Shift+Enter to enter it as an array. In Microsoft 365 it spills automatically — make sure the cells to the right and below are empty.

High R-squared but useless predictions

You may be overfitting (too many predictors for the sample), extrapolating beyond your data, or fitting a line to a curved relationship. Plot the residuals; if they show a pattern, a straight line is the wrong model.

Regression & Forecasting, Automatically

DataHub Pro runs regression, trend, and forecasting on your spreadsheet in seconds — coefficients, R-squared, and predictions in a shareable dashboard, no ToolPak or array formulas required. Upload your data and see the fit instantly.

Try DataHub Pro free →

Frequently Asked Questions

How do I run a regression analysis in Excel?
The quickest way is a scatter chart with a trendline: select your X and Y columns, Insert → Scatter, then right-click the points → Add Trendline → Linear, and tick 'Display Equation' and 'Display R-squared'. For coefficients as live cell values use SLOPE, INTERCEPT, and RSQ. For a full statistical report with p-values and confidence intervals, enable the Data Analysis ToolPak and use Data → Data Analysis → Regression. For multiple predictors, use the LINEST array function.
What is the formula for linear regression in Excel?
A simple linear regression fits the line Y = slope × X + intercept. In Excel, =SLOPE(Y_range, X_range) gives the slope, =INTERCEPT(Y_range, X_range) gives the intercept, and you predict with =SLOPE*X + INTERCEPT or more directly with =FORECAST.LINEAR(new_x, Y_range, X_range). Note that SLOPE and INTERCEPT take the Y (dependent) range first and the X (independent) range second, the reverse of how most people read a chart.
What does R-squared mean in regression?
R-squared (the coefficient of determination) is the proportion of variation in the dependent variable Y that is explained by the independent variable(s) X. It ranges from 0 to 1: an R-squared of 0.85 means 85% of the variation in Y is explained by the model, leaving 15% to other factors and noise. Use =RSQ(Y_range, X_range). Higher is generally better, but a high R-squared does not prove causation, and what counts as 'good' depends heavily on the field.
How do I enable the Data Analysis ToolPak in Excel?
Go to File → Options → Add-ins. At the bottom, set Manage to 'Excel Add-ins' and click Go. Tick 'Analysis ToolPak' and click OK. A 'Data Analysis' button then appears on the far right of the Data tab. From there, Data Analysis → Regression gives you the full regression output including coefficients, standard errors, t-stats, p-values, R-squared, and confidence intervals. The ToolPak ships with Excel but is off by default.
How do I do multiple regression in Excel?
Multiple regression predicts Y from several X variables. Use the LINEST array function: =LINEST(Y_range, X_range, TRUE, TRUE), where X_range covers all predictor columns side by side. In Microsoft 365 it spills a block of coefficients and statistics automatically; in older Excel, select the output range and press Ctrl+Shift+Enter. Alternatively, the Data Analysis ToolPak's Regression tool accepts a multi-column X range and produces a fuller, labelled report.
What is a good p-value in regression?
A p-value below 0.05 is the conventional threshold for statistical significance — it suggests there is less than a 5% chance the relationship you see is due to random noise, so the variable likely belongs in the model. A p-value above 0.05 suggests the predictor may not be meaningfully related to Y and could be dropped. Treat 0.05 as a guideline, not a law: consider the size of the effect, the sample size, and domain knowledge alongside the p-value.
What is the difference between SLOPE/INTERCEPT and the Regression ToolPak?
SLOPE, INTERCEPT, RSQ, and FORECAST.LINEAR are live worksheet functions: they recalculate as your data changes and are perfect for a dashboard or an ongoing model. The Data Analysis ToolPak's Regression tool produces a one-time static report that is far richer — standard errors, t-stats, p-values, confidence intervals, and residuals — but it does not update when the data changes. Use the functions for live models and the ToolPak when you need full statistical detail.
Can Excel do non-linear regression?
Yes, to a degree. The trendline options on a scatter chart include polynomial, exponential, logarithmic, and power fits, each with its own equation and R-squared. For a polynomial or exponential model in cells, you can transform the data (for example, take the log of Y for an exponential relationship) and then run a linear regression on the transformed values. For genuinely complex non-linear models, Excel's Solver add-in can fit custom equations by minimising squared error.

Related Tutorials