Lines, y=mx+b, and the regression formula

This equation is the equation for a line. The slope (m) of the line indicates how many units y changes (and in what direction) for a one-unit change in x. A positive slope means that if x increases by one, y increases. A negative slope means that if x increases by one, y decreases.

The intercept value indicates what the value of y is if the value of x is zero.

A regression line is a line that shows the trend of the relationship between an x (cause; independent) variable and a y (outcome; dependent) variable. It has a slope and intercept, just like any other line. The line is based on predicting as many points in a dataset as possible while minimizing error in the prediction of y. This means that the y-axis has priority in creating a regression (prediction) line. The purpose of the line is to predict y with as much accuracy as possible, given the data. Because the regression line is based on real data, there is an error term in the equation of a regression line to account for the error between the line of "best fit" (the prediction line or regression line) and the actual data points on which that line was based. Thus, a bivariate (two-variable) regression equation might look like this (note that by convention the intercept and the slope term are switch places--this has no impact on the math):

y = intercept + mx + error

To create a regression line, you need exactly two variables (x and y). However, we often want to create models that involve more than just two variables. In these cases, the regression equation gets more sophisticated. Once adjusting for all the data (variables) in the model, trying to minimize error only in the y direction, each variable will have its own partial slope. The values of these slopes are known as partial slope coefficients, estimates, beta values, or a variety of other names. Thus, a regression equation might look like this:

y = intercept + m₁x₁ + m₂ x₂ + m₃x₃ + m₄x₄ + error

In this equation, each independent (cause) variable has its own slope to describe the relationship between that variable and the dependent (outcome) variable. The entire equation has an intercept (the value of y when all other variables equal zero) and an error term (the error in the y direction between the predicted value of y and the actual, real-life value of y).