Least Squares Regression Solved with Matrix Algebra
This is an explanation of Least Squares Regression solved using matrix algebra.
Some sample data
Let’s start with a simple set of data for sales and price
y = the number of units sold
x = price per unit
The data shows that when prices are high, the quantity sold is low, and when prices are low the quantity sold is high.
x <- c(6,7,9,10,11,12,14,18,26) # of promotions
y <- c(170,221,264,308,444,540,388,677,711) #revenue
Least squares regression equation
for points $(x_1,y_1), (x_2,y_2)…(x_n,y_n)$ the least square regression line is:
$f(x)=\beta_0 + \beta_1x$
The errors (also referred to as residuals) are the difference between the actual quantity sold and the regression estimate of the quantity sold. They are
$e_i=y_i-f(x_i)$
The $\beta$ coefficients are those values which minimize the sum of squared errors using the regression function f(x)
$(y_1-f(x_1))^2$ + $(y_2-f(x_2))^2$ + . . $(y_n-f(x_n))^2$
Goal: minimize sum of squared residuals(SSR) where
$SSR=\sum_{i=1}^{n}{e_i}^2$
We can set this up as a system of equations and then solve using matrices
$y_1 = (b + mx_1) + e_1$
$y_2 = (b + mx_2) + e_2$
…
$y_n = (b + mx_n) + e_n$
Single variable regression represented in matrix form
We can represent our x and y data in matrix form
\(Y = \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \\ \end{bmatrix}\) This is an N x 1 matrix
\(X = \begin{bmatrix}
1 ; x_1\\
1 ; x_2\\
\vdots \\
1 ; x_n \\
\end{bmatrix}\) This is an N x 2 matrix. The column of 1s is for the intercept.
Note: if this were multivariate problem with k variables (x’s), it would be a N X (1 + k) matrix
\(B = \begin{bmatrix}
b_0\\
b_1
\end{bmatrix}\) This is a 2 x 1 matrix
Note: if this were multivariate problem with k variables (x’s), it would be a (1+k) X 1 matrix
\(\epsilon = \begin{bmatrix} e_1\\ e_2\\ \vdots\\ e_n \end{bmatrix}\) This is a N x 1 matrix
Matrix form of regression equation
$Y = XB + \epsilon$
Regression assumes zero errors (ie some are positive some are negative and should average to zero), so we drop $\epsilon$
$Y = XB$
Solve for B
$Y = XB$
First we put X into a square matrix. We can do that by multiplying by the transpose $X^{`}$ which gives us a 2x2 matrix
$X^{`} Y = (X^{`} X) B$
Next, to isolate B we’ll divide both sides by $(X^{`}{X})$ - this is the same as multiplying both sides by $(X^{`}{X})^{-1}$
$(X^{`}{X})^{-1} (X^{`}{Y}) = (X^{`}{X})^{-1} (X^{`} X) B$
Next, noting that a matrix multipled by its inverse is the Identity Matrix we simplfy the equation to
$(X^{`}{X})^{-1} (X^{`}{Y}) = IB$
Finally, noting that any matrix multiplied by the identity matrix is itself. So IB = B. Therefore this simplifies to
$B = (X^{`}{X})^{-1} (X^T{`}{Y})$
An alternative explanation
$ SSE = \sum_{i=1}^{n}{e_i}^2$ is the forumula we are trying to minimize. We want the sum of these squared errors as small as possible.
To convert this formula to matrix notation we can take the vector of errors and multiply it by the transpose.
$SSE = E^{‘}E$
E is a column vector of the residuals, and $E^{‘}$ is the transpose of $E$.
The expression $E^{‘}E$ is the matrix multiplication of $E^{‘}$ and $E$, which results in a single value (scalar) that represents the sum of the squared errors.
We can confirm that this gives us the sum of squared errors. Let’s assume errors are (4, 6, 3) from a given regression equation with $\beta$. This $SSE$ equation results in the row vector of errors (the transpose) multiplied by the column vector of errors
\(E^{`} = \begin{bmatrix} 4 & 6 & 3 \end{bmatrix}\) Row vector
multiplied by a column vector
\(E = \begin{bmatrix} 4 \\ 6 \\ 3 \\ \end{bmatrix}\) Column vector
Resulting in $(4x4)$ + $(6x6)$ + $(3x3)$ = $4^2$ + $6^2$ + $3^2$. This is what we want: the sum of the squared errors (SSE)
\(SSE = 16 +36 +9 = 61\).
Remember, that $\epsilon$ is equivalent to $Y -X{\beta}$. Thus
$E^{`}E = (Y -X{\beta)}^{`}(Y -X{\beta}) $
Next, multiply out.
$(Y^{`} - (X\beta)^{`}) (Y -X{\beta})$
$Y^{`}Y - (X\beta)^{`}Y - Y^{`}(X{\beta}) + (X\beta)^{`}X{\beta}$
$Y^{`}Y - 2YX^{`}\beta^{`} + X^{`}\beta^{`}(X\beta)$
We take the derivative with respect to b and set it to zero.
$-2X^{`}Y + 2X^{`}X\beta=0$
$-X^{`}Y + X^{`}X\beta=0$
$X^{`}X\beta =X^{`}Y$
Solve for $\beta$
isolate $\beta$ by dividing both sides by $(X^{`}X)$. Note, this is the same as multiplying by $(X^{`}X)^{-1}$
$(X^{`}X)^{-1}(X^{`}X)\beta = (X^{`}X)^{-1}(X^{`}Y)$
$\beta = (X^{`}{X})^{-1} (X^{`}{Y})$
Find $\beta$ coefficients using matrix algebra from our sample data
$\beta = (X^{`}{X})^{-1} (X^{`}{Y})$ is our regression equation
Recall our sample data for sales and prices
y = the number of units sold
x = price per unit
x
y
- 6
- 7
- 9
- 10
- 11
- 12
- 14
- 18
- 26
- 170
- 221
- 264
- 308
- 444
- 540
- 388
- 677
- 711
Let’s convert the vectors for x and y into matrix Y and matrix X
Y <- as.matrix(y);Y
X <- matrix(c(rep(1, length(x)), c(x)), nrow = 9, ncol = 2);X
170 |
221 |
264 |
308 |
444 |
540 |
388 |
677 |
711 |
1 | 6 |
1 | 7 |
1 | 9 |
1 | 10 |
1 | 11 |
1 | 12 |
1 | 14 |
1 | 18 |
1 | 26 |
Step 1: Calculate $(X^`{X})^{-1}$
From $\beta = (X^{`}{X})^{-1} (X^{`}{Y})$ calculate just the $(X^{`}{X})^{-1}$ part
options(digits=2)
cat("The x vector")
X
cat("The x transpose")
XTrans <- t(X); XTrans #transpose of X
cat("Product of X transpose and X")
XTransX <- XTrans %*% X; XTransX # product of X transpose and X
cat("(Inverse of product of X Transpose and X")
XTransXInv <- solve(XTransX); XTransXInv #inverse of X traspose times X
The x vector
1 | 6 |
1 | 7 |
1 | 9 |
1 | 10 |
1 | 11 |
1 | 12 |
1 | 14 |
1 | 18 |
1 | 26 |
The x transpose
1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
6 | 7 | 9 | 10 | 11 | 12 | 14 | 18 | 26 |
Product of X transpose and X
9 | 113 |
113 | 1727 |
(Inverse of product of X Transpose and X
0.623 | -0.0407 |
-0.041 | 0.0032 |
Step 2: Calculate $(X^`{Y})$
From $\beta = (X^{`}{X})^{-1} (X^{`}{Y})$ calculate just the $(X^{`}{Y})$ part
XTransY <- XTrans %*% Y; XTransY
3723 |
55491 |
Step 3: Summarize: find $\beta$ coefficients $\beta = (X^{`}{X})^{-1} (X^{`}{Y})$
multiply step 1 x step 2
options(digits=3)
B <- XTransXInv %*% XTransY;
The $\beta$ coefficients are
B
57.4 |
28.4 |
Check that $\beta$ coefficients from matrix form are the same as from R linear regression model
cat("The coefficients using R's linear regression model are")
lm <- lm(y~x)
print(lm$coefficients,digits=3)
cat("The coefficients we calculated previously with matrix algebra are the same")
B
The coefficients using R's linear regression model are(Intercept) x
57.4 28.4
The coefficients we calculated previously with matrix algebra are the same
57.4 |
28.4 |
Calculate estimates for Y from regression model
fx <- B[1, 1] + (x * B[2, 1])
cat(fx,digits=3)
228 256 313 341 370 398 455 568 795 3
Calculate errors and SSE
Ei <- y - fx
cat("The residuals are ")
cat(Ei)
The residuals are -57.6 -35 -48.8 -33.1 74.5 142 -66.7 109 -84.2
SSE <- t(Ei) %*% Ei
cat("The SSE (sum of squared errors) is ")
cat(SSE)
The SSE (sum of squared errors) is 57139