More with Least Squares
One very cool thing about our formula for the least squares regression line, (mathtt{left(X^{T}Xright)^{-1}X^{T}y}), is that it is the same no matter whether we have one independent variable (univariate) or many independent variables (multivariate). Consider these data, showing the selling prices of some grandfather clocks at auction. The first scatter plot shows the age of the clock in years on the x-axis (100–200), and the second shows the number of bidders on the x-axis (0–20). Price (in pounds or dollars) is on the y-axis on each plot (500–2500). Age (years)BiddersPrice ($) 127131235 115121080 1277845 15091522 15661047 182111979 156121822 132101253 13791297 1139946 137151713 117111024 13781147 15361092 117131152 126101336 170142131 18281550 162111884 184102041 1436854 15991483 108141055 17581545 1086729 17991792 111151175 18781593 1117785 1157744 19451356 16871262 You can see in the notebook below that the first regression line, for the price of a clock as a function of its age, is approximately (mathtt{10.5x-192}). The second regression line, for the price of a clock as a function of the number of bidders at auction, is approximately (mathtt{55x+806}). As mentioned above, each of these univariate least squares regression lines can be calculated with the formula (mathtt{left(X^{T}Xright)^{-1}X^{T}y}). Combining both age and number of bidders together, we can calculate, using the same formula, a multivariate least squares regression equation. This of course is no longer a line. In the case of two input variables as we have here, our line becomes a plane. Our final regression equation becomes (mathtt{12.74x_{1}+85.82x_{2}-1336.72}), with (mathtt{x_1}) representing the age of a clock and (mathtt{x_2}) representing the number of bidders.
Josh Fisher