% Example of simple and multiple linear regression % AOS575: 9/25/2006 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Start by generating some pseudodata that contains a trend and some % random variations. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% ntim = 30; % Trend: trend = 0.2*[1:ntim]'; % Random vector (gaussian) - Let's assume this is a signal x1 = randn(ntim,1); % Another random vector that's correlated with x1. Again, we're % assuming this is a signal. x2 = 0.8*x1 + 0.6*randn(ntim,1); % OK, generate a time series that's just the trend + x1 + x2 + some % truly random variation (that we're assuming is not a signal). y = trend + x1 + x2 + 0.5*randn(ntim, 1); % Plot the data: Start with y, x1, and x2 figure(1); orient tall wysiwyg subplot(2,1,1); plot(1:ntim, y, 'k', 1:ntim, x1, 'b', 1:ntim, x2, 'r'); % How do x1 and x2 relate to each other? Plot a scatterplot between x1 % and x2. This should be an elongated elipse oriented at 45deg. subplot(2,2,3); plot(x1, x2, '.k'); axis square axis([-3 3 -3 3]); % Do the same for y vs. x1. This should also be an elongated ellipse. subplot(2,2,4); plot(x1, y, '.k'); axis square axis([-3 3 -3 10]); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate regression coefficients between different time series %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % To start with, we're going to introduce three methods for calculating % the regression coefficients. We'll use the linear trend as an % example. Remember that for a linear trend, the independent variable % is just an indexed variable (like time). % Linear trend x = [1:ntim]'; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Method 1: Calculate a0 and a1 by hand. % Remove mean to avoid a really messy next line: xtem = x-mean(x); % Now, calculate the slope coefficient. Note that the covariance and % variance can be calculated via a dot product. Also note that due to % our small sample size, the slope parameter may not equal, but should % be close to, 0.2 (see above equation). a1 = (y - mean(y))'*xtem / (xtem'*xtem) % And, the intercept parameter: a0 = mean(y) - a1*mean(x) % Now, get the predicted trend line: yhat = a0 + a1*x; % Plot this with the actual data: subplot(2,1,1); plot(x, y, 'k', x, yhat, 'b'); % To get the correlation between the trend and the actual data, do the % following: r = (x-mean(x))'*(y-mean(y)) / (std(x)*std(y)*(ntim-1)) % Or, download 'corr.m' from the course website r = corr(x, y) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Method 2: Use linear algebra % For this method, we'll define 'x' as a matrix containing our predictor % variables as columns. To get the intercept, we also need to add a % column of ones in the first column: x = [ones(ntim, 1) [1:ntim]']; a = inv(x'*x) * (x'*y) yhat = x*a; % Check to make sure that a0 and a1 are identical to those you % calculated in Method 1, above. a0 = a(1) a1 = a(2) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Method 3: redo this using MATLAB shorthand for matrix algebra: % if Ax = b, then x = A \ b . Type "help slash" if you'd like. % Again, note that the results are identical. x = [ones(ntim, 1) [1:ntim]']; a = (x'*x) \ (x'*y) yhat = x*a; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Multiple linear regression: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Note that methods 2 and 3 above (nearly identical) can be easily % generalized for multiple regression: x = [ones(ntim, 1) [1:ntim]' x1]; a = (x'*x) \ (x'*y); yhat = x*a; % Calculate the correlation between the predictand and the actual data, % 'y'. corr(yhat, y) % What happens if we add x2 to the predictors? First, let's look at the % raw correlations between the three variables in question: corr(y, x1) corr(y, x2) corr(x1, x2) x = [ones(ntim, 1) [1:ntim]' x1 x2]; a = (x'*x) \ (x'*y); yhat = x*a; % Calculate the correlation corr(yhat, y) % Note that x2 doesn't do nearly as much, as it's pretty redundant with % x1.