%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Regression example 1 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% clear % clear the work space % Start by loading data cd ~/matlab/Data load Regression_example.mat % Now, we can see what's in this data set by typing: whos % Note that the variable name and size are shown. With the exception of % "fulldat", these variables are column vectors with 856 points. % "fulldat" is just a matrix of all 9 column vectors. % Let's start by plotting the raw data % Open a figure window in "tall" orientation figure_tall(1); clf; % Define an axes on the figure window subplot(4,1,1); % Plot ozone data (use handles) plot(yr, o3, '-k'); % Set axis properties, and turn grid on axis([yr(1) yr(end) 0 200]); grid on % Label axes xlabel('Year'); ylabel('[Ozone] (ppb)'); % Insert title, and change the font size using "handles" (this just % means that we assign a label to the graphics object - in this case, % t2, and change its properties by referring to that label. We could % have done the same for xlabel, ylable, plot, etc.) t2 = title('\bfOzone concentration in Houston during summer, 1984'); set(t2, 'fontsize', 12); % Now, plot scatter plots of the remaining variables, using points % rather than a line % O3 vs. WSPD subplot(4,2,3); plot(wspd, o3, '.k'); axis square axis([1 7 0 200]); grid on; xlabel('WSPD (m s^-^1)'); ylabel('[O3] (ppb)'); % O3 vs. OPCOV subplot(4,2,4); plot(opcov, o3, '.k'); axis square axis([0 100 0 200]); grid on; xlabel('OPCOV (%)'); ylabel('[O3] (ppb)'); % Redefine tics on the x-axis set(gca, 'XTick', 0:25:100); % O3 vs. RAD subplot(4,2,5); plot(rad, o3, '.k'); axis square axis([0 400 0 200]); grid on; xlabel('RAD (W m^-^2)'); ylabel('[O3] (ppb)'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % OK! Now that we can see what's going on here, let's compute % regression coefficients to predict ozone using some other variables. % Let's start with individual predictions. Note that there are a number % of ways to do this. We'll go through all three methods, starting with % a linear trend, then just use the shorthand method. % Let's start with a linear trend. Note, a1 is just the covariance % between o3 and yr, divided by the variance of yr yr2 = yr - mean(yr); % This is just temporary - to avoid a really % messy next line a1 = (o3 - mean(o3))'*yr2/(yr2'*yr2); % Slope (deg / yr) a0 = mean(o3) - a1*mean(yr); % Intercept % Check out the coefficients (you can do this by typing the variables in % without a ';', and pressing 'return': a0 a1 % Now, we can compute yhat by: yhat = a0 + a1*yr; % Now, do the same, using linear algebra. Note, the result is % identical. y = o3; x = [ones(size(yr)) yr]; a = inv(x'*x) * (x'*y); a0 = a(1) a1 = a(2) % Finally, redo this using MATLAB shorthand for matrix algebra: % if Ax = b, then x = A \ b . Type "help slash" if you'd like. % Again, note that the results are identical. y = o3; x = [ones(size(yr)) yr]; a = (x'*x) \ (x'*y); a0 = a(1) a1 = a(2) % For the above two examples, we can calculate yhat by: yhat = x*a; % Now, let's plot this on the graph above. Note that we need to "hold" % the plot so we don't erase what's already there. subplot(4,1,1); hold on plot(yr, yhat, '-r'); hold off % Let's do the same for the next three. ntim = length(o3); % WSPD x = [ones(ntim, 1) wspd]; a = (x'*x) \ (x'*y); yhat = x*a; subplot(4,2,3); hold on; plot(wspd, yhat, '-r'); hold off; % OPCOV x = [ones(ntim, 1) opcov]; a = (x'*x) \ (x'*y); yhat = x*a; subplot(4,2,4); hold on; plot(opcov, yhat, '-r'); hold off; % RAD x = [ones(ntim, 1) rad]; a = (x'*x) \ (x'*y); yhat = x*a; subplot(4,2,5); hold on; plot(rad, yhat, '-r'); hold off; %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Great! Now, let's take a look at some correlation coefficients, and % compute yhat using multiple linear regression. % Start with computing correlation coefficients individually. Again, % let's concatenate all the predictors together to save space. Note, % however, that we're calculating univarite correlation coefficients! y = standardize(o3); x = standardize([wspd opcov rad]); r1 = y'*x/(ntim-1) % We could also use the canned "corr" function, for correlations of a % vector with another vector: corr(o3, wspd) % Hmmm... based on the scatter plot for WSPD, a linear regression % doesn't look optimal. Furthermore, we might expect that turbulent % mixing is not directly proportional to WSPD. Let's try WSPD^-1: x = standardize([1./wspd opcov rad]); r1 = y'*x/(ntim-1); % Now, let's predict o3 using wspd AND rad: y = o3; x = [ones(ntim,1) 1./wspd rad]; a = (x'*x) \ (x'*y); yhat = x*a; % Let's look at variance explaned: corr(yhat, y)^2 % Check this against R2: r1y = corr(o3, 1./wspd); r2y = corr(o3, rad); r12 = corr(1./wspd, rad); R2 = (r1y^2 + r2y^2 - 2*r1y*r2y*r12) / (1-r12^2) % What happens if we add opcov? y = o3; x = [ones(ntim,1) 1./wspd rad opcov]; a = (x'*x) \ (x'*y); yhat = x*a; % Let's look at variance explaned: corr(yhat, y)^2 % Why doesn't this add much? % Finally, lets plot the predicted curve vs. o3: subplot(4,2,6); plot(yhat, o3, '.k'); axis square axis([0 200 0 200]); grid on; xlabel('[O3] predicted (ppb)'); ylabel('[O3] (ppb)'); ll = line([0 200], [0 200]); set(ll, 'color', 'r'); % Last, let's plot the time series: subplot(4,1,4); plot(yr, o3, 'k', yr, yhat, 'r'); axis([yr(1) yr(end) 0 200]); grid on xlabel('Year'); ylabel('[Ozone] (ppb)'); t2 = title(... '\bfActual and Predicted Ozone concentration in Houston during summer, 1984'); set(t2, 'fontsize', 12); % Add regression equation t1 = text(1984.37, 160, ... '[O_3]^* = -67 + 220/WSPD + 0.25*RAD + 0.23*OPCOV'); set(t1, 'fontsize', 10, 'color', 'r'); % If you want to print this, I suggest printing it to a file, then % converting it to PostScript, or something. %cd ~/matlab/Figs %print -depsc2 -painters ozone_regressions.eps % To convert to PDF, type: %!ps2pdf ozone_regressions.eps % If you want to print it, open the PDF file, or print the .eps file % directly from a terminal window.