/************************************************************************
RSQDELTA
DISCLAIMER:
THIS INFORMATION IS PROVIDED BY SAS INSTITUTE INC. AS A SERVICE TO
ITS USERS. IT IS PROVIDED "AS IS". THERE ARE NO WARRANTIES,
EXPRESSED OR IMPLIED, AS TO MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE REGARDING THE ACCURACY OF THE MATERIALS OR CODE
CONTAINED HEREIN.
TITLE: Compute R-square change and F-statistics in regression
PURPOSE:
In a regression analysis, compute the change in R-square and the
associated F statistics and p-values as variables are added to the
model
REQUIRES:
Base SAS and SAS/STAT Software.
USAGE:
Assign the name of your input data set to the macro variable
DATASET. Assign the name of your dependent variable to the macro
variable YVAR=. Assign the names of your independent varaibles
(in the order in which you want them included in the model) to the
macro variable XVAR=.
PRINTED OUTPUT:
The example (following the macro definition below) creates the
following output:
Change in R-square & F statistics as variables are added
to the regression model
OBS _MODEL_ _RSQ_ RSQDELTA F PROB
1 INTERCEP 0.00000 . 2451.73 0.00000
2 RUNTIME 0.74338 0.74338 84.01 0.00000
3 AGE 0.76425 0.02087 2.48 0.12666
4 RUNPULSE 0.81109 0.04685 6.70 0.01537
5 MAXPULSE 0.83682 0.02572 4.10 0.05330
6 WEIGHT 0.84800 0.01118 1.84 0.18714
DETAILS:
PROC REG will provide the partial R-square as variables are added
to the model, if you specify the SCORR1 option on the MODEL
statement. However, PROC REG does not include the partial F
statistics and p-values and none of this is provided in an output
data set.
PROC REG includes the R-square statistic in the OUTEST dataset when
the ADJRSQ option is specified on the MODEL statement. In the
OUTSTAT= data set, PROC GLM includes the sequential sums of squares
(SS1) that will provide the F statistic and associated p-value.
However, PROC GLM does not include the R-square statistic in an
output dataset.
By combining the information from PROC REG's OUTEST dataset and
PROC GLM's OUTSTAT dataset and doing some DATA step programming, we
will have a program that will compute the change in R-square as
well as the F statistics and p-values as variables are added to a
model. The F statistics and p-values in the final table represent
a partial F statistic for the general linear model testing approach
which is defined as follows:
SSE(R) - SSE(F)
----------------
df(R) - df(F)
F = ______________________
C
SSE(F)
--------
df(F)
where SSE is the Error Sum of Squares and df is the error degrees
of freedom for the full (F) or reduced (R) model.
The rejection region is defined as
F > F
C (alpha, df(R)-df(F), df(F))
Note the F statistic is a different than an overall F test which
C
tests whether or not there is a regression relationship between the
dependent variable and the set of independent variables. Most
regression textbooks provide a discussion of tests about the
regression coefficients.
REFERENCES:
Myers, (1986), Classical and Modern Regression with Applications.
Neter, Wasserman, Kutner (1989), Applied Linear Regression Model.
Rawlings (1988), Applied Regression Analysis.
************************************************************************/
%macro rsqdelta(data=,yvar=,xvar=);
%let count=1;
%let b=%str(&xvar);
%let z=%str( );
%let c=%str( );
proc reg data= &data outest=est;
INTERCEP:model &yvar=&z/adjrsq;
%do %until( "&c" = "" );
%let c= %scan(&b,&count,%str( ));
%let z=%str(&z &c);
%let count=%eval(&count+1);
%if "&c" ^= "" %then &c:model &yvar=&z/adjrsq;
;
%end;
%let count=1;
%let b=%str(&xvar);
%let z=%str( );
%let c=%str( );
%do %until( "&c" = "" );
%let z=%str(&z &c);
%let c= %scan(&b,&count,%str( ));
proc glm data=&data outstat=out&count noprint;
model &yvar=&z/ss1;
run;
%let count=%eval(&count+1);
%end;
%do i=1 %to %eval(&count-1);
data out&i;
do while ( last ^= 1);
set out&i end=last;
end;
output;keep f prob;stop;
run;
%end;
data g;
retain prevrsq .;
set est;
if _n_=1 then rsqdelta=.;
else rsqdelta =_rsq_ - prevrsq;
prevrsq=_rsq_;
keep _model_ _rsq_ rsqdelta;
run;
data table;
%do i=1 %to %eval(&count-1);
set out&i;
output;
%end;
data final;
merge g table;
run;
proc print data=final;
title2 "Change in R-square & F statistics as variables are added";
* title2 "to the regression model";
run;
%mend;
/* An example */
options ls=72;
*include data(fitness);
/* The macro parameter xvar= contains the regressor
variables in the order that you desire */
/*
%rsqdelta(data=fitness, yvar=oxy, xvar=runtime age runpulse maxpulse weight);
*/