boxglm Power transformations by Box-Cox method for GLMs boxglm

SAS Macro Programs: boxglm

$Version: 1.1 (10 Dec 1991)
Michael Friendly
York University

The boxglm macro ( [download] get

Power transformations by Box-Cox method for GLMs

The boxglm macro finds power transformations of the response variable in a general linear model by the Box-Cox method, with graphic display of the maximum likelihood solution (RMSE plot), F-values for model effects (EFFECT plot), and the influence of observations on choice of power (INFL plot). The program produces printer plots by default, and can optionally produce high-resolution versions of any of these plots.


The program uses transforms the response to all powers from the LOPOWER= value to the HIPOWER= value, and fits a linear model using PROC GLM for each, extracting values to an output dataset from which the plots are drawn.

The influence plot also implements a score test for the power transformation due to Atkinson, which provides an alternative estimate of the power transformation. based on power = 1 - slope of the fitted line in the partial regression plot for a constructed variable.


boxglm is a macro program. Values must be supplied for the RESP= and MODEL= parameters.

The arguments may be listed within parentheses in any order, separated by commas. For example:

   %boxglm(resp=responsevariable, model=predictors, ..., )


The name of the response variable for analysis.
The independent variables in the model, i.e., the terms on the right side of the = sign in the MODEL statement for PROC GLM. The MODEL= argument should be spelled out completely, rather than using abbreviated 'bar' notation.
Specifies the MODEL= variables which are classification factors rather than continuous variables.
The name of the data set holding the response and predictor variables. (Default: most recently created)
The name of an ID variable for observations
The name of an output dataset to contain the transformed response. This dataset contains all original variables, with the transformed response replacing the original variable.
The name of the output data set containing _RMSE_, and t-values for each effect in the model, with one observation for each power value tried. PPLOT=RMSE EFFECT INFL
Which printer plots should be produced? One or more of RMSE, EFFECT, and INFL, or NONE.
Which high-resolution (PROC GPLOT) plots should be produced? One or more of RMSE, EFFECT, and INFL, or NONE.
low value for power
high value for power
number of power values in the interval LOPOWER to HIPOWER
confidence coefficient for the confidence interval for the power.


The example finds power transformations for the SURVIVAL variable in a dataset relating survival time to type of poison and antidote used.
data poisons;
   input antidote $ poison @;
   label survival='Survival time';
   length id $4;
   do rep = 1 to 4;
      id = trim(antidote)||trim(put(poison,1.))||'-'||put(rep,1.);
      input survival @;
A 1  .31  .45  .46  .43
A 2  .36  .29  .40  .23
A 3  .22  .21  .18  .23
B 1  .82 1.10  .88  .72
B 2  .92  .61  .49 1.24
B 3  .30  .37  .38  .29
C 1  .43  .45  .63  .76
C 2  .44  .35  .31  .40
C 3  .23  .25  .24  .22
D 1  .45  .71  .66  .62
D 2  .56 1.02  .71  .38
D 3  .30  .36  .31  .33
*include macros(boxglm);
        model=antidote poison,
        class=antidote poison,
        gplot=RMSE EFFECT INFL,
        npower=17, conf=.99);
The plot of RMSE vs. lambda (power) indicates power = -1 / sqrt(SURVIVAL) as the maximum likelihood estimate, but power = -1 / SURVIVAL == survival rate is within the confidence interval.

The EFFECT plot indicates that the significance of partial F-tests are unaffected by the choice of power. The influence plot indicates that a few observations have a large leverage, but none is influential in determining the choice of power.

See also

boxcox Power transformations by Box-Cox method
boxtid Power transformations by Box-Tidwell method
outlier Robust multivariate outlier detection
resline Resistant line for bivariate data
stars Diagnostic plots for transformations to symmetry