SAS Macro Programs: boxcox
$Version: 1.4 (05 Sep 2006)
Power transformations by Box-Cox method
The boxcox macro finds maximum likelihood power
(or folded power) transformations of
the response variable in a regression model by the Box-Cox method.
The program provides
graphic displays of the maximum likelihood solution, t-values
for model effects, and the influence of observations on choice of
power. The program can produce
printer plots or high-resolution versions of any of these plots.
The optimal transformation of the response variable is returned
in an output dataset.
For a positive response variable, y > 0, the family of monotone
power transformations with power p is
If the response contains 0 or negative values, use the ADD= parameter
to assure that y+&ADD is strictly positive.
|| (yp - 1) / p
||p != 0
| log (y)
||p = 0
If the response variable is bounded on a closed interval, [0, b],
the FOLD= parameter may be used to obtain analogous folded-power
transformations. For example, use FOLD=100 when the response
variable is a percentage on the interval [0, 100].
The program uses transforms the response to all powers from
the LOPOWER= value to the HIPOWER= value, and fits a regression
model for each, extracting values to an output dataset from
which the plots are drawn.
The influence plot also implements a score test for the power transformation
due to Atkinson, which provides an alternative estimate of the
power transformation, based on power = 1 - slope of the fitted line
in the partial regression plot for a constructed variable.
boxcox is a macro program. A value must be supplied for the
The arguments may be listed within parentheses in any order, separated
by commas. For example:
%boxcox(resp=responsevariable, model=predictors, ..., )
- The name of the response variable for
- A blank-separated list of the independent variables in the
regression, i.e., the terms on the right side
of the = sign in the MODEL statement for PROC
REG. The MODEL= terms may be empty to obtain a
transformation of a response on its own.
- The name of the data set holding the
response and predictor variables. (Default:
most recently created)
- The name of an ID variable for observations,
used in labeling the influence plot. (Default: ID=_N_)
- Upper bound for the response variable. If
FOLD>0 is specified, folded power transformations are computed.
E.g., for a response which is a proprotion, specify FOLD=1;
for a percentage, specify FOLD=100.
- The name of an output dataset to contain
the transformed response. This dataset
contains all original variables, with the
transformed response replacing the original
- The name of the output data set containing
_RMSE_, and t-values for each effect in the
model, with one observation for each power
- PPLOT=RMSE EFFECT INFL
- Which printer plots should be produced?
One or more of RMSE, EFFECT, and INFL, or NONE.
- GPLOT=RMSE EFFECT INFL
- Which high-resolution (PROC GPLOT) plots
should be produced? One or more of RMSE,
EFFECT, and INFL, or NONE.
- low value for power
- high value for power
- number of power values in the interval
LOPOWER to HIPOWER
- confidence coefficient for the confidence
interval for the power.
The example finds power transformations for the MPG variable
in the auto dataset, using Weight, Displacement and Gear Ratio
%include macros(boxcox); * or, store in autocall library;
model=Weight Displa Gratio,
gplot=RMSE EFFECT INFL,
lopower=-2.5, hipower=2.5, conf=.99);
The plot of RMSE vs. lambda (power) indicates power = -1 / sqrt(MPG)
as the maximum likelihood estimate, but power = -1 / MPG == gallons/mile
is within the confidence interval.
The EFFECT plot indicates that the significance of partial t-test are unaffected by the choice of power.
The influence plot indicates that the VW Diesel has a large leverage,
but is not influential in determining the choice of power.
boxglm Power transformations by Box-Cox method for GLM
boxtid Power transformations by Box-Tidwell method
nqplot Normal QQ plot
outlier Robust multivariate outlier detection
resline Resistant line for bivariate data
stars Diagnostic plots for transformations to symmetry