cpplot C(p) plots for model selection cpplot

SAS Macro Programs: cpplot

$Version: 1.5 (26 May 2003)
Michael Friendly
York University

The cpplot macro ( [download] get cpplot.sas)

Plots of Mallow's C(p) and related statistics for model selection

The CPPLOT macro plots of Mallow's C(p) and related statistics for model selection in linear models. In a graph of C(p) vs. p, good models are those for which C(p) <= p. The program optionally plots other equivalent statistics for which the reference line for good models is horizontal. The program produces a high-resolution plot of C(p) by default. Optionally, it can produce printer plots in any of the above forms. These are useful only with SAS Version 6.07 or later, where PROC PLOT can label points with character strings.


The macro uses PROC REG with / SELECTION=RSQUARE on the MODEL statement, and extracts the values to be plotted using the OUTEST= option.

cpplot uses the label macro to label points in these plots.


cpplot is a macro program. Values must be supplied for the YVAR= and XVAR= parameters

The arguments may be listed within parentheses in any order, separated by commas. For example:

   %cpplot(YVAR=dependentvar, XVAR=predictorvars, ..., )


The name of the dependent variable
A list of potential independent variables in the model
The name of input data set
PLOTCHAR=1 2 3 4 5 6 7 6 8 9 0
Symbols used to identify the independent variables included in any model in the plot. Usually one would specify the first character of the name of each independent variable, in the order listed in XVAR. The PLOTCHAR list is parsed as blank-delimited words, so each symbol may consist of more than one character. However, blanks are removed from the symbol used to identify a particular model.
Other options for the MODEL statement, e.g., OPTIONS=AIC to print AIC values.
High-resolution (PROC GPLOT) plots: Specify a list of any one or more of CP CD F PROBF (separated by blanks).
Printer plots: any one or more of CP CD F PROBF
Maximum value of C(p) plotted. Since values of C(p) can be extremely high for unreasonable models, use this parameter to restrict the plot to the more interesting range of models for which C(p) &le CPMAX. Any models with greater values of C(p) are shown off-scale, labelled in red.
Maximum value of F plotted
Name for the graphic catalog entry
Name of the graphics catalog in which the plot is to be stored. Default: WORK.GSEG.


The example produces C(p) and F plots of models predicting FUEL consumption from all subsets of the predictor variables TAX, DRIVERS, ROAD, INCome, and POPulation.
%include data(fuel) ;
%include macros(cpplot);       * or, store in autocall library;
      xvar=tax drivers road inc pop,
      gplot=CP F, plotchar=T D R I P,
      cpmax=20, fmax=20 );

See also

boxcox Power transformations by Box-Cox method
label Create Annotate dataset to label observations
inflplot Influence plot for regression models
partial Partial regression residual plots