inflglim | Influence plots for generalized linear models | inflglim |
The INFLGLIM macro produces various influence plots for a generalized
linear model fit by PROC GENMOD. Each of these is a bubble plot of one
diagnostic measure (specified by the GY=
parameter) against another
(GX=
), with the bubble size proportional to a measure of influence
(usually, BUBBLE=COOKD
). One plot is produced for each combination
of the GY=
and GX=
parameters.
The macro normally takes an input data set of raw data and fits the
GLM specified by the RESP=, and MODEL=
parameters, using an error
distribution given by the DIST=
parameter. It fits the model,
obtains the OBSTATS and PARMEST data sets, and uses these to compute
some additional influence diagnostics (HAT, COOKD, DIFCHI, DIFDEV,
SERES), any of which may be used as the GY=
and GX=
variables.
Alternatively, if you have fit a model with PROC GENMOD and saved
the OBSTATS and PARMEST data sets, you may specify these with the
OBSTATS=
and PARMEST=
parameters. The same additional diagnostics
are calculated and plotted.
The INFLGLIM macro is called with keyword parameters. The MODEL=
and RESP=
parameters are required, and you must supply the DIST=
parameter for any model with non-normal errors.
The arguments may be listed within parentheses in any order, separated
by commas. For example:
%inflglim(data=berkeley, class=dept gender admit, resp=freq, model=dept|gender dept|admit, dist=poisson, id=cell, gx=hat, gy=streschi);
Name of input (raw data) data set. [Default: DATA=_LAST_
]
The name of response variable. For a loglin model, this
is usually the frequency or cell count variable when the
data are in grouped form (specify DIST=POISSON
in this
case).
Gives the model specification. You may use the '|' and '@' symbols to specify the model.
Specifies the names of any class variables used in the model.
The name of the PROC GENMOD error distribution. If you
don't specify the error distribution, PROC GENMOD uses
DIST=NORMAL.
The name of the link function. The default is the canonical
link function for the error distribution given by the
DIST=
parameter.
The name(s) of any offset variables in the model.
Other options on the MODEL statement (e.g., MOPT=NOINT
to fit a model without an intercept).
The name of a frequency variable, when the data are in grouped form.
The name of an observation weight (SCWGT) variable, used, for example, to specify structural zeros in a loglin model.
Gives the name of a character observation ID variable
which is used to label influential observations in the
plots. Usually you will want to construct a character
variable which combines the CLASS=
variables into a
compact cell identifier.
The names of variables in the OBSTATS data set used as
ordinates for in the plot(s). One plot is produced for
each combination of the words in GY by the words in GX.
[Default: GY=DIFCHI STRESCHI
]
Abscissa(s) for plot, usually PRED or HAT. [Default: GX=HAT
]
Name of output data set, containing the observation
statistics. [Default: OUT=COOKD
]
Specifies the name of the OBSTATS data set (containing residuala and other observation statistics) for a model already fitted.
Specifies the name of the PARMEST data set (containing parameter estimates) for a model already fitted.
Gives the name of the variable to which the bubble size is
proportional. [Default: BUBBLE=COOKD
]
Determines which observations, if any, are labeled in the
plots. If LABEL=NONE
, no observations are labeled; if
LABEL=ALL
, all are labeled; if LABEL=INFL
, only possibly
influential points are labeled, as determined by the
INFL=
parameter. [Default: LABEL=INFL
]
Specifies the criterion used to determine which observations
are influential (when used with LABEL=INFL
).
[Default: INFL=%STR(DIFCHI > 4 OR HAT > &HCRIT OR &BUBBLE > 1)
]
Observation label size. [Default: LSIZE=1.5
]. The height of
other text (e.g., axis labels) is controlled by the HTEXT=
goption.
Observation label color. [Default: LCOLOR=BLACK
]
Observation label position, relative to the point.
[Default: LPOS=5
]
Font used for observation labels.
Bubble size scale factor. [Default: BSIZE=10
]
Specifies whether the bubble size is proportional to AREA
or RADIUS. [Default: BSCALE=AREA
]
The color of the bubble symbol. [Default: BCOLOR=RED
]
Bubble fill? Options are BFILL=SOLID | GRADIENT
, where the
latter uses a gradient version of BCOLOR
Color of reference lines. Reference
lines are drawn at nominally 'large' values for HAT values,
standardized residuals, and change in chi square values.
[Default: REFCOL=BLACK
]
Line style for reference lines. Use REFLIN=0
to suppress
these reference lines. [Default: REFLIN=33
]
Name of the graph in the graphic catalog [Default:
NAME=INFLGLIM
]
Name of the graphics catalog.
%include vcd(inflglim); *-- or include in an autocall library; %include data(berkeley); %inflglim(data=berkeley, class=dept gender admit, resp=freq, model=dept|gender dept|admit, dist=poisson, id=cell, gx=hat, gy=streschi);