|inflglim||Influence plots for generalized linear models||inflglim|
The INFLGLIM macro produces various influence plots for a generalized
linear model fit by PROC GENMOD. Each of these is a bubble plot of one
diagnostic measure (specified by the
GY= parameter) against another
GX=), with the bubble size proportional to a measure of influence
BUBBLE=COOKD). One plot is produced for each combination
The macro normally takes an input data set of raw data and fits the
GLM specified by the RESP=, and
MODEL= parameters, using an error
distribution given by the
DIST= parameter. It fits the model,
obtains the OBSTATS and PARMEST data sets, and uses these to compute
some additional influence diagnostics (HAT, COOKD, DIFCHI, DIFDEV,
SERES), any of which may be used as the
Alternatively, if you have fit a model with PROC GENMOD and saved
the OBSTATS and PARMEST data sets, you may specify these with the
PARMEST= parameters. The same additional diagnostics
are calculated and plotted.
The INFLGLIM macro is called with keyword parameters. The
RESP= parameters are required, and you must supply the
parameter for any model with non-normal errors.
The arguments may be listed within parentheses in any order, separated
by commas. For example:
%inflglim(data=berkeley, class=dept gender admit, resp=freq, model=dept|gender dept|admit, dist=poisson, id=cell, gx=hat, gy=streschi);
Name of input (raw data) data set. [Default:
The name of response variable. For a loglin model, this
is usually the frequency or cell count variable when the
data are in grouped form (specify
DIST=POISSON in this
Gives the model specification. You may use the '|' and '@' symbols to specify the model.
Specifies the names of any class variables used in the model.
The name of the PROC GENMOD error distribution. If you
don't specify the error distribution, PROC GENMOD uses
The name of the link function. The default is the canonical
link function for the error distribution given by the
The name(s) of any offset variables in the model.
Other options on the MODEL statement (e.g.,
to fit a model without an intercept).
The name of a frequency variable, when the data are in grouped form.
The name of an observation weight (SCWGT) variable, used, for example, to specify structural zeros in a loglin model.
Gives the name of a character observation ID variable
which is used to label influential observations in the
plots. Usually you will want to construct a character
variable which combines the
CLASS= variables into a
compact cell identifier.
The names of variables in the OBSTATS data set used as
ordinates for in the plot(s). One plot is produced for
each combination of the words in GY by the words in GX.
Abscissa(s) for plot, usually PRED or HAT. [Default:
Name of output data set, containing the observation
Specifies the name of the OBSTATS data set (containing residuala and other observation statistics) for a model already fitted.
Specifies the name of the PARMEST data set (containing parameter estimates) for a model already fitted.
Gives the name of the variable to which the bubble size is
Determines which observations, if any, are labeled in the
LABEL=NONE, no observations are labeled; if
LABEL=ALL, all are labeled; if
LABEL=INFL, only possibly
influential points are labeled, as determined by the
INFL= parameter. [Default:
Specifies the criterion used to determine which observations
are influential (when used with
INFL=%STR(DIFCHI > 4 OR HAT > &HCRIT OR &BUBBLE > 1)]
Observation label size. [Default:
LSIZE=1.5]. The height of
other text (e.g., axis labels) is controlled by the
Observation label color. [Default:
Observation label position, relative to the point.
Font used for observation labels.
Bubble size scale factor. [Default:
Specifies whether the bubble size is proportional to AREA
or RADIUS. [Default:
The color of the bubble symbol. [Default:
Bubble fill? Options are
BFILL=SOLID | GRADIENT, where the
latter uses a gradient version of BCOLOR
Color of reference lines. Reference
lines are drawn at nominally 'large' values for HAT values,
standardized residuals, and change in chi square values.
Line style for reference lines. Use
REFLIN=0 to suppress
these reference lines. [Default:
Name of the graph in the graphic catalog [Default:
Name of the graphics catalog.
%include vcd(inflglim); *-- or include in an autocall library; %include data(berkeley); %inflglim(data=berkeley, class=dept gender admit, resp=freq, model=dept|gender dept|admit, dist=poisson, id=cell, gx=hat, gy=streschi);