goodfit Goodness of fit tests for discrete distributions goodfit

Visualizing Categorical Data: goodfit

\$Version: 1.4 (7 May 2002)
Michael Friendly
York University

The goodfit macro ( get goodfit.sas)

Goodness of fit tests for discrete distributions

The GOODFIT macro carries out Chi-square goodness-of-fit tests for discrete distributions. These include the uniform, binomial, Poisson, negative binomial, geometric, and logarithmic series distributions, as well as any discrete (multinomial) distribution whose probabilities you can specify. Both the Pearson chi-square and likelihood-ratio chi-square are computed.

The data may consist either of individual observations on a single variable, or a grouped frequency distribution.

The `parameter(s)` of the distribution may be specified as constants or may be estimated from the data.

Usage

The GOODFIT macro is called with keyword parameters. The arguments may be listed within parentheses in any order, separated by commas. For example:

```  %goodfit(var=k, freq=freq, dist=binomial);
```

You must specify a VAR= analysis variable and the keyword for the distribution to be fit with the DIST= parameter. All other parameters are optional.

Parameters

DATA=
Specifies the name of the input data set to be analyzed. [Default: `DATA=_LAST_`]
VAR=
Specifies the name of the variable to be analyzed, the basic count variable.
FREQ=
Specifies the name of a frequency variable for a grouped data set. If no FREQ= variable is specified, the program assumes the data set is ungrouped, and calculates frequencies using PROC FREQ. In this case you can specify a SAS format with the FORMAT= parameter to control the way the observations are grouped.
DIST=
Specifies the name of the discrete distribution to be fit. The allowable values are: UNIFORM, DISCRETE, BINOMIAL, POISSON, NEGBIN, GEOMETRIC, LOGSERIES.
PARM=
Specifies the value of `parameter(s)` for the distribution being fit. If PARM= is not specified, the `parameter(s)` are estimated using maximum likelihood or method of moment estimators.
SUMAT=
For a distribution where frequencies for values of the VAR= variable = k have been lumped into a single category, specify `SUMAT=`k causes the macro to sum the probabilities and fitted frequencies for all values >= k. [Default: `SUMAT=10000`]
FORMAT=
The name of a SAS format used when no FREQ= variable has been specified.
OUT=
Name of the output data set containing the grouped frequency distribution, estimated fitted frequencies (EXP) and the values of the Pearson (CHI) and deviance (DEV) residuals. [Default: `OUT=FIT`]
OUTSTAT=
Name of the output data set containing goodness-of-fit statistics. [Default: `OUTSTAT=STATS`]

Example

```%include vcd(goodfit);        *-- or include in an autocall library;

%goodfit();
```