Michael Friendly

- Logistic Regression Model
- Predicted probabilities
- Fitting Logistic Regression Models
- Plotting results from PROC LOGISTIC
- Quantitative predictors
- Models with interaction
- Ordinal Response: Proportional Odds Model
- Plotting results from PROC LOGISTIC
- Polytomous Response: Nested Dichotomies
- Influence statistics and diagnostic plots

------------------------

(2) Some material in this section borrows from Koch & Stokes (1991).

------------------------

Consider the data below on the effects of a particular treatment for arthritis. The response is improvement; sex and treatment are the explanatory variables.

Improvement Sex Treatment None Some/Marked Total F Active 6 21 27 F Placebo 19 13 32 M Active 7 7 14 M Placebo 10 1 11

Let * pi sub hi * be the probability that a patient of sex
* h * who receives treatment * i * will show some or marked
improvement. It is convenient to model this probability in terms of
the * log odds * of improvement, called the

(11)

Number Observed Odds Observed SEX Treatmt Better Total P{better} Better Logit Female Active 21 27 0.7778 3.50 1.2529 Female Placebo 13 32 0.4062 0.68 -0.3797 Male Active 7 14 0.5000 1.00 0.0000 Male Placebo 1 11 0.0909 0.10 -2.3026

The * logistic regression model * fits the log odds
by a linear function of the explanatory variables (as is multiple
regression).

(12)For example, a simple model might assume additive ("main") effects for sex and treatment on the log odds of improvement.

(13)where

*x sub 1*and*x sub 2*are dummy variables representing sex and treatment, respectively:x sub 1 = left "{" lpile { { 0 ' ' , ' ' roman ' if male' } above { 1 ' ' , ' ' roman ' if female' } } right ""

x sub 2 = left "{" lpile { { 0 ' ' , ' ' roman ' if placebo' } above { 1 ' ' , ' ' roman ' if active' } } right ""

*alpha*is the log odds of improvement for males receiving placebo.*beta sub 1*is the increment in log odds for being female. Therefore,*e sup < beta sub 1 >*gives the odds of improvement for females relative to males.*beta sub 2*is the increment in log odds for being in the active treatment group.*e sup < beta sub 2 >*gives the odds of improvement for the active treatment group relative to placebo.

(14)For the example, when

Sex Treatment Odds Improved Pr{Improved} Female ActiveThe formulas for predicted probabilities may look formidable, but the numerical results are easy to interpret.e sup {alpha+beta_1+beta_2}e sup {alpha+beta_1+beta_2 }/(1+e sup {alpha+beta_1+beta_2})Female Placeboe sup {alpha + beta_1}e sup {alpha + beta_1}/(1 + e sup {alpha + beta_1})Male Activee sup {alpha + beta_2}e sup {alpha + beta_2}/(1 + e sup {alpha + beta_2})Male Placeboe sup {alpha}e sup {alpha}/(1 + e sup {alpha})

The following DATA step creates a data set in frequency form
named` arthrit`. The dummy variables` _SEX_` and `
_TREAT_` corresponding to * x sub 1 * and * x sub 2 *
are created, as is the dichotomous response variable, `
better`.

The first logistic regression model includes effects for sex and
treatment, specified by the dummy variables on the ` MODEL`
statement. Note that by default, PROC LOGISTIC orders the response
values in * increasing* order, and sets up the model so that it
is predicting the probability of the * smallest* ordered value,
Pr{better=0}, which means it would be modelling the probability of No
improvement. The DESCENDING option (available with Version 6.08)
reverses this order, so that predicted results will be for
Pr{better=1).(3)

------------------------

(3) In earlier releases, a format is used to order
the values of ` better` so that the first value
corresponds to Improved.

------------------------

data arthrits; input sex $ trtment $ improve $ count; _treat_ = (trtment='Active'); _sex_ = (sex='F'); better = (improve='some'); cards; F Active none 6 M Active none 7 F Active some 21 M Active some 7 F Placebo none 19 M Placebo none 10 F Placebo some 13 M Placebo some 1 ; proc logistic data=arthrits descending; freq count; model better = _sex_ _treat_ / scale=none aggregate;The options

+-------------------------------------------------------------------+ | | | Deviance and Pearson Goodness-of-Fit Statistics | | Pr > | | Criterion DF Value Value/DF Chi-Square | | | | Deviance 1 0.2776 0.2776 0.5983 | | Pearson 1 0.2637 0.2637 0.6076 | | | | Number of unique profiles: 4 | | | | Testing Global Null Hypothesis: BETA=0 | | | | Intercept | | Intercept and | | Criterion Only Covariates Chi-Square for Covariates | | | | AIC 118.449 104.222 . | | SC 118.528 104.460 . | | -2 LOG L 116.449 98.222 18.227 with 2 DF (p=0.0001)| | Score . . 16.797 with 2 DF (p=0.0002)| | | +-------------------------------------------------------------------+The Chi-square tests for BETA=0 above test the joint effect of sex and treatment. Individual effects in the model are tested by Wald

+--------------------------------------------------------------------------+ | | | Analysis of Maximum Likelihood Estimates | | | | Parameter Standard Wald Pr > Standardized Odds | Variable DF Estimate Error Chi-Square Chi-Square Estimate Ratio | | | INTERCPT 1 -1.9037 0.5982 10.1286 0.0015 . . | | _SEX_ 1 1.4687 0.5756 6.5092 0.0107 0.372433 4.343 | _TREAT_ 1 1.7817 0.5188 11.7961 0.0006 0.493956 5.940 | | +--------------------------------------------------------------------------+

roman 'logit' ( pi sub hi ) = -1.90 + 1.47_sex_ + 1.78 _treat_ (15)is most easily interpreted by considering the odds ratios corresponding to the parameters:

- 1.47 is the increment to log odds of a better outcome for
females; the odds ratio
*e sup 1.47 = 4.34*indicates that females are 4.3 times as likely to achieve a better outcome than males. - 1.78 is the increment to log odds for the treatment group; the
odds ratio
*e sup 1.78 = 5.94*indicates that the treated group is nearly 6 times as likely to achieve a better outcome than the placebo group.

proc logistic data=arthrit; freq count; model better = _sex_ _treat_ ;Theoutput out=results p=predict l=lower u=upper xbeta=logit;

The output data set ` RESULTS` contains the following
variables. There are two observations for each group (for none and
some improvement). The ` PREDICT` variable gives the
predicted probability of an improved outcome according to model (15),
using the inverse transformation (14) of logit to probability.

+-------------------------------------------------------------------+ | | | SEX TRTMENT IMPROVE COUNT PREDICT LOWER UPPER LOGIT | | | | F Active none 6 0.794 0.620 0.900 1.347 | | M Active none 7 0.470 0.257 0.694 -0.122 | | F Active some 21 0.794 0.620 0.900 1.347 | | M Active some 7 0.470 0.257 0.694 -0.122 | | F Placebo none 19 0.393 0.248 0.560 -0.435 | | M Placebo none 10 0.130 0.044 0.325 -1.904 | | F Placebo some 13 0.393 0.248 0.560 -0.435 | | M Placebo some 1 0.130 0.044 0.325 -1.904 | | | +-------------------------------------------------------------------+

To plot the predicted probabilities of improvement and confidence
limits from the` RESULTS` data set, we select the observations
for` improve='some'`. A plot can be done as a bar chart with
PROC GCHART or as a line graph with PROC PLOT. Confidence limits can
be added to either with the SAS/GRAPH Annotate facility. The
statements below show how a grouped horizontal bar chart is
constructed.

data results; set results; if improve='some'; label predict='Prob. Improved'; data limits; set results; xsys='2'; ysys='2'; midpoint=trtment; group=sex; x = lower; function='MOVE '; output; text='|'; function='LABEL '; output; x = upper; function='DRAW '; output; text='|'; function='LABEL '; output; proc gchart data=results; hbar trtment / sumvar=predict group=sex gspace=3 anno=limits raxis=axis1 maxis=axis2 gaxis=axis3; axis1 order=(0 to 1 by .2) label=(h=1.5 f=duplex) value=(h=1.3 f=duplex); axis2 label=(h=1.4 f=duplex 'Treat') value=(h=1.2 f=duplex); axis3 label=(h=1.4 f=duplex) value=(h=1.2 f=duplex); pattern1 v=empty;

Figure 25: Predicted probabilities of improvement

- the data are usually entered in case form, with one observation per person.
- the statistics for goodness-of-fit are computed differently.
- the output data set for plotting now contains one observation per person.

The DATA step below creates a SAS data set named `
arthrit`, in case form. It is equivalent to the earlier example,
but contains the age of each person in the sample. Programming
statements are used to create dummy variables` _sex_` and`
_treat_` as before. Variables for testing for interactions among
sex, treatment and age are also created. A preliminary analysis
(described later) is used to test whether any of these variables
interact. That test shows that all interactions can safely be
ignored in this example. That test is one suitable test for goodness
of fit of the main effects model.

data arthrit; length treat $7. sex $6. ; input id treat $ sex $ age improve @@ ; better = (improve > 0); /* Dichotomous response */ _treat_ = (treat ='Treated') ; /* Dummy var for treatment */ _sex_ = (sex = 'Female'); /* and sex */ agesex = age*_sex_ ; /* Dummy var for testing */ agetrt = age*_treat_; /* interactions */ sextrt = _sex_*_treat_; age2 = age*age ; cards ; 57 Treated Male 27 1 9 Placebo Male 37 0 46 Treated Male 29 0 14 Placebo Male 44 0 77 Treated Male 30 0 73 Placebo Male 50 0 ...(observations omitted)56 Treated Female 69 1 42 Placebo Female 66 0 43 Treated Female 70 1 15 Placebo Female 66 1 71 Placebo Female 68 1 1 Placebo Female 74 2

The next logistic model includes (main) effects for age, sex, and
treatment. The ` lackfit` option requests a lack-of-fit test
proposed by Hosmer and Lemeshow (1989). This test divides subjects
into deciles based on predicted probabilities, then computes a *
chi² * from observed and expected frequencies. In this
example, we plot both a confidence interval for Pr{Improved} and *
roman 'logit' ± roman 's.e.' *. To make these intervals
roughly comparable, we choose * alpha = .33 * to give a 67%
confidence interval.

title2 h=1.3 f=duplex 'Estimated Effects of Age, Treatment and Sex'; proc logistic data=arthrit; format better outcome.; model better = _sex_ _treat_ age / lackfit; output out=results p=predict l=lower u=upper xbeta=logit stdxbeta=selogit / alpha=.33;The results include:

+-------------------------------------------------------------------+ | | | Testing Global Null Hypothesis: BETA=0 | | Intercept | | Intercept and | | Criterion Only Covariates Chi-Square for Covariates | | | | AIC 118.449 100.063 . | | SC 120.880 109.786 . | | -2 LOG L 116.449 92.063 24.386 with 3 DF (p=0.0001) | Score . . 22.005 with 3 DF (p=0.0001) | | +-------------------------------------------------------------------+

+-------------------------------------------------------------------+ | | | Analysis of Maximum Likelihood Estimates | | Parameter Standard Wald Pr > Standardized Odds | Variable DF Estimate Error Chi-Square Chi-Square Estimate | Ratio | INTERCPT 1 -4.5033 1.3074 11.8649 0.0006 .| . | _SEX_ 1 1.4878 0.5948 6.2576 0.0124 0.377296| 4.427 | _TREAT_ 1 1.7598 0.5365 10.7596 0.0010 0.487891| 5.811 | AGE 1 0.0487 0.0207 5.5655 0.0183 0.343176| 1.050 | | +-------------------------------------------------------------------+

+-------------------------------------------------------------------+ | | | Hosmer and Lemeshow Goodness-of-Fit Test | | BETTER = not | | BETTER = improved improved | | -------------------- --------------------| | Group Total Observed Expected Observed Expected| | 1 9 0 1.09 9 7.91| | 2 8 3 1.48 5 6.52| | 3 8 2 2.00 6 6.00| | 4 8 2 3.04 6 4.96| | 5 8 5 3.64 3 4.36| | 6 9 5 4.82 4 4.18| | 7 8 4 4.78 4 3.22| | 8 8 5 5.77 3 2.23| | 9 8 7 6.62 1 1.38| | 10 10 9 8.77 1 1.23| | Goodness-of-fit Statistic = 5.5549 with 8 DF (p=0.6970) | | | +-------------------------------------------------------------------+

Plots are constructed from the data set RESULTS. The first few observations are shown below.

+---------------------------------------------------------------------------+ | | | ID TREAT AGE SEX IMPROVE PREDICT LOWER UPPER LOGIT SELOGIT | | | | 57 Treated 27 Male 1 0.194 0.103 0.334 -1.427 0.758 | | 9 Placebo 37 Male 0 0.063 0.032 0.120 -2.700 0.725 | | 46 Treated 29 Male 0 0.209 0.115 0.350 -1.330 0.728 | | 14 Placebo 44 Male 0 0.086 0.047 0.152 -2.358 0.658 | | ... | | | +---------------------------------------------------------------------------+

Predicted probabilities and confidence limits are contained in
the variables` PREDICT`,` UPPER`, and` LOWER`.
To show the effects of sex, treatment, and age on Pr{Improvement},
separate plots are drawn for each sex, using the statement ` BY
SEX;` in the PROC GPLOT step. Most of the work consists of
drawing the confidence limits with the Annotate facility. Plots of
the predicted logits can be made in a similar way using the variables
` LOGIT` and ` SELOGIT`.

proc sort data=results; by sex treat age; data bars; set results; by sex treat; length text $8; xsys='2'; ysys='2'; if treat='Placebo' then color='BLACK'; else color='RED'; x = age; line=33; y = upper; function='MOVE '; output; text='-'; function='LABEL '; output; y = lower; function='DRAW '; output; text='-'; function='LABEL '; output; if last.treat then do; y = predict; x = age+1; position='6'; size=1.4; text = treat; function='LABEL'; output; end; if first.sex then do; ysys ='1'; y=90; xsys ='1'; x=10; size=1.4; text = sex; function='LABEL'; output; end;

goptions hby=0; proc gplot; plot predict * age = treat / vaxis=axis1 haxis=axis2 nolegend anno=bars; by sex; axis1 label=(h=1.3 a=90 f=duplex 'Prob. Improvement (67% CI)') value=(h=1.2) order=(0 to 1 by .2); axis2 label=(h=1.3 f=duplex) value=(h=1.2) order=(20 to 80 by 10) offset=(2,5); symbol1 v=+ h=1.4 i=join l=3 c=black; symbol2 v=$ h=1.4 i=join l=1 c=red;

One compromise is to plot the results on the logit scale, and add
a second scale showing corresponding probability values. In SAS,
this can be done with an Annotate data set. The example below is
designed for a plot with logit values ranging from -5 to +2. For
each of a set of simple probability values (` prob`), the
probability is labeled on an axis at the right of the figure
corresponding to the logit value at the left.

data pscale; xsys = '1'; * percent values for x; ysys = '2'; * data values for y; do prob = .01,.05,.1,.25,.5,.75,.9; logit = log( prob / (1-prob) ); * convert to logit; y = logit; x = 99 ; function='MOVE '; output; * tick marks; x = 100; function='DRAW '; output; text = put(prob,4.3); position='6'; * values; style='DUPLEX'; function='LABEL '; output; end; ysys='1'; x=100.1; y=99.5; * axis label; text='Prob'; output;In a PROC GPLOT step, the Annotate data set

proc gplot data=results; plot logit * x / annotate=pscale ... ;

Figure 26: Quantitative predictor: Linear and Logit regression on age. The curves show predicted probabilities of improvement and 95% confidence bands (ignoring sex and treatment). The points show the observations. Except in the extremes, the linear and logistic models give similar predicted values.

Figure 27: Logit regression on age and
treatment

Figure 28: Estimated effects of age,
treatment and sex. The effects are simpler (linear and additive) on
the logit scale, but more easily interpreted in terms of
probabilities. One solution is to plot on the logit scale, and
provide a separate (nonlinear) scale of probabilities.

In fact, since the predicted probabilities and logits are
calculated by the procedure and output to the data set`
RESULTS`, the results plotted depend purely on the `
MODEL` statement. The plotting steps remain the same, assuming
you want to make separate plots for males and females of the age by
treatment effects.

roman < logit ( p ) = - 4.5033 + 1.4878 % font mono _sex_ + 1.7598 % font mono _treat_ + 0.0487 % font mono age >With this method, each independent variable is manipulated over its range. The response can be graphed on the probability scale by transforming the logit:

data fitted; do _sex_ = 0 to 1; do _treat_ = 0 to 1; do age = 25 to 75 by 5; logit= -4.5033 + 1.4878*_sex_ + 1.7598*_treat_ + 0.0487*age; prob = exp(logit) / (1 + exp(logit)); output; end; end; end;Fox (1987) explains how this method may be used to construct adjusted effects plots for particular interactions, adjusting for other variables not represented in the plot.

The interaction effects were defined in the data step`
arthrit` as the dummy variables,` agesex`,`
agetrt`, and` sextrt`. The variable ` age2 =
age**2` can be used to test whether the relationship between age
and logit(better) is quadratic rather than linear.

The PROC LOGISTIC step below requests a forward-selection procedure. Setting START=3 requests that the model building begin with the first three variables (the main effects) listed in the model statement. SLENTRY=1 (significance level to enter) forces all variables to enter the model eventually.

proc logistic data=arthrit; format better outcome.; model better = _sex_ _treat_ age /* main effects */ agesex agetrt sextrt /* interactions */ age2 /* quadratic age */ / selection=forward start=3 slentry=1;The variables included in each model for the selection procedure are listed in a note at the beginning of each set of results:

+-------------------------------------------------------------------+ | | | Step 0. The following variables were entered: | | INTERCPT _SEX_ _TREAT_ AGE | | | +-------------------------------------------------------------------+Results for this step are identical to those of the main effects model given earlier. Near the end of this step, the residual

+-------------------------------------------------------------------+ | | | Residual Chi-Square = 4.0268 with 4 DF (p=0.4024) | | | +-------------------------------------------------------------------+Other tests printed show none of the interaction terms is significant individually.

Improvement Sex Treatment None Some Marked Total F Active 6 5 16 27 F Placebo 19 7 6 32 M Active 7 2 5 14 M Placebo 10 0 1 11One way to model these data is to consider two logits for the dichotomies between adjacent categories:

roman 'logit' % ( theta sub hi1 ) = log < pi sub hi1 > over < pi sub hi2 + pi sub hi3 > = roman < 'logit ( None vs. [Some or Marked] )' >

roman 'logit' % ( theta sub hi2 ) = log < pi sub hi1 + pi sub hi2 > over < pi sub hi3 > = roman < 'logit ( [None or Some] vs. Marked )' >Consider a logistic regression model for each logit:

roman logit ( pi sub hi1 ) = alpha sub 1 + x prime sub hi % beta sub 1 (16)

roman logit ( pi sub hi2 ) = alpha sub 2 + x prime sub hi % beta sub 2 (17)The proportional odds assumption is that

For the arthritis example, with additive effects for sex and treatment on the both log odds,

roman logit % ( pi sub hi1 ) = alpha sub 1 + beta sub 1 % x sub 1 + beta sub 2 % x sub 2 (18)

roman logit % ( pi sub hi2 ) = alpha sub 2 + beta sub 1 % x sub 1 + beta sub 2 % x sub 2 (19)where:

*x sub 1*and*x sub 2*are dummy variables representing sex and treatment.*alpha sub 1*is the log odds of no improvement (vs. some or marked) for males receiving the placebo.*alpha sub 2*is the log odds of no improvement or some improvement (vs. marked) for males receiving the placebo.*beta sub 1*is the increment to*both*log odds for being female. Therefore,*e sup < beta sub 1 >*gives the odds of improvement for females relative to males.*beta sub 2*is the increment to both log odds for being in the active treatment group.*e sup < beta sub 2 >*gives the odds of improvement for the active treatment group relative to placebo.

Figure 29: Proportional odds model. The model assumes that the regression functions for different response categories are parallel on the logit scale.

- The validity of the analysis depends on the correctness of the proportional odds assumption. A test of this assumption appears in the output from PROC LOGISTIC.
- The results from PROC LOGISTIC are cast in terms of predicted
probabilities and fitted logits for response
*less than*each of the cutpoints. To plot Pr{Improve}, we must reverse the sense of the probabilities and logits.

*-- Proportional Odds Model: Effects of treat, sex and age; proc logistic data=arthrit nosimple; model improve = _sex_ _treat_ age ; output out=results p=predict l=lower u=upper xbeta=logit stdxbeta=selogit / alpha=.33;The response profile displays the ordering of the outcome variable. Note that logits are formed from top to bottom, i.e., none vs. some or marked, none or some vs. marked. The output also shows the proportional odds assumption is reasonable.

+-------------------------------------------------------------------+ | Response Profile | | Ordered | | Value IMPROVE Count | | | | 1 0 42 | | 2 1 14 | | 3 2 28 | +-------------------------------------------------------------------+

+-------------------------------------------------------------------+ | Score Test for the Proportional Odds Assumption | | Chi-Square = 2.4917 with 3 DF (p=0.4768) | +-------------------------------------------------------------------+The parameter estimates relate to the odds of a poorer response (they are all negative):

+---------------------------------------------------------------------+ | | | Analysis of Maximum Likelihood Estimates | | | | Parameter Standard Wald Pr > Standardized | Variable Estimate Error Chi-Square Chi-Square Estimate | | | | INTERCP1 3.7837 1.1530 10.7683 0.0010 . | | INTERCP2 4.6827 1.1949 15.3569 0.0001 . | | _SEX_ -1.2517 0.5321 5.5343 0.0186 -0.317412 | | _TREAT_ -1.7453 0.4772 13.3770 0.0003 -0.483871 | | AGE -0.0382 0.0185 4.2358 0.0396 -0.268666 | | | +---------------------------------------------------------------------+The output data set

data results; set results; treatl = treat||put(_level_,1.0); if _level_=0 then better = (improve > 0); else better = (improve > 1); *-- Change direction of probabilities & logit; predict = 1 - predict; lower = 1 - lower; upper = 1 - upper; logit = -logit;The first few observations in the data set

+-------------------------------------------------------------------+ | | | ID TREAT SEX IMPROVE _LEVEL_ PREDICT LOWER UPPER LOGIT | | | | 57 Treated Male 1 0 0.267 0.417 0.157 -1.008 | | 57 Treated Male 1 1 0.129 0.229 0.069 -1.907 | | 9 Placebo Male 0 0 0.085 0.149 0.048 -2.372 | | 9 Placebo Male 0 1 0.037 0.069 0.019 -3.271 | | 46 Treated Male 0 0 0.283 0.429 0.171 -0.932 | | 46 Treated Male 0 1 0.138 0.238 0.076 -1.831 | | ... | | | +-------------------------------------------------------------------+As in the earlier example, an Annotate data set is used to add more descriptive labels and confidence intervals to the plots. (This adds somewhat more work, but I prefer the plots labeled this way, rather than with legends at the bottom.

proc sort data=results; by sex treatl age; data bars; set results; by sex treatl; length text $8; xsys='2'; ysys='2'; if treat='Placebo' then color='BLACK'; else color='RED'; x = age; line=33; *-- plot confidence limits ; y = upper; function='MOVE '; output; text='-'; function='LABEL '; output; y = lower; function='DRAW '; output; text='-'; function='LABEL '; output; if last.treatl then do; y = predict; x = age+1; position='C'; size=1.4; text = treat; function='LABEL'; output; position='F'; if _level_ = 0 then text='> None'; else text='> Some';; output; end; if first.sex then do; ysys ='1'; y=90; xsys ='1'; x=10; size=1.5; text = sex; function='LABEL'; output; end;

goptions hby=0; proc gplot; plot predict * age = treatl / vaxis=axis1 haxis=axis2 nolegend anno=bars ; by sex; axis1 label=(h=1.4 a=90 f=duplex 'Prob. Improvement (67% CI)') value=(h=1.2) order=(0 to 1 by .2); axis2 label=(h=1.4 f=duplex) value=(h=1.2) order=(20 to 80 by 10) offset=(2,5); symbol1 v=+ h=1.4 i=join l=3 c=black; symbol2 v=+ h=1.4 i=join l=3 c=black; symbol3 v=$ h=1.4 i=join l=1 c=red; symbol4 v=$ h=1.4 i=join l=1 c=red;

Figure 30: Predicted probabilities for the proportional odds model

The proportional odds model is attractive if the response is
ordinal, * and* the proportional odds assumption holds.
However, if the response is purely nominal (e.g., vote Tory, Liberal,
Reform, NDP), or if the proportional odds assumption is untenable,
one particularly simple strategy is to fit separate models to a set
of * m - 1 * dichotomies derived from the polytomous
response.(4)

- Each dichotomy can be fit using the familiar binary-response logistic model, and,
- the
*m - 1*models will be statistically independent (so that likelihood-ratio*G sup 2*statistics will be additive)*if*the dichotomies are chosen as**nested**

Nested dichotomies are successive binary partitions of the response categories into nested sets. For example, the response categories {1,2,3,4} could be divided first as {1,2} vs. {3,4}. Then these two dichotomies could be divided as {1} vs. {2}, and {3} vs. {4}. Alternatively, these response categories could be divided as {1} vs. {2,3,4}, then {2} vs {3,4}, and finally {3} vs. {4}. (For levels of a factor in an ANOVA design, nested dichotomies correspond to orthogonal contrasts.)

+--------++--------+ +---++-------------+ | 1 2 || 3 4 | | 1 || 2 3 4 | +--------++--------+ +---++-------------+ | | | v v v +----++----+ +----++----+ +----++-------+ | 1 || 2 | | 3 || 4 | | 2 || 3 4 | +----++----+ +----++----+ +----++-------+ | v +---++---+ | 3 || 4 | +---++---+

- Working vs. NotWorking
- Working fulltime vs working parttime.

------------------------

(4) An alternative strategy is to choose one response category (the last, say) as the "base category", and model the

------------------------

The data are read in as follows shown below. The 3-level variable

proc format; value labour /* labour-force participation */ 1 ='working full-time' 2 ='working part-time' 3 ='not working'; value kids /* presence of children in the household */ 0 ='Children absent' 1 ='Children present'; data wlfpart; input case labour husinc children region; working = labour < 3; if working then fulltime = (labour = 1); cards; 1 3 15 1 3 2 3 13 1 3 3 3 45 1 3 4 3 23 1 3 5 3 19 1 3 6 3 7 1 3 7 3 15 1 3 8 1 7 1 3 9 3 15 1 3 ... more data lines ...An initial analysis attempts to fit the proportional odds model, with the 3-level

proc logistic data=wlfpart nosimple;However, the proportional odds assumption is rejected by the score test:model labour = husinc children ;title2 'Proportional Odds Model for Fulltime/Parttime/NotWorking';

+-------------------------------------------------------------------+ | | | Score Test for the Proportional Odds Assumption | | | | Chi-Square = 18.5641 with 2 DF (p=0.0001) | | | +-------------------------------------------------------------------+Hence, we fit models for each of the

proc logistic data=wlfpart nosimple descending;Themodel working = husinc children ;output out=resultw p=predict xbeta=logit; title2 'Nested Dichotomies'; run; proc logistic data=wlfpart nosimple descending;model fulltime = husinc children ;output out=resultf p=predict xbeta=logit;

+-------------------------------------------------------------------+ | | | Response Profile | | Ordered | | Value WORKING Count | | | | 1 1 108 | | 2 0 155 | | | | Testing Global Null Hypothesis: BETA=0 | | Intercept | | Intercept and | | Criterion Only Covariates Chi-Square for Covariates | | | | AIC 358.151 325.733 . | | SC 361.723 336.449 . | | -2 LOG L 356.151 319.733 36.418 with 2 DF (p=0.0001)| | Score . . 35.713 with 2 DF (p=0.0001)| | | +-------------------------------------------------------------------+

+-------------------------------------------------------------------+ | | | Analysis of Maximum Likelihood Estimates | | | | Parameter Standard Wald Pr > Standardized Odds | Variable DF Estimate Error Chi-Square Chi-Square Estimate | Ratio | | | INTERCPT 1 1.3358 0.3838 12.1165 0.0005 .| . | HUSINC 1 -0.0423 0.0198 4.5751 0.0324 -0.168541| 0.959 | CHILDREN 1 -1.5756 0.2923 29.0651 0.0001 -0.398992| 0.207 | | +-------------------------------------------------------------------+The output for the fulltime vs. parttime dichotomy is shown below. Note that nonworking women are excluded in this analysis.

+-------------------------------------------------------------------+ | | | Response Profile | | Ordered | | Value FULLTIME Count | | | | 1 1 66 | | 2 0 42 | | | | WARNING: 155 observation(s) were deleted due to missing values for | the response or explanatory variables. | | | | Testing Global Null Hypothesis: BETA=0 | | Intercept | | Intercept and | | Criterion Only Covariates Chi-Square for Covariates | | | | AIC 146.342 110.495 . | | SC 149.024 118.541 . | | -2 LOG L 144.342 104.495 39.847 with 2 DF (p=0.0001) | Score . . 35.150 with 2 DF (p=0.0001) | | +-------------------------------------------------------------------+

+-------------------------------------------------------------------+ | | | Analysis of Maximum Likelihood Estimates | | | | Parameter Standard Wald Pr > Standardized Odds | Variable DF Estimate Error Chi-Square Chi-Square Estimate | Ratio | | | INTERCPT 1 3.4778 0.7671 20.5537 0.0001 .| . | HUSINC 1 -0.1073 0.0392 7.5063 0.0061 -0.424867| 0.898 | CHILDREN 1 -2.6515 0.5411 24.0135 0.0001 -0.734194| 0.071 | | +-------------------------------------------------------------------+Thus, the full 3-category response has been fitted by two models,

log left ( < Pr ( roman 'working' ) > over < Pr ( roman 'not working' ) > right ) = mark 1.336 - 0.042 % roman 'H$' - 1.576 % roman 'kids'

log left ( < Pr ( roman 'fulltime' ) > over < Pr ( roman 'parttime' ) > right ) = lineup 3.478 - 0.107 % roman 'H$' - 2.652 % roman 'kids'

Note that the second equation gives the predicted log odds for fulltime vs. parttime work

Moreover, we can add the likelihood ratio or Wald tests across
the two models, so the overall test of the hypothesis that neither
husband's income nor presence of children predicts working status
(the 3-level response) has a * G sup 2 = 36.42 + 39.85 =
66.27 * on 2+2=4 df (p < .0001). Similarly, the hypothesis
that husband's income does not predict working status has a Wald-test
* G sup 2 = 4.58 + 7.51 = 12.09 * on 2 df ( p < .001).

Comparison of the regression coefficients in the two sub-models (in relation to the size of their standard errors) indicates why the proportional odds model was not tenable. The proportional odds model requires that the coefficients for husband's income and children in the two models are equal. We can see that both variables have a greater effect on the odds of fulltime vs. parttime work than on the odds of working vs. not working.

As usual, these effects can be seen and interpreted more easily in a graph. The odds of working outside the home decrease as husband's income increase and when there are children present. However, among working women, the odds of fulltime vs. parttime work decrease at a faster rate with husband's income; women with children are less likely to work fulltime.

To construct this graph, first join the separate results datasets into one.

*-- Merge the results datasets to create one plot; data both; set resultw(in=inw) resultf(in=inf); if inw then do; if children=1 then event='Working, with Children '; else event='Working, no Children '; end; else do; if children=1 then event='Fulltime, with Children '; else event='Fulltime, no Children '; end;Then, we can plot the log odds (or predicted probability) against husband's income, using

proc gplot data=both; plot logit * husinc = event / anno=lbl nolegend frame vaxis=axis1; axis1 label=(a=90 'Log Odds') order=(-5 to 4); title2 'Working vs Not Working and Fulltime vs. Parttime'; symbol1 v=dot h=1.5 i=join l=3 c=red; symbol2 v=dot h=1.5 i=join l=1 c=black; symbol3 v=circle h=1.5 i=join l=3 c=red; symbol4 v=circle h=1.5 i=join l=1 c=black;

Figure 31: Log odds of working vs. not working, and of fulltime work vs. parttime work

- leverage
- Measures the potential impact of an individual case
on the results, which is directly proportional to how far an
individual case is from the centroid in the space of the
predictors. Leverage is computed as the diagonal elements,
*h sub ii*, of the "Hat" matrix,*bold H*,bold H = bold X star ( bold X star prime bold X star ) sup -1 bold X star prime

where*bold X star = bold V sup 1/2 bold X*, and*bold V = diag { P Hat ( 1 - P Hat ) }*. As in OLS, leverage values are between 0 and 1, and a leverage value,*h sub ii > 2 k / n*is considered "large";*k*= number of predictors,*n*= number of cases. - residuals
- Pearson and deviance residuals are useful in identifying observations that are not explained well by the model. They are the (signed) square roots of the contribution that each case makes to the overall Pearson and deviance goodness-of-fit statistics.
- influence measures
- Measure the effect that deleting an
observation has on the regression parameters or the
goodness-of-fit statistics. (The terminology for these
measures is not well-established, so I use the terms from the
SAS/STAT User's Guide.)
- C
- CBAR
- Analogs of the Cook's D statistic in OLS,
these are two standardized measures of the
approximate change in all regression coefficients
when the
*i*-th case is deleted. - DIFCHISQ
- DIFDEV
- Approximate measures of the amount
*Delta chi²*by which the Pearson*chi²*(DIFCHISQ) or the likelihood-ratio deviance*chi²*(DIFDEV) would decrease if the*i*-th case were deleted. Values > 4 for either indicate "significant" change (since these are 1 df*chi²*or squared Normal values). - DFBETAij
- Measure the change in the logistic regression
coefficient
*b sub j*for predictor*x sub j*when observation*i*is deleted.

proc logistic data=arthrit ; model better = _sex_ _treat_ _age_ / influence;This produces a great deal of output, including the following:

The LOGISTIC Procedure Regression Diagnostics Deviance Residual Hat Matrix Diagonal Case (1 unit = 0.26) (1 unit = 0.01) Number Value -8 -4 0 2 4 6 8 Value 0 2 4 6 8 12 16 1 1.812 | * | | 0.089 | * | 2 0.360 | |* | 0.031 | * | 3 0.685 | | * | 0.087 | * | 4 0.425 | | * | 0.034 | * | 5 0.700 | | * | 0.086 | * | 6 0.488 | | * | 0.038 | * | 7 1.703 | * | | 0.084 | * | 8 0.499 | | * | 0.039 | * | 9 1.396 | * | | 0.066 | * | 10 0.511 | | * | 0.040 | * | 11 1.142 | * | | 0.064 | * | 12 0.523 | | * | 0.041 | * | 13 1.234 | | * | 0.065 | * | 14 0.599 | | * | 0.051 | * | 15 1.121 | * | | 0.065 | * | 16 0.599 | | * | 0.051 | * | 17 1.319 | | * | 0.069 | * | 18 0.640 | | * | 0.058 | * | 19 1.319 | | * | 0.069 | * | 20 0.640 | | * | 0.058 | * | 21 1.340 | | * | 0.070 | * | 22 1.814 | * | | 0.061 | * | 23 1.022 | * | | 0.070 | * | 24 0.529 | | * | 0.060 | * | 25 1.449 | | * | 0.078 | * | 26 0.619 | | * | 0.053 | * | 27 0.909 | * | | 0.080 | * | 28 0.619 | | * | 0.053 | * | 29 1.120 | | * | 0.141 | *| 30 1.846 | * | | 0.052 | * | 31 1.309 | | * | 0.092 | * | 32 0.647 | | * | 0.050 | * | 33 0.955 | * | | 0.070 | * | 34 1.803 | * | | 0.049 | * |

Such plots are produced by the INFLOGIS program. For example, these lines produce plots of DIFCHISQ against both the leverage (HAT) and predicted probability (PRED) using bubbles whose area is proportional to C:

title 'Arthritis treatment data'; title2 'Bubble size: Influence on Coefficients (C)'; goptions htext=1.6 ftext=duplex; %include inflogis; %include arthrit; %inflogis(data=arthrit, y=better, /* response */ x=_sex_ _treat_ age, /* predictors */ id=id, gy=DIFCHISQ, /* graph ordinate */ gx=PRED HAT, /* graph abscissas */ lcolor=RED, bsize=14 ); run;The printed output from the INFLOGIS program includes a table identifying any observation of high leverage or influence. For example, case 29 is of high leverage, because she is unusual in terms of the predictors: a young woman given treatment; however, she is not influential in the fitted model. Case 77, is not of high leverage, but is poorly predicted by the model and has large contributions to

CASE BETTER _SEX_ _TREAT_ AGE HAT DIFCHISQ DIFDEV C 1 1 0 1 27 .09 4.5781 3.6953 0.4510 22 1 0 0 63 .06 4.4603 3.5649 0.2898 29 0 1 1 23 .14 1.0183 1.4005 0.1679 30 1 1 0 31 .05 4.7485 3.6573 0.2611 34 1 1 0 33 .05 4.2955 3.4644 0.2236 55 0 1 1 58 .03 4.9697 3.6759 0.1602 77 0 1 1 69 .03 8.4977 4.7122 0.2758

Figure 32: Influence plot for arthritis data. Cases with DIFCHISQ > 4 or leverage

Figure 33: Changes in chi-square vs. predicted probability. The plot shows that most of the influential observations are those with very high or low predicted probabilities. The systematic pattern shown is inherent in the discrete nature of logistic regression.

[Previous] [Next] [Up] [Top]