Michael Friendly

However, more powerful analyses are often available:

- When either the row or column variables are
*ordinal*, tests which take the order into account are more specific and often have greater statistical power. - When additional classification variables exist, it may be
important to control for these variables, or to determine if
the association between the row and column variables is the
same (
*homogeneous*) across the levels of the control variables.

Consider the data below, which compares a treatment for rheumatoid arthritis to a placebo (Koch & Edwards, 1998). The outcome reflects whether individuals showed no improvement, some improvement, or marked improvement.

| Outcome ---------+---------+--------------------------+ Treatment| Sex |None |Some |Marked | Total ---------+---------+--------+--------+--------+ Active | Female | 6 | 5 | 16 | 27 | Male | 7 | 2 | 5 | 14 ---------+---------+--------+--------+--------+ Placebo | Female | 19 | 7 | 6 | 32 | Male | 10 | 0 | 1 | 11 ---------+---------+--------+--------+--------+ Total 42 14 28 84Here, the outcome variable is an ordinal one, and it is probably important to determine if the relation between treatment and outcome is the same for males and females.

title 'Arthritis Treatment: PROC FREQ Analysis'; data arth; input sex $ treat $ @; do improve = 'None ', 'Some', 'Marked'; input count @; output; end; cards; Female Active 6 5 16 Female Placebo 19 7 6 Male Active 7 2 5 Male Placebo 10 0 1 ; *-- Ignoring sex; proc freq order=data; weight count; tables treat * improve / cmh chisq nocol nopercent; run;

- TREAT and IMPROVE are both character variables, which PROC FREQ
orders alphabetically (i.e., 'Marked', 'None', 'Some') by
default. Because I want to treat the IMPROVE variable as
ordinal, I used
`order=data`on the`proc freq`statement to have the levels of IMPROVE ordered by their order of appearance in the dataset. - The
`chisq`option gives the usual*chi²*tests (Pearson, Fisher's, etc.). The`cmh`option requests the Cochran-Mantel-Haenszel tests for ordinal variables.

TABLE OF TREAT BY IMPROVE TREAT IMPROVE Frequency| Row Pct |None |Some |Marked | Total ---------+--------+--------+--------+ Active | 13 | 7 | 21 | 41 | 31.71 | 17.07 | 51.22 | ---------+--------+--------+--------+ Placebo | 29 | 7 | 7 | 43 | 67.44 | 16.28 | 16.28 | ---------+--------+--------+--------+ Total 42 14 28 84The results for the

+-------------------------------------------------------------------+ | | | STATISTICS FOR TABLE OF TREAT BY IMPROVE | | | | Statistic DF Value Prob | | ------------------------------------------------------ | | Chi-Square 2 13.055 0.001 | | Likelihood Ratio Chi-Square 2 13.530 0.001 | | Mantel-Haenszel Chi-Square 1 12.859 0.000 | | Phi Coefficient 0.394 | | Contingency Coefficient 0.367 | | Cramer's V 0.394 | | | +-------------------------------------------------------------------+

For the arthritis data, these tests (` cmh` option) give
the following output.

+-------------------------------------------------------------------+ | | | SUMMARY STATISTICS FOR TREAT BY IMPROVE | | | | Cochran-Mantel-Haenszel Statistics (Based on Table Scores) | | | | Statistic Alternative Hypothesis DF Value Prob | | -------------------------------------------------------------- | | 1 Nonzero Correlation 1 12.859 0.000 | | 2 Row Mean Scores Differ 1 12.859 0.000 | | 3 General Association 2 12.900 0.002 | | | +-------------------------------------------------------------------+The three types of tests differ in the types of departure from independence they are sensitive to:

**General Association***some*association between the row and column variables. The CMH test statistic is similar to the (Pearson) Chi-Square and Likelihood Ratio Chi-Square in the Statistics table; all have*(r - 1) (c - 1)*df.**Mean score differences***r - 1*df. This is analogous to the Kruskal-Wallis non-parametric test (ANOVA based on rank scores).**Linear association****both**row and column variables are ordinal, we could assign scores to both variables and compute the correlation. The Mantel-Haenzsel*chi²*is equal to*( N - 1) r sup 2*, where N is the total sample size. The test is most sensitive to a pattern where the row mean score changes linearly over the rows.

- Different kinds of scores can be assigned using the
`scores=`options on the`tables`statement, but only the relative spacing of the scores is important. - When only one variable is ordinal, make it the
**last**one on the`tables`statement, because PROC FREQ only computes means across the column variable. - When there are only r=2 rows (as here), the correlation and row means tests are equivalent.

| b1 | b2 | b3 | b4 | b5 | Total Mean --------+-------+-------+-------+-------+-------+ a1 | 0 | 15 | 25 | 15 | 0 | 55 3.0 a2 | 5 | 20 | 5 | 20 | 5 | 55 3.0 a3 | 20 | 5 | 5 | 5 | 20 | 55 3.0 --------+-------+-------+-------+-------+-------+ Total 25 40 35 40 25 165This is reflected in the PROC FREQ output:

+-------------------------------------------------------------------+ | | | Cochran-Mantel-Haenszel Statistics (Based on Table Scores) | | | | Statistic Alternative Hypothesis DF Value Prob | | -------------------------------------------------------------- | | 1 Nonzero Correlation 1 0.000 1.000 | | 2 Row Mean Scores Differ 2 0.000 1.000 | | 3 General Association 8 91.797 0.000 | | | +-------------------------------------------------------------------+

| b1 | b2 | b3 | b4 | b5 | Total Mean --------+-------+-------+-------+-------+-------+ a1 | 2 | 5 | 8 | 8 | 8 | 31 3.48 a2 | 2 | 8 | 8 | 8 | 5 | 31 3.19 a3 | 5 | 8 | 8 | 8 | 2 | 31 2.81 a4 | 8 | 8 | 8 | 5 | 2 | 31 2.52 --------+-------+-------+-------+-------+-------+ Total 17 29 32 29 17 124Note that the

+-------------------------------------------------------------------+ | | | Cochran-Mantel-Haenszel Statistics (Based on Table Scores) | | | | Statistic Alternative Hypothesis DF Value Prob | | -------------------------------------------------------------- | | 1 Nonzero Correlation 1 10.639 0.001 | | 2 Row Mean Scores Differ 3 10.676 0.014 | | 3 General Association 12 13.400 0.341 | | | +-------------------------------------------------------------------+The differences in sensitivity and power among these tests is analogous to the difference between general ANOVA tests and tests for linear trend in experimental designs with quantitative factors.

Figure 5: General association (sieve diagram) Figure 6: Linear association (sieve diagram)

A stratified analysis

- controls for the effects of one or more background variables. This is similar to the use of a blocking variable in an ANOVA design.
- is obtained by including more than two variables in the
`tables`statement. List the stratification variables**first**. To examine the association between TREAT and IMPROVE, controlling for both SEX and AGE (if available):tables age * sex * treat * improve;

*-- Stratified analysis, controlling for sex; proc freq order=data; weight count; tables sex * treat * improve / cmh chisq nocol nopercent; run;PROC FREQ gives a separate table for each level of the stratification variables, plus overall (partial) tests controlling for the stratification variables.

TABLE 1 OF TREAT BY IMPROVE CONTROLLING FOR SEX=Female TREAT IMPROVE Frequency| Row Pct |None |Some |Marked | Total ---------+--------+--------+--------+ Active | 6 | 5 | 16 | 27 | 22.22 | 18.52 | 59.26 | ---------+--------+--------+--------+ Placebo | 19 | 7 | 6 | 32 | 59.38 | 21.88 | 18.75 | ---------+--------+--------+--------+ Total 25 12 22 59 STATISTICS FOR TABLE 1 OF TREAT BY IMPROVE CONTROLLING FOR SEX=Female Statistic DF Value Prob ------------------------------------------------------ Chi-Square 2 11.296 0.004 Likelihood Ratio Chi-Square 2 11.731 0.003 Mantel-Haenszel Chi-Square 1 10.935 0.001 Phi Coefficient 0.438 Contingency Coefficient 0.401 Cramer's V 0.438Note that the association between treatment and outcome is quite strong for females. In contrast, the results for males (below) shows a non-significant association, even by the Mantel-Haenzsel test; but note that there are too few males for the general association

TABLE 2 OF TREAT BY IMPROVE CONTROLLING FOR SEX=Male TREAT IMPROVE Frequency| Row Pct |None |Some |Marked | Total ---------+--------+--------+--------+ Active | 7 | 2 | 5 | 14 | 50.00 | 14.29 | 35.71 | ---------+--------+--------+--------+ Placebo | 10 | 0 | 1 | 11 | 90.91 | 0.00 | 9.09 | ---------+--------+--------+--------+ Total 17 2 6 25 STATISTICS FOR TABLE 2 OF TREAT BY IMPROVE CONTROLLING FOR SEX=Male Statistic DF Value Prob ------------------------------------------------------ Chi-Square 2 4.907 0.086 Likelihood Ratio Chi-Square 2 5.855 0.054 Mantel-Haenszel Chi-Square 1 3.713 0.054 Phi Coefficient 0.443 Contingency Coefficient 0.405 Cramer's V 0.443 WARNING: 67% of the cells have expected counts less than 5. Chi-Square may not be a valid test.The individual tables are followed by the (overall) partial tests of association controlling for sex. Unlike the tests for each strata, these tests

+--------------------------------------------------------------------+ | | | SUMMARY STATISTICS FOR TREAT BY IMPROVE | | CONTROLLING FOR SEX | | | | Cochran-Mantel-Haenszel Statistics (Based on Table Scores) | | | | Statistic Alternative Hypothesis DF Value Prob | | -------------------------------------------------------------- | | 1 Nonzero Correlation 1 14.632 0.000 | | 2 Row Mean Scores Differ 1 14.632 0.000 | | 3 General Association 2 14.632 0.001 | | | +--------------------------------------------------------------------+

For the arthritis data, homogeneity means that there is no three-way sex * treatment * outcome association. This hypothesis can be stated as is the loglinear model,

[SexTreat] [SexOutcome] [TreatOutcome],which allows associations between sex and treatment (e.g., more males get the Active treatment) and between sex and outcome (e.g. females are more likely to show marked improvement). In the PROC CATMOD step below, the

title2 'Test homogeneity of treat*improve association'; data arth; set arth; if count=0 then count=1E-20; proc catmod order=data; weight count; model sex * treat * improve = _response_ / ml noiter noresponse nodesign nogls ; loglin sex|treat|improve@2 / title='No 3-way association'; run; loglin sex treat|improve / title='No Sex Associations';(Frequencies of zero can be regarded as either "structural zeros"--a cell which could not occur, or as "sampling zeros"--a cell which simply did not occur. PROC CATMOD treats zero frequencies as "structural zeros", which means that cells with

In the output from PROC CATMOD, the likelihood ratio *
chi² * (the badness-of-fit for the No 3-Way model) is the
test for homogeneity across sex. This is clearly non-significant, so
the treatment-outcome association can be considered to be the same
for men and women.

+-------------------------------------------------------------------+ | | | Test homogeneity of treat*improve association | | No 3-way association | | MAXIMUM-LIKELIHOOD ANALYSIS-OF-VARIANCE TABLE | | | | Source DF Chi-Square Prob | | -------------------------------------------------- | | SEX 1 14.13 0.0002 | | TREAT 1 1.32 0.2512 | | SEX*TREAT 1 2.93 0.0871 | | IMPROVE 2 13.61 0.0011 | | SEX*IMPROVE 2 6.51 0.0386 | | TREAT*IMPROVE 2 13.36 0.0013 | | | | LIKELIHOOD RATIO 2 1.70 0.4267 | | | +-------------------------------------------------------------------+Note that the associations of sex with treatment and sex with outcome are both small and of borderline significance, which suggests a stronger form of homogeneity, the log-linear model [Sex] [TreatOutcome] which says the only association is that between treatment and outcome. This model is tested by the second

+-------------------------------------------------------------------+ | | | No Sex Associations | | MAXIMUM-LIKELIHOOD ANALYSIS-OF-VARIANCE TABLE | | | | Source DF Chi-Square Prob | | -------------------------------------------------- | | SEX 1 12.95 0.0003 | | TREAT 1 0.15 0.6991 | | IMPROVE 2 10.99 0.0041 | | TREAT*IMPROVE 2 12.00 0.0025 | | | | LIKELIHOOD RATIO 5 9.81 0.0809 | | | +-------------------------------------------------------------------+

[Previous] [Next] [Up] [Top]