Michael Friendly

For a two-way table the scores for the row categories, namely *
x sub im *, and column categories, * y sub jm *, on dimension
* m = 1, ... , M * are derived from a singular value
decomposition of residuals from independence, expressed as * d sub
ij / sqrt n *, to account for the largest proportion of the *
chi² * in a small number of dimensions. This decomposition
may be expressed as

(10)where

n % sum from m to d { % lambda sub m sup 2 } / chi²

Thus, correspondence analysis is designed to show how the data
deviate from expectation when the row and column variables are
independent, as in the association plot and mosaic display. However,
the association plot and mosaic display depict every * cell* in
the table, and for large tables it may be difficult to see patterns.
Correspondence analysis shows only row and column * categories*
in the two (or three) dimensions which account for the greatest
proportion of deviation from independence.

- a two-way contingency table
- raw category responses on two or more classification variables

data colors; input BLACK BROWN RED BLOND EYE $; cards; 68 119 26 7 Brown 20 84 17 94 Blue 15 54 14 10 Hazel 5 29 14 16 Green ; proc corresp data=colors out=coord short; var black brown red blond; id eye; proc print data=coord;

The printed output from the CORRESP procedure is shown below.
The section labeled "Inertia, ... " indicates that over 98%
of the * chi² * for association is accounted for by two
dimensions, with most of that attributed to the first dimension.

+--------------------------------------------------------------------+ | | | The Correspondence Analysis Procedure | | | | | | Inertia and Chi-Square Decomposition | | | | Singular Principal Chi- | | Values Inertias Squares Percents 18 36 54 72 90 | | ----+----+----+----+----+---| | 0.45692 0.20877 123.593 89.37% ************************* | | 0.14909 0.02223 13.158 9.51% *** | | 0.05097 0.00260 1.538 1.11% | | ------- ------- | | 0.23360 138.29 (Degrees of Freedom = 9) | | | | | | Row Coordinates | | | | Dim1 Dim2 | | | | Brown -.492158 -.088322 | | Blue 0.547414 -.082954 | | Hazel -.212597 0.167391 | | Green 0.161753 0.339040 | | | | | | Column Coordinates | | | | Dim1 Dim2 | | | | BLACK -.504562 -.214820 | | BROWN -.148253 0.032666 | | RED -.129523 0.319642 | | BLOND 0.835348 -.069579 | | | +--------------------------------------------------------------------+The singular values,

A plot of the row and column points can be constructed from the OUT= data set COORD requested in the PROC CORRESP step. The variables of interest in this example are shown in below. Note that row and column points are distinguished by the variable _TYPE_.

+-------------------------------------------------------------------+ | | | OBS _TYPE_ EYE DIM1 DIM2 | | | | 1 INERTIA . . | | 2 OBS Brown -0.49216 -0.08832 | | 3 OBS Blue 0.54741 -0.08295 | | 4 OBS Hazel -0.21260 0.16739 | | 5 OBS Green 0.16175 0.33904 | | 6 VAR BLACK -0.50456 -0.21482 | | 7 VAR BROWN -0.14825 0.03267 | | 8 VAR RED -0.12952 0.31964 | | 9 VAR BLOND 0.83535 -0.06958 | | | +-------------------------------------------------------------------+The interpretation of the correspondence analysis results is facilitated by a

proc plot vtoh=2; plot dim2 * dim1 = '*' $ eye / box haxis=by .1 vaxis=by .1; run;

Plot of DIM2*DIM1$EYE. Symbol used is '*'. -+----+----+----+----+----+----+----+----+----+----+----+----+----+----+- DIM2 | | | | 0.4 + + | * Green | 0.3 + * RED + | | 0.2 + + | * Hazel | 0.1 + + | * BROWN | 0.0 + + | BLOND * | -0.1 +* Brown * Blue + | | -0.2 +* BLACK + | | -0.3 + + | | -+----+----+----+----+----+----+----+----+----+----+----+----+----+----+- -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 DIM1

A labeled high-resolution display of the correspondence analysis solution ( Figure 21) is constructed with PROC GPLOT, using a DATA step to produce an Annotate data set LABELS from the COORD data set. In the PROC GPLOT step, axes are equated with the AXIS statements: AXIS1 specifies a length and range which are both twice that in the AXIS2 statement, so that the ratio of data units to plot units is the same in both dimensions.

data label; set coord; xsys='2'; ysys='2'; x = dim1; y = dim2; text = eye; size = 1.3; function='LABEL'; if _type_='VAR' then color='RED '; else color='BLUE'; proc gplot data=coord; plot dim2 * dim1 / anno=label frame href=0 vref=0 lvref=3 lhref=3 vaxis=axis2 haxis=axis1 vminor=1 hminor=1; axis1 length=6 in order=(-1. to 1. by .5) label=(h=1.5 'Dimension 1'); axis2 length=3 in order=(-.5 to .5 by .5) label=(h=1.5 a=90 r=0 'Dimension 2'); symbol v=none; run;

Figure 21: Correspondence analysis solution for Hair color, Eye color data

In particular, for the three-way table that is reshaped as a
table of size * ( I x J ) x K *, the correspondence analysis
solution analyzes residuals from the log-linear model [AB] [C]. That
is, for such a table, the * I x J * rows represent the joint
combinations of variables A and B. The expected frequencies under
independence for this table are

e sub [ij]k = < f sub [ij]+ % f sub [+]k > over n = < f sub ij+ % f sub ++k > over nwhich are the ML estimates of expected frequencies for the log-linear model [AB] [C]. The

+-------------------------------------------------------------------+ | | | Sex Age POISON GAS HANG DROWN GUN JUMP | | | | M 10-20 1160 335 1524 67 512 189 | | M 25-35 2823 883 2751 213 852 366 | | M 40-50 2465 625 3936 247 875 244 | | M 55-65 1531 201 3581 207 477 273 | | M 70-90 938 45 2948 212 229 268 | | | | F 10-20 921 40 212 30 25 131 | | F 25-35 1672 113 575 139 64 276 | | F 40-50 2224 91 1481 354 52 327 | | F 55-65 2283 45 2014 679 29 388 | | F 70-90 1548 29 1355 501 3 383 | | | +-------------------------------------------------------------------+

The table below shows the results of all possible hierarchical
log-linear models for the suicide data. It is apparent that none of
these models has an acceptable fit to the data. Given the enormous
sample size (* n = 48,177 *), even relatively small departures
from expected frequencies under any model would appear significant,
however.

Model df L.R.Gchi² [M] [A] [S] 49 10119.60 9908.24 [M] [AS] 45 8632.0 8371.3 [A] [MS] 44 4719.0 4387.7 [S] [MA] 29 7029.2 6485.5 [MS] [AS] 40 3231.5 3030.5 [MA] [AS] 25 5541.6 5135.0 [MA] [MS] 24 1628.6 1592.4 [MA] [MS] [AS] 20 242.0 237.0

Correspondence analysis applied to the [AS] by [M] table helps to
show the nature of the association between method of suicide and the
joint age-sex combinations and decomposes the * chi² = 8371
* for the log-linear model [AS] [M]. To carry out the analysis
with the data as shown above, the variables ` age` and`
sex` are combined into a single variable` sexage`.

proc corresp data=suicide; var poison gas hang drown gun jump; id sexage;The results show that over 93% of the association can be represented well in two dimensions.

+-------------------------------------------------------------------+ | | | Inertia and Chi-Square Decomposition | | | | Singular Principal Chi- | | Values Inertias Squares Percents 12 24 36 48 60 | | ----+----+----+----+----+--- | | 0.32138 0.10328 5056.91 60.41% ************************* | | 0.23736 0.05634 2758.41 32.95% ************** | | 0.09378 0.00879 430.55 5.14% ** | | 0.04171 0.00174 85.17 1.02% | | 0.02867 0.00082 40.24 0.48% | | ------- ------- | | 0.17098 8371.28 (Degrees of Freedom = 45) | | | +-------------------------------------------------------------------+The plot of the scores for the rows (sex-age combinations) and columns (methods) shows residuals from the log-linear model [AS] [M]. Thus, it shows the two-way associations of sex

Dimension 1 in the plot separates males and females. This dimension indicates a strong difference between suicide profiles of males and females. The second dimension is mostly ordered by age with younger groups at the top and older groups at the bottom. Note also that the positions of the age groups are approximately parallel for the two sexes. Such a pattern indicates that sex and age do not interact in this analysis. The relation between the age - sex groups and methods of suicide can be interpreted in terms of similar distance and direction from the origin, which represents the marginal row and column profiles. Young males are more likely to commit suicide by gas or a gun, older males by hanging, while young females are more likely to ingest some toxic agent and older females by jumping or drowning.

Figure 22: Two-dimensional correspondence
analysis solution for the [SA] [M] multiple table

Figure 23: Mosaic display for sex and
age. The frequency of suicide shows opposite trends with age for
males and females.

Figure 24: Mosaic display showing
deviations from model [SA] [M]. The methods have been reordered
according to their positions on Dimension 1 of the correspondence
analysis solution for the [SA] [M] table.

[Previous] [Next] [Up] [Top]