[Previous] [Next] [Up] [Top] Graphical Methods for Categorical Data
Michael Friendly

# Framework for Categorical Data Analysis

Methods of analysis of categorical data fall into two categories:(1)
------------------------
(1) After Koch & Stokes (1991).
------------------------
• Non-parametric, randomization-based methods
• make minimal assumptions
• useful for hypothesis-testing
• SAS: PROC FREQ
• Pearson Chi-square
• Fisher's exact test (for small expected frequencies)
• Mantel-Haenszel tests (ordered categories: test for linear association)
• Model-based methods
• Must assume random sample (possibly stratified)
• Useful for estimation purposes
• Greater flexibility; fitting specialized models (e.g., symmetry)
• More suitable for multi-way tables
• SAS: PROC LOGISTIC, PROC CATMOD, PROC GENMOD , PROC INSIGHT (Fit YX)
• estimate standard errors, covariances for model parameters
• confidence intervals for parameters, predicted Pr{response}

## Graphical methods for categorical data

Getting information from a table is like extracting sunlight from a cucumber. Fahrquar & Fahrquar (1891).
You can see a lot, just by looking. Yogi Berra.
Graphical methods for quantitative data are well-developed. From the basic display of data in a scatterplot, to diagnostic methods for assessing assumptions and finding transformations, to the final presentation of results, graphical techniques are commonplace adjuncts to most methods of statistical analysis. Graphical methods for categorical data are still in infancy. There are not many methods, and they are not widely used. Wondering why this is provokes several thoughts:
• Exploratory methods
Many of the graphical methods described here make minimal assumptions about the data, like the non-parametric statistical methods. Their goal is to help the viewer see the data, detect patterns, and suggest hypotheses.
• Graphic metaphor?
The basic metaphor for displaying quantitative data is magnitude ~ position along an axis . Categorical data consist of counts of observations in discrete categories. Some of the methods described here (e.g., sieve diagram, mosaic display) suggest the metaphor
count ~ area
• Generalizations?
The scatterplot is a basic tool for viewing raw (quantitative) data. It generalizes readily to three or more variables in the form of the scatterplot matrix -- a matrix of pairwise scatterplots. The mosaic display is a simple graphic method for looking at cross-classified data which generalizes to more than two-way tables. Are there others?
• Analogies?
Model-based methods for analyzing categorical data, such as logistic regression and log-linear models, are discrete analogs of methods of regression and analysis of variance for quantitative data. We can adapt some of the familiar graphical methods to categorical data.
• Presentation plots for model-based methods
Results of model-based analysis are almost invariably presented in tables of estimated frequencies, parameter estimates, log-linear model effects, and so forth. Effect displays of estimated probabilities of response or log odds provide a useful alternative.
• Practical power = Statistical power x Probability of Use
Statistical and graphical methods are of practical value to the extent that they are available and easy to use. Statistical methods for categorical data analysis have (nearly) reached that point. Graphical methods still have a long way to go. One aim for this workshop is to show what can now be done, with some examples of how to do it.

[Previous] [Next] [Up] [Top]
© 1995 Michael Friendly
Email: <friendly@yorku.ca>