biplot Construct a biplot of observations and variables biplot

Visualizing Categorical Data: biplot

$Version: 2.0 (26 Sep 2006 )
Michael Friendly
York University

The biplot macro ( [download] get biplot.sas)

Construct a biplot of observations and variables

The BIPLOT macro produces generalized biplot displays for multivariate data, and for two-way and multi-way tables of either quantitative or frequency data. It also produces labeled plots of the row and column points in 2 dimensions, with a variety of graphic options, and the facility to equate the axes automatically.

Input data

The macro takes input in one of two forms:

(a) A data set in table form, where the columns are separate variables and the rows are separate observations (identified by a row ID variable). In this arrangment, use the VAR= argument to specify this list of variables and the ID= variable to specify an additional variable whose values are labels for the rows.

Assume a dataset of reaction times to 4 topics in 3 experimental tasks, in a SAS dataset like this:

    TASK   TOPIC1   TOPIC2   TOPIC3   TOPIC4
    Easy     2.43     3.12     3.68     4.04
    Medium   3.41     3.91     4.07     5.10
    Hard     4.21     4.65     5.87     5.69

For this arrangment, the macro would be invoked as follows:

  %biplot(var=topic1-topic4, id=task);

(b) A contingency table in frequency form (e.g., the output from PROC FREQ), or multi-way data in the univariate format used as input to PROC GLM. In this case, there will be two or more factor (class) variables, and one response variable, with one observation per cell. For this form, you must use the VAR= argument to specify the two (or more) factor (class) variables, and specify the name of response variable as the RESPONSE= parameter. Do not specify an ID= variable for this form.

For contingency table data, the response will be the cell frequency, and you will usually use the POWER=0 parameter to perform an analysis of the log frequency.

The same data in this format would have 12 observations, and look like:

     TASK  TOPIC    RT
     Easy    1     2.43
     Easy    2     3.12
     Easy    3     3.68
     ...
     Hard    4     5.69

For this arrangment, the macro would be invoked as follows:

  %biplot(var=topic task, response=RT);

In this arrangement, the order of the VAR= variables does not matter. The columns of the two-way table are determined by the variable which varies most rapidly in the input dataset (topic, in the example).

Usage

The BIPLOT macro is defined with keyword parameters. The VAR= parameter must be specified, together with either one ID= variable or one RESPONSE= variable.

The arguments may be listed within parentheses in any order, separated by commas. For example:

  %biplot(data=soccer, var=_num_, id=home);

The plot may be re-drawn or customized using the output OUT= data set of coordinates and the ANNO= Annotate data set.

The graphical representation of biplots requires that the axes in the plot are equated, so that equal distances on the ordinate and abscissa represent equal data units (to perserve distances and angles in the plot). A '+', whose vertical and horizontal lengths should be equal, is drawn at the origin to indicate whether this has been achieved.

If you do not specifiy the HAXIS= and YAXIS= parameters, the EQUATE macro is called to generate the AXIS statements to equate the axes. In this case the INC=, XEXTRA=, and YEXTRA=, parameters may be used to control the details of the generated AXIS statements.

By default, the macro produces and plots a two-dimensional solution.

Parameters

DATA=
Specifies the name of the input data set to be analyzed. [Default: DATA=_LAST_]
VAR=
Specifies the names of the column variables when the data are in table form, or the names of the factor variables when the data are in frequency form or GLM form. [Default: VAR=_NUM_]
ID=
Observation ID variable when the data are in table form.
RESPONSE=
Name of the response variable (for GLM form)
DIM=
Specifies the number of dimensions of the CA/MCA solution. Only two dimensions are plotted by the PPLOT and GPLOT options, however. [Default: DIM=2]
FACTYPE=
Biplot factor type: GH, SYM, JK or COV [Default: FACTYPE=SYM]
VARDEF=
Variance def for FACTYPE=COV: DF | N [Default: VARDEF=DF]
SCALE=
Scale factor for variable vectors [Default: SCALE=1]
POWER=
Power transform of response [Default: POWER=1]
OUT=
Specifies the name of the output data set of coordinates. [Default: OUT=BIPLOT]
ANNO=
Specifies the name of the annotate data set of labels produced by the macro. [Default: ANNO=BIANNO]
STD=
How to standardize columns: NONE | MEAN | STD [Default: STD=MEAN]
COLORS=
Colors for OBS and VARS [Default: COLORS=BLUE RED]
SYMBOLS=
Symbols for OBS and VARS [Default: SYMBOLS=NONE NONE]
INTERP=
Markers/interpolation for OBS and VARS. [Default: INTERP=NONE VEC]
LINES=
Lines for OBS and VARS interpolation [Default: LINES=33 20]
PPLOT=
Produce a printer plot? [Default: PPLOT=NO]
VTOH=
The vertical to horizontal aspect ratio (height of one character divided by the width of one character) of the printer device, used to equate axes for a printer plot, when PPLOT=YES. [Default: VTOH=2]
GPLOT=
Produce a graphics plot? [Default: GPLOT=YES]
PLOTREQ=
The dimensions to be plotted [Default: PLOTREQ=DIM2*DIM1]
HAXIS=
AXIS statement for horizontal axis. If both HAXIS= and VAXIS= are omitted, the program calls the EQUATE macro to define suitable axis statements. This creates the axis statements AXIS98 and AXIS99, whether or not a graph is produced.
VAXIS=
The name of an AXIS statement for the vertical axis.
INC=
The length of X and Y axis tick increments, in data units (for the EQUATE macro). Ignored if HAXIS= and VAXIS= are specified. [Default: INC=0.5 0.5]
XEXTRA=
Number of extra X axis tick marks at the left and right. Use to allow extra space for labels. [Default: XEXTRA=0 0]
YEXTRA=
Number of extra Y axis tick marks at the bottom and top. [Default: YEXTRA=0 0]
M0=
Length of origin marker, in data units. [Default: M0=0.5]
DIMLAB=
Prefix for dimension labels [Default: DIMLAB=Dimension]
NAME=
Name of the graphics catalog entry [Default: NAME=BIPLOT]

Dependencies

BIPLOT requires the following macros:

  equate.sas    Create AXIS statements for a GPLOT with equated axes 

Example

This analysis produces a biplot of log frequency (POWER=0) to examine whether the number of goals scored in home and away games are independent.
%include vcd(biplot);        *-- or include in an autocall library;
title 'UK Soccer scores: Biplot';
data soccer;
	input home $ a0-a4;
cards;
H0   27 29 10  8  2
H1   59 53 14 12  4
H2   28 32 14 12  4
H3   19 14  7  4  1
H4    7  8 10  2  0
;

%biplot(data=soccer, var=_num_, id=home,
	std=none, power=0,
	out=biplot, anno=bianno,
	symbols=circle dot, interp=none, inc=1);
Steps to draw the lines are omitted here.

See also

corresp Correspondence analysis of contingency tables
equate Create AXIS statements for a GPLOT with equated axes