SAS Macro Programs for Statistical Graphics: BIPLOT

$Version: 1.9 (17 Dec 2003)
Michael Friendly
York University



BIPLOT macro ( [download] get biplot.sas)

The BIPLOT macro uses PROC IML to carry out the calculations for the biplot display described in "Section 8.7". The program produces

Usage

The original version of this macro required that the columns of the data table be stored as a set of variables in the input dataset. In this arrangment, use the VAR= argument to specify this list of variables and the ID= variable to specify an additional variable whose values are labels for the rows.

Assume a dataset of reaction times to 4 topics in 3 experimental tasks, in a SAS dataset like this:

     TASK   TOPIC1   TOPIC2   TOPIC3   TOPIC4
	  Easy     2.43     3.12     3.68     4.04
	  Medium   3.41     3.91     4.07     5.10
	  Hard     4.21     4.65     5.87     5.69
For this arrangment, the macro would be invoked as follows:
   %biplot(var=topic1-topic4, id=task);

The present version also allows the dataset to contain all response values in a single variable, with two (or more) additional variables to specify the row and column class variables, as is done with PROC GLM in the univariate (non-repeated measures) format. In this case, DO NOT specify an ID= variable, use the VAR= argument to specify the two row and column class variables, and specify the name of response variable as RESPONSE=.

The same data in this format would have 12 observations, and look like:

 		TASK  TOPIC    RT
		Easy    1     2.43
		Easy    2     3.12
		Easy    3     3.68
		...
		Hard    4     5.69
For this arrangment, the macro would be invoked as follows:
   %biplot(var=topic task, response=RT);
In this arrangement, the order of the VAR= variables does not matter. The columns of the two-way table are determined by the variable which varies most rapidly in the input dataset (TOPIC, in the example).

Parameters

DATA=_LAST_
Name of the input data set for the biplot.
VAR =_NUM_
Variables for biplot, when the data is in table form, or list of factor variables, when the data is in GLM form. The list of variables may use any of the SAS abbreviated forms for variable lists (e.g.,X1-X n).
ID=
Name of a character variable used to label the rows (observations) in the biplot display. (Only specify an ID= variable when the data is in table form.)
RESPONSE=
Name of response variable (GLM input form)
DIM =2
Number of biplot dimensions. (Only two-dimensional plots are produced if DIM>2.)
FACTYPE=SYM
Biplot factor type: GH, SYM, JK, or COV. FACTYPE=COV gives the GH scaling, with observation vectors multiplied by sqrt(N-1), and variable vectors divided by the same factor.
SCALE=1
Scale factor for variable vectors. The coordinates for the variables are multiplied by this value. Setting SCALE=0 causes the macro to compute the scale factor to equate the maximum distance from the origin of the variable and observation markers.
POWER=1
Power to which the data values are transformed (POWER=0 means log(y)).
OUT =BIPLOT
Output data set containing biplot coordinates.
ANNO=BIANNO
Output data set containing Annotate labels.
STD=MEAN
Specifies how to standardize the data matrix before the singular value decomposition is computed. If STD=NONE, only the grand mean is subtracted from each value in the data matrix. This option is typically used when row and column means are to be represented in the plot, as in the diagnosis of two-way tables ("Section 7.6.3"). If STD=MEAN, the mean of each column is subtracted. This is the default, and assumes that the variables are measured on commensurable scales. If STD=STD, the column means are subtracted and each column is standardized to unit variance.
COLORS=BLUE RED
Colors used for OBS and VARS.
SYMBOLS=NONE NONE
Symbols used for OBS and VARS. Because the points are usually labeled, symbols are often superfluous.
INTERP=NONE VEC
Interpolation option used for OBS and VARS. In addition to the standard interpolation options provided by the SYMBOL statement, the BIPLOT macro also understands the option VEC to mean a vector from the origin to the row or column point. [Default: INTERP=NONE VEC,
LINES=33 20
Line styles used for OBS and VARS interpolation options.
PLOTREQ=DIM2 * DIM1
Specifies the dimensions to be plotted.
GPLOT=YES
Produce a GPLOT plot? If GPLOT=YES, the two dimensions specified in PLOTREQ= are plotted.
PPLOT=NO
Produce printer plot? If PPLOT=YES, the two dimensions specified in PLOTREQ= are plotted.
HAXIS=
The name of an AXIS statement for the horizontal axis. If neither HAXIS= nor VAXIS= are specified, the program calls the EQUATE macro to produce AXIS statements in which the axes are equated. This creates the axis statements AXIS98 and AXIS99, whether or not a graph is produced. In this case, you should examine the values used for the INC=, XEXTRA=, and YEXTRA= parameters.
VAXIS=
The name of an AXIS statement for the vertical axis.
VTOH=2
The vertical to horizontal aspect ratio (height of one character divided by the width of one character) of the printer device, used to equate axes for a printer plot, when PPLOT=YES. [Default: VTOH=2]
INC=0.5 0.5
X, Y axis tick increments (for the EQUATE macro). Ignored if HAXIS= and VAXIS= are specified. [Default: INC=0.5 0.5]
XEXTRA=0 0
The number of extra X axis tick marks at left and right. Use to allow extra space for labels. [Default: XEXTRA=0 0]
YEXTRA=0 0
The number of extra Y axis tick marks [Default: YEXTRA=0 0]
M0=0.5
Length of origin marker, in data units. If the axes have been properly equated, the lengths of the horizontal and vertical segments should be equal. [Default: M0=0.05]
DIMLAB=
Prefix for dimension labels [Default: DIMLAB=Dimension when DIM=2, otherwise, DIMLAB=Dim]
NAME=
Name of the graphics catalog entry [Default: NAME=biplot]

The OUT= data set

The results from the analysis are saved in the OUT= data set. This data set contains two character variables (_TYPE_ and _NAME_) which identify the observations and numeric variables (DIM1, DIM2, ...) which give the coordinates of each point.

The value of the _TYPE_ variable is 'OBS' for the observations that contain the coordinates for the rows of the data set, and is 'VAR' for the observations that contain the coordinates for the columns. The _NAME_ variable contains the value of ID= variable for the row observations and the variable name for the column observations in the output data set.

GOPTIONS

The height and font used for point labels may be set using the GOPTIONS atatement (HTEXT= and FTEXT=) before calling the macro.

Missing data

The program makes no provision for missing values on any of the variables to be analyzed.

Example

%include data(AUTO) ;
*include macros(biplot);           /* or store in autocall library */
title h=1.6 'Biplot of Automobiles data';
data auto;
   set auto;
   if rep77 ^= . and rep78 ^=.;       /* delete missing data  */
   model = origin || scan(model,1);
goptions htext=1.5;                   /* set symbol height    */
%biplot( data= auto,
         var = gratio  turn  rep77  rep78  price    mpg
               hroom rseat  trunk weight length displa,
         id=model,  scale=.8,
         factype=SYM, std = STD );