heplot Plot Hypothesis and Error matrices for one MLM effect heplot

SAS Macro Programs: heplot

$Version: 1.6-4 (24 Aug 2007)
Michael Friendly
York University

The heplot macro ( [download] get heplot.sas)

Plot Hypothesis and Error matrices for a bivariate MANOVA effect

The HEPLOT macro plots the covariance ellipses for a hypothesized (H) effect and for error (E) for two variables from a MANOVA. The plot helps to show how the means of the groups differ on the two variables jointly, in relation to the within-group variation. The test statistics for any MANOVA are essentially saying how 'large' the variation in H is, relative to the variation in E, and in how many dimensions. The HEPLOT shows a two-dimensional visualization of the answer to this question. An alternative two-dimensional view is provided by the CANPLOT macro, which shows the data, variables, and within-group ellipses projected into the space of the largest two canonical variables--- linear combinations of the responses for which the group differences are largest.

Typically, you perform a MANOVA analysis with PROC GLM, and save the output statistics, including the H and E matrices, using the OUTSTAT= option. This must be supplied to the macro as the value of the STAT= parameter. If you also supply the raw data for the analysis via the DATA= parameter, the means for the levels of the EFFECT= parameter are also shown on the plot.

Various kinds of plots are possible, determined by the M1= and M2= parameters. The default is M1=H and M2=E. If you specify M2=I (identity matrix), then the H and E matrices are transformed to H* = eHe (where e=E^-1/2), and E*=eEe=I, so the errors become uncorrelated, and the size of H* can be judged more simply in relation to a circular E*=I. For multi-factor designs, is it sometimes useful to specify M1=H+E, so that each factor can be examined in relation to the within-cell variation.



The HEPLOT macro is defined with keyword parameters. The STATS= parameter and either the VAR= or the X= and Y= parameters are required. The arguments may be listed within parentheses in any order, separated by commas. For example:

  proc glm data=dataset outstat=stats;
     model y1 y2  = A B A*B / ss3;
  %heplot(data=dataset, stat=stats, var=y1 y2, effect=A );
  %heplot(data=dataset, stat=stats, var=y1 y2, effect=A*B );


Name of the OUTSTAT= dataset from proc glm containing the SSCP matrices for model effects and ERROR, as indicated by the _SOURCE_ variable.
Name of the input, raw data dataset (for means)
Name of horizontal variable for the plot
Name of vertical variable for the plot
2 response variable names: x y. Instead of specifying X= and Y= separately, you can specify the names of two response variables with the VAR= parameter.
Name of the MODEL effect to be displayed for the H matrix. This must be one of the terms on the right hand side of the MODEL statement used in the PROC GLM or PROC REG step, in the same format that this efffect is labeled in the STAT= dataset. This must be one of the values of the _SOURCE_ variable contained in the STAT= dataset.
Names of class variables(s), used to find the means for groups to be displayed in the plot. The default value is the value specified for effect, except that '*' characters are changed to spaces.
Label (up to 16 characters) for the H effect, annotated near the max/min corner of the H ellipse. [Default: EFFLAB=&EFFECT]
Location for the effect label: MAX (above) or MIN (below).[Default: EFFLOC=MAX]
Matrices to plot [Default: MPLOT=1 2]
Format for levels of the group/effect variable used in labeling group means.
Non-coverage proportion for the ellipses [Default: ALPHA=0.32]
Coverage proportion, 1-alpha [Default: PVALUE=0.68]
Type of SS to extract from the STAT= dataset. The possibilities are SS1-SS4, or CONTRAST (but the SSn option on the MODEL statement in PROC GLM will limit the types of SSCP matrices produced). This is the value of the _TYPE_ variable in the STAT= dataset. [Default: SS=SS3]
To subset both the STAT= and DATA= datasets
Name of an input annotate data set, used to add additional information to the plot of y * x.
Specify ADD=CANVEC to add canonical vectors to the plot. The PROC GLM step must have included the option CANONICAL on the MANOVA statement.
First matrix: either H or H+E [Default: M1=H]
Second matrix either E or I [Default: M2=E]
Scale factors for M1 and M2. This can be a pair of numeric values or expressions using any of the scalar values calculated in the PROC IML step. The default scaling [SCALE=1 1] results in a plot of E/dfe and H/dfe, where the size and orientation of E shows error variation on the data scale, and H is scaled conformably, allowing the group means to be shown on the same scale. The _natural scaling_ of H and E as generalized mean squares would be H/dfh and E/dfe, which is obtained using SCALE=dfe/dfh 1. Equivalently, the E matrix can be shrunk by the same factor by specifying SCALE=1 dfh/dfe.
Name of an axis statement for the y variable
Name of an axis statement for the x variable
Name of a LEGEND statement. If not specified, a legend for the M1 annd M2 matrices is drawn beneath the plot. Use LEGEND=NONE to suppress the legend.
Colors for the H and E ellipses [Default: COLORS=BLACK RED]
Line styles for the H and E ellipses [Default: LINES=1 21]
Line widths for the H and E ellipses [Default: WIDTH=3 2]
Height of text in the plot. If not specified, the global graphics option HTEXT controls this.
Name of the output dataset [Default: OUT=OUT]
Name of the graphic catalog entry [Default: NAME=HEPLOT]
Name of the graphic catalog [Default: GOUT=GSEG]


Carry out a one-way MANOVA on the Iris data, gettting an OUTSTAT= dataset:
%include macros(heplot);        *-- or include in an autocall library;
%include data(iris);
proc glm data=iris outstat=stats noprint;
   class species;
   model SepalLen sepalwid PetalLen petalwid = species / nouni ss3;
   manova h=species;
Produce two HE plots: one of H and E, and the other of (H+E) and E, for the effect of species. The VAXIS=AXIS1 parameter ensures that both plots have the same vertical axis scaling.
axis1 label=(a=90) order=(40 to 80 by 10);
legend1 position=(bottom center inside) offset=(0,1) mode=share frame;
%heplot(data=iris,stat=stats, var=Petallen SepalLen, effect=species, 
	vaxis=axis1, legend=legend1, hsym=1.6);

%heplot(data=iris,stat=stats, var=PetalLen SepalLen, effect=species,
	vaxis=axis1,m1=H+E, legend=legend1, hsym=1.6);

%panels(rows=1, cols=2);

Canonical HE plots

This example displays the H and E matrices for the two canonical dimensions that best discriminate among the species.
*-- use canplot for side effect of getting Can scores and annotate dataset;
  var=SepalLen SepalWid PetalLen PetalWid,

*-- remove species circles and means from annotate data set;
data _danno_;
	set _danno_;
	where comment not in ('MEAN', 'CIRCLE');
*-- Get H and E matrices for canonical scores;
proc glm data=_dscore_ outstat=stats;
	class species;
	model can1 can2  = species / nouni ss3;
	manova h=species;

axis1 length=2.6 IN order=(-4 to 4 by 2) label=(a=90);
axis2 length=6.5 IN order=(-10 to 10 by 2);
legend1 position=(bottom center inside) offset=(0,1) mode=share frame;

%heplot(data=_dscore_, stat=stats, x=Can1, y=Can2
	,haxis=axis2, vaxis=axis1, legend=none, hsym=1.6
It may be seen that the species mean-variation is essentially one-dimensional.

See also

biplot Generalized biplot of observations and variables
canplot Canonical discriminant structure plot
ellipses Plot bivariate data ellipses
hemat HE plots for all pairs of response variables
hemreg Extract H and E matrices for multivariate regression
meanplot Plot means for factorial designs
mpower Retrospective power analysis for multivariate GLMs
multisummary Calculate summary statistics for multiple variables
panels Display a set of plots in rectangular panels

heplots package for R Comprehensive R package provideing many enhancements
candisc package for R Extensions for generalized canonical discriminant analysis