||Macro for lag sequential analysis
Visualizing Categorical Data: lags
$Version: 1.2 (2 Feb 2001)
Macro for lag sequential analysis
Given a variable containing event codes (char or numeric), the LAGS macro
Either or both of these datasets may be used for subsequent analysis of
sequential dependencies. One or more BY= variables may be specified, in which case separate lags and frequencies are
produced for each value of the BY variables.
A WEIGHT= variable may also be specified,
giving frequencies weighted by that variable. For example, using the
duration of an event as a weight gives 'frequencies' which represent
the total number of time units for state sequential data.
One event variable must be specified with the VAR= option. All other options have default values. If one or more BY= variables are specified, lags and frequencies are calculated separately for
each combination of values of the BY=
a dataset containing n+1 lagged variables, _lag0 - _lagN (_lag0 is just a
copy of the input event variable)
optionally, an (n+1)-way contingency table containing frequencies of all
combinations of events at lag0 -- lagN
The arguments may be listed within parentheses in any order, separated by
commas. For example:
%lags(data=codes, var=event, nlag=2)
Arguments pertaining to the n-way frequency table:
The name of the SAS dataset to be lagged. If DATA= is not specified, the most recently created data set is used.
The name of the event variable to be lagged. The variable may be either
character or numeric.
The name of one or more BY variables. Lags will be restarted for each level
of the BY variable(s). The BY variables may be character or
Specifies a numeric variable whose value represents the
frequency or weight of an observation. The weight values
must be non-negative, but need not be integers.
An optional format for the event VAR= variable. If the codes are numeric, and a format specifying what each
number means is used (e.g., 1='Active' 2='Passive'), the output lag
variables will be given the character values.
Number of lags to compute. Default = 1.
Name of the output dataset containing the lagged variables. This dataset
contains the original variables plus the lagged variables, named according
to the PREFIX= option.
Prefix for the name of the created lag variables. The default is
PREFIX=_LAG, so the variables created are named _LAG1, _LAG2, ..., up to
_LAG&nlag. For convenience, a copy of the event variable is created as
Options for the TABLES statement used in PROC FREQ for the frequencies of
each of lag1-lagN vs lag0 (the event variable). The default is
FREQOPT= NOROW NOCOL NOPERCENT CHISQ.
Assume a series of 16 events have been coded with the 3 codes, a, b, c, for
2 subjects as follows:
Name of the output dataset containing the n-way frequency table. The table
is not produced if this argument is not specified.
NO, or ALL specifies whether the n-way frequency table is to be made
'complete', by filling in 0 frequencies for lag combinations which do not
occur in the data.
Sub1: c a a b a c a c b b a b a a b c
Sub2: c c b b a c a c c a c b c b c c
and these have been entered as the 2 variables SEQ (subject) and CODE in
the dataset CODES:
Then the macro call:
%lags(data=codes, var=code, by=seq, outfreq=freq);
produces the lags dataset _lags_ for
NLAG=1 that looks like this:
SEQ CODE _LAG0 _LAG1
1 c c
a a c
a a a
b b a
a a b
2 c c
c c c
b b c
b b b
a a b
The output 2-way frequency table (outfreq=freq) looks liks this:
SEQ _LAG0 _LAG1 COUNT
1 a a 2
b a 3
c a 2
a b 3
b b 1
c b 1
a c 2
b c 1
c c 0
2 a a 0
b a 0
c a 3
a b 1
b b 1
c b 2
a c 2
b c 3
c c 3