lags Macro for lag sequential analysis lags

Visualizing Categorical Data: lags

$Version: 1.2 (2 Feb 2001)
Michael Friendly
York University

The lags macro ( [download] get lags.sas)

Macro for lag sequential analysis

Given a variable containing event codes (char or numeric), the LAGS macro creates:

Either or both of these datasets may be used for subsequent analysis of sequential dependencies. One or more BY= variables may be specified, in which case separate lags and frequencies are produced for each value of the BY variables. A WEIGHT= variable may also be specified, giving frequencies weighted by that variable. For example, using the duration of an event as a weight gives 'frequencies' which represent the total number of time units for state sequential data.

Usage

One event variable must be specified with the VAR= option. All other options have default values. If one or more BY= variables are specified, lags and frequencies are calculated separately for each combination of values of the BY= variable(s).

The arguments may be listed within parentheses in any order, separated by commas. For example:

  %lags(data=codes, var=event, nlag=2)

Parameters

DATA=
The name of the SAS dataset to be lagged. If DATA= is not specified, the most recently created data set is used.
VAR=
The name of the event variable to be lagged. The variable may be either character or numeric.
BY=
The name of one or more BY variables. Lags will be restarted for each level of the BY variable(s). The BY variables may be character or numeric.
WEIGHT=
Specifies a numeric variable whose value represents the frequency or weight of an observation. The weight values must be non-negative, but need not be integers.
VARFMT=
An optional format for the event VAR= variable. If the codes are numeric, and a format specifying what each number means is used (e.g., 1='Active' 2='Passive'), the output lag variables will be given the character values.
NLAG=
Number of lags to compute. Default = 1.
OUTLAG=
Name of the output dataset containing the lagged variables. This dataset contains the original variables plus the lagged variables, named according to the PREFIX= option.
PREFIX=
Prefix for the name of the created lag variables. The default is PREFIX=_LAG, so the variables created are named _LAG1, _LAG2, ..., up to _LAG&nlag. For convenience, a copy of the event variable is created as _LAG0.
FREQOPT=
Options for the TABLES statement used in PROC FREQ for the frequencies of each of lag1-lagN vs lag0 (the event variable). The default is FREQOPT= NOROW NOCOL NOPERCENT CHISQ.
Arguments pertaining to the n-way frequency table:

OUTFREQ=
Name of the output dataset containing the n-way frequency table. The table is not produced if this argument is not specified.
COMPLETE=
NO, or ALL specifies whether the n-way frequency table is to be made 'complete', by filling in 0 frequencies for lag combinations which do not occur in the data.

Example

Assume a series of 16 events have been coded with the 3 codes, a, b, c, for 2 subjects as follows:
 Sub1:   c   a   a   b   a   c   a   c   b   b   a   b   a   a   b   c
 Sub2:   c   c   b   b   a   c   a   c   c   a   c   b   c   b   c   c

and these have been entered as the 2 variables SEQ (subject) and CODE in the dataset CODES:

        SEQ    CODE

        1      c
        1      a
        1      a
        1      b
        ....
        2      c
        2      c
        2      b
        2      b
        ....

Then the macro call:

   %lags(data=codes, var=code, by=seq, outfreq=freq);
produces the lags dataset _lags_ for NLAG=1 that looks like this:

  SEQ    CODE    _LAG0    _LAG1

   1      c        c         
          a        a        c
          a        a        a
          b        b        a
          a        a        b
          ....

   2      c        c         
          c        c        c
          b        b        c
          b        b        b
          a        a        b
           ....

The output 2-way frequency table (outfreq=freq) looks liks this:

  SEQ    _LAG0    _LAG1    COUNT

   1       a        a        2
           b        a        3
           c        a        2
           a        b        3
           b        b        1
           c        b        1
           a        c        2
           b        c        1
           c        c        0

   2       a        a        0
           b        a        0
           c        a        3
           a        b        1
           b        b        1
           c        b        2
           a        c        2
           b        c        3
           c        c        3

See also

meanplot
panels
scatmat
stat2dat