sort Generalized dataset sorting by format or statistic sort

Visualizing Categorical Data: sort

$Version: 1.1 (19 Nov 1998)
Michael Friendly
York University

The sort macro ( [download] get

Generalized dataset sorting by format or statistic

The SORT macro generalizes the idea of sorting the observations in a dataset to include:

sorting according to the values of a user-specified format.
With appropriate user-defined formats, this may be used to arrange the observations in a dataset in any desired order.
reordering according to the values of a summary statistic
computed on the values in each of serveral groups, for example, the mean or median of an analysis variable. Any statistic computed by PROC UNIVARIATE may be used.



You must specify one or more BY= variables. To sort by the value of a statistic, specify name the statistic with the BYSTAT= parameter, and specify the analysis variable with VAR=. To sort by formatted values, specify the variable names and associated formats with BYFMT=.

If neither the BYSTAT= or BYFMT= parameters are specified, an ordinary sort is performed.

The sort macro is called with keyword parameters. The arguments may be listed within parentheses in any order, separated by commas. For example:

  %sort(by=age sex, bystat=mean, var=income);
  proc format;
     value age   0='Child' 1='Adult';
  %sort(by=age decending sex,  byfmt=age:age.);


Name of the input dataset to be sorted. The default is the most recently created data set.
Specifies the name of the analysis variable used for BYSTAT sorting.
Name of the output dataset. If not specified, the output dataset replaces the input dataset.
Names of one or more classification (factor, grouping) variables to be used in sorting. The BY= argument may contain the keyword DESCENDING before a variable name for ordinary or formatted-value sorting. For BYSTAT sorting, use ORDER=DESCENDING. The BY= variables may be character or numeric.
A list of one or more terms, of the form, VAR:FMT or VAR=FMT, where VAR is one of the BY= variables, and FMT is a SAS format. Do not specify BYSTAT= when sorting by formatted values.
Name of the analysis variable to be used in determining the sorted order.
Name of the statistic, calculated for the VAR= variable for each level of the BY= variables. BYSTAT may be the name of any statistic computed by PROC UNIVARIATE.
For BYSTAT sorting, specify the name of a frequency variable if the input data consists of grouped frequency counts.
Specify ORDER=DESCENDING to sort in descending order when sorting by a BYSTAT. The ORDER= parameter applies to all BY= variables in this case.


Given a frequency table of Faculty by Income, sort the faculties so they are arranged by mean income:
  %include macros(sort);        *-- or include in an autocall library;
  %sort(data=salary, by=Faculty, bystat=mean, var=income, freq=count);

See also

table Construct a grouped frequency table, with recoding