RuG/L04 - Manuals

perform hierarchical clustering on a difference matrix

cluster -sl|-cl|-ga|-wa|-uc|-wc|-wm [-b] [-c] [-m int|int-int|int-int+int] [-N float] [-o filename] [-r int] [-s int] [-u] difference-matrix-file

Clustering algorithm:

-sl: Single Link (Nearest Neighbor), also: -n
-cl: Complete Link (Furthest Neighbor)
-ga: Group Average (UPGMA: Unweighted Pair Group Method using Arithmetic averages)
-wa: Weighted Average (WPGMA: Weighted Pair Group Method using Arithmetic averages)
-uc: Unweighted Centroid (Centroid, UPGMC: Unweighted Pair Group Method using Centroids)
-wc: Weighted Centroid (Median, WPGMC: Weighted Pair Group Method using Centroids)
-wm: Ward's Method (Minimum Variance), also: -w

Other options:

-b

Binary difference output, instead of a cluster file. The result is a difference matrix file.

-c

Cophenetic difference output, instead of cluster file. The result is a difference matrix file.

-m int

-m int-int

-m int-int+int

Maximum number of clusters for binary or cophenetic output. This is the maximum for each run (option -r), so there may very well appear more clusters than given by this number.
You can define ranges of numbers, for instance:

    -m 2-8
    -m 2-11+3

The first example selects all values from 2 to 8 inclusive, the second selects the values 2 5 8 11.

-N float

Noise. Before clustering, all values are increased by a random value between zero and sd times the specified value, where sd is the standard deviation of all the original values. This option can be used more than once, if -b or -c are used as well.

-o filename

Output file

-r int

Number of runs. Only useful if -b or -c, and -N are used as well.

-s int

Seed for random number generator.

-u

Unsorted

This program performs hierarchical clustering on a difference matrix file and produces a hierarchical cluster definition file, unless option -b or -c was used. This clustering file can then be processed further with den to create an image of a dendrogram, or with clgroup to produce a partitioning of the data. With option -b or -c, the result can be used to create a differentiated cluster map with mapdiff.

The program will abort if a file _CANCEL_.L04 exists in the current directory, or if it is created while the program is running. This is useful for stopping long running calculations from a GUI, such as pyL04.

The clustering algorithms implemented in this program are described in:

Anil K. Jain and Richard C. Dubes.
Algorithms for Clustering Data.
Prentice Hall, Englewood Cliffs, NJ, 1988.

Manuals

cluster

Description

Synopsis

Options

Purpose

Cancelling

Reference