RuG/L04

Manuals

cluster

Description

perform hierarchical clustering on a difference matrix

Synopsis

cluster -sl|-cl|-ga|-wa|-uc|-wc|-wm [-b] [-c] [-m int|int-int|int-int+int] [-N float] [-o filename] [-r int] [-s int] [-u] difference-matrix-file

Options

Clustering algorithm:
-sl
Single Link (Nearest Neighbor), also: -n
-cl
Complete Link (Furthest Neighbor)
-ga
Group Average (UPGMA: Unweighted Pair Group Method using Arithmetic averages)
-wa
Weighted Average (WPGMA: Weighted Pair Group Method using Arithmetic averages)
-uc
Unweighted Centroid (Centroid, UPGMC: Unweighted Pair Group Method using Centroids)
-wc
Weighted Centroid (Median, WPGMC: Weighted Pair Group Method using Centroids)
-wm
Ward's Method (Minimum Variance), also: -w
Other options:
-b
Binary difference output, instead of a cluster file. The result is a difference matrix file.
-c
Cophenetic difference output, instead of cluster file. The result is a difference matrix file.
-m int
-m int-int
-m int-int+int
Maximum number of clusters for binary or cophenetic output. This is the maximum for each run (option -r), so there may very well appear more clusters than given by this number.
You can define ranges of numbers, for instance:
    -m 2-8
    -m 2-11+3
The first example selects all values from 2 to 8 inclusive, the second selects the values 2 5 8 11.
-N float
Noise. Before clustering, all values are increased by a random value between zero and sd times the specified value, where sd is the standard deviation of all the original values. This option can be used more than once, if -b or -c are used as well.
-o filename
Output file
-r int
Number of runs. Only useful if -b or -c, and -N are used as well.
-s int
Seed for random number generator.
-u
Unsorted

Purpose

This program performs hierarchical clustering on a difference matrix file and produces a hierarchical cluster definition file, unless option -b or -c was used. This clustering file can then be processed further with den to create an image of a dendrogram, or with clgroup to produce a partitioning of the data. With option -b or -c, the result can be used to create a differentiated cluster map with mapdiff.

Cancelling

The program will abort if a file _CANCEL_.L04 exists in the current directory, or if it is created while the program is running. This is useful for stopping long running calculations from a GUI, such as pyL04.

Reference

The clustering algorithms implemented in this program are described in:
Anil K. Jain and Richard C. Dubes.
Algorithms for Clustering Data.
Prentice Hall, Englewood Cliffs, NJ, 1988.