Clustering with noise
Clustering is susceptible to noise. We can exploit this fact. We can
deliberately add noise before clustering, repeat this many times, and see what
the effect
will be. A strong cluster border will not be as easily effected
as a weak cluster border.
There are several choices to make in this procedure. After testing, these seem
to work well in most cases:
- a noise level of 0.5 times the standard deviation of the differences
- using both Group Average clustering and Weighted Average clustering, combining the results
- repeat 50 times
- use the average cophenetic differences as the result
The cophenetic difference of two items is the difference of the two clusters
they were part of at the point were these two cluster were joined
into a single cluster containing both items. A dendrogram shows these
cophenetic differences on the x-axis. See image at middle right.
Limitations
Though this is probably the most honest map you can get, it is not the most
clear one. You may want to extend this procedure with MDS.
|
|
|