Update 2001/11/16: Recompiled koh.exe for Windows allows for processing of much larger data files.
Contents:
Note: Images on this webpage use grey-scales to convey
information. If you don't see 16 shades of grey in the image to the left
some information will be lost.
Teuvo Kohonen.The extensions were first described in:
Self-Organization and Associative Memory.
Springer-Verlag, Berlin, 3rd edition, 1989.
Peter Kleiweg.
Neurale netwerken: Een inleidende cursus met practica voor de studie Alfa-Informatica.
Syllabus, section 5.7: Analyse van complexe data.
Master's thesis, Rijksuniversiteit Groningen, 1996.
With software.
The result of the training is that a pattern of organization emerges in the map. Different units learn to respond to different vectors in the input set, and units closer together will tend to respond to input vectors that resemble each other.
When the training is finished, the set of input vectors is applied to the map once more, marking for each input vector the unit that responds the strongest (is most similar) to that input vector.
To demonstrate this algorithm, Kohonen used the set of 32 vectors reproduced in the table below. The vectors have dimension 5, and are labeled A-Z,1-6.
A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | 1 | 2 | 3 | 4 | 5 | 6 |
1 0 0 0 0 |
2 0 0 0 0 |
3 0 0 0 0 |
4 0 0 0 0 |
5 0 0 0 0 |
3 1 0 0 0 |
3 2 0 0 0 |
3 3 0 0 0 |
3 4 0 0 0 |
3 5 0 0 0 |
3 3 1 0 0 |
3 3 2 0 0 |
3 3 3 0 0 |
3 3 4 0 0 |
3 3 5 0 0 |
3 3 6 0 0 |
3 3 7 0 0 |
3 3 8 0 0 |
3 3 3 1 0 |
3 3 3 2 0 |
3 3 3 3 0 |
3 3 3 4 0 |
3 3 6 1 0 |
3 3 6 2 0 |
3 3 6 3 0 |
3 3 6 4 0 |
3 3 6 2 1 |
3 3 6 2 2 |
3 3 6 2 3 |
3 3 6 2 4 |
3 3 6 2 5 |
3 3 6 2 6 |
There is order in this set of vectors that can be displayed in a minimal spanning tree, obtained by linking all the vectors together, using the smallest possible square difference between linked vectors:
Using Kohonen's algorithm on this set of vectors generates the map below:
However, this information is available. All you have to do is make it visible. This can be done by calculating the square difference between neighboring units on the trained map, and use this value to color the edged separating the units. This is done in the map below, where dark lines indicate strong difference, and light lines indicate strong resemblance.
Additional information can be added. You can create a minimal spanning tree, and overlay this on the map, as is done on the map below.
Kohonen's example set of vectors is highly artificial. As a more natural example, I created a set of vectors from statistical data about occurrences of combinations of characters in Dutch words. The result is a set of 24 vectors of dimension 58. The characters q, x and y are omitted for their rarity in Dutch. The combination ij is considered to be one character, a vowel.
Applying nearest neighbor vector clustering, using my clustering software results in an image of a dendrogram, as displayed below. Note that creating a dendrogram for Kohonen's example set of vectors is meaningless because all pairs of nearest neighbors have an identical square difference. By the way, the algorithm to obtain nearest neighbor vector clustering is identical to the algorithm used to obtain the minimal spanning trees.
The result of creating a Kohonen map, with all the bells and whistles, from this set of vectors is shown below. One more additional type of information is made visible here. The lines of the minimal spanning tree indicate the difference between linked vectors. Closed lines indicate strong resemblance (e.g. l, r), open lines indicate strong difference (e.g. h, j).
Here is a large example. A spanning tree is omitted.
The usage of koh is fairly simple. Just run the program without any parameters for a list of possible options. For the syntax of the input file, refer to one of the examples.
You may want to modify the PostScript file produced by the program. All relevant options are at the top of the file. DX and DY define width and height of one unit. The flags SHOW_GRAY and SHOW_LINKS define whether or not to use the extensions. ANGLE controls the angle of curved links. If you plan to include the PostScript image into another document, remember to adjust the values of the BoundingBox, if you have changed the size of the image.
Sometimes it may be necessary to adjust the positioning of the labels on the map as produces by the program. For instance, two labels may be unreadable because they are mapped to the same unit. The commands to draw the labels are at the end of the PostScript file. When two short labels are mapped to the same unit you could delete one line in the PostScript file, and combine the labels on the other line. For instance, the labels (A) and (B) could become (A, B). Longer labels could be placed above each other. The command for drawing the labels uses three arguments, X and Y coordinate and the label itself. The coordinates are given as integer values, but they need not be. When two labels overlap, you can add .15 to the Y coordinate (second argument) of one label, and subtract .15 from the Y coordinate of the other label.
The usage of kohview is as simple as possible. Just run the program with as an argument the name of one of the files produces by koh, but without the extension.
What was stated above about the positioning of labels in the PostScript file applies to the output of kohview as well. In this case you need to make similar adjustments to the file with the extension .top as produced by the koh program.