RuG/L04

Manuals

linc

Description

calculate the Local Incoherence of a difference matrix, a numerical validation method

Synopsis

linc -D [-n float] [-o | -r | -w] [-l] diff-table-file distance-table-file

linc -L [-n float] [-o | -r | -w] ] [-l] diff-table-file coordinate-file

linc -a float [-n float] [-o | -r | -w]] [-l] diff-table-file coordinate-file

Options

-D
Geographic distances are given in distance tabel file
-L
Coordinates are in longitude/latitude
-a float
Coordinates are in x/y, with the given aspect ratio, (>= 1)
-l
Long output
-n float
Add noise
-o
Calculate value for optimal solution
-r
Calculate value for a random `solution'
-w
Calculate value for worst solution

Purpose

The Local Incoherence is a numerical score assigned to the results of the leven program.

Comparing this score for several runs of the program with different parameters or datasets, but on the same set of locations, gives you an idea what results are the more reliable. Lower values for Local Incoherence means the results are better.

Comparing the Local Incoherence between different sets of locations is meaningless.

The idea behind Local Incoherence is that on average, locations that are close should be less different than locations farther apart (coherence), but any correlation between geographical distance and dissimilarity is lost over larger geographical distances (hence: local).

Type of coordinates

To determine what locations are closer, it must me determined how the coordinates must be interpreted.

Use option -L if the coordinates are given in longitude and latitude.

Use option -a if the coordinates are given in some user-defined rectangular grid, and specify the aspect ration between X- and Y-coordinates. This must be a value of 1.0 or greater, meaning a distance of 1X can represent a smaller geographical distance than a distance of 1Y.

Geographic distances

Instead of using geographic coordinates, you can use option -D, and supply a file with geographic distances. These don't have to be distances "as the crow flows", but can be, for instance, travelling times, distances that take into account any geographic obstacles that "increase" the geographic distances.

Note that the format of the file with geographic distances is the same as the file with differences you want to test for Local Incoherence. So make sure you don't accidently swap the file name arguments.

Definition

The Local Incoherence, or IL is defined as follows:

[formula]

n Number of locations.
dij The geographical distance between locations i and j. These distances are ordered such that the corresponding dialectological difference between locations i and 1 is the smallest, between locations i and 2 the second smallest, etc.
d^ij Like dij, but distances are ordered according to geographical distance instead of according to dialectological difference. In other words: A^i is the optimal solution for Ai.
k The upper limit for the number of dialectological neighbourhood locations used. The program uses k=8.
p A parameter. The program uses 0.5.

Whenever there is a series of identical dialectological differences, the geographic distances are averaged for those pairs of locations.