INPUT: A table with columns of words, like this: LOC1 LOC2 LOC3 a1 a2 a3 b1 b2 b3 c1 c3 There can be as many input columns and rows as you like. Some cells can be empty (as in this case, there is no c2). OUTPUT: A table like this: LOC1 LOC2 LOC3 LOC1__LOC2 LOC1__LOC3 LOC2__LOC3 a1 a2 a3 L(a1,a2) L(a1,a3) L(a2,a3) b1 b2 b3 L(b1,b2) L(b1,b3) L(b2,b3) c1 c3 NA L(c1,c3) NA L(a1,a2) = Levenshtein difference between strings a1 and a2 NA = not available Some of the extra columns may be missing, depending on options you have set. ================================================================ DETAILS The input file must be a tab-delimited file with raw, unquoted strings: - Row items (empty items too) are separated by a single tab. - Any quotes or slashes are considered to be just part of the string. See example file: sample.data The output file is also a tab-delimited file with raw, unquoted strings. ================================================================ PROGRAM The program that does all the work is: columns.py You can call the program directly from the command line, passing settings as command line options. Or you can use the GUI: columnsGUI.py To test the program, use as input file: sample.data If you want to try features, select the feature definition file: features-example.txt ================================================================ IMPORTING RESULTS When you import the results into another program, such as a spreadsheet, there are two things to pay attention to. - You need to set the import format to tab-delimited, for instance in Excel, select: Delimited, Delimiters: Tab (and no others). - You need to import the text columns as raw text (quotes and backslashes are just part of the text), for instance in Excel, select: Text Qualifier: {none} There is an example Excel worksheet with two tabs. One tab is an import of the results on 'sample.data' without features. The other tab is an import of the results on 'sample.data' with features from 'features-example.txt'.