# written by Bart Alewijnse # Vieregge-Cuccuarini modified by W. Heeringa (X-SAMPA) # configuration file written in Jan 2005 # # A somewhat modified version of the feature system by Vieregge and # Cucchiarini. Most of the modifications (by Wilbert Heeringa) are in # the addition of a number of features. # ######################################################################### DEFINES ######## #default stuff. VERSION 2 TOP 65535 METHOD SUM TOKENSTRING RAW START 0 # The following is actually only half-arbitrary # # 8 seems to be about the maximum usual difference between tokens # in practice (Taking the distance to the indels vector into # account; 5 is enough for most others) # # However, because the output costs are clamped when they are # converted into the features distance table (int range 0 to 65535, # which is equivalent to 0..1 for the leven executable) it's # possible to distort the costs: eg. when delete-and-insert becomes # relatively cheaper because the would-be-much-larger indel was clamped # while the subst wasn't. # Therefore I'm taking a value a little higher, especially since further # alterations could make the distance go to ten without too much trouble. # Note that the leven distance only really has comparison value # relative to other values under the same model anyhow. SUBSTMAX 10.0 #the INDEL value is irrelevant here, as the use of the vector in the #INDELS section overrides it ######################################################################### FEATURES ######## # All the 0's are default distances. # # The point is that the characters that can be both vowel and consonant # should be comparable to characters that are only vowel or consonant ONLY # by the features of the type in which they match (eg. vowel features in # a vowel-vowel and vowel-both comparison). # # By default in VERSION 2, the distance between a feature that is undefined # in one token and defined in the other is its default difference times its # weight. By making the default difference zero, this means if a feature # is defined in one token only, it does not add anything to the distance. # # This in turn means consonants and vowels can leave only their own # features defined and 'both' can have both sections defined, and whatever # isn't defined in one of the tokens in a token-token comparison will # contribute 0 (0*weight) to the distance. # # It's a bit convoluted, but it seems like the only way to allow the 'both' # thing. # # Type As in vowel/consonant/both, exists purely to disallow a vowel-consonant # subst by making it extremely costly, forcing leven to fall back on using an # insert and delete instead. B 1.0 40.0 type # While used as bitmaps, some features can take fractional values. # It's possible to solve that with weights, but rather dirty. N 0 1.0 adv1 N 0 1.0 adv2 N 0 1.0 adv3 N 0 1.0 adv4 N 0 1.0 high1 N 0 1.0 high2 N 0 1.0 high3 N 0 1.0 long1 N 0 1.0 long2 N 0 1.0 long3 N 0 1.0 round N 0 1.0 nasal N 0 1.0 diph N 0 1.0 breath N 0 1.0 creak N 0 1.0 tone1 N 0 1.0 tome2 N 0 1.0 circ N 0 1.0 place1 N 0 1.0 place2 N 0 1.0 place3 N 0 1.0 place4 N 0 1.0 voice N 0 1.0 nas N 0 1.0 stop N 0 1.0 glide N 0 1.0 lat N 0 1.0 fric N 0 1.0 trill N 0 1.0 high N 0 1.0 distr N 0 1.0 syll N 0 1.0 apic #These two are purely for the indels. They don't have any other effect, # because because they're either equal or swamped in value by 'type' # because I'm lazy, these do work based on the nondefined-defined default # distance, so that I only have to set them in the INDELS vector. D 1 1 vInDel D 1 1 cInDel # In fact, their naming is almost redundant; the only detail is that it's # in the indel vector, nowhere else, and has a default distance of 1. ######################################################################### TEMPLATES ######## # # vzero and czero set the respective section to non-undefined when it is clear # that a token is a vowel (and/)or a consonant. # (This avoids the default distance from being applicable for the same # section - i.e. makes sure the distance comes purely from defined value # distances [note that in the TOKENS section templates are only called to # set to non-zero] ) # T vzero F adv1 = 0 F adv2 = 0 F adv3 = 0 F adv4 = 0 F high1 = 0 F high2 = 0 F high3 = 0 F long1 = 0 F long2 = 0 F long3 = 0 F round = 0 F nasal = 0 F diph = 0 F breath = 0 F creak = 0 F tone1 = 0 F tome2 = 0 F circ = 0 T czero F place1 = 0 F place2 = 0 F place3 = 0 F place4 = 0 F voice = 0 F nas = 0 F stop = 0 F glide = 0 F lat = 0 F fric = 0 F trill = 0 F high = 0 F distr = 0 F syll = 0 F apic = 0 #note that type is a bitmap. Matching of 3 with 2, and 3 with 1 is inherent. T vowel F type = 1 T consonant F type = 2 #This is from back when a character (i, j, u and w) could be seen as both a # vowel and consonant, which had a problem. # It is still necessary for the comparison to the null vector in INDELS. T both F type = 3 T adv1 F adv1 = 1 T adv2 F adv2 = 1 T adv3 F adv3 = 1 T adv4 F adv4 = 1 T high1 F high1 = 1 T high1half F high1 = 0.5 T high2 F high2 = 1 T high2half F high2 = 0.5 T high3 F high3 = 1 T high3half F high3 = 0.5 T long1 F long1 = 1 T long2 F long2 = 1 T round F round = 1 T roundhalf F round = 0.5 T nasal F nasal = 1 T diph F diph = 1 T breath F breath = 1 T creak F creak = 1 T tone1 F tone1 = 1 T tome2 F tome2 = 1 T circ F circ = 1 T place1 F place1 = 1 T place1half F place1 = 0.5 T place2 F place2 = 1 T place2half F place2 = 0.5 T place3 F place3 = 1 T place4 F place4 = 1 T place4half F place4 = 0.5 T voice F voice = 1 T nas F nas = 1 T stop F stop = 1 T glide F glide = 1 T lat F lat = 1 T fric F fric = 1 T trill F trill = 1 T high F high = 1 T distr F distr = 1 T syll F syll = 1 T apic F apic = 1 ######################################################################### INDELS ######## # # Makes indels comparable with a neutral feature-vector that more or less # represents silent and neutral. # Basically works quite like a 'both' token in how comparison is done. # T both czero vzero adv1 adv2 high1 high2half roundhalf place1 place2 place3 place4 F vInDel = 1 F cInDel = 1 # These last two are here to give an inherent bast cost for being a vowel # or a consonant ######################################################################### TOKENS ######## H i T vowel vzero high1 high2 high3 #under both also had: czero place1 place2 voice glide high H y T vowel vzero high1 high2 high3 round H 1 T vowel vzero adv1 adv2 adv3 high2 high3 long1 H } T vowel vzero adv1 adv2 high1 high2 high3 round H M T vowel vzero adv1 adv2 adv3 adv4 high1 high2 high3 H u T vowel vzero adv1 adv2 adv3 adv4 high1 high2 high3 round voice glide distr H I T vowel vzero high1 high2 high3half H Y T vowel vzero high1 high2 high3half round H U T vowel vzero adv1 adv2 adv3 adv4 high1 high2 high3half round H e T vowel vzero high1 high2 H 2 T vowel vzero high1 high2 round H @\ T vowel vzero adv1 adv2 high1 high2 H 8 T vowel vzero adv1 adv2 high1 high2 round H 7 T vowel vzero adv1 adv2 adv3 adv4 high1 high2 H o T vowel vzero adv1 adv2 adv3 adv4 high1 high2 round H @ T vowel vzero adv1 adv2 high1 high2half roundhalf H E T vowel vzero high1 H 9 T vowel vzero high1 round H 3 T vowel vzero adv1 adv2 high1 H 3\ T vowel vzero adv1 adv2 high1 round H V T vowel vzero adv1 adv2 adv3 adv4 high1 H O T vowel vzero adv1 adv2 adv3 adv4 high1 round H { T vowel vzero high1half H 6 T vowel vzero adv1 adv2 high1half H a T vowel vzero H & T vowel vzero round H A T vowel vzero adv1 adv2 adv3 adv4 H Q T vowel vzero adv1 adv2 adv3 adv4 round H p T consonant czero stop distr H b T consonant czero voice stop distr H t T consonant czero place1 stop H d T consonant czero place1 voice stop H t` T consonant czero place1 place2half stop H d` T consonant czero place1 place2half voice stop H c T consonant czero place1 place2 stop high distr H J\ T consonant czero place1 place2 voice stop high distr H k T consonant czero place1 place2 place3 stop high H g T consonant czero place1 place2 place3 voice stop high H q T consonant czero place1 place2 place3 stop H G\ T consonant czero place1 place2 place3 voice stop H ? T consonant czero place1 place2 place3 place4 stop H m T consonant czero voice nas distr H F T consonant czero voice nas H n T consonant czero place1 voice nas H n` T consonant czero place1 place2half voice nas H J T consonant czero place1 place2 voice nas high distr H N T consonant czero place1 place2 place3 voice nas high H N\ T consonant czero place1 place2 place3 voice nas H B\ T consonant czero voice trill distr H r T consonant czero place1 voice trill H R\ T consonant czero place1 place2 place3 voice trill H 4 T consonant czero place1 voice H r` T consonant czero place1 place2half voice H p\ T consonant czero fric distr H B T consonant czero voice fric distr H f T consonant czero fric H v T consonant czero voice fric H T T consonant czero place1half fric H D T consonant czero place1half voice fric H s T consonant czero place1 fric H z T consonant czero place1 voice fric H S T consonant czero place1 fric high distr H Z T consonant czero place1 voice fric high distr H s` T consonant czero place1 place2half fric H z` T consonant czero place1 place2half voice fric H C T consonant czero place1 place2 fric high distr H j\ T consonant czero place1 place2 voice fric high distr H x T consonant czero place1 place2 place3 fric high H G T consonant czero place1 place2 place3 voice fric high H X T consonant czero place1 place2 place3 fric H R T consonant czero place1 place2 place3 voice fric H X\ T consonant czero place1 place2 place3 place4half fric H ?\ T consonant czero place1 place2 place3 place4half voice fric H h T consonant czero place1 place2 place3 place4 fric H h\ T consonant czero place1 place2 place3 place4 voice fric H K T consonant czero place1 lat fric H K\ T consonant czero place1 voice lat fric H w T consonant czero round voice glide distr # under both alzo vzero adv1 adv2 adv3 adv4 high1 high2 high3 H P T consonant czero voice glide H v\ T consonant czero voice glide H r\ T consonant czero place1 voice glide H r\` T consonant czero place1 place2half voice glide H j T consonant czero place1 place2 voice glide high #under both, also had czero high1 high2 high3 place1 place2 voice glide high H M\ T consonant czero place1 place2 place3 voice glide high H l T consonant czero place1 voice lat H l` T consonant czero place1 place2half voice lat H L T consonant czero place1 place2 voice lat high distr H L\ T consonant czero place1 place2 place3 voice lat high #This is a hack to introduce a certain 65535 indel, so that # neither features or leven will rescale. This is simply # a string that will never exist as real input and has a # large distance with respect to everything. H #** F type = 4 # Length modifiers. This is a slightly odd distance system, # summarized in this distance matrix: # es s hl # s 1 # hl 1 1 # l 3 2 1 #extra short M _X F long1 = 1 #normal length, aka 'short': stays zero #half-long M :\ F long2 = 1 #long M : F long2 = 1 F long3 = 1 M ~ F nasal = 1 MI _L MI _H MI _= MI _~ MI _t MI _q HI H