# written by Bart Alewijnse # Consonant-vowel opposition with simple diacritics (X-SAMPA) # configuration file written in April 2005 for Charlotte Gooskens # # # Tokens are either consonant or vowel: # identical token -> 0 (inherently) # # vowel, other vowel -> 0.5 # consonant, other consonant -> 0.5 # # consonant, vowel -> 1.0 # vowel, consonant -> 1.0 # # # Diacritics are divided into 'retroflex', '~' (nasal), 'length' and 'other'. # Retroflex ("`") and ~ ("~") cost 0.25 # (technically 0.25 difference per use, but it can probably be assumed # they are never used to modify a single phoneme twice) # Length is handled so that : and :\ are identical (0.25 to unaltered length) # _X is also 0.25 to unaltered length, and 0.5 to : and to :\ # 'Other' is treated as a unification (in a bitmask) of all other diacritics # if the set of the diacritics in here is *different*, it adds 0.25. # # # ? is special: in substitution it counts as a regular consonant, but # insertion and deleting is cheaper (0.5; for other charactrs it is 1.0) # some advanced things are ignored (see HI lines) # # This model was tweaked to allow for some structural transcription errors in # the data it was designed for; it recognizes some modifiers that do not exist # as the ones they were meant as. # ##### DEFINES ##### VERSION 2 TOP 65535 METHOD SUM TOKENSTRING RAW START 0 SUBSTMAX 1 #INDEL 1 ##### FEATURES ##### #type defdif weight label B 0 0.5 type D 0 0.5 idinherent #for indels. Only defined on the neutral indels vector, so it always applies; it's the minimum cost. B 1 0.25 always B 1 0.25 nonletter N 0 1 length N 0 1 retroflex N 0 1 ~ # The rest of the diacritics are treated like: # 'when there is a different set of diacritics, there is a +0.25 cost in the comparison' B 1 0.25 diac #bit of a hack, this. N 0 0.001 little ##### TEMPLATES ##### T vowel F type = 1 F length = 0 F retroflex = 0 F ~ = 0 F diac = 1 T consonant F type = 2 F length = 0 F retroflex = 0 F ~ = 0 F diac = 1 ##### INDELS ##### #1, except for ?. F always = 1 F idinherent = 0 F nonletter = 1 F diac = 0 ##### TOKENS ##### H i T vowel F idinherent = 1 H y T vowel F idinherent = 2 H 1 T vowel F idinherent = 3 H } T vowel F idinherent = 4 H M T vowel F idinherent = 5 H u T vowel F idinherent = 6 H I T vowel F idinherent = 7 H Y T vowel F idinherent = 8 H U T vowel F idinherent = 9 H e T vowel F idinherent = 10 H 2 T vowel F idinherent = 11 H @\ T vowel F idinherent = 12 H 8 T vowel F idinherent = 13 H 7 T vowel F idinherent = 14 H o T vowel F idinherent = 15 H @ T vowel F idinherent = 16 H E T vowel F idinherent = 17 H 9 T vowel F idinherent = 18 H 3 T vowel F idinherent = 19 H 3\ T vowel F idinherent = 20 H V T vowel F idinherent = 21 H O T vowel F idinherent = 22 H { T vowel F idinherent = 23 H 6 T vowel F idinherent = 24 H a T vowel F idinherent = 25 H & T vowel F idinherent = 26 H A T vowel F idinherent = 27 H Q T vowel F idinherent = 28 H p T consonant F idinherent = 29 H b T consonant F idinherent = 30 H t T consonant F idinherent = 31 H d T consonant F idinherent = 32 H c T consonant F idinherent = 35 H J\ T consonant F idinherent = 36 H k T consonant F idinherent = 37 H g T consonant F idinherent = 38 H q T consonant F idinherent = 39 H G\ T consonant F idinherent = 40 H m T consonant F idinherent = 42 H F T consonant F idinherent = 43 H n T consonant F idinherent = 44 H J T consonant F idinherent = 46 H N T consonant F idinherent = 47 H N\ T consonant F idinherent = 48 H B\ T consonant F idinherent = 49 H r T consonant F idinherent = 50 H R\ T consonant F idinherent = 51 H 4 T consonant F idinherent = 52 H p\ T consonant F idinherent = 54 H B T consonant F idinherent = 55 H f T consonant F idinherent = 56 H v T consonant F idinherent = 57 H T T consonant F idinherent = 58 H D T consonant F idinherent = 59 H s T consonant F idinherent = 60 H z T consonant F idinherent = 61 H S T consonant F idinherent = 62 H Z T consonant F idinherent = 63 H C T consonant F idinherent = 66 H j\ T consonant F idinherent = 67 H x T consonant F idinherent = 68 H G T consonant F idinherent = 69 H X T consonant F idinherent = 70 H R T consonant F idinherent = 71 H X\ T consonant F idinherent = 72 H ?\ T consonant F idinherent = 73 H h T consonant F idinherent = 74 H h\ T consonant F idinherent = 75 H K T consonant F idinherent = 76 H K\ T consonant F idinherent = 77 H w T consonant F idinherent = 78 H P T consonant F idinherent = 79 H v\ T consonant F idinherent = 79 H r\ T consonant F idinherent = 81 H j T consonant F idinherent = 83 H M\ T consonant F idinherent = 84 H l T consonant F idinherent = 85 H L T consonant F idinherent = 87 H L\ T consonant F idinherent = 88 #Listed as 'Other Symbols' on the IPA chart H l\ T consonant F idinherent = 90 H x\ T consonant F idinherent = 91 H H T consonant F idinherent = 92 #A hack to have a very-little-cost (essentially none through the int encoding) word, #to avoid the fact leven won't accept empty words. H $$ F type = 3 F little = 0.01 # This is a hack to introduce a certain 65535 indel, so that # neither features or leven will rescale. This is simply # a string that will never exist as real input and has a # large distance with respect to everything. H #** F type = 4 #long M : F length + 0.25 #lengths aren't handled particularly smartly M :\ F length + 0.25 #extra short M _X F length - 0.25 M ` F retroflex + 0.25 M ~ F ~ + 0.25 # diacritics (see also FEATURES section) use a D (equal/nonequal judgement) # in an integer-encoding way. This assumes the same diacritic isn't used # twice on the same letter. M _k F diac - 1 F diac + 2 M _h F diac - 1 F diac + 4 #transcription error, apparently M _i F diac - 1 F diac + 4 M _v F diac - 1 F diac + 8 #transcription error, apparently M _P F diac - 1 F diac + 8 M _O F diac - 1 F diac + 16 M _0 F diac - 1 F diac + 16 M _G F diac - 1 F diac + 32 M _j F diac - 1 F diac + 64 M _- F diac - 1 F diac + 128 M _w F diac - 1 F diac + 256 #transcription error, apparently M _W F diac - 1 F diac + 256 M _? F diac - 1 F diac + 512 H ? F nonletter = 1 F type = 2 #tiebar - ignore and see letters as two (for now?) HI __ HI % HI "" HI " MI _= MI = MI _^ MI _@