Words
Introduction to Linguistics for Computational Linguists
Word Structure
- Morphology - Study of Word Structure
- Morpheme - minimal unit of meaning
- Free vs. Bound Morphemes
- Stem vs. Affix
- Inflection vs. Derivation
- Formal Operations, Structure
- Compounding
- Lexicon, Size
Complex Words
- Wiedervereingungen
- Wiedervereingung - en
- Frau-en, Kirche-en, Schule-en, Muskel-en
- wiedervereinig - ung
- Regier-ung, Sitz-ung, Vorles-ung, Reif-ung, Üb-ung
- wieder - vereinig-
- wieder-sehen, wieder-holen, wieder-schreiben, wieder-taufen
- ver - einig-
- ver-dunkel(n), ver-deutlich(en), ver-grösser(n), ver-allgemeiner(n)
- ein -ig
- witz-ig, körn-ig, salz-ig, farb-ig, wicht-ig,
- ein
Morphemes
- Morpheme - minimal meaningful unit
- Wiedervereingungen
- wieder- ver- ein -ig -ung -en
- some meanings” very abstract
- wie - der ?
- Both meaningful, but not parts of wieder-
- We look for a regular process of combination, other examples of putative process
Free vs. Bound
- Free morphemes can stand alone, bound morphemes appear only as parts of words
- Test: somewhere in sentence, not necessarily as one-word answer to question
- Morphemes in Wiedervereingungen
- ver-, -ig, -ung, & -en are bound
- ein is free
- wieder- thought of as bound, not the same as independent use in ‘Morgen wieder?’
Bound Morphemes
- May be “basic”
- -kunft in Auskunft, Herkunft, Zukunft, Einkünfte
- No longer used independently (freely)
- Sometimes have little meaning (but always meaningful, like all morphemes)
- “Fügungs” /-s/
- Regierungspartei, Prüfungsangst
- no word *Regierungs or *Prüfungs
Design Problem
- 104 -105 words in open-ended vocabulary
- Phonological systems creates forms for symbols (morphemes) to which meaning is attached.
- Symbols manipulated in syntax
- explosion in expressiveness (later lectures)
- But why is the intermediate level (word) needed?
Duality of Patterning
- Hockett noted that all languages are “dual” in how they’re structured
- Sentences, phrases are combinations of words
- Words are combinations of sounds (phonemes)
- But there’s also a third level, morphemes, intermediate between words and sounds.
Phrases (& Words), Morphemes & Phonemes
some phonemes omitted (to simplify)
Stem vs. Affix
- Affixes are added to stems (aka roots or bases)
- Wiedervereingungen
- affixes: wieder- ver- -ig -ung -en; stem: ein
- Affixes always bound, stems normally free
- Affixes before stem -- prefix, after -- suffix, around -- circumfix
- prefixes: wieder- ver-
- suffixes: -ig -ung -en
- circumfix: ge- plan- -t
Morpheme Variants
- German present, 3rd-person, singular <-t >
- Two allomorphs /-t/ and /-?t/
- latter after stems ending in /d/ or /t/
Inflections
- Inflectional Morphology varies the form of a word
- no new word, just a variant of an existing one
- les + ɚnd-pl -t> = lest ‘[You] read’
- Derivational Morphology creates new words from old
- more than just a variant
- les + <-bar> = lesbar ‘readable’
“Rich” Inflectional System
- German is inflectionally rich compared to English, Dutch, Frisian, Scandinavian
- Nouns distinguish:
- two numbers (singular and plural)
- (in traces) four cases (nominative, genitive, dative, accusative)
- Adjectives, determiners have all noun distinctions plus:
- three genders (masculine, feminine & neuter)
- two declensions (strong and weak)
- ein grünes Haus vs. das grüne Haus
German Inflection, cont.
- Verbs distinguish two tenses, two moods, three persons, and two numbers
- plus three nonfinite forms (two participles, one infinitive)
-
- Lots of overlap in form, however
- laufen can be 1st/3rd person plural or infinitive
- Hand can be any singular case
Example Paradigm
Paradigm - collection of inflectionally related forms
Richer Systems
- Finnish has over twenty nominal cases
- Latin has four tenses plus inflectionally marked passive
- Possibly thousands of inflected forms per word
Inflection vs. Derivation
- Derivation creates new words, often of different syntactic category
- A/V les - + <bar>
- V/V <wieder> + vereinig-
- N/V regier- + <ung>
- A/N Mann + <lich> (männlich)
- A/N Witz + <ig>
- V/A <ver> + deutlich-
Derivational Results
--Natural hierarchical structure
Alternative Groupings
--notice dual role of <un->
Inflection vs. Derivation
- New variants of existing words
- Required by syntax, e.g., 3rd-sg subject
- No change in part of speech
- No interaction with syntax
- May change part of speech, e.g., A/N
New Variants
- Inflectional variants “count as the same” in grammatical rules
- Ich schreibe an Gabi, und Du schreibst an Karin
- Ich schreibe an Gabi, und Du ? an Karin
- Derivational results are always new words
- Es ist nicht deutlich, aber man kann ’s verdeutlichen
- *Es ist nicht deutlich, aber man kann ’s (ver- ? )
Productivity
- Some inflectional variants are missing, but few in comparison to derivation
- No perfect participle for scheinen used as auxiliary
- Sie schien zu laufen
- Sie hat zu laufen geschienen/scheinen
- Lots of “gaps” in derivation
- <ver> + Adj (verdeutlichen, etc.) does not apply to most adjectives (hübsch, viereckig, krank, mutig, …)
Regularity
- Inflection can change meaning, but regularly
- past tense, plural change meanings
- Derivational meanings often “shift”
- <wieder-> + V ‘do V again’, but wiederholen?
- V + <-ung> ‘event, result of V-ing’ but Sitzung?
- Inflection/Derivation still controversial
Formal Operations
- Not limited to attachment of affixes
- Umlaut in plural, derivation
- Umlaut changes [back]? [front], retaining rounding
- u/ü [u/y], o/ö [o/?] a/ä [a/e]
- [back,closed,round] (u) ? [front, closed,round] (y)
- Buch/Bücher; Loch/Löcher; Angst/ängstlich
- Historically vowel harmony with following [i]
- Ablaut vowel alternation
- laufen, liefen; helfen, half, geholfen; singen, sangen, gesungen
Other Formal Means
--Semitic consonantal roots are filled variously in morphology
Borderline Cases
- Inflected Prepositions
- German zum/zur/beim/ins/fürs/ans
- French du/des/au/aux -- de le, de les, a le, a les
- If analyzed as one word, they would be in an otherwise non-existing category
- (zu (der Schule)) (bei (dem Feld))
- Aka “portmanteau words” -- two-in-one
Separable Prefixes
- Genuine prefix or two words?
- Mir kam einer entgegen / ist entgegengekommen
- Occasionally irregular semantics, missing productivity
- vor-schreiben, nach-tragen, wieder-holen, vor-lesen
- mitkriegen ? mitbekommen, kriegen = bekommen
Clitics
- Affixes or words?
- Affixes are parts of words, cannot stand alone (bound morphemes), words are free
- Pronouns are words:
- Ich kann ihm und will ihm helfen
- Ich kann und will ihm helfen
- -st is an affix
- Du kannst und willst Tom helfen
- *Du kann- und willst Tom helfen
Clitics
- English auxiliaries simple clitics
- Mary and I’ll leave soon
- Tom and his wife’ve left already
- French clitics affix-like
- Marie l’ai lu et l’ai compris
Clitics
- Affixes or words?
- Ich kann’s und will’s nicht begreifen
- Ich kann und will’s nicht begreifen
- Infinitival zu
- Er versprach, zu helfen und zu arbeiten
- *Er versprach, zu helfen und arbeiten
English Possessive
- No normal affix
- The king of England’s hat
- Seems to apply to phrases, not words
- “Special” forms (e.g., ?) after plural, etc.
- The boys’ hats [??.b?Iz.h?ts] *[??.b?Iz.?z.h?ts]
- His kid’s toy is here, hers’ is over there.
Compounding
- Combining two free words to make a third
- Donaudampfschifffahrtsgesellschaft
- Increases number of words enormously
- Only unpredictable compounds in dictionaries
- Introduces more question of associativity:
- Softwarevirusprüfung Softwarevirusprüfung
How Many Words are there?
- Size of dictionaries (Miller, p.135)
Computational Morphology
- Lexicography (collecting words) -- entirely computational
- Models of affixation as finite automata
- Lexical structure as inheritance networks
- lexical properties inherited (as in O-O design)
- morphological, syntactic properties
- semantic (hyponym, synonym): WordNet
Word Structure
- Morphology - Study of Word Structure
- Morpheme - minimal unit of meaning
- Free vs. Bound Morphemes
- Stem vs. Affix
- Prefix/Suffix/Circumfix; Allomorphy
- Inflection vs. Derivation: word forms vs. novelty
- Formal operations varied
- Compounding creates words
- Lexicons