Exercise 1

General remarks

Exercise

Produce an FSA macro-file containing macros for A simple example issyllable.pl. You can take this as a start and replace the definitions by something more adequate...

Macro's and auxiliary files

Macro's can be loaded by starting fsa (fsa tkconsol=on -tk), and going to the menu File and choosing LoadAux or Reconsult Aux. Select the corresponding filename (my_macros.pl) in the resulting box. After that you can use the macros you have just defined as if they were a regular expression. Thus, if a macro 'vowel' is defined, you can type 'vowel' in the Regexp line. This expression will be translated as the regular expression in the definition of the macro.

TESTING

After making the definitions, and checking them in fsa, you can test your work in two ways. This requires the following files:

Test 1: Recognize `foreign' words.

The file 'monosyll' consists of a list of 5890 words of the form
[consonant*, vowel+, consonant*].  The Unix command

make not_accepted

produces a file `not_accepted' which contains all words not recognized by 'syllable'. This list should only contain words which consist of more than a single syllable (aaien, beiaard,...) and non-native words (back, blues,...).

Test 2: Hyphenating simple words

The file dol.mono.stem contains a list of 12628 mono-morphemic (non-compound) words. The command

make hyphen_errors

produces a file hyphen_errors which contains all wrongly hyphenated (1st column, 2nd column = correct patterns ), and gives the percentage of correctly hyphenated words.

NOTA BENE

  1. Check your definitions before testing, for instance by loading them in fsa and trying out some examples.
  2. The unix command make produces files as defined in a 'Makefile'. Rerunning a test sometimes leads to the message 'File up to date'. In those cases, just remove File and run make again. If you want to start all over, do 'make clean' : this removes all files made by make.

Reporting Results

  1. Mail the file syllable.pl and a brief report to your lab-assistent.
  2. In the report you should give the results of test 1 (how many words are not recognized, which kind of words are not recognized?) and of test 2 (how many mistakes? what kind of mistakes?)
  3. Send your results to m.b.villada@let.rug.nl

Deadline: Thursday, April, 17

Good luck!

Gosse.

p.s. A first try gave 22% unaccepted words for test 1, and 10% errors for test 2.....