Werkoverleg met GvN.
Taken:
Alle programma's (earley, alpeur, ape, ...) aanpassen: direct met woordgroepen werken, zonder regel in te voegen. Niet gedaan, maar zie onder.
Nieuwe grammatica testen. Gedaan.
- met/zonder POS-nodes.
- bekende/onbekende data.
Evaluatie (eval): ook fail/unknown meetellen. Gedaan.
- Nog nieuwere grammatica testen.
Bij 1.1
Probleem was: parsen met woordcategorieën gegeven door Alpino kon niet voor "woorden" van meerdere woorden. Er zouden bij elke zin extra regels toegevoegd moeten worden. Oplossing zou zijn om de parser geschikt te maken om altijd direct met zulke woorden met spaties te werken.
Dit blijkt niet eenvoudig. Meerdere delen van de parser zouden ingewikkelder worden.
Aan de andere kant bleek het mogelijk te zijn om met een kleine aanpassing toch met Alpino's woordcategorieën voor woordgroepen te werken, zonder extra regels.
Bug in programma: inlezen van waarschijnlijkheden van regels ging mis. Dus alle oude metingen zijn incorrect.
Bij 1.2
1000 bekende zinnen op 9 delen. (Zinnen uit dezelfde negen delen als waaruit de grammatica komt.)
Gewoon Zonder POS-nodes Met POS-nodes tijd: 4h24 (vingolf) tijd: 9u30 (vingolf) geheugen: 3.4 Gb geheugen: 5.2 Gb Precision Recall Crossing brackets Precision Recall Crossing brackets Min. :0.3793 Min. :0.3906 Min. :0.00000 Min. :0.1892 Min. :0.2000 Min. :0.00000 1st Qu.:0.8641 1st Qu.:0.8667 1st Qu.:0.00000 1st Qu.:0.8384 1st Qu.:0.8203 1st Qu.:0.00000 Median :0.9756 Median :0.9762 Median :0.00000 Median :0.9365 Median :0.9129 Median :0.00000 Mean :0.9183 Mean :0.9182 Mean :0.01737 Mean :0.8946 Mean :0.8723 Mean :0.01795 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.02217 3rd Qu.:1.0000 3rd Qu.:0.9714 3rd Qu.:0.02117 Max. :1.0000 Max. :1.0000 Max. :0.37500 Max. :1.0000 Max. :1.0000 Max. :0.36842
POS door Alpino Met POS-nodes tijd: 2u37 (vingolf) geheugen: 2.7 Gb OK only Precision Recall Crossing brackets Min. :0.1818 Min. :0.1048 Min. :0.00000 1st Qu.:0.7994 1st Qu.:0.7787 1st Qu.:0.00000 Median :0.9202 Median :0.8984 Median :0.00000 Mean :0.8761 Mean :0.8523 Mean :0.02058 3rd Qu.:1.0000 3rd Qu.:0.9688 3rd Qu.:0.02827 Max. :1.0000 Max. :1.0000 Max. :0.30769 OK + FAIL + UNKNOWN Precision Recall Crossing brackets Min. :0.0000 Min. :0.0000 Min. :0.00000 1st Qu.:0.7880 1st Qu.:0.7682 1st Qu.:0.00000 Median :0.9175 Median :0.8957 Median :0.00000 Mean :0.8621 Mean :0.8387 Mean :0.03625 3rd Qu.:1.0000 3rd Qu.:0.9681 3rd Qu.:0.03137 Max. :1.0000 Max. :1.0000 Max. :1.00000 Fail: 1.6%
1000 onbekende zinnen op 9 delen. (Zinnen uit een ander deel als waaruit de grammatica komt.)
Gewoon Zonder POS-nodes Met POS-nodes tijd: 1h33 (vingolf) tijd: 3u11 (vingolf) geheugen: 3.3 Gb geheugen: 5.5 Gb OK only OK only Precision Recall Crossing brackets Precision Recall Crossing brackets Min. :0.2300 Min. :0.2593 Min. :0.00000 Min. :0.3000 Min. :0.3846 Min. :0.00000 1st Qu.:0.5758 1st Qu.:0.6316 1st Qu.:0.00000 1st Qu.:0.7059 1st Qu.:0.7024 1st Qu.:0.00000 Median :0.8148 Median :0.8333 Median :0.00000 Median :0.8415 Median :0.8318 Median :0.00000 Mean :0.7780 Mean :0.7940 Mean :0.03776 Mean :0.8291 Mean :0.8201 Mean :0.02276 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.06000 3rd Qu.:1.0000 3rd Qu.:0.9667 3rd Qu.:0.03890 Max. :1.0000 Max. :1.0000 Max. :0.28767 Max. :1.0000 Max. :1.0000 Max. :0.20000 OK + FAIL + UNKNOWN OK + FAIL + UNKNOWN Precision Recall Crossing brackets Precision Recall Crossing brackets Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.03774 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 Median :0.0000 Median :0.0000 Median :1.00000 Median :0.0000 Median :0.0000 Median :1.0000 Mean :0.2871 Mean :0.2930 Mean :0.64493 Mean :0.3938 Mean :0.3896 Mean :0.5358 3rd Qu.:0.6667 3rd Qu.:0.6864 3rd Qu.:1.00000 3rd Qu.:0.8262 3rd Qu.:0.8209 3rd Qu.:1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Fail: 14.7% Fail: 4.1% Unknown: 48.4% Unknown: 48.4%
Verwerking met gokken volgens methode 2. Gebruikt veel geheugen, dus gedraaid op zardoz. Zonder POS-nodes Met POS-nodes tijd: 17h24 (zardoz) tijd: na 35 uur en 844 zinnen is zardoz gecrasht geheugen: 23.8 Gb geheugen: 40.5 Gb OK only OK only Precision Recall Crossing brackets Precision Recall Crossing brackets Min. :0.07792 Min. :0.08163 Min. :0.00000 Min. :0.08163 Min. :0.1111 Min. :0.00000 1st Qu.:0.47214 1st Qu.:0.46529 1st Qu.:0.00000 1st Qu.:0.63793 1st Qu.:0.6136 1st Qu.:0.00000 Median :0.66045 Median :0.64286 Median :0.04348 Median :0.77551 Median :0.7374 Median :0.01242 Mean :0.66392 Mean :0.64909 Mean :0.07298 Mean :0.76721 Mean :0.7273 Mean :0.03714 3rd Qu.:0.88091 3rd Qu.:0.83580 3rd Qu.:0.12245 3rd Qu.:0.93443 3rd Qu.:0.8772 3rd Qu.:0.05769 Max. :1.00000 Max. :1.00000 Max. :0.45161 Max. :1.00000 Max. :1.0000 Max. :0.35897 OK + FAIL + UNKNOWN OK + FAIL + UNKNOWN Precision Recall Crossing brackets Precision Recall Crossing brackets Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.00000 1st Qu.:0.4211 1st Qu.:0.4217 1st Qu.:0.00000 1st Qu.:0.6374 1st Qu.:0.6115 1st Qu.:0.00000 Median :0.6250 Median :0.6140 Median :0.05556 Median :0.7754 Median :0.7367 Median :0.01274 Mean :0.6055 Mean :0.5920 Mean :0.15456 Mean :0.7645 Mean :0.7247 Mean :0.04056 3rd Qu.:0.8381 3rd Qu.:0.8110 3rd Qu.:0.16141 3rd Qu.:0.9343 3rd Qu.:0.8770 3rd Qu.:0.05776 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Fail: 8.8% Fail: 0.4%
POS door Alpino Met POS-nodes tijd: 2u27 (vingolf) geheugen: 3.1 Gb OK only Precision Recall Crossing brackets Min. :0.1771 Min. :0.06195 Min. :0.00000 1st Qu.:0.7073 1st Qu.:0.69129 1st Qu.:0.00000 Median :0.8222 Median :0.80833 Median :0.00000 Mean :0.8161 Mean :0.79812 Mean :0.02805 3rd Qu.:1.0000 3rd Qu.:0.95321 3rd Qu.:0.04662 Max. :1.0000 Max. :1.00000 Max. :0.37037 OK + FAIL + UNKNOWN Precision Recall Crossing brackets Min. :0.0000 Min. :0.0000 Min. :0.00000 1st Qu.:0.6610 1st Qu.:0.6499 1st Qu.:0.00000 Median :0.8043 Median :0.7888 Median :0.01058 Mean :0.7434 Mean :0.7271 Mean :0.11456 3rd Qu.:0.9829 3rd Qu.:0.9417 3rd Qu.:0.06250 Max. :1.0000 Max. :1.0000 Max. :1.00000 Fail: 8.9%
Bij 1.3
Als er geen parse is neem ik deze waarden:
Precision: 0 (eigenlijk 0/0)
Recall: 0 (eigenlijk 0/x)
Crossing brackets: 1 (eigenlijk 0/0)
Bij 1.4
Nieuwe data
1000 bekende zinnen op 9 delen.
Gewoon Zonder POS-nodes Met POS-nodes tijd: 3u39 (zardoz) tijd: 7u57 (zardoz) geheugen: 3.4 Gb geheugen:5.4 Gb Precision Recall Crossing brackets Precision Recall Crossing brackets Min. :0.3793 Min. :0.4355 Min. :0.00000 Min. :0.1892 Min. :0.2000 Min. :0.00000 1st Qu.:0.8627 1st Qu.:0.8648 1st Qu.:0.00000 1st Qu.:0.8417 1st Qu.:0.8241 1st Qu.:0.00000 Median :0.9762 Median :0.9767 Median :0.00000 Median :0.9375 Median :0.9148 Median :0.00000 Mean :0.9187 Mean :0.9187 Mean :0.01715 Mean :0.8968 Mean :0.8745 Mean :0.01733 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.02180 3rd Qu.:1.0000 3rd Qu.:0.9717 3rd Qu.:0.02020 Max. :1.0000 Max. :1.0000 Max. :0.37500 Max. :1.0000 Max. :1.0000 Max. :0.36842
POS door Alpino Met POS-nodes tijd: 2u08 (zardoz) geheugen: 2.7 Gb OK only Precision Recall Crossing brackets Min. :0.1818 Min. :0.1048 Min. :0.0000 1st Qu.:0.8042 1st Qu.:0.7828 1st Qu.:0.0000 Median :0.9235 Median :0.9000 Median :0.0000 Mean :0.8787 Mean :0.8547 Mean :0.0200 3rd Qu.:1.0000 3rd Qu.:0.9706 3rd Qu.:0.0274 Max. :1.0000 Max. :1.0000 Max. :0.3077 OK + FAIL + UNKNOWN Precision Recall Crossing brackets Min. :0.0000 Min. :0.0000 Min. :0.00000 1st Qu.:0.7970 1st Qu.:0.7714 1st Qu.:0.00000 Median :0.9210 Median :0.8989 Median :0.00000 Mean :0.8664 Mean :0.8428 Mean :0.03372 3rd Qu.:1.0000 3rd Qu.:0.9693 3rd Qu.:0.02952 Max. :1.0000 Max. :1.0000 Max. :1.00000 Fail: 1.4%
1000 onbekende zinnen op 9 delen.
Gewoon Zonder POS-nodes Met POS-nodes tijd: 1u19 (zardoz) tijd: 12u08 (zardoz) geheugen: 3.3 Gb geheugen: 122.5 Gb WAAROM ZO VEEL ??? OK only OK only Precision Recall Crossing brackets Precision Recall Crossing brackets Min. :0.1224 Min. :0.2593 Min. :0.00000 Min. :0.3000 Min. :0.4286 Min. :0.00000 1st Qu.:0.5914 1st Qu.:0.6316 1st Qu.:0.00000 1st Qu.:0.7070 1st Qu.:0.7036 1st Qu.:0.00000 Median :0.8182 Median :0.8333 Median :0.00000 Median :0.8462 Median :0.8354 Median :0.00000 Mean :0.7799 Mean :0.7958 Mean :0.03778 Mean :0.8306 Mean :0.8219 Mean :0.02252 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.06136 3rd Qu.:1.0000 3rd Qu.:0.9673 3rd Qu.:0.03803 Max. :1.0000 Max. :1.0000 Max. :0.28767 Max. :1.0000 Max. :1.0000 Max. :0.20000 OK + FAIL + UNKNOWN OK + FAIL + UNKNOWN Precision Recall Crossing brackets Precision Recall Crossing brackets Min. :0.0000 Min. :0.0000 Min. :0.00000 Min. :0.0000 Min. :0.0000 Min. :0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.03774 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 Median :0.0000 Median :0.0000 Median :1.00000 Median :0.0000 Median :0.0000 Median :1.0000 Mean :0.2870 Mean :0.2928 Mean :0.64590 Mean :0.3945 Mean :0.3904 Mean :0.5357 3rd Qu.:0.6667 3rd Qu.:0.6853 3rd Qu.:1.00000 3rd Qu.:0.8269 3rd Qu.:0.8210 3rd Qu.:1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000 Max. :1.0000 Max. :1.0000 Max. :1.0000 Fail: 14.8% Fail: 4.1% Unknown: 48.4% Unknown: 48.4%
Verwerking met gokken volgens methode 2. Zonder POS-nodes Met POS-nodes tijd: (zardoz) tijd: (zardoz) geheugen: geheugen:
POS door Alpino Met POS-nodes tijd: 2u09 (zardoz) geheugen: 2.6 Gb OK only Precision Recall Crossing brackets Min. :0.1771 Min. :0.06195 Min. :0.00000 1st Qu.:0.7072 1st Qu.:0.69426 1st Qu.:0.00000 Median :0.8242 Median :0.80851 Median :0.00000 Mean :0.8169 Mean :0.79917 Mean :0.02772 3rd Qu.:1.0000 3rd Qu.:0.95276 3rd Qu.:0.04545 Max. :1.0000 Max. :1.00000 Max. :0.37037 OK + FAIL + UNKNOWN Precision Recall Crossing brackets Min. :0.0000 Min. :0.0000 Min. :0.00000 1st Qu.:0.6610 1st Qu.:0.6500 1st Qu.:0.00000 Median :0.8070 Median :0.7914 Median :0.01015 Mean :0.7450 Mean :0.7288 Mean :0.11328 3rd Qu.:0.9822 3rd Qu.:0.9434 3rd Qu.:0.06178 Max. :1.0000 Max. :1.0000 Max. :1.00000 Fail: 8.8%