[PetersWerkWiki] [TitleIndex] [WordIndex

Werkoverleg met GvN.

Taken:

  1. EarleyParser

    1. Alle programma's (earley, alpeur, ape, ...) aanpassen: direct met woordgroepen werken, zonder regel in te voegen. Niet gedaan, maar zie onder.

    2. Nieuwe grammatica testen. Gedaan.

      • met/zonder POS-nodes.
      • bekende/onbekende data.
    3. Evaluatie (eval): ook fail/unknown meetellen. Gedaan.

    4. Nog nieuwere grammatica testen.


Bij 1.1

Probleem was: parsen met woordcategorieën gegeven door Alpino kon niet voor "woorden" van meerdere woorden. Er zouden bij elke zin extra regels toegevoegd moeten worden. Oplossing zou zijn om de parser geschikt te maken om altijd direct met zulke woorden met spaties te werken.

Dit blijkt niet eenvoudig. Meerdere delen van de parser zouden ingewikkelder worden.

Aan de andere kant bleek het mogelijk te zijn om met een kleine aanpassing toch met Alpino's woordcategorieën voor woordgroepen te werken, zonder extra regels.


Bug in programma: inlezen van waarschijnlijkheden van regels ging mis. Dus alle oude metingen zijn incorrect.

Bij 1.2

1000 bekende zinnen op 9 delen. (Zinnen uit dezelfde negen delen als waaruit de grammatica komt.)

Gewoon

Zonder POS-nodes                                          Met POS-nodes

tijd: 4h24 (vingolf)                                      tijd: 9u30 (vingolf)
geheugen: 3.4 Gb                                          geheugen: 5.2 Gb

   Precision          Recall       Crossing brackets         Precision          Recall       Crossing brackets  
 Min.   :0.3793   Min.   :0.3906   Min.   :0.00000         Min.   :0.1892   Min.   :0.2000   Min.   :0.00000    
 1st Qu.:0.8641   1st Qu.:0.8667   1st Qu.:0.00000         1st Qu.:0.8384   1st Qu.:0.8203   1st Qu.:0.00000    
 Median :0.9756   Median :0.9762   Median :0.00000         Median :0.9365   Median :0.9129   Median :0.00000    
 Mean   :0.9183   Mean   :0.9182   Mean   :0.01737         Mean   :0.8946   Mean   :0.8723   Mean   :0.01795    
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.02217         3rd Qu.:1.0000   3rd Qu.:0.9714   3rd Qu.:0.02117    
 Max.   :1.0000   Max.   :1.0000   Max.   :0.37500         Max.   :1.0000   Max.   :1.0000   Max.   :0.36842    

POS door Alpino

                                                          Met POS-nodes

                                                          tijd: 2u37 (vingolf)
                                                          geheugen: 2.7 Gb

                                                          OK only

                                                             Precision          Recall       Crossing brackets
                                                           Min.   :0.1818   Min.   :0.1048   Min.   :0.00000
                                                           1st Qu.:0.7994   1st Qu.:0.7787   1st Qu.:0.00000
                                                           Median :0.9202   Median :0.8984   Median :0.00000
                                                           Mean   :0.8761   Mean   :0.8523   Mean   :0.02058
                                                           3rd Qu.:1.0000   3rd Qu.:0.9688   3rd Qu.:0.02827
                                                           Max.   :1.0000   Max.   :1.0000   Max.   :0.30769

                                                          OK + FAIL + UNKNOWN

                                                             Precision          Recall       Crossing brackets
                                                           Min.   :0.0000   Min.   :0.0000   Min.   :0.00000
                                                           1st Qu.:0.7880   1st Qu.:0.7682   1st Qu.:0.00000
                                                           Median :0.9175   Median :0.8957   Median :0.00000
                                                           Mean   :0.8621   Mean   :0.8387   Mean   :0.03625
                                                           3rd Qu.:1.0000   3rd Qu.:0.9681   3rd Qu.:0.03137
                                                           Max.   :1.0000   Max.   :1.0000   Max.   :1.00000

                                                          Fail:     1.6%

1000 onbekende zinnen op 9 delen. (Zinnen uit een ander deel als waaruit de grammatica komt.)

Gewoon

Zonder POS-nodes                                          Met POS-nodes

tijd: 1h33 (vingolf)                                      tijd: 3u11 (vingolf)
geheugen: 3.3 Gb                                          geheugen: 5.5 Gb

OK only                                                   OK only

   Precision          Recall       Crossing brackets         Precision          Recall       Crossing brackets
 Min.   :0.2300   Min.   :0.2593   Min.   :0.00000         Min.   :0.3000   Min.   :0.3846   Min.   :0.00000
 1st Qu.:0.5758   1st Qu.:0.6316   1st Qu.:0.00000         1st Qu.:0.7059   1st Qu.:0.7024   1st Qu.:0.00000
 Median :0.8148   Median :0.8333   Median :0.00000         Median :0.8415   Median :0.8318   Median :0.00000
 Mean   :0.7780   Mean   :0.7940   Mean   :0.03776         Mean   :0.8291   Mean   :0.8201   Mean   :0.02276
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.06000         3rd Qu.:1.0000   3rd Qu.:0.9667   3rd Qu.:0.03890
 Max.   :1.0000   Max.   :1.0000   Max.   :0.28767         Max.   :1.0000   Max.   :1.0000   Max.   :0.20000

OK + FAIL + UNKNOWN                                       OK + FAIL + UNKNOWN

   Precision          Recall       Crossing brackets         Precision          Recall       Crossing brackets
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00000         Min.   :0.0000   Min.   :0.0000   Min.   :0.0000
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.03774         1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000
 Median :0.0000   Median :0.0000   Median :1.00000         Median :0.0000   Median :0.0000   Median :1.0000
 Mean   :0.2871   Mean   :0.2930   Mean   :0.64493         Mean   :0.3938   Mean   :0.3896   Mean   :0.5358
 3rd Qu.:0.6667   3rd Qu.:0.6864   3rd Qu.:1.00000         3rd Qu.:0.8262   3rd Qu.:0.8209   3rd Qu.:1.0000
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00000         Max.   :1.0000   Max.   :1.0000   Max.   :1.0000

Fail:    14.7%                                            Fail:     4.1%
Unknown: 48.4%                                            Unknown: 48.4%

Verwerking met gokken volgens methode 2. Gebruikt veel geheugen, dus gedraaid op zardoz.

Zonder POS-nodes                                          Met POS-nodes

tijd: 17h24 (zardoz)                                      tijd: na 35 uur en 844 zinnen is zardoz gecrasht
geheugen: 23.8 Gb                                         geheugen: 40.5 Gb  

OK only                                                   OK only

   Precision           Recall        Crossing brackets       Precision           Recall       Crossing brackets
 Min.   :0.07792   Min.   :0.08163   Min.   :0.00000       Min.   :0.08163   Min.   :0.1111   Min.   :0.00000
 1st Qu.:0.47214   1st Qu.:0.46529   1st Qu.:0.00000       1st Qu.:0.63793   1st Qu.:0.6136   1st Qu.:0.00000
 Median :0.66045   Median :0.64286   Median :0.04348       Median :0.77551   Median :0.7374   Median :0.01242
 Mean   :0.66392   Mean   :0.64909   Mean   :0.07298       Mean   :0.76721   Mean   :0.7273   Mean   :0.03714
 3rd Qu.:0.88091   3rd Qu.:0.83580   3rd Qu.:0.12245       3rd Qu.:0.93443   3rd Qu.:0.8772   3rd Qu.:0.05769
 Max.   :1.00000   Max.   :1.00000   Max.   :0.45161       Max.   :1.00000   Max.   :1.0000   Max.   :0.35897

OK + FAIL + UNKNOWN                                       OK + FAIL + UNKNOWN

   Precision          Recall       Crossing brackets         Precision          Recall       Crossing brackets
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00000         Min.   :0.0000   Min.   :0.0000   Min.   :0.00000
 1st Qu.:0.4211   1st Qu.:0.4217   1st Qu.:0.00000         1st Qu.:0.6374   1st Qu.:0.6115   1st Qu.:0.00000
 Median :0.6250   Median :0.6140   Median :0.05556         Median :0.7754   Median :0.7367   Median :0.01274
 Mean   :0.6055   Mean   :0.5920   Mean   :0.15456         Mean   :0.7645   Mean   :0.7247   Mean   :0.04056
 3rd Qu.:0.8381   3rd Qu.:0.8110   3rd Qu.:0.16141         3rd Qu.:0.9343   3rd Qu.:0.8770   3rd Qu.:0.05776
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00000         Max.   :1.0000   Max.   :1.0000   Max.   :1.00000

Fail:     8.8%                                            Fail:     0.4%

POS door Alpino

                                                          Met POS-nodes

                                                          tijd: 2u27 (vingolf)
                                                          geheugen: 3.1 Gb

                                                          OK only

                                                             Precision          Recall        Crossing brackets
                                                           Min.   :0.1771   Min.   :0.06195   Min.   :0.00000
                                                           1st Qu.:0.7073   1st Qu.:0.69129   1st Qu.:0.00000
                                                           Median :0.8222   Median :0.80833   Median :0.00000
                                                           Mean   :0.8161   Mean   :0.79812   Mean   :0.02805
                                                           3rd Qu.:1.0000   3rd Qu.:0.95321   3rd Qu.:0.04662
                                                           Max.   :1.0000   Max.   :1.00000   Max.   :0.37037

                                                          OK + FAIL + UNKNOWN

                                                             Precision          Recall       Crossing brackets
                                                           Min.   :0.0000   Min.   :0.0000   Min.   :0.00000
                                                           1st Qu.:0.6610   1st Qu.:0.6499   1st Qu.:0.00000
                                                           Median :0.8043   Median :0.7888   Median :0.01058
                                                           Mean   :0.7434   Mean   :0.7271   Mean   :0.11456
                                                           3rd Qu.:0.9829   3rd Qu.:0.9417   3rd Qu.:0.06250
                                                           Max.   :1.0000   Max.   :1.0000   Max.   :1.00000

                                                          Fail:     8.9%


Bij 1.3

Als er geen parse is neem ik deze waarden:


Bij 1.4

Nieuwe data

1000 bekende zinnen op 9 delen.

Gewoon

Zonder POS-nodes                                          Met POS-nodes

tijd: 3u39 (zardoz)                                       tijd: 7u57 (zardoz)
geheugen: 3.4 Gb                                          geheugen:5.4 Gb

   Precision          Recall       Crossing brackets         Precision          Recall       Crossing brackets
 Min.   :0.3793   Min.   :0.4355   Min.   :0.00000          Min.   :0.1892   Min.   :0.2000   Min.   :0.00000
 1st Qu.:0.8627   1st Qu.:0.8648   1st Qu.:0.00000          1st Qu.:0.8417   1st Qu.:0.8241   1st Qu.:0.00000
 Median :0.9762   Median :0.9767   Median :0.00000          Median :0.9375   Median :0.9148   Median :0.00000
 Mean   :0.9187   Mean   :0.9187   Mean   :0.01715          Mean   :0.8968   Mean   :0.8745   Mean   :0.01733
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.02180          3rd Qu.:1.0000   3rd Qu.:0.9717   3rd Qu.:0.02020
 Max.   :1.0000   Max.   :1.0000   Max.   :0.37500          Max.   :1.0000   Max.   :1.0000   Max.   :0.36842

POS door Alpino

                                                          Met POS-nodes

                                                          tijd: 2u08 (zardoz)
                                                          geheugen: 2.7 Gb

                                                          OK only

                                                             Precision          Recall       Crossing brackets
                                                           Min.   :0.1818   Min.   :0.1048   Min.   :0.0000   
                                                           1st Qu.:0.8042   1st Qu.:0.7828   1st Qu.:0.0000   
                                                           Median :0.9235   Median :0.9000   Median :0.0000   
                                                           Mean   :0.8787   Mean   :0.8547   Mean   :0.0200   
                                                           3rd Qu.:1.0000   3rd Qu.:0.9706   3rd Qu.:0.0274   
                                                           Max.   :1.0000   Max.   :1.0000   Max.   :0.3077   

                                                          OK + FAIL + UNKNOWN

                                                             Precision          Recall       Crossing brackets
                                                           Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
                                                           1st Qu.:0.7970   1st Qu.:0.7714   1st Qu.:0.00000  
                                                           Median :0.9210   Median :0.8989   Median :0.00000  
                                                           Mean   :0.8664   Mean   :0.8428   Mean   :0.03372  
                                                           3rd Qu.:1.0000   3rd Qu.:0.9693   3rd Qu.:0.02952  
                                                           Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  

                                                          Fail:     1.4%

1000 onbekende zinnen op 9 delen.

Gewoon

Zonder POS-nodes                                          Met POS-nodes

tijd: 1u19 (zardoz)                                       tijd: 12u08 (zardoz)
geheugen: 3.3 Gb                                          geheugen: 122.5 Gb      WAAROM ZO VEEL ???

OK only                                                   OK only

   Precision          Recall       Crossing brackets         Precision          Recall       Crossing brackets
 Min.   :0.1224   Min.   :0.2593   Min.   :0.00000         Min.   :0.3000   Min.   :0.4286   Min.   :0.00000
 1st Qu.:0.5914   1st Qu.:0.6316   1st Qu.:0.00000         1st Qu.:0.7070   1st Qu.:0.7036   1st Qu.:0.00000
 Median :0.8182   Median :0.8333   Median :0.00000         Median :0.8462   Median :0.8354   Median :0.00000
 Mean   :0.7799   Mean   :0.7958   Mean   :0.03778         Mean   :0.8306   Mean   :0.8219   Mean   :0.02252
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:0.06136         3rd Qu.:1.0000   3rd Qu.:0.9673   3rd Qu.:0.03803
 Max.   :1.0000   Max.   :1.0000   Max.   :0.28767         Max.   :1.0000   Max.   :1.0000   Max.   :0.20000

OK + FAIL + UNKNOWN                                       OK + FAIL + UNKNOWN

   Precision          Recall       Crossing brackets         Precision          Recall       Crossing brackets
 Min.   :0.0000   Min.   :0.0000   Min.   :0.00000         Min.   :0.0000   Min.   :0.0000   Min.   :0.0000
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.03774         1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000
 Median :0.0000   Median :0.0000   Median :1.00000         Median :0.0000   Median :0.0000   Median :1.0000
 Mean   :0.2870   Mean   :0.2928   Mean   :0.64590         Mean   :0.3945   Mean   :0.3904   Mean   :0.5357
 3rd Qu.:0.6667   3rd Qu.:0.6853   3rd Qu.:1.00000         3rd Qu.:0.8269   3rd Qu.:0.8210   3rd Qu.:1.0000
 Max.   :1.0000   Max.   :1.0000   Max.   :1.00000         Max.   :1.0000   Max.   :1.0000   Max.   :1.0000

Fail:    14.8%                                            Fail:     4.1%
Unknown: 48.4%                                            Unknown: 48.4%

Verwerking met gokken volgens methode 2.

Zonder POS-nodes                                          Met POS-nodes

tijd: (zardoz)                                            tijd:  (zardoz)
geheugen:                                                 geheugen: 

POS door Alpino

                                                          Met POS-nodes

                                                          tijd: 2u09 (zardoz)
                                                          geheugen: 2.6 Gb

                                                          OK only

                                                             Precision          Recall        Crossing brackets
                                                           Min.   :0.1771   Min.   :0.06195   Min.   :0.00000  
                                                           1st Qu.:0.7072   1st Qu.:0.69426   1st Qu.:0.00000  
                                                           Median :0.8242   Median :0.80851   Median :0.00000  
                                                           Mean   :0.8169   Mean   :0.79917   Mean   :0.02772  
                                                           3rd Qu.:1.0000   3rd Qu.:0.95276   3rd Qu.:0.04545  
                                                           Max.   :1.0000   Max.   :1.00000   Max.   :0.37037  

                                                          OK + FAIL + UNKNOWN

                                                             Precision          Recall       Crossing brackets
                                                           Min.   :0.0000   Min.   :0.0000   Min.   :0.00000  
                                                           1st Qu.:0.6610   1st Qu.:0.6500   1st Qu.:0.00000  
                                                           Median :0.8070   Median :0.7914   Median :0.01015  
                                                           Mean   :0.7450   Mean   :0.7288   Mean   :0.11328  
                                                           3rd Qu.:0.9822   3rd Qu.:0.9434   3rd Qu.:0.06178  
                                                           Max.   :1.0000   Max.   :1.0000   Max.   :1.00000  
                                                           
                                                          Fail:     8.8%


CategoryParsing