[PetersWerkWiki] [TitleIndex] [WordIndex

Werkoverleg met GvN.

Taken:

  1. EarleyParser

    1. Programma's voor weergave
      • Crash bij openen window met zelfde ID. Gedaan.

      • Zoeken naar verschilcategorieën met/zonder attributen. Gedaan.

      • Zoeken: boven, onder, allebei. Gedaan.

      • Muiswiel. Gedaan.

    2. Parsen met POS-tags door Alpino.
      • Waarschijnlijkheid voor woordgroep met lengte n: 0.1^(n-1). Gedaan, zie onder.

  2. InformationRetrieval

    • Eerste negen hoofdstukken van boek doorkijken. Gedaan.
      Materiaal voor oefeningen toegevoegd op InformationRetrieval.

      • De stof uit hoofdstukken 1 t/m 3 is goed te oefenen met MongoDB, maar niet op het allerlaagste niveau zoals dat in het boek wordt beschreven.
      • Hoofdstukken 4 en 5 beschrijven technieken waarop de gebruiker van MongoDB geen invloed heeft.
      • De inhoud van hoofdstukken 6 t/m 8 zou ook met MongoDB geoefend kunnen worden.
      • Wat hoofdstuk 9 betreft, dit lijk mij op het eerste gezicht moeilijk te oefenen in korte tijd. MongoDB zou wel gebruikt kunnen worden.
    • MongoDB
      • Mongod werkt niet op karora, heeft om onduidelijke redenen geen rechten om bestanden in homedir aan te maken. Op zardoz werkt het wel. En op de machine die door studenten gebruikt wordt?


Werk voor WvdM:

  1. Registratiesysteem en database aanpassen voor nieuwe groepsindeling van CLCG. Gedaan.


Bij 1.2: Waarschijnlijkheid voor woordgroep met lengte n

Getest op grammatica uit 9 delen.

Getest op 374 zinnen waarvoor de POS-tagger woordgroepen als mogelijkheid gaf.

Al bij zeer lage waardes nieuwe fouten. Soms veranderen er dingen die niets met woordgroepen te maken lijken te hebben.

Sommige oude fouten verdwijnen pas bij zeer hoge waardes, sommige verdwijnen helemaal niet.

cd /net/aistaff/kleiweg/Earley/2013-08-14

# -log(p) = len - 1
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_001.parse

# -log(p) = (len - 1) * 5
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_005.parse

# -log(p) = (len - 1) * 10
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_010.parse

# -log(p) = (len - 1) * 50
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_050.parse

# -log(p) = (len - 1) * 100
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_100.parse

0:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4355   1st Qu.:0.43070   1st Qu.:0.5897   1st Qu.:0.57468   1st Qu.:0.00000  
 Median :0.5455   Median :0.54258   Median :0.6859   Median :0.68860   Median :0.03226  
 Mean   :0.5446   Mean   :0.53697   Mean   :0.6628   Mean   :0.64832   Mean   :0.05365  
 3rd Qu.:0.6595   3rd Qu.:0.66549   3rd Qu.:0.7500   3rd Qu.:0.75361   3rd Qu.:0.08333  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

1:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4355   1st Qu.:0.42807   1st Qu.:0.5897   1st Qu.:0.57310   1st Qu.:0.00000  
 Median :0.5455   Median :0.54258   Median :0.6850   Median :0.68819   Median :0.03226  
 Mean   :0.5447   Mean   :0.53743   Mean   :0.6624   Mean   :0.64835   Mean   :0.05369  
 3rd Qu.:0.6595   3rd Qu.:0.66667   3rd Qu.:0.7500   3rd Qu.:0.75373   3rd Qu.:0.08333  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

5:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4377   1st Qu.:0.43591   1st Qu.:0.5845   1st Qu.:0.58380   1st Qu.:0.00000  
 Median :0.5455   Median :0.54505   Median :0.6812   Median :0.68860   Median :0.03525  
 Mean   :0.5449   Mean   :0.54319   Mean   :0.6580   Mean   :0.65348   Mean   :0.05429  
 3rd Qu.:0.6527   3rd Qu.:0.66549   3rd Qu.:0.7477   3rd Qu.:0.75422   3rd Qu.:0.08333  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

10:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4318   1st Qu.:0.43732   1st Qu.:0.5786   1st Qu.:0.58333   1st Qu.:0.00000  
 Median :0.5413   Median :0.54505   Median :0.6788   Median :0.68750   Median :0.03604  
 Mean   :0.5417   Mean   :0.54253   Mean   :0.6538   Mean   :0.65218   Mean   :0.05455  
 3rd Qu.:0.6497   3rd Qu.:0.66334   3rd Qu.:0.7462   3rd Qu.:0.75325   3rd Qu.:0.08466  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

50:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4271   1st Qu.:0.43548   1st Qu.:0.5762   1st Qu.:0.58046   1st Qu.:0.00000  
 Median :0.5377   Median :0.54387   Median :0.6802   Median :0.68586   Median :0.03604  
 Mean   :0.5398   Mean   :0.54097   Mean   :0.6524   Mean   :0.65121   Mean   :0.05501  
 3rd Qu.:0.6497   3rd Qu.:0.66334   3rd Qu.:0.7460   3rd Qu.:0.75361   3rd Qu.:0.08511  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

100:

   Precision+        Recall+          Precision-        Recall-       Crossing brackets
 Min.   :0.1957   Min.   :0.05455   Min.   :0.2222   Min.   :0.1636   Min.   :0.00000  
 1st Qu.:0.4297   1st Qu.:0.43550   1st Qu.:0.5738   1st Qu.:0.5816   1st Qu.:0.00000  
 Median :0.5400   Median :0.54505   Median :0.6795   Median :0.6870   Median :0.03822  
 Mean   :0.5415   Mean   :0.54388   Mean   :0.6525   Mean   :0.6541   Mean   :0.05544  
 3rd Qu.:0.6497   3rd Qu.:0.66334   3rd Qu.:0.7460   3rd Qu.:0.7542   3rd Qu.:0.08636  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.8889   Max.   :0.34483  

Met minimum vier woorden

cd /net/aistaff/kleiweg/Earley/2013-08-14

# -log(p) = len - 1
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_001_4.parse

# -log(p) = (len - 1) * 5
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_005_4.parse

# -log(p) = (len - 1) * 10
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_010_4.parse

# -log(p) = (len - 1) * 50
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_050_4.parse

# -log(p) = (len - 1) * 100
../pairsview clef_part0001_multi_000.parse clef_part0001_multi_100_4.parse

0:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4355   1st Qu.:0.43070   1st Qu.:0.5897   1st Qu.:0.57468   1st Qu.:0.00000  
 Median :0.5455   Median :0.54258   Median :0.6859   Median :0.68860   Median :0.03226  
 Mean   :0.5446   Mean   :0.53697   Mean   :0.6628   Mean   :0.64832   Mean   :0.05365  
 3rd Qu.:0.6595   3rd Qu.:0.66549   3rd Qu.:0.7500   3rd Qu.:0.75361   3rd Qu.:0.08333  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

1:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4355   1st Qu.:0.43070   1st Qu.:0.5897   1st Qu.:0.57468   1st Qu.:0.00000  
 Median :0.5455   Median :0.54258   Median :0.6850   Median :0.68819   Median :0.03226  
 Mean   :0.5447   Mean   :0.53723   Mean   :0.6628   Mean   :0.64841   Mean   :0.05370  
 3rd Qu.:0.6595   3rd Qu.:0.66549   3rd Qu.:0.7500   3rd Qu.:0.75401   3rd Qu.:0.08333  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

5:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4383   1st Qu.:0.43898   1st Qu.:0.5862   1st Qu.:0.58333   1st Qu.:0.00000  
 Median :0.5515   Median :0.54545   Median :0.6868   Median :0.68913   Median :0.03571  
 Mean   :0.5491   Mean   :0.54437   Mean   :0.6629   Mean   :0.65471   Mean   :0.05434  
 3rd Qu.:0.6615   3rd Qu.:0.66667   3rd Qu.:0.7500   3rd Qu.:0.75532   3rd Qu.:0.08451  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

10:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4383   1st Qu.:0.44094   1st Qu.:0.5858   1st Qu.:0.58380   1st Qu.:0.00000  
 Median :0.5515   Median :0.54545   Median :0.6868   Median :0.68913   Median :0.03571  
 Mean   :0.5492   Mean   :0.54477   Mean   :0.6628   Mean   :0.65506   Mean   :0.05437  
 3rd Qu.:0.6615   3rd Qu.:0.66667   3rd Qu.:0.7500   3rd Qu.:0.75532   3rd Qu.:0.08451  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

50:

   Precision+        Recall+          Precision-        Recall-        Crossing brackets
 Min.   :0.1667   Min.   :0.03774   Min.   :0.2222   Min.   :0.07547   Min.   :0.00000  
 1st Qu.:0.4383   1st Qu.:0.44094   1st Qu.:0.5858   1st Qu.:0.58380   1st Qu.:0.00000  
 Median :0.5515   Median :0.54545   Median :0.6868   Median :0.68913   Median :0.03571  
 Mean   :0.5492   Mean   :0.54480   Mean   :0.6628   Mean   :0.65509   Mean   :0.05434  
 3rd Qu.:0.6615   3rd Qu.:0.66667   3rd Qu.:0.7500   3rd Qu.:0.75532   3rd Qu.:0.08451  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.88889   Max.   :0.34483  

100:

   Precision+        Recall+          Precision-        Recall-       Crossing brackets
 Min.   :0.1957   Min.   :0.05455   Min.   :0.2222   Min.   :0.1636   Min.   :0.00000  
 1st Qu.:0.4387   1st Qu.:0.44094   1st Qu.:0.5831   1st Qu.:0.5838   1st Qu.:0.00000  
 Median :0.5515   Median :0.54545   Median :0.6868   Median :0.6891   Median :0.03693  
 Mean   :0.5503   Mean   :0.54707   Mean   :0.6624   Mean   :0.6574   Mean   :0.05543  
 3rd Qu.:0.6615   3rd Qu.:0.66667   3rd Qu.:0.7500   3rd Qu.:0.7553   3rd Qu.:0.08571  
 Max.   :0.8889   Max.   :0.88889   Max.   :0.8889   Max.   :0.8889   Max.   :0.34483  


CategoryParsing CategoryInformationRetrieval CategoryClcg