Taken:
/net/aistaff/kleiweg/namen → telling.out/net/corpora/LassyLarge/net/shared/vannoord/LassyLargeRestricted/TWNC/net/corpora/paqu/wablief//NP in
depheads-in.go
en
enhanced-in.gofixMisplacedHeadsInCoordination()
score2 en score3En verder, misschien…
$ find /net/repositories/git-svn/Alpino -size +20M | xargs ls -lh | fgrep -v /.git/ -rw-rw-r-- 1 p141988 aistaff 29M jun 30 2016 /net/repositories/git-svn/Alpino/Generation/fluency/bigrams.tpl -rw-rw-r-- 1 p141988 aistaff 487M jun 30 2016 /net/repositories/git-svn/Alpino/Generation/fluency/trigrams.tpl -rw-r--r-- 1 p141988 aistaff 156M aug 30 2017 /net/repositories/git-svn/Alpino/Grammar/corpus_frequency_features.fsa -rw-r--r-- 1 p141988 aistaff 63M sep 11 14:17 /net/repositories/git-svn/Alpino/Tokenization/libtok.c -rw-r--r-- 1 p141988 aistaff 38M sep 11 14:17 /net/repositories/git-svn/Alpino/Tokenization/libtok_no_breaks.c
Macro’s voor het herkennen van namen:
naam1 = """( (@ntype="eigen" and @pos="name") or (@cat="mwu" and node[@spectype="deeleigen"] and not(../node[@rel="det"])) )""" naam2 = """( (@neclass="PER" and not(@rel="mwp")) or (@cat="mwu" and node[@neclass="PER"]) )"""
naam2 lijkt beter dan naam1 maar is niet bij elk corpus
toepasbaar.
Zoeken naar zoiets als Chris en zijn/haar…:
//node[@cat="conj"]/node[
%naam2% and
number(@begin) < ../node/node[
@pt="vnw" and @rel="det" and @lemma=("zijn","haar")
]/number(@begin)
]
Zoeken naar zoiets als Chris pakt zijn/haar…:
GJ_i = """ number(@index) """
GJ_name_per = """
( %GJ_single_name_per%
or
%GJ_multi_name_per%
) """
GJ_single_name_per = """
( ( @ntype = 'eigen'
or
@postag='SPEC(deeleigen)'
)
and
not(@neclass and not(@neclass="PER"))
) """
GJ_multi_name_per = """
( @cat='mwu'
and
node[@rel='mwp'
and
%GJ_single_name_per%
]
) """
GJ_name_subject = """
( ( @rel="su" and %GJ_name_per% )
or
( @rel="hd" and %GJ_name_per% and ../@rel="su")
) """
GJ_direct_haar_object = """
( ../node[@rel="obj1" and node[@rel="det" and @lemma="haar"]]
)
"""
GJ_coindex_haar_object = """
( %GJ_i% = //node[@rel="su"
and %GJ_direct_haar_object%
]/%GJ_i%
) """
GJ_direct_zijn_object = """
( ../node[@rel="obj1" and node[@rel="det" and @lemma="zijn"]]
)
"""
GJ_coindex_zijn_object = """
( %GJ_i% = //node[@rel="su"
and %GJ_direct_zijn_object%
]/%GJ_i%
) """
GJ_haar_object = """
( %GJ_direct_haar_object%
or
%GJ_coindex_haar_object%
) """
GJ_zijn_object = """
( %GJ_direct_zijn_object%
or
%GJ_coindex_zijn_object%
) """
GJ_fem_subject = """
//node[ %GJ_name_subject%
and
%GJ_haar_object%
] """
GJ_masc_subject = """
//node[ %GJ_name_subject%
and
%GJ_zijn_object%
] """
%GJ_fem_subject%
%GJ_masc_subject%