dt_search Usage: ./dt_search [Options] Query Files Search Files for trees matching Query. Query uses XPath syntax, see http://www.w3.org/TR/xpath Options: -l Label : print stats for values of Label in matching nodes -s : show sentences with matching phrases -q : do not show file names of matching files. -h : This message. Examples: dt_search -l '//node[@cat="pp"]' cgn_exs/*.xml .... 54 mod 25 pc 13 ld 7 top 3 predc 3 obj2 2 cnj 1 sat 1 predm 1 obj1 1 dp 111 rel values total dt_search -l rel '//node[@cat="pp" and ./node[@word="als"]]' cgn_exs/*.xml 2 mod 1 predm 3 rel values total dt_search -s '//node[ ./node[@rel="su"]/@index = ./node[@rel="vc"]/node[@rel="obj1"]/@index ]' cgn_exs/*.xml (all passives, ie phrases with a 'su' and a 'vc'/'obj1' with identical index) ... cgn_exs/109.xml [met Haider in de regering zijn we verloren] cgn_exs/110.xml [met de helft van de ploeg dronken zijn we verloren] cgn_exs/146.xml het interventiebureau houdt bij de verkoop van het magere-melkpoeder rekening met de datum waarop [het produkt is ingeslagen] en slaat van de totale beschikbare hoeveelheid of , in voorkomend geval , van de in de door de marktdeelnemer aangewezen opslagplaats of opslagplaatsen beschikbare hoeveelheid telkens eerst de oudste produkten uit . cgn_exs/179.xml [de dagelijkse activiteit van een auteur , stukjes schrijven , wordt niet altijd gewaardeerd] . cgn_exs/180.xml [die man daar , tot voor kort gymnasiumleraar hier ter stede , is onlangs tot hoogleraar benoemd] . cgn_exs/190.xml [het kleine kinderen snoep geven wordt afgeraden] cgn_exs/5.xml [bij een huwelijk was het vroeger gemakkelijk gezegd] : tot de dood hon ons scheidt hè . cgn_exs/68.xml [Wim echter was allang ingevroren] . Further examples, useful at some point: ./dt_search -s '//node[@cat="pp" and ./node[@rel="obj1" and @cat="advp"]/@end < ./node[@rel="hd"]/@begin]' */*.xml Sometimes it's useful to only get the matched portion. In that case use something like: ./dt_search -s '//node[@cat="detp" and ./node[@cat="advp"]]' */*.xml |\ sed -e 's/.*\[\([^]]*\)\].*/\1/' which rels occur as sisters to 'tag': ./dt_search -l '//node[ ../node[ @rel="tag"]]' */*.xml expected: nucl and tag similar for nucl (expected nucl, tag, sat) sat (expected nucl, sat) dp (expected dp) body (body, cmp, rhd, whd) rhd (body,rhd) whd (body,whd) top (top) in order to find which are unexpected, e.g. a pobj1 relation as sister to dp: ./dt_search -s '//node[ @rel="pobj1" and ../node[ @rel="dp"]]' */*.xml find np's without head: ./dtv '//node[@cat="np" and ./node and not(./node[@rel="hd"])]' */*.xml also for advp ap pp ti smain ssub sv1 detp ./dtv '//node[@cat="oti" and ./node and not(./node[@rel="cmp"])]' */*.xml also cp show all head-less stuff: ./dt_search -l '//node[ ./node and not(./node[@rel="hd"])]' */*.xml which ones are inf but have no head? ./dtv '//node[ @cat="inf" and ./node and not(./node[@rel="hd"])]' */*.xml which ones are whq but have unexpected daughters? ./dtv '//node[ ../@cat="whq" and not(@rel="whd") and not(@rel="body")]' */*.xml what kind of daughters have oti's? ./dt_search -l '//node[ ../@cat="oti" ]' */*.xml what kind of daughters have conjunctions? similar for 'du'. ./dt_search2 -l '//node[ ../@cat="conj" ]' display all unexpected du-daughters: ./dtv2 '//node[ ../@cat="du" and not(@rel="dp") and not(@rel="sat") and not(@rel="nucl") and not(@rel="tag")]' check if rel occurs more than once (hd, hdf, rhd, whd, obj1, obj2, pobj1 ...): ./dtv2 '//node[count( ./node[@rel="hd"]) > 1]' ## which heads occur for which categories? xsltproc stylesheets/child-info.xsl */*.xml | grep childrel=hd | awk '{ print $1,$4 }' | sort | uniq -c | sort -nr | less ## which rels occur for which categories? xsltproc stylesheets/child-info.xsl */*.xml | awk '{ print $1,$3 }' | sort | uniq -c | sort -nr | less ## which cats occur for which rel? xsltproc stylesheets/child-info.xsl */*.xml | awk '{ print $1,$2 }' | sort | uniq -c | sort -nr | less ## or: xsltproc stylesheets/child-info.xsl */*.xml | awk '{ print $3,$4 }' | sort | uniq -c | sort -nr | less