Word frequency lexicon for TASX corpora
The following XQuery is a program creating a simple word frequency lexicon from a TASX corpus.
<frequencylexicon>{
let $file := doc("filename.xml")
for $layermeta in $file//session/layer/meta/desc
where $layermeta/name="layer name" and $layermeta/val="words"
return
for $word in distinct-values($layermeta/../../event)
order by $word
return
<lexentry>
<word>
{$word}
</word>
<absfrequency>
{count(
for $frequencywords in $layermeta/../../event
where $frequencywords=$word
return $word)
}
</absfrequency>
<relfrequency>
{count(
for $frequencywords in $layermeta/../../event
where $frequencywords=$word
return $word
)
div
count(
for $frequencywords in $layermeta/../../event
return $word)
}
</relfrequency>
</lexentry>
}
</frequencylexicon>
The following is a part of the resulting lexicon:
<lexentry>
<word>ab</word>
<absfrequency>2</absfrequency>
<relfrequency>0.001845018450184502</relfrequency>
</lexentry>
<lexentry>
<word>Abend</word>
<absfrequency>1</absfrequency>
<relfrequency>0.000922509225092251</relfrequency>
</lexentry>
<lexentry>
<word>Abends</word>
<absfrequency>1</absfrequency>
<relfrequency>0.000922509225092251</relfrequency>
</lexentry>
<lexentry>
<word>aber</word>
<absfrequency>7</absfrequency>
<relfrequency>0.006457564575645756</relfrequency>
</lexentry>