Word frequency lexicon for TASX corpora
The following XQuery is a program creating a simple word frequency lexicon from a TASX corpus.
<frequencylexicon>{ let $file := doc("filename.xml") for $layermeta in $file//session/layer/meta/desc where $layermeta/name="layer name" and $layermeta/val="words" return for $word in distinct-values($layermeta/../../event) order by $word return <lexentry> <word> {$word} </word> <absfrequency> {count( for $frequencywords in $layermeta/../../event where $frequencywords=$word return $word) } </absfrequency> <relfrequency> {count( for $frequencywords in $layermeta/../../event where $frequencywords=$word return $word ) div count( for $frequencywords in $layermeta/../../event return $word) } </relfrequency> </lexentry> } </frequencylexicon>
The following is a part of the resulting lexicon:
<lexentry> <word>ab</word> <absfrequency>2</absfrequency> <relfrequency>0.001845018450184502</relfrequency> </lexentry> <lexentry> <word>Abend</word> <absfrequency>1</absfrequency> <relfrequency>0.000922509225092251</relfrequency> </lexentry> <lexentry> <word>Abends</word> <absfrequency>1</absfrequency> <relfrequency>0.000922509225092251</relfrequency> </lexentry> <lexentry> <word>aber</word> <absfrequency>7</absfrequency> <relfrequency>0.006457564575645756</relfrequency> </lexentry>