Word frequency lexicon for TASX corpora

The following XQuery is a program creating a simple word frequency lexicon from a TASX corpus.

<frequencylexicon>{
let $file := doc("filename.xml") 

for $layermeta in $file//session/layer/meta/desc
where $layermeta/name="layer name" and $layermeta/val="words"
return 
  for $word in distinct-values($layermeta/../../event)
  order by $word
  return 
    <lexentry>
      <word>
        {$word}
      </word>
      <absfrequency>
        {count(
          for $frequencywords in $layermeta/../../event
          where $frequencywords=$word
          return $word)
        }
      </absfrequency>
      <relfrequency>
        {count( 
          for $frequencywords in $layermeta/../../event
          where $frequencywords=$word
          return $word
          ) 
          div 
         count(  
          for $frequencywords in $layermeta/../../event
          return $word)
        }
      </relfrequency>
    </lexentry>
}
</frequencylexicon>

The following is a part of the resulting lexicon:

<lexentry>
      <word>ab</word>
      <absfrequency>2</absfrequency>
      <relfrequency>0.001845018450184502</relfrequency>
   </lexentry>
   <lexentry>
      <word>Abend</word>
      <absfrequency>1</absfrequency>
      <relfrequency>0.000922509225092251</relfrequency>
   </lexentry>
   <lexentry>
      <word>Abends</word>
      <absfrequency>1</absfrequency>
      <relfrequency>0.000922509225092251</relfrequency>
   </lexentry>
   <lexentry>
      <word>aber</word>
      <absfrequency>7</absfrequency>
      <relfrequency>0.006457564575645756</relfrequency>
   </lexentry>

Thorsten Trippel 2006-11-18