This zip file contains two files:

  kale-p-*.csv
  kale-q-*.csv

where * is "u" if the file is utf-8 encoded, or "w" if it is 
shift-jis encoded.  The xxx-u.csv files are packaged together 
in file kale-u.zip, and the xxx-w.csv files in kale-w.zip.

The zip files are available at http://www.edrdg.org/~smg/.

The csv files combine the Google page count data generated for 
for JMdict reading and kanji text strings by Kale Stutzman(*) 
with "P" tag information for the strings extracted from JMdict.
This allows comparison of these two frequency-of-use metrics.
All JMdict data is from the 2007-01-14 version of 
ftp://ftp.cc.monash.edu.au/pub/nihongo/JMdict.gz

kale-p-*.csv 
  Gives all the text strings and page counts in Kale's data,
  matches them to JMdict entries, and indicates whether or not
  the string is marked P in the entry.
  See below for detailed description of format.

kale-q-*.csv
  For all reading and kanji text strings in JMdict that are
  marked with a P tag in edict/wwwjdict, gives the Google
  page counts from Kale's data, with a visual flag if the
  page count values is less than 1M.
  See below for detailed description of format.

(*) Generated around 2007-01-14 by  Kale Stutzman.  See email
to the edict-JMdict mailing list at:
  http://http://tech.groups.yahoo.com/group/edict-JMdict/message/1076
Kale's data file is (at time of writing) at:
  http://www.samuraifight.com/edict-gfreq.txt

General Notes:
I have not checked these results much yet.  There may be
goofs, possibly major.

The files were created using queries on the experimental 
JMdict database which was loaded from the full JMdict file 
rather than the english-only version.  Hence the glosses 
are rather long and contain non-english characters.

These files may be updated in the future based on feedback
from the edict-JMdict mailing list.

============================================
kale-p-*.csv
============================================
The data is arranged in CSV (comma separated value) format as 
a table with 7 columns: 

  txt -- This is a text string from Kale's file.  The original 
        file had multiple search results on a single line, and 
        many search strings occurred multiple times in the file.
        I coalesced all occurrences of the same search string 
        into a single one by averaging the hit counts for all 
        occurrences of the same string.

  hits -- Hit count for "txt".  As mentioned above, if there 
        were multiple occurrences of the text in Kale's file,
        this value is the average of all of them.

  seq -- The sequence number of the JMdict entry with a reading
        or kanji that matched 'txt".

  case -- Is "P" if the reading or kanji that matches "txt" 
        would have a P tag in edict/wwwjdict.  Blank otherwise.

  kanji, 
  rdng, 
  gloss -- These columns give a summary of the JMdict entry.

Notes:
Because a "txt" string may match readings or kanji in multiple
JMdict entries, it will be repeated on multiple lines for each 
JMdict entry matched.

Records are ordered by descending hit counts.  If you can
get the data in Excel or similar, you can re-sort however
you wish.

============================================
kale-q-*.csv
============================================
The data is arranged in CSV (comma separated value) format
as a table with 7 columns: 

  txt -- A P marked reading of kanji text string from JMdict.

  hits -- Google page count for this string from Kale's data.

  low -- Contains "*" if "hits" is < 1M, blank otherwise.

  seq -- The sequence number of the JMdict entry that contains 
        the reading or kanji in "txt".

  kanji, 
  rdng, 
  gloss -- These columns give a summary of the JMdict entry.

Records are ordered by entry seq number.


============================================
Stuart McGraw, 2007-01-16
Revised 2007-01-17, minor edits, spelling and typo corrections.