The JMdictDB Project

This page contains infomation about the development of a Postgresql database to support Jim Breen's Japanese-English dictionary projects including JMdict, JMnedict, Kanjidic2 WWWJDIC and others. Jim runs these projects under the auspices of the Electronic Dictionary Research and Development Group EDRDG).

The goals of this project (in priority order) are:

  1. To create a database to serve as a master repository for the information in the JMdict, EDICT, JMnedict, Examples, Kanjidic and other related files distributed by Jim Breen and the EDRDG.
  2. To provide a web-based system for the submission, review, and approval of corrections and new entries to these data.
  3. To provide freely available software to others who want to use or build upon, "JMdict in a database".
  4. To provide an open-source replacement for the principal author's Microsoft Access based JMdict database. :-)

Discussion of this project takes place on the edict-jmdict@yahoo.com mailing list (http://groups.yahoo.com/group/edict-jmdict/).  Jim Breen maintains a web page describing the JMdict project's use of JMdictDB at http://www.edrdg.org/wiki/index.php/JMdictDB_Project. There is also some older information at http://www.csse.monash.edu.au/~jwb/edictredev/.

The project code is still undergoing active development and no promises are made regarding stability or backward compatibility. However, it is currently in use as the primary repository for the JMdict project dictionary data and the web interface is in use for submitting new entries and corrections to existing entries in WWWJDIC.

All the code developed for this project is GPL'd and maintained in a publicly accessible Mercurial repository (links below). Additional help is welcome; please post to the edict-jmdict mailing list, or email the current principal developer at the address at the bottom of this page.

The code currently consists of scripts to create and load JMdict (and related data such as the JMnedict "Japanese names" file, or the Tatoeba "examples" file) into a Postgresql database, some maintenance and other command line tools, and a set of CGI scripts to allow access and updating of the database using a web browser. The code was originally written in Perl but was migrated entirely to Python in May 2008. The code is developed and tested under Ubuntu Linux and Fedora 15 (both with Apache web server), and Microsoft Windows XP (with IIS web server). More information on prequisites is in the README.txt file.  

News

New 2018-06-10: conj.py is a standalone Python program that uses the conjugation tables developed for the JMdictDB project to demonstrate how simple a table-based Japanese word conjugator can be when using this approach. It has been moved out of JMdictDB to a separate, independent (git) project. Current URLs:

New 2018-04-26: The JMdictDB web interface now provides a page that lets users change their saved settings (userid, name, email address, password). It is accessed by clicking one's user name after logging in. An "administrator" user privilege level has been added which which any user's settings can be changed.

New 2018-04-01: The templating system user by JMdictDB web interface was changed from SimpleTAL, an implementation of the TAL/TALES attribute language used in Zope, to Jinja2, a somewhat more readable and easier to use template language.

New 2015-08-19: The "Submissions" link at the top of each JMdictDB page now shows an index page with links to updates made "Today", "Yesterday", each day of this year grouped by month, and previous years. The apperance is very close to the databaseupdates.html that Jim Breen was maintaining. Formerly, the Submissions link showed a page of the entries that were updated "today" only.

New 2015-05-05: The contact email of the principle developer of JMdictDB has changed; the various JMdictDB web pages and forms have been updated with the new address in the footer.

Try it !

Access to the online test version of JMdictDB. (Note that these links are to the web pages provided in the JMdictDB source code. The pages linked to from WWWJDIC are very similar but have been tweaked to the needs of WWWJDIC.)

  Find and edit existing entries: search / advanced search
  Add a new entry
  Editing quick overview or full help

Please feel free to try these out, including adding any real or junk entries you want, but be aware that all changes will be thrown away periodically and will NOT go into the real JMdict.

Code and Documentation

jmdictdb -- Browsable (read-only) access to the JMdictDB code Mercurial repository.
Issue tracker -- Issue tracker for the JMdictDB project software.
tip.tar.gz -- Download source code, latest development version (gzipped tar file).
README.txt -- The README file, includes install prerequisites and instructions (2010-03-10)
schema.html, schema.pdf -- Comprehensive description of the database schema (2008-11-12).
schema.png -- Diagram of the database schema (200KB, 2008-11-12).
T021.tar.gz -- Source code for last version implemented in Perl, obsolete, 2008-05-03 (gzipped tar file).


Related files, but not part of JMdictDB...


The following HTML pages list all jmdict entries that share a common kanji or reading text with at least one other entry. The entries are sorted by the text making it relatively easy to identify enties that are very similar and possibly should be merged.
This data is based on the 2007-01-14 version on JMdict.
Shared kanji (800KB)
Shared readings (10MB)

Matchup of Kale Stutzman's 2007-01-14 google hit counts and corresponding JMdict entries (Kale's email):
README.txt (also included in the .zip files)
kale-u.zip UTF-8 encoded files
kale-w.zip SJIS (Windows) encoded files

Kale Stutzman's original data file in alternate encodings:
edict-gfreq.euc EUC-JP encoded
edict-gfreq.utf UTF-8 encoded


Please send questions or comments about these pages or the JMdictDB project in general to Stuart McGraw (jmdictdb@mtneva.com / http://mtneva.com)