Edict Overview

From EDRDG Wiki
Revision as of 23:35, 18 January 2013 by JimBreen (talk | contribs)
Jump to: navigation, search

The EDICT Dictionary File

Welcome to the Home Page of the EDICT file within the JMdict/EDICT Project; founded, coordinated and more-or-less single-handedly run by Jim Breen (hereafter "I" or "me"). This page is intended as an overview of the file, with links to more detail elsewhere.

Background

Way back in 1991 I began to experiment with handling Japanese text in computer files, and decided to try writing a dictionary search program in Turbo C under DOS, which used a simple dictionary file contained in the MOKE (Mark's Own Kanji Editor) package. To make this program more useful, I began to expand the file itself. One thing led to another, until I ended up running a fairly major project which has taken over a large portion of my life. I must acknowledge that the EDICT project has depended on many people who have provided material and editorial assistance. A significant proportion of the compilation process has been carried out using electronic mail and file transfers, and indeed the project would never have occurred without the services provided by the Internet.

What is EDICT?

EDICT is a Japanese-English Dictionary file. For the full details, see the full documentation, or the old documentation.

It is a plain text document in EUC-JP coding, with its own format (which has become known as "EDICT-format"). (samples) Originally it was compiled and edited in this format, but since 1999 it has been generated as a legacy file from an expanded database, along with the related JMdict (Japanese-Multilingual Dictionary) project. JMdict is an expanded file, containing French, German, Russian, etc. translation, and is in XML format and UTF-8 coding.

There are now two EDICT versions:

  1. the plain EDICT file. This is the original format, where there is only one kanji form and one reading per entry. I regard this as a legacy format, and only provide it for older applications. PLEASE do not use this format for new applications, as I would like to withdraw it one day.
  2. the enhanced "EDICT2" format. This can have multiple kanji forms and readings in an entry, and also has other information such as cross-references, restrictions, etc. and also uses kanji from the extensions in the JIS X 0212 standard. It has almost all the information in the full JMdict format. This form should be used for all new applications.

The EDICT2 file currently has about 160,000 entries, and the legacy EDICT format has nearly 200,000 entries (many of which are duplicates as all the permutations of kanji and readings generate distinct entries.)

A short overview of the EDICT project as parallel English/Japanese text is available.

Download

You can download the files in various formats: edict2.gz, edict.gz, edict.zip These are all on the Monash ftp site.

You can also use EDICT2 online via my WWWJDIC server.

Is it Public Domain?

EDICT can be freely used provided satisfactory acknowledgement is made in any software product, server, etc. that uses it. There are a few other conditions relating to distributing copies of EDICT with or without modification. Copyright is vested in the EDRG (Electronic Dictionary Research Group). You can see the specific licence statement at the Group's site.

Other Dictionary Files

A number of other dictionary files have been compiled by me and others as adjuncts or spin-offs from the EDICT file. I will list the major of these below. Another summary can be found be found in the documentation of my WWWJDIC server.

  • the KANJIDIC kanji information file. (overview) (download) (documentation) This file has an entry for each of the 6,355 kanji in the JIS X 0208-1990 standard. (The KANJIDIC file was cited as a reference in the New Nelson character dictionary published in 1997.)
  • A second file, KANJD212, which covers the 5,801 kanji in the JIS X 0212-1990 standard has been assembled, and was released early in 1996. (documentation) (download)
  • the ENAMDICT/JMnedict files of proper names. These now have over 720,000 names. Downloads: (enamdict.gz) (JMnedict.xml.gz) or see the documentation.
  • the COMPDIC file of computing and (tele)communications terminology. Has over 12,000 entries. (documentation) (download)
  • the EDICLSD3 (Japanese-English Life Science dictionary), which is the EDICT-format version of a major file produced at Kyoto University by a project group coordinated by Professor Shuji Kaneko..

Software for using the EDICT files=

  1. WWW

There are two main WWW options:

    • my own WWWJDIC server, which has a number of mirrors in Canada, Japan, the US, etc. (Please note that the WWWJDIC program is not available for download. There is no PC version.)
    • Jeffrey Friedl's server at sites in Canada, the USA, etc.
There are many other WWW-based methods, and a larger list can be found on my online dictionaries  page.

A very useful site is Rikai, which massages WWW pages, placing popup translations from EDICT behind the Japanese text. As well there is a Rikai-based Mozilla Plugin that achieves the same without going to the server. Needs Firefox 0.8.

  1. Windows

While I do not have a lot of direct experience (I don't use Windows much), the following appear to be the options:

    • use the JquickTrans program, also available from the Monash ftp site. Despite its name, it is a dictionary client.
    • use the old WinJDic program, also available from the Monash ftp site. It has the limitation of not being able to handle more than one dictionary file.
    • use the JWPce freeware wordprocessor, also available from the Monash ftp site. It has a good built-in dictionary function. The author, Glenn Rosenthal, has promised a stand-alone dictionary version soon. The older JWP wordprocessor, written by Stephen Chung, is also popular.
    • another WP which uses the EDICT file is NJSTAR. NJSTAR comes with an early copy of EDICT. If you want to use a more recent copy, you'll need to create special index files. I think the utilities for this are in the DOS archive of NJSTAR, but I cannot confirm this.
    • the Roboword program from Technocraft.
    • use the DOS JDIC mentioned below.

All of the above work with just "English" versions of Windows.

  1. WindowsCE/Windows Mobile
  2. Unix/Linux (X-Windows)
    • My own xjdic (V2.4) which is available from the Monash ftp site. It needs to run in a kterm window, and has been used successfully on virtually every type of Unix & Linux system.
    • the Gjiten package, which is very nice, and has its own flexible GUI. (Gnome)
    • the new gWaei, which aims to be a "dictionary program for the Gnome desktop with support for regular expressions."
    • the fast and light-weight Kiten (KDE).
    • for Emacs/XEmacs users there is edict.el. I don't know much about it, but I think it is included in the XEmacs rpm. On the Monash ftp site I have some information and the latest release.
  3. Smartphones

For Android phones there are two main options:

    1. the AEdict app. This uses copies of the dictionary files downloaded to the phone, so it works well offline. A weakness is that it uses the old legacy format for EDICT.
    2. the WWWJDIC for Android app. This uses the WWWJDIC server via its API, and needs network access. It has the advantage of always being up-to-date as the dictionary is expanded and corrected.

For Apple iPhones two options are:

    1. the Kotoba app, which is very highly regarded. It is similar to AEdict, but uses the JMdict data and hence provides all the information from the dictionary.
    2. the newer EDICT with Grammar app. This uses the common words subset in the old EDICT format.

WWWJDIC itself has a simple mobile phone interface, which I developed for Japanese keitai many years ago.

  1. Macintosh

Mac users have a number of options if they have Japanese support with their OS (I think the support is standard for later versions):

  1. DOS

The two main main programs for DOS are:

    • the JDIC program which provides a Japanese-English and English-Japanese dictionary function.
    • the JREADER text reader, which includes an integrated dictionary look-up function.
  1. Others

There are also programs for Amiga, BeoS, Palm Pilots, etc. Most can be obtained from the Monash ftp site. There is a Jabber bot that does local EDICT lookups too.

Romaji?

None of the files in the EDICT project use romanized Japanese. I get many requests for a romaji version of EDICT, however as I do not like romaji and do not want to encourage its use, I will not be producing romaji versions. There is a romaji version dating from 1997 on the Monash ftp site. This was prepared for a blind person who was using a non-Japanese Braille interface. I (foolishly) placed it on my ftp site, and I have had a lot of problems since it was not in step with the main file. That file is now withdrawn, and I am asking all sites carrying copies to withdraw it.

Publications

If you like, you can collect some papers I have written about the project:

Other useful links can be found on my Japanese Page.