Internally, Kirrkirr, as a Java program, uses Unicode, and so, in theory, you should be able to work with characters from any language that is defined in Unicode. In practice, things can get more complicated and messier than that.
Kirrkirr writes intermediate files in UTF-8, which should work with any Unicode characters. Your dictionary need not be in UTF-8, but it should correctly identify its character set in the XML header.
This is where most of the problems arise in practice. Java needs to be able to find a font that can display your characters. Depending on what fonts are available on your system, and whether Java can find them (with or without hints from Kirrkirr), you may or may not see the correct characters. If Java cannot find correct characters to display, you standardly see rectangular boxes.
Kirrkirr doesn't ship with fonts for miscellaneous alphabets. There needs to be something appropriate already installed on your system. (If you have a recent operating system (e.g., Windows XP or Mac OS X), then you do already have fonts that can display most of the major character sets (e.g., Arabic and Chinese), but probably not Inuktitut syllabics.
If your fonts are in the standard font location for your system, then
the answer should be yes, but for some systems, some fiddling with
font.properties
could be needed.
It may well not do this automatically, if you are running in an
English locale using the default font.properties
. There
are four ways that you can attempt to fix this:
kirrkirr.wordFont
- This font is used to display words
in lists, networks, etc.kirrkirr.textFont
- This font is used to display running
text.kirrkirr.interfaceFont
- This font is used for some
interface elements, where the font is explicitly specified in Kirrkirr
code. However, many other interface elements just use whatever font
is provided as the default by your implementation's LookAndFeel.kirrkirr.properties
file in the directory in which Kirrkirr
was installed (in a future version we'll probably make setting these
available as a Preferences tab). Their value should be the name of a
font on your system, for example:
kirrkirr.wordFont=Arial Unicode MS
font.properties.locale
file.font.properties
file (which you find in
the jre/lib
directory) to tell it about how to use fonts on
your system to display certain Unicode character ranges. Doing this
correctly is quite tricky; see
Sun's
documentation.Getting everything right on a pre-JDK1.5 system may require doing 2 and (3 or 4). In particular, at present, I believe the HtmlPanel will not display characters correctly unless they can be found at the system level by either method 3 or 4. Nevertheles, it can be done. For example, here is Kirrkirr displaying a demonstration tiny Japanese dictionary.
Kirrkirr fully supports localization of its interface. However,
currently very little localization data is available. In fact, all that
is available is an Australian English localization (which is used by
default), and a very partial Warlpiri localization. Additional
localizations can be defined by providing appropriate
lang_langcode_country.properties
files.
The picture that Kirrkirr displays at startup is loaded from
icons/splash.jpg
in the installation directory. For speed
of startup reasons, this path is hardcoded and cannot be changed.
However, you can change the picture stored as that filename to any
suitable sized JPEG file.
Kirrkirr has some known issues, which we hope to address in future versions. Among others these include:
Many dictionaries contain subentries: headwords for derived forms, or combinations of a noun and a light verb, or whatever are placed under the headword for the main entry, together with information about this subheadword. This organization doesn't work very well with Kirrkirr. While one can display a dictionary in this form (with a suitable XSL file), only the main headwords will be used, in the Network display, or when searching by headwords, etc.
This is largely by design: our paper dictionary usability testing showed people generally being confused by and not properly interpreting subentries. Further, we were interested in network representations of the lexicon, and this led to the idea that one is better off regarding all words as headwords, and turning this subordination relationship into a pair of links (mainentry for subentry, and subentries of main entry), which parallel other link types like synonym, antonym or hyponym. The included MiniWrl dictionary gives an example of this. (Note that the entries for main head words should also include information about what their subentries are. This is necessary for subentries to be displayed in the Network pane when a main entry is clicked on: the program only looks locally at one entry for links from it.)
If an XML dictionary has subentries in the traditional way, it should be fairly straightforward to automatically convert it, by promoting the subentries into the list of main headwords, perhaps marking them with an attribute, and simultaneously putting in links from the subentry to the main entry and from the main entry to the subentry. We hope to someday write a general utility that will do this, but don't have one at present.
Proceed to Troubleshooting problems.
Back to Preparing a dictionary for Kirrkirr.
Back to the Kirrkirr home page.
http://nlp.stanford.edu/kirrkirr/dictionaries/other.html