Building an XSLT stylesheet

Basics

To use a dictionary with Kirrkirr, you must provide at least one XSLT stylesheet file that can render dictionary entries in HTML (the format used by web browsers and for Kirrkirr entry text display). Our dictionaries usually provide several, so you can view dictionary entries in different ways. Most (recent) books on XML contain some coverage of XSLT, and there are several books just on XSLT (if you're looking to buy one, a good one is Michael Kay's XSLT: Programmer's Reference). There are also some introductions to XSLT on the web (such as w3schools). XSLT specifies rules to transduce tree structures. It's actually a very linguist-friendly notion. Transformational grammar might have turned out better if tree transductions XSLT-style were the norm in the 1960s.

The XML file passed to your XSLT stylesheet is basically just a fragment of your dictionary (recall the assumption that the dictionary must be represented in XML as a list of words). That is, the XML file will have the complete structure of the dictionary, but it will be a dictionary with just one word left in it in L1 mode, or perhaps a few entries if in L2 mode or if exporting dictionary data to HTML. So, the XSLT file should be built to work with your dictionary structure (indeed, it should work if run on your entire dictionary).

Getting links to other words working inside Kirrkirr

The main Kirrkirr-specific thing you need to know is how to get links that work between dictionary entries, so that you can click on a word in one entry (such as a word listed as a synonym) and move to that entry. Essentially one wants to build up an "A HREF" element, just like when writing an HTML page, but a particular encoding is used for a word. When the hyperlink is clicked, Kirrkirr intercepts the link request, and creates the necessary HTML file, and updates other panels to display the requested word. The file to request for a word, say, xyzzy is @xyzzy@uniquifier.html. If there is no uniquifier (i.e., there aren't homophones to deal with), it's just @xyzzy@.html. (Again, the xyzzy part should be in UTF-8 encoding.) Note that such links cannot work outside running Kirrkirr, because HTML files for each entry do not exist permanently on disk - dealing with this for exported HTML files is discussed below.

To be very concrete, if links to other words are in a LINKTO element in your XML, and the uniquifier is stored in an HNUM attribute (when needed), then the following XSLT will produce working links in the HTML (within Kirrkirr).

<xsl:template match="LINKTO">
  <xsl:choose>
    <xsl:when test='@HNUM'>
      <A>
      <xsl:attribute name="HREF">@<xsl:value-of 
          select="."/>@<xsl:value-of select="@HNUM"/>.html</xsl:attribute>
      <xsl:apply-templates/>
      </A>
    </xsl:when>
    <xsl:otherwise>
      <A>
      <xsl:attribute name="HREF">@<xsl:value-of
          select="."/>@.html</xsl:attribute>
      <xsl:apply-templates/>
      </A>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

Advanced features

As an advanced feature, the XML files passed through to the XSLT stylesheet add a couple of elements which you can regard as parameters that might allow you to produce better XML output. These are added as direct descendants of the root element of your dictionary. (Advanced note: note that this means that the XML files passed to the XSLT file will not strictly satisfy any DTD that you may have defined for the dictionary.) This information can be used by the XSLT file to provide links to pictures and sounds that work within HTML.

If your XSLT file descends the hierarchy of XML elements in the most straightforward way (by implicitly or explicitly having the top level node do <xsl:apply-templates/>), then you will probably want to suppress printing of the contents of the DIR and GLOSS elements in your HTML output (since, if you just descend the tree, by default element contents are printed). You can do this with an empty transformation body like this (assuming that /DICTIONARY/ENTRY is your DICTIONARY_ENTRY_XPATH):

<xsl:template match="DIR|GLOSS">
</xsl:template>

Alternatively, you can avoid having them printed by having the top level XSLT transformation only apply to your entry nodes like this:

<xsl:template match="/">
  <HTML>
    <BODY>
      <xsl:apply-templates select="DICTIONARY/ENTRY"/>
   </BODY>
  </HTML>
</xsl:template>

Examples

Here is a simple XSLT file that will work with the tinydict.xml file introduced earlier: tinydict.xsl. (It should load in your web browser. Finally, this link contains a subset of tinydict.xml with instructions to render using tinydict.xsl. As a result, if using a modern web browser, you should see a formatted dictionary entry rendered by the XSLT: tinyrend.xml.

For more complex and complete examples, look at some of the XSLT stylesheets that come with the Kirrkirr download. The MiniWrl ones are particularly complex. That is, they're not a terribly good place to start, but they do illustrate many of the things that you can do in XSLT.

It can take quite a while getting an XSLT file both syntactically correct and doing what you want. Again, a web browser is commonly the best way to test an XSLT file, but you need to make sure that it is a modern browswer that supports standard XSLT 1.0 (good candidates are Internet Explorer version 6+, Mozilla, or Netsacpe 7+). Older versions of IE supported a very different non-standard dialect of XSLT. Kirrkirr uses Xalan as our XSLT processor, but it should be sufficient just to use standard XSLT 1.0 processing. To have the web browser render an XML file, put a line at the top of it saying which XSL file to use, like this:

<?xml-stylesheet type="text/xsl" href="tinydictxsl.xml"?>

(Note, we usually use a .xsl extension for our XSLT files, but web servers may not recognize this extension as an XML file, and so it may be safer to test things using a .xml extension, as here - certainly for Mozilla.)

You can also test out XSL transformations from within Kirrkirr. Use the Tools | XSL transformer menu option, and browse to provide appropriate filenames for the XML input, XSL file, and output file. You will then need to load the output file in a web browser to examine it.

Once you have one or more XSLT files that you are at least moderately happy with, then you need to tell Kirrkirr about them. This is one part of defining a DictionaryInfo file for your dictionary, which we turn to next.

Proceed to Building a DictionaryInfo XML file


Valid HTML 4.01! http://nlp.stanford.edu/kirrkirr/dictionaries
Christopher Manning -- <manning@cs.stanford.edu> -- Last modified: Wed Mar 17 15:51:49 PST 2004