The lexical database
Original materials are stored in an ad hoc format of markup using backslash codes with some (rather odd) nesting of structural tags
These were converted to XML using an error-correcting stack-based parser (written in PERL).
- The inconsistency and flexibility of dictionary entries actually made this a surprisingly difficult task.
- But parser tries to impose data integrity
Use of XML gives a clear structure to the lexical data, and makes available many (free) tools