This is a Python library for several natural language processing related tasks:
calling Eugene Charniak and Mark Johnson's reranking parser from Python
reading and writing n-best lists
reading in trees (with actual tree internals accessible via PyInputTree)
running evaluators evalb and sparseval and extracting results from their output
Note that the reranking parser interface is subshell-based and not SWIG-based like it should be. Though I haven't played with Patrick Ye's version too much, it may be a better fit depending on what you are looking for. If you are hoping to treat the parser as a black box, this is the library you're looking for.
Contents
Parsing/ directory: Python libraries
ECData.py: partial Pythonic interface to data directories used by parser
ECParser.py: run and train the Charniak and Johnson reranking parser
(training only for the parser)
Evaluation.py: abstract evaluation class and tools
Trees.py: read trees, n-best lists
evalb.py: evaluate trees using evalb
sparseval.py: evaluate trees using sparseval (less complete)
bin/ directory: various command-line tools of varying levels of usefulness.
Many of these were just used as tests of ECParser, etc.
aligntrees.py: align portions of test trees with full gold trees
(used when you parse a file in contiguous slices and want to
recover the gold parses for each slice)
crossingbracketsstats.py: print out the crossing bracket stats from an
evalb file
drawparallel.py: draw parallel trees (for viewing test and gold trees
side by side, for example). Needs
NLTK-Lite.
evalbsummary.py: print summary from evalb from evalb files
fetchfscores.py: converts evalb files into a simpler form, easier
to parse in R.
graphlengths.py: make histograms of sentence lengths for various files
onebestselector: select the 1-best parse from an n-best list
sgmlize: convert raw text into semi-SGML format (the format used as
input to the Charniak parser)
treeconverter: extract yields from trees
treeoneliner: remove insignificant whitespace from trees so that
each tree is on one line (essentially the opposite of pretty
printing)
After those are installed, standard distutils should work fine:
shell> python setup.py install
Please email me if you encounter any problems. Note that Parsing.ECParser has some hard-coded defaults for the parser binary and data directory locations which you may wish to set (though this is not necessary since you can just specify them as parameters).