Earleyx parser

The code is now available on github.

Description:
The Earleyx parser was originated from Roger Levy's prefix parser, but has evolved significantly. Earleyx can generate Viterbi parses and perform rule estimation (Expectation-Maximization and Variational Bayes). The parser also implements the scaling approach as described in my TACL'13 paper which speeds up parsing time and allows for parsing long sentences (with restricted grammars).

Features:
(a) Code was restructured and rewritten to follow the flow of Stolcke's algorithm (see the method parse() in parser.EarleyParser).
(b) Scaling approach to parse long sentences (see my TACL'13 paper). With scaling, no log operations are required (see the usage of util.Operator/ProbOperator/LogProbOperator).
(c) Rule probability estimation: inside-outside algorithm in the prefix parser context as described in Stolcke's paper. Expectation-Maximization and Variational Bayes are implemented (see induction.InsideOutside).
(d) Handling of dense and sparse grammars (arrays vs lists, see parser.EarleyParserDense/EarleyParserSparse).
(e) Compute closure matrices efficiently in a way that avoids inverting large matrices as described in Stolcke's paper (see base.ClosureMatrix).
(f) Handle grammars with high fan-out (see Util.TrieSurprisal).
(g) Use integers for strings for speed.
(h) Smoothing of rule probabilities for unknown words (see parser.SmoothLexicon).

References:
(a) Andreas Stolcke. 1995. An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities. Computational Linguistics 21(2), 165-201.
(b) Roger Levy's prefix parser. http://idiom.ucsd.edu/~rlevy/prefixprobabilityparser.html
Roger Levy. 2008. Expectation-based syntactic comprehension. Cognition 106(3):1126-1177.

Citation:
@article{Luong-etal:tacl13:social, Title = {Parsing entire discourses as very long strings: {C}apturing topic continuity in grounded language learning}, Author = {Luong, Minh-Thang and Frank, Michael C. and Johnson, Mark}, Journal = {Transactions of the Association for Computational Linguistics}, Volume = {1}, Number = {3}, Pages = {315--323}, Year = {2013} }