Applies an AbstractEval to a list of trees to pick the best tree
using F1 measure. Then uses a second AbstractEval to tally
statistics for the best tree chosen. This is useful for
experiments to see how much the parser could improve if you were
able to correctly order the top N trees.
The comparisonEval will not have any useful statistics, as it will
tested against the top N trees for each parsing. The countingEval
is the useful AbstractEval, as it is tallied only once per parse.
One example of this is the pcfgTopK eval, which looks for the best
LP/LR of constituents in the top K trees.