JavaNLP meeting notes for 11/21/02 Today, for the first time (this fall if not ever), we have 100% of the JavaNLP packages documented, at least minimally. Thanks to everyone for your contribution to this important milestone! This doesn't mean we're done documenting, but it does mean we're making clear progress improving the quality and usefulness of the repository. Let's keep up the great work! Next week is thanksgiving, so obviously there'll be no JavaNLP meeting. Our next meeting (final meeting of the quarter?!) will be December 5th. Progress emails will be due Tuesday December 3rd (i.e. no need to send reports for next Tuesday). Summary of this week's progress: * cvsweb is now up and running (nlp.stanford.edu/cvsweb) - it's password protected, see me if you didn't get the login/passwd at the meeting today (or talk to someone that was there) - particularly useful for viewing/comparing old versions of files in cvs * The rehaul of the Document classes is now complete and the process API is now using Documents instead of collections. - For most instances, use dbm.BasicDocument or make your own subclass (see the javadocs). - Processors (in the process package) take in one doc and spit out another (new) doc - implementers use doc.blankDocument() to make a new document with the same meta-data as the old one - Documents are normally lists of Words, but don't have to be (can be sentences, trees, etc). - Make sure your process classes state what they expect to be in the input document * internal classifier API is coming along - now have implementation of logistic regression and perceptron - waiting on external classification API and serialization/saving mechanism for classifiers * HMM IE code now can train and test multi-field HMMs * There's now an implementation of Fisher's exact test (thanks Chris!) in the math package Tasks for next week(s): Dan: - external classification API (in classify package) - make sure Datum/DataCollection is adequate (talk to js/sep) - external wrappers for each internal classifier - standard load/save mechanism for trained classifiers - NaiveBayesClassifier implementation Kristina: - move classification.internal package to classify.internal package - make sure permissions don't get messed up - move current classify stuff to classify.old - SVM classifier implementation Sep: - fix Stemmer to work with new process/document api - fix up PTBTokenizer to work with nbsp chars - make sure you coordinate with Tim Grow, who wrote ptb tokenizer Joris: - integrate pnp classifier into pcfg ie code - continue cleaning up / improving pcfg ie code Huy: - HMM experiments Cindi: - check in fix to treebank tokenizer - learn about process / document api in anticipation of doing framenet stuff Chris: - make Label/Word immutable (no set methods) Roger: - POSTagMapper to convert between BNC and PTB tags Joseph: - set up tomcat and ant - put up web page of javanlp member contact info - HMM experiments Thanks, see you in two weeks! js