JavaNLP meeting notes for 10/24/2002 We had a great meeting today--efficient and productive. Thanks to everyone that came (and especially to those of you that got your tasks from last week done! ;)). Here's a summary of what we agreed to: - Our first project focus is the classify package--Dan and Kristina will finalize the API and start implementing it by next week. - It will consist of an "inside API" for making different classification engines interchangeable, and an "outside API" for classifying Documents, Datums, and the like. - The "Classifier" class will only be responsible for "classification", not training or testing per se. Other classes will take in training data and spit out a Classifier which can be serialized for later use. - EVERYONE (!!) will make javadocs (particularly package docs) the top priority and get them fixed up ASAP. Specifically, the following people agreed to handle the following packages: - Chris: annotation, ie.desc, ie.merge, ie.test, io, tagger - Sep: classify, cluster, database, linalg, math, process, wsd - Dan: fsm, lexgram, optimization, parser.lexparser, stats, util - Kristina: maxent.iis, mt To make a package comment, create a file in the dir with the java files (e.g. src/edu/stanford/nlp/annotation) called "package.html" and do "cvs add package.html" to put it in the repository. The file should start and end with and tags respectively, and the first sentence will be what shows up on the overview page. The goal is to provide (1) a high-level overview of what this package does and who should care, and (2) some specific pointers of which classes to look at and use for general tasks. Look at existing package comments or email me if you're confused on what should go there. Here are the specific tasks people agreed to for this week (in addition to javadocs!!): Dan: - classify API and initial implementation - connect javanlp web page into nlp web site Sep: - make documents have a single "meta data" field - different subclasses put different fields in there - easy to say "gimme a new Doc with same meta data as this old doc" - Document should have a method for copying metadata - hook this up with processor stuff so it gives you a new doc instead of messing with what you pass in Kristina: - move maxtent.tagger package to tagger.maxent - classify API and initial implementation - look at unifying general tagger API with tagger.maxent implementation Joris: - help get tim's code into cvs and looking good / usable Huy: - test IE structure code to make sure it does something sensible - more on structure learning - look at multi-level HMMs Joseph: - fix pnp permissions [done] - start char-based IE using pnp - discriminative training for IE HMMs Roger: - fix project on tree manipulation and start working on it Chris: - fixes in trees API (not quite sure what you said specifically :-[) And to reiterate--everyone do javadocs first! It won't take long, and it will make a big difference (i.e. people might actually consider using the code you spent so long writing! :)). That's all, see you next week, same place and time, and as always, email early and often with questions or problems. Thanks, js