javaNLP meeting notes 6/21 /------------\ -=| Attendance |=- \------------/ - Huy - Jenny - Teg - Bill - Kristina - Mona (Welcome!) /---------------\ -=| Announcements |=- \---------------/ - Jenny is the new javaNLP czar [bow down before me] - however, i will be gone from this thursday until july 8, so chris will be the substitute czar. - new time: Tuesdays 12:15-1:15 - EVERY week, not every other week - same location i guess, though i think someone has to book the room - Mona is joining the group - Welcome Mona! - Galen has been volunteered to be the food person for the summer - if you have a problem with this Galen, please speak up sooner rather than later, and don't hate us for volunteering you in your absense). - Because Galen is away and we aren't quite sure when he's returning, Bill has agreed to bring food next week. - Roger claims to know the solution to the permissions issues when adding packages to CVS, so hopefully we can get the specifics on that soon. /------------------\ -=| Meeting Requests |=- \------------------/ - kristina would like to discuss the classify package, since it's current guardian will soon be leaving us, and because we should discuss how to do classifiers which takes features which have arbitrary values. it was noted that Jenny is somewhat familiar with classify and could possibly take over, as could kristina. - teg commented that during training the parser currently loads all the trees into memory and then does stuff with them, instead of processing one at a time, making it requre much more memory, and that this should be fixed. - JUnit tutorial by Teg - javacc tutorial by galen (i must confess that i thought that this already happened at that i missed it, so is this something that should be scheduled?) /-----------\ -=| Next Week |=- \-----------/ - jflex tutorial by roger - discussion about classify led by kristina /-----------------\ -=| Tasks Completed |=- \-----------------/ - Kristina: - made script for distributing tagger - deleted junk from JavaNLP MaxentTagger main() - cleaned up contents of package.html file. - A line of Javadoc for each public class. - Roger: - did chinese treebank stuff - Teg: - deleted /u/nlp/lib/*.jar - the text file format for the parser seems to work - Bill: - worked on code for IDing semantic roles, not yet checked in but will be soon /----------------------------\ -=| Tasks That Need Completion |=- \----------------------------/ - Teg: - improved test suite for Tokenization. Have it check that one gets sentences of the same length from different data formats. Correct any errors with newline separated etc. (I think there still may be some.) - Kristina: - documentation of what other maxent classifiers they do and how to invoke them (this maybe done, i'm not sure?) - unifying interfaces with classify stuff. - Jenny: - classifier that takes a Counter not just a set as feature input - Dan: - Classify package: - documenting and improving classify package - have it read/write files (in SVMlight format?) - merge various logistic classifier factories into one class - Roger/Galen: - Tregex: - docco of tregex patterns - implement substitute? - be able to use "basicCategory" easily - General: - All public classes should at least have a (useful!) one line Javadoc class comment. - I think a good clean-up focus (for Jenny/Teg/Kristina) could be to provide a useful general measure of common plain text/web document preprocessing -- roughly what the parser provides, dragged out of the main method of lexicalized parser -- so that it can be used in all of NER, lexparser, and the POS tagger.