Stanford Chinese Bilingual NER Software Instructions


This page gives instructions to run the bilingual NER experiments described in papers listed on the Chinese NER page.
To carry out the up-training experiments, use the bilingual NER model described below to label unannotated bitext, and include the tagged output as additional training data to retrain the CRF NER tagger.

(NAACL 2013) Bilingual NER with Integer Linear Programming

  1. Download the latest Stanford CoreNLP packages, set environment variable $JAVANLP_HOME to point to the javanlp directory.
  2. Obtain the OntoNotes 4.0 Chinese-English portion of parallel NER data (LDC2011T03), separate into train, dev, and test portion based on descriptions in Section 4 of the paper.
  3. Run Berkeley Aligner to produce word alignments for the OntoNotes bitext, make sure to set -writePosteriors to true to get alignment posterior probabilities. Store the resulting alignment file for test portion as test.align
  4. Prepare the train, dev and test data into CoNLL format, where each line contains a word and its NER tag, separated by TAB, e.g.,
    A O
    European I-LOC
    official O
    in O
    the O
    Egyptian I-LOC
    capital I-LOC
  5. Train a baseline English CRF NER model using property file en.prop, and a Chinese CRF NER model using property file cn.prop. Name the resulting English model en.ser.gz and the Chinese model cn.ser.gz.
  6. Download lp_solve_5.5, set environment variable $LP_HOME to point to lp_solve_5.5 directory.
  7. Generate the zero-order model posteriors from the baseline CRFs using the following commands:
    java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/* -testFile en.test -loadClassifier en.ser.gz -printProbs > en.test.probs
    java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/* -testFile cn.test -loadClassifier cn.ser.gz -printProbs > cn.test.probs
  8. Run the following script:
    export PYTHONPATH=$PYTHONPATH:$LP_HOME/extra/Python/build/lib.linux-x86_64-2.6/
    python cn.test.probs en.test.probs test.align autostat.penalty > cn.test.out 2> en.test.out 
  9. Evaluate cn.test.out using conlleval

(AAAI 2013) Bilingual NER using Gibbs Sampling

  1. Follow steps 1-5 of (NAACL 2013)
  2. Download and install javanlp/more from here.
  3. Perform Gibbs sampling based decoding using the following command:
    java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/*:$JAVANLP_HOME/projects/more/classes -prop gibbs.prop

    (NOTE: for BIO tagging, use autostat.penalty)
  4. Evaluate cn.test.out using conlleval
  5. Document-level global consistency model is only applicable when test set contains aligned documents (rather than sentences). To use it, uncomment the last 4 lines (below the comment) of gibbs.prop.

(ACL 2013) Joint Bilingual NER and Word Alignment using Dual Decomposition

  1. Follow the same steps 1-4 as (AAAI 2013), except use dualdecomp.prop instead of gibbs.prop.

(TACL 2013) Cross-lingual Expecatation Projection and Regularization

  1. For minimally-supervised evaluation, follow steps 1-2 of (AAAI 2013) (can skip the step of training the baseline Chinese CRF model), and run the following command:
    java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/*:$JAVANLP_HOME/projects/more/classes -prop cl-proj-unsup.prop
    java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/* -testFile cn.test -loadClassifier cn.bilingual.ser.gz > cn.test.out
  2. For semi-supervised experiment (same evaluation setting as the up-training case in (AAAI 2013)), follow steps 1-2 of (AAAI 2013), and run the following command:
    java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/*:$JAVANLP_HOME/projects/more/classes -prop cl-proj-semisup.prop
    java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/* -testFile cn.test -loadClassifier cn.bilingual.ser.gz > cn.test.out
  3. Evaluate cn.test.out using conlleval