The Stanford Natural Language Processing Group

Stanford Chinese Bilingual NER Software Instructions

Overview

This page gives instructions to run the bilingual NER experiments described in papers listed on the Chinese NER page.
To carry out the up-training experiments, use the bilingual NER model described below to label unannotated bitext, and include the tagged output as additional training data to retrain the CRF NER tagger.

NAACL 2013
AAAI 2013
ACL 2013
TACL 2013

(NAACL 2013) Bilingual NER with Integer Linear Programming

Download the latest Stanford CoreNLP packages, set environment variable $JAVANLP_HOME to point to the javanlp directory.
Obtain the OntoNotes 4.0 Chinese-English portion of parallel NER data (LDC2011T03), separate into train, dev, and test portion based on descriptions in Section 4 of the paper.
Run Berkeley Aligner to produce word alignments for the OntoNotes bitext, make sure to set -writePosteriors to true to get alignment posterior probabilities. Store the resulting alignment file for test portion as test.align
Prepare the train, dev and test data into CoNLL format, where each line contains a word and its NER tag, separated by TAB, e.g.,
```
A O
European I-LOC
official O
in O
the O
Egyptian I-LOC
capital I-LOC
```
Train a baseline English CRF NER model using property file en.prop, and a Chinese CRF NER model using property file cn.prop. Name the resulting English model en.ser.gz and the Chinese model cn.ser.gz.
Download lp_solve_5.5, set environment variable $LP_HOME to point to lp_solve_5.5 directory.

Generate the zero-order model posteriors from the baseline CRFs using the following commands:

java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/* 
edu.stanford.nlp.ie.crf.CRFClassifier -testFile en.test -loadClassifier en.ser.gz -printProbs > en.test.probs
java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/* 
edu.stanford.nlp.ie.crf.CRFClassifier -testFile cn.test -loadClassifier cn.ser.gz -printProbs > cn.test.probs

Run the following script:


export PYTHONPATH=$PYTHONPATH:$LP_HOME/extra/Python/build/lib.linux-x86_64-2.6/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$LP_HOME/lib/liblpsolve55.so
python ilp-soft.py cn.test.probs en.test.probs test.align autostat.penalty > cn.test.out 2> en.test.out

Evaluate cn.test.out using conlleval

(AAAI 2013) Bilingual NER using Gibbs Sampling

Follow steps 1-5 of (NAACL 2013)
Download and install javanlp/more from here.

Perform Gibbs sampling based decoding using the following command:

java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/*:$JAVANLP_HOME/projects/more/classes 
edu.stanford.nlp.ie.crf.BisequenceCRFClassifier -prop gibbs.prop 
(NOTE: for BIO tagging, use autostat.penalty)

Evaluate cn.test.out using conlleval
Document-level global consistency model is only applicable when test set contains aligned documents (rather than sentences). To use it, uncomment the last 4 lines (below the comment) of gibbs.prop.

(ACL 2013) Joint Bilingual NER and Word Alignment using Dual Decomposition

Follow the same steps 1-4 as (AAAI 2013), except use dualdecomp.prop instead of gibbs.prop.

(TACL 2013) Cross-lingual Expecatation Projection and Regularization

For minimally-supervised evaluation, follow steps 1-2 of (AAAI 2013) (can skip the step of training the baseline Chinese CRF model), and run the following command:

java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/*:$JAVANLP_HOME/projects/more/classes 
edu.stanford.nlp.ie.crf.BilingualCRFClassifier -prop cl-proj-unsup.prop
java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/* 
edu.stanford.nlp.ie.crf.CRFClassifier -testFile cn.test -loadClassifier cn.bilingual.ser.gz > cn.test.out

For semi-supervised experiment (same evaluation setting as the up-training case in (AAAI 2013)), follow steps 1-2 of (AAAI 2013), and run the following command:

java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/*:$JAVANLP_HOME/projects/more/classes 
edu.stanford.nlp.ie.crf.BilingualCRFClassifier -prop cl-proj-semisup.prop
java -cp $JAVANLP_HOME/projects/core/classes:$JAVANLP_HOME/projects/core/lib/* 
edu.stanford.nlp.ie.crf.CRFClassifier -testFile cn.test -loadClassifier cn.bilingual.ser.gz > cn.test.out
 Evaluate cn.test.out using conlleval