Weight Learning Results

All numbers are Accuracy/CWS.

BASIC: Threshold was not optimized on the test data.
OPT: Threshold was optimized on the test data.

The results were generated using the alignments in the files /u/nlp/rte/data/byformat/align/simple/*.xml

Source

Train on all other sources, test on this source (OPT uses one threshold per source)

Put all sources together, do random 10-fold CV (OPT uses one threshold per fold)

15-fold CV within each source only. (OPT uses one threshold for entire source; not per fold.)

Train & test on each source only.

BASIC

OPT

BASIC

OPT

BASIC

OPT

BASIC

OPT

ALL

0.582 / 0.616

0.648 / 0.676

0.59 / 0.598

0.642 / 0.669

--

--

--

--

ATM Dev

0.472 / 0.66

0.527 / 0.68

0.416 / 0.425

0.5 / 0.512

0.5 / 0.281

0.583 / 0.372

0.722 / 0.759

0.805 / 0.823

Brandeis Dev

0.54 / 0.617

0.675 / 0.722

0.594 / 0.626

0.675 / 0.668

0.459 / 0.483

0.594 / 0.449

0.81 / 0.906

0.864 / 0.93

Cycorp Dev

0.611 / 0.595

0.611 / 0.595

0.444 / 0.393

0.666 / 0.704

0.444 / 0.544

0.527 / 0.531

0.861 / 0.879

0.861 / 0.879

LCC-H Dev

0.6 / 0.578

0.657 / 0.712

0.657 / 0.76

0.6 / 0.666

0.4 / 0.434

0.571 / 0.618

0.885 / 0.916

0.885 / 0.916

LCC-M Dev

0.5 / 0.525

0.633 / 0.578

0.566 / 0.472

0.666 / 0.628

0.533 / 0.54

0.6 / 0.551

0.933 / 0.929

0.966 / 0.948

MIT Dev

0.5 / 0.532

0.7 / 0.69

0.466 / 0.597

0.433 / 0.58

0.5 / 0.384

0.6 / 0.641

0.933 / 0.925

0.966 / 0.961

PARC Dev

0.657 / 0.642

0.671 / 0.67

0.789 / 0.838

0.828 / 0.9

0.684 / 0.617

0.697 / 0.632

0.973 / 0.965

0.973 / 0.965

Stanford Dev

0.566 / 0.544

0.633 / 0.635

0.533 / 0.507

0.5 / 0.476

0.266 / 0.261

0.566 / 0.681

0.733 / 0.745

0.766 / 0.765

UTD-ICSI Dev

0.675 / 0.778

0.702 / 0.787

0.594 / 0.706

0.648 / 0.687

0.351 / 0.247

0.567 / 0.467

0.756 / 0.882

0.81 / 0.89

Pascal Dev1









Pascal Dev2









Pascal Test