Better Word Representations with Recursive Neural Networks for Morphology

Thanks for reading our paper and visiting this project page! If you have any questions, feel free to email us.

Dataset:

The Stanford Rare Word (RW) Similarity Dataset could now be downloaded here.

Morphologically-trained word vectors:

Based on Huang et al. (2012)'s embeddings (HSMN+csmRNN): [ embeddings (text) ] [ words (text) ] [ parameters (mat) ].

Based on Collobert et al. (2011)'s embeddings (CW+csmRNN): [ embeddings (text) ] [ words (text) ] [ parameters (mat) ].

Note:

Each model file is in Matlab format and consists of the following variables as described in the paper W, v, b, Wm, bm, mWe. There is also a variable tagMorphemes which lists out all morphemes of the form, e.g. un/PRE, fortunate/STM, ly/SUF, each of which corresponds to a column vector in mWe.

	WS353	MC	RG	SCWS*	RW
HSMN+csmRNN	64.70	71.73	65.42	44.10	22.55
CW+csmRNN	58.49	60.84	61.19	49.31	32.06

Citation: