edu.stanford.nlp.tagger.maxent
Class Extractor

java.lang.Object
  extended by edu.stanford.nlp.tagger.maxent.Extractor
All Implemented Interfaces:
Serializable
Direct Known Subclasses:
DictionaryExtractor, ExtractorDistsim, ExtractorDistsim.ExtractorDistsimConjunction

public class Extractor
extends Object
implements Serializable

This class serves as the base class for classes which extract relevant information from a history to give it to the features. Every feature has an associated extractor or maybe more. GlobalHolder keeps all the extractors; two histories are considered equal if all extractors return equal values for them. The main functionality of the Extractors is provided by the method extract which takes a History as an argument. The Extractor looks at the history and takes out something important for the features - e.g. specific words and tags at specific positions or some function of the History. The histories are effectively vectors of values, with each dimension being the output of some extractor. When creating a new Extractor subclass, make sure to override the setGlobalHandler method if you need information from the tagger. The best policy is to declare any such data you take from the extractor as "transient", especially if it is a large object such as the dictionary. New extractors are created in either ExtractorFrames or ExtractorFramesRare; those are the places you want to consider adding your new extractor. Note that some extractors can be reused across multiple taggers, but many cannot. Any extractor that uses information from the tagger such as its dictionary, for example, cannot. For the moment, some of the extractors in ExtractorFrames and ExtractorFramesRare are static; those are all reusable at the moment, but if you change them in any way to make them not reusable, make sure to change the way they are constructed as well.

Author:
Kristina Toutanova
See Also:
Serialized Form

Constructor Summary
  Extractor()
           
protected Extractor(int position, boolean isTag)
          This constructor creates an extractor which extracts either the tag or the word from position position in the history.
 
Method Summary
 boolean isDynamic()
           
 boolean isLocal()
           
 int leftContext()
           
 boolean precondition(String tag)
          This evaluates any precondition for a feature being applicable based on a certain tag.
 int rightContext()
           
protected  void setGlobalHolder(MaxentTagger tagger)
          Subclasses should override this method and keep only the data they want about the tagger.
 String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Extractor

public Extractor()

Extractor

protected Extractor(int position,
                    boolean isTag)
This constructor creates an extractor which extracts either the tag or the word from position position in the history.

Parameters:
position - The position of the thing to be extracted. This is relative to the current word. For example, position 0 will be the current word, -1 will be the word before +1 will be the word after, etc.
isTag - If true this means that the POS tag is extracted from position, otherwise the word is extracted.
Method Detail

setGlobalHolder

protected void setGlobalHolder(MaxentTagger tagger)
Subclasses should override this method and keep only the data they want about the tagger. Note that such data should also be declared "transient" if it is already available in the tagger. This is because, when we save the tagger to disk, we do so by writing out objects, and there is no need to write the same object more than once. setGlobalHolder will be called both after construction when building a new tag and when loading existing taggers from disk, so the same data will available then as well.


precondition

public boolean precondition(String tag)
This evaluates any precondition for a feature being applicable based on a certain tag. It returns true if the feature is applicable. By default an Extractor is applicable everywhere, but some subclasses limit application.

Parameters:
tag - The possible tag that the feature will be generated for
Returns:
Whether the feature extractor is applicable (true) or not (false)

leftContext

public int leftContext()
Returns:
the number of positions to the left the extractor looks at (only tags, because words are fixed.)

rightContext

public int rightContext()
Returns:
the number of positions to the right the etxractor looks at (only tags, because words are fixed.)

isDynamic

public boolean isDynamic()
Returns:
Returns true if extractor is a function of POS tags; if it returns false, features are pre-computed.

isLocal

public boolean isLocal()
Returns:
Returns true if extractor is not a function of POS tags, and only depends on current word.

toString

public String toString()
Overrides:
toString in class Object


Stanford NLP Group