ChineseDocumentToSentenceProcessor (Stanford CoreNLP API)

java.lang.Object
- edu.stanford.nlp.process.ChineseDocumentToSentenceProcessor

All Implemented Interfaces:

Serializable
```
public class ChineseDocumentToSentenceProcessor
extends Object
implements Serializable
```
Convert a Chinese Document into a List of sentence Strings.

Author:

Pi-Chuan Chang

See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description

ChineseDocumentToSentenceProcessor()

ChineseDocumentToSentenceProcessor(String normalizationTableFile)

Constructors
Constructor and Description
`ChineseDocumentToSentenceProcessor()`
`ChineseDocumentToSentenceProcessor(String normalizationTableFile)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static List<String>`	`fromHTML(String inputString)` Strip off HTML tags before processing.
`static List<String>`	`fromPlainText(String contentString)`
`static List<String>`	`fromPlainText(String contentString, boolean segmented)`
`static void`	`main(String[] args)` usage: java ChineseDocumentToSentenceProcessor [-segmentIBM] -file filename [-encoding encoding]
`String`	`normalization(String in)` This should now become disused, and other people should call ChineseUtils directly! CDM June 2006.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - ChineseDocumentToSentenceProcessor
```
public ChineseDocumentToSentenceProcessor()
```
  - ChineseDocumentToSentenceProcessor
```
public ChineseDocumentToSentenceProcessor(String normalizationTableFile)
```
    Parameters:
    
    normalizationTableFile - A file listing character pairs for normalization. Currently the normalization table must be in UTF-8. If this parameter is null, the default normalization of the zero-argument constructor is used.
- Method Detail
  - normalization
```
public String normalization(String in)
```
    This should now become disused, and other people should call ChineseUtils directly! CDM June 2006.
  - main
```
public static void main(String[] args)
                 throws IOException
```
    usage: java ChineseDocumentToSentenceProcessor [-segmentIBM] -file filename [-encoding encoding]
    The -segmentIBM option is for IBM GALE-specific splitting of an XML element into sentences.
    
    Throws:
    
    IOException
  - fromHTML
```
public static List<String> fromHTML(String inputString)
                             throws IOException
```
    Strip off HTML tags before processing. Only the simplest tag stripping is implemented.
    
    Parameters:
    
    inputString - Chinese document text which contains HTML tags
    
    Returns:
    
    a List of sentence strings
    
    Throws:
    
    IOException
  - fromPlainText
```
public static List<String> fromPlainText(String contentString)
                                  throws IOException
```
    Parameters:
    
    contentString - Chinese document text
    
    Returns:
    
    a List of sentence strings
    
    Throws:
    
    IOException
  - fromPlainText
```
public static List<String> fromPlainText(String contentString,
                                         boolean segmented)
                                  throws IOException
```
    Throws:
    
    IOException

Class ChineseDocumentToSentenceProcessor

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

ChineseDocumentToSentenceProcessor

ChineseDocumentToSentenceProcessor

Method Detail

normalization

main

fromHTML

fromPlainText

fromPlainText