edu.stanford.nlp.util
Class XMLUtils

java.lang.Object
  extended by edu.stanford.nlp.util.XMLUtils

public class XMLUtils
extends java.lang.Object

Class XMLUtils


Nested Class Summary
static class XMLUtils.XMLTag
           
 
Field Summary
static java.util.Set breakingTags
          Block-level HTML tags that are rendered with surrounding line breaks.
 
Constructor Summary
XMLUtils()
           
 
Method Summary
static java.lang.String escapeStringForXML(java.lang.String s)
          Returns a String in which all of the special characters of XML have been escaped.
static java.lang.String escapeTextAroundXMLTags(java.lang.String s)
           
static boolean isBreaking(java.lang.String tag)
           
static boolean isBreaking(XMLUtils.XMLTag tag)
           
static void main(java.lang.String[] args)
           
static XMLUtils.XMLTag parseTag(java.lang.String tagString)
           
static XMLUtils.XMLTag readAndParseTag(java.io.Reader r)
           
static java.lang.String readTag(java.io.Reader r)
          Reads all text of the XML tag and returns it as a String.
static java.lang.String readUntilTag(java.io.Reader r)
          Reads all text up to next XML tag and returns it as a String.
static java.lang.String stripTags(java.io.Reader r, java.util.List mapBack, boolean markLineBreaks)
           
static java.lang.String unescapeStringForXML(java.lang.String s)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

breakingTags

public static final java.util.Set breakingTags
Block-level HTML tags that are rendered with surrounding line breaks.

Constructor Detail

XMLUtils

public XMLUtils()
Method Detail

stripTags

public static java.lang.String stripTags(java.io.Reader r,
                                         java.util.List mapBack,
                                         boolean markLineBreaks)
Parameters:
r - the reader to read the XML/HTML from
mapBack - a List of Integers mapping the positions in the result buffer to positions in the original Reader, will be cleared on receipt
Returns:
the String containing the resulting text

isBreaking

public static boolean isBreaking(java.lang.String tag)

isBreaking

public static boolean isBreaking(XMLUtils.XMLTag tag)

readUntilTag

public static java.lang.String readUntilTag(java.io.Reader r)
                                     throws java.io.IOException
Reads all text up to next XML tag and returns it as a String.

Returns:
the String of the text read, which may be empty.
Throws:
java.io.IOException

readAndParseTag

public static XMLUtils.XMLTag readAndParseTag(java.io.Reader r)
                                       throws java.lang.Exception
Returns:
the new XMLTag object, or null if couldn't be created
Throws:
java.lang.Exception

unescapeStringForXML

public static java.lang.String unescapeStringForXML(java.lang.String s)

escapeStringForXML

public static java.lang.String escapeStringForXML(java.lang.String s)
Returns a String in which all of the special characters of XML have been escaped. The resulting String can be used as text in well-formed XML.

Parameters:
s -
Returns:

escapeTextAroundXMLTags

public static java.lang.String escapeTextAroundXMLTags(java.lang.String s)

readTag

public static java.lang.String readTag(java.io.Reader r)
                                throws java.io.IOException
Reads all text of the XML tag and returns it as a String. Assumes that a '<' character has already been read.

Parameters:
r -
Returns:
the String representing the tag, or null if one couldn't be read
Throws:
java.io.IOException

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Throws:
java.lang.Exception

parseTag

public static XMLUtils.XMLTag parseTag(java.lang.String tagString)
                                throws java.lang.Exception
Throws:
java.lang.Exception