org.opencms.util
Class CmsHtmlExtractor

java.lang.Object
  extended by org.opencms.util.CmsHtmlExtractor

public final class CmsHtmlExtractor
extends java.lang.Object

Extracts plain text from HTML.

Since:
6.0.0
Version:
$Revision: 1.15 $
Author:
Alexander Kandzior

Method Summary
static java.lang.String extractText(java.io.InputStream in, java.lang.String encoding)
          Extract the text from a HTML page.
static java.lang.String extractText(java.lang.String content, java.lang.String encoding)
          Extract the text from a HTML page.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

extractText

public static java.lang.String extractText(java.io.InputStream in,
                                           java.lang.String encoding)
                                    throws org.htmlparser.util.ParserException,
                                           java.io.UnsupportedEncodingException
Extract the text from a HTML page.

Parameters:
in - the html content input stream
encoding - the encoding of the content
Returns:
the extracted text from the page
Throws:
org.htmlparser.util.ParserException - if the parsing of the HTML failed
java.io.UnsupportedEncodingException - if the given encoding is not supported

extractText

public static java.lang.String extractText(java.lang.String content,
                                           java.lang.String encoding)
                                    throws org.htmlparser.util.ParserException,
                                           java.io.UnsupportedEncodingException
Extract the text from a HTML page.

Parameters:
content - the html content
encoding - the encoding of the content
Returns:
the extracted text from the page
Throws:
org.htmlparser.util.ParserException - if the parsing of the HTML failed
java.io.UnsupportedEncodingException - if the given encoding is not supported