org.opencms.util
Interface I_CmsHtmlNodeVisitor

All Known Implementing Classes:
CmsHtml2TextConverter, CmsHtmlDecorator, CmsHtmlParser, CmsLinkProcessor

public interface I_CmsHtmlNodeVisitor

Interface for a combination of a visitor of HTML documents along with the hook to start the parser / lexer that triggers the visit.

Since:
6.1.3
Version:
$Revision: 1.9 $
Author:
Alexander Kandzior

Method Summary
 java.lang.String getConfiguration()
          Returns the configuartion String of this visitor or the empty String if was not provided before.
 java.lang.String getResult()
          Returns the text extraction result.
 java.lang.String process(java.lang.String html, java.lang.String encoding)
          Extracts the text from the given html content, assuming the given html encoding.
 void setConfiguration(java.lang.String configuration)
          Set a configuartion String for this visitor.
 void setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTags)
          Sets a list of upper case tag names for which parsing / visitng should not correct missing closing tags.
 void visitEndTag(org.htmlparser.Tag tag)
          Visitor method (callback) invoked when a closing Tag is encountered.
 void visitRemarkNode(org.htmlparser.Remark remark)
          Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
 void visitStringNode(org.htmlparser.Text text)
          Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.
 void visitTag(org.htmlparser.Tag tag)
          Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.
 

Method Detail

getConfiguration

java.lang.String getConfiguration()
Returns the configuartion String of this visitor or the empty String if was not provided before.

Returns:
the configuartion String of this visitor - by this contract never null but an empty String if not provided.
See Also:
setConfiguration(String)

getResult

java.lang.String getResult()
Returns the text extraction result.

Returns:
the text extraction result

process

java.lang.String process(java.lang.String html,
                         java.lang.String encoding)
                         throws org.htmlparser.util.ParserException
Extracts the text from the given html content, assuming the given html encoding.

Parameters:
html - the content to extract the plain text from
encoding - the encoding to use
Returns:
the text extracted from the given html content
Throws:
org.htmlparser.util.ParserException - if something goes wrong

setConfiguration

void setConfiguration(java.lang.String configuration)
Set a configuartion String for this visitor.

This will most likely be done with data from an xsd, custom jsp tag, ...

Parameters:
configuration - the configuration of this visitor to set.

setNoAutoCloseTags

void setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTags)
Sets a list of upper case tag names for which parsing / visitng should not correct missing closing tags.

This has to be used before process(String, String) is invoked to take an effect.

Parameters:
noAutoCloseTags - a list of upper case tag names for which parsing / visiting should not correct missing closing tags to set.

visitEndTag

void visitEndTag(org.htmlparser.Tag tag)
Visitor method (callback) invoked when a closing Tag is encountered.

Parameters:
tag - the tag that is ended.
See Also:
NodeVisitor.visitEndTag(org.htmlparser.Tag)

visitRemarkNode

void visitRemarkNode(org.htmlparser.Remark remark)
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.

Parameters:
remark - the remark Tag to visit.
See Also:
NodeVisitor.visitRemarkNode(org.htmlparser.Remark)

visitStringNode

void visitStringNode(org.htmlparser.Text text)
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered.

Parameters:
text - the text that is visited.
See Also:
NodeVisitor.visitStringNode(org.htmlparser.Text)

visitTag

void visitTag(org.htmlparser.Tag tag)
Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered.

Parameters:
tag - the tag that is visited.
See Also:
NodeVisitor.visitTag(org.htmlparser.Tag)