|
||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectorg.htmlparser.visitors.NodeVisitor
org.opencms.util.CmsHtmlParser
public class CmsHtmlParser
Base utility class for OpenCms
implementations, which provides some often used utility functions.
NodeVisitor
This base implementation is only a "pass through" class, that is the content is parsed, but the generated result is exactly identical to the input.
Field Summary | |
---|---|
protected boolean |
m_echo
Indicates if "echo" mode is on, that is all content is written to the result by default. |
protected java.util.List<java.lang.String> |
m_noAutoCloseTags
List of upper case tag name strings of tags that should not be auto-corrected if closing divs are missing. |
protected java.lang.StringBuffer |
m_result
The buffer to write the out to. |
protected static java.lang.String[] |
TAG_ARRAY
The array of supported tag names. |
protected static java.util.List<java.lang.String> |
TAG_LIST
The list of supported tag names. |
Constructor Summary | |
---|---|
CmsHtmlParser()
Creates a new instance of the html converter with echo mode set to false . |
|
CmsHtmlParser(boolean echo)
Creates a new instance of the html converter. |
Method Summary | |
---|---|
protected java.lang.String |
collapse(java.lang.String string)
Collapse HTML whitespace in the given String. |
protected org.htmlparser.PrototypicalNodeFactory |
configureNoAutoCorrectionTags()
Internally degrades Composite tags that do have children in the DOM tree to simple single tags. |
java.lang.String |
getConfiguration()
Returns the configuartion String of this visitor or the empty String if was not provided before. |
java.util.List<java.lang.String> |
getNoAutoCloseTags()
Returns a list of upper case tag names for which parsing / visiting will not correct missing closing tags. |
java.lang.String |
getResult()
Returns the text extraction result. |
java.lang.String |
getTagHtml(org.htmlparser.Tag tag)
Returns the HTML for the given tag itself (not the tag content). |
java.lang.String |
process(java.lang.String html,
java.lang.String encoding)
Extracts the text from the given html content, assuming the given html encoding. |
void |
setConfiguration(java.lang.String configuration)
Set a configuartion String for this visitor. |
void |
setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTagList)
Sets a list of upper case tag names for which parsing / visiting should not correct missing closing tags. |
void |
visitEndTag(org.htmlparser.Tag tag)
Visitor method (callback) invoked when a closing Tag is encountered. |
void |
visitRemarkNode(org.htmlparser.Remark remark)
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered. |
void |
visitStringNode(org.htmlparser.Text text)
Visitor method (callback) invoked when a remark Tag (HTML comment) is encountered. |
void |
visitTag(org.htmlparser.Tag tag)
Visitor method (callback) invoked when a starting Tag (HTML comment) is encountered. |
Methods inherited from class org.htmlparser.visitors.NodeVisitor |
---|
beginParsing, finishedParsing, shouldRecurseChildren, shouldRecurseSelf |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.util.List<java.lang.String> m_noAutoCloseTags
protected static final java.lang.String[] TAG_ARRAY
protected static final java.util.List<java.lang.String> TAG_LIST
protected boolean m_echo
protected java.lang.StringBuffer m_result
Constructor Detail |
---|
public CmsHtmlParser()
false
.
public CmsHtmlParser(boolean echo)
echo
- indicates if "echo" mode is on, that is all content is written to the resultMethod Detail |
---|
protected org.htmlparser.PrototypicalNodeFactory configureNoAutoCorrectionTags()
setNoAutoCloseTags(List)
public java.lang.String getConfiguration()
I_CmsHtmlNodeVisitor
getConfiguration
in interface I_CmsHtmlNodeVisitor
I_CmsHtmlNodeVisitor.getConfiguration()
public java.lang.String getResult()
I_CmsHtmlNodeVisitor
getResult
in interface I_CmsHtmlNodeVisitor
I_CmsHtmlNodeVisitor.getResult()
public java.lang.String getTagHtml(org.htmlparser.Tag tag)
tag
- the tag to create the HTML for
public java.lang.String process(java.lang.String html, java.lang.String encoding) throws org.htmlparser.util.ParserException
I_CmsHtmlNodeVisitor
process
in interface I_CmsHtmlNodeVisitor
html
- the content to extract the plain text fromencoding
- the encoding to use
org.htmlparser.util.ParserException
- if something goes wrongI_CmsHtmlNodeVisitor.process(java.lang.String, java.lang.String)
public void setConfiguration(java.lang.String configuration)
I_CmsHtmlNodeVisitor
This will most likely be done with data from an xsd, custom jsp tag, ...
setConfiguration
in interface I_CmsHtmlNodeVisitor
configuration
- the configuration of this visitor to set.I_CmsHtmlNodeVisitor.setConfiguration(java.lang.String)
public void visitEndTag(org.htmlparser.Tag tag)
I_CmsHtmlNodeVisitor
visitEndTag
in interface I_CmsHtmlNodeVisitor
visitEndTag
in class org.htmlparser.visitors.NodeVisitor
tag
- the tag that is ended.I_CmsHtmlNodeVisitor.visitEndTag(org.htmlparser.Tag)
public void visitRemarkNode(org.htmlparser.Remark remark)
I_CmsHtmlNodeVisitor
visitRemarkNode
in interface I_CmsHtmlNodeVisitor
visitRemarkNode
in class org.htmlparser.visitors.NodeVisitor
remark
- the remark Tag to visit.I_CmsHtmlNodeVisitor.visitRemarkNode(org.htmlparser.Remark)
public void visitStringNode(org.htmlparser.Text text)
I_CmsHtmlNodeVisitor
visitStringNode
in interface I_CmsHtmlNodeVisitor
visitStringNode
in class org.htmlparser.visitors.NodeVisitor
text
- the text that is visited.I_CmsHtmlNodeVisitor.visitStringNode(org.htmlparser.Text)
public void visitTag(org.htmlparser.Tag tag)
I_CmsHtmlNodeVisitor
visitTag
in interface I_CmsHtmlNodeVisitor
visitTag
in class org.htmlparser.visitors.NodeVisitor
tag
- the tag that is visited.I_CmsHtmlNodeVisitor.visitTag(org.htmlparser.Tag)
protected java.lang.String collapse(java.lang.String string)
string
- the string to collapse
public java.util.List<java.lang.String> getNoAutoCloseTags()
public void setNoAutoCloseTags(java.util.List<java.lang.String> noAutoCloseTagList)
setNoAutoCloseTags
in interface I_CmsHtmlNodeVisitor
noAutoCloseTagList
- a list of upper case tag names for which parsing / visiting
should not correct missing closing tags to set.
|
||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |