public final class HTMLParser extends Object
HTMLConfiguration
to parse HTML into a HtmlUnit-specific DOM (HU-DOM) tree.
Note that the parser currently does not handle CDATA or comment sections, i.e. these do not appear in the resulting DOM tree
| Modifier and Type | Method and Description |
|---|---|
static IElementFactory |
getFactory(String tagName) |
static boolean |
getIgnoreOutsideContent()
Get the state of the flag to ignore content outside the BODY and HTML tags
|
static HtmlPage |
parse(WebResponse webResponse,
WebWindow webWindow)
parse the HTML content from the given WebResponse into an object tree representation
|
static void |
parseFragment(DomNode parent,
String source)
Parses the HTML content from the given string into an object tree representation.
|
static void |
setIgnoreOutsideContent(boolean ignoreOutsideContent)
Set the flag to control validation of the HTML content that is outside of the
BODY and HTML tags.
|
public static void setIgnoreOutsideContent(boolean ignoreOutsideContent)
ignoreOutsideContent - - boolean flag to setpublic static boolean getIgnoreOutsideContent()
public static IElementFactory getFactory(String tagName)
tagName - an HTML element tag namepublic static void parseFragment(DomNode parent, String source) throws SAXException, IOException
parent - the parent for the new nodessource - the (X)HTML to be parsedSAXException - if a SAX error occursIOException - if an IO error occurspublic static HtmlPage parse(WebResponse webResponse, WebWindow webWindow) throws IOException
webResponse - the response datawebWindow - the web window into which the page is to be loadednull if the <HTML>
tag is missingIOException - io errorCopyright © 2002-2012 Gargoyle Software Inc.. All Rights Reserved.