Use either direct subclasses of the appropriate node and set them on the
PrototypicalNodeFactory,
or use a dynamic proxy implementing the required node type interface.
In the former case this avoids the wrapping and delegation, while the latter
case handles the wrapping and delegation without this class.
Here is an example of how to use dynamic proxies to accomplish the same effect as using decorators to wrap Text nodes:
import java.lang.reflect.InvocationHandler;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.lang.reflect.Proxy;
import org.htmlparser.Parser;
import org.htmlparser.PrototypicalNodeFactory;
import org.htmlparser.Text;
import org.htmlparser.nodes.TextNode;
import org.htmlparser.util.ParserException;
public class TextProxy
implements
InvocationHandler
{
protected Object mObject;
public static Object newInstance (Object object)
{
Class cls;
cls = object.getClass ();
return (Proxy.newProxyInstance (
cls.getClassLoader (),
cls.getInterfaces (),
new TextProxy (object)));
}
private TextProxy (Object object)
{
mObject = object;
}
public Object invoke (Object proxy, Method m, Object[] args)
throws Throwable
{
Object result;
String name;
try
{
result = m.invoke (mObject, args);
name = m.getName ();
if (name.equals ("clone"))
result = newInstance (result); // wrap the cloned object
else if (name.equals ("doSemanticAction")) // or other methods
System.out.println (mObject); // do the needful on the TextNode
}
catch (InvocationTargetException e)
{
throw e.getTargetException ();
}
catch (Exception e)
{
throw new RuntimeException ("unexpected invocation exception: " +
e.getMessage());
}
finally
{
}
return (result);
}
public static void main (String[] args)
throws
ParserException
{
// create the wrapped text node and set it as the prototype
Text text = (Text) TextProxy.newInstance (new TextNode (null, 0, 0));
PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
factory.setTextPrototype (text);
// perform the parse
Parser parser = new Parser (args[0]);
parser.setNodeFactory (factory);
parser.parse (null);
}
}
public abstract class AbstractNodeDecorator extends java.lang.Object implements Text
| Modifier | Constructor and Description |
|---|---|
protected |
AbstractNodeDecorator(Text delegate)
Deprecated.
|
| Modifier and Type | Method and Description |
|---|---|
void |
accept(NodeVisitor visitor)
Deprecated.
Apply the visitor to this node.
|
java.lang.Object |
clone()
Deprecated.
Clone this object.
|
void |
collectInto(NodeList list,
NodeFilter filter)
Deprecated.
Collect this node and its child nodes into a list, provided the node
satisfies the filtering criteria.
|
void |
doSemanticAction()
Deprecated.
Perform the meaning of this tag.
|
boolean |
equals(java.lang.Object arg0)
Deprecated.
|
NodeList |
getChildren()
Deprecated.
Get the children of this node.
|
int |
getEndPosition()
Deprecated.
Gets the ending position of the node.
|
Page |
getPage()
Deprecated.
Get the page this node came from.
|
Node |
getParent()
Deprecated.
Get the parent of this node.
|
int |
getStartPosition()
Deprecated.
Gets the starting position of the node.
|
java.lang.String |
getText()
Deprecated.
Accesses the textual contents of the node.
|
void |
setChildren(NodeList children)
Deprecated.
Set the children of this node.
|
void |
setEndPosition(int position)
Deprecated.
Sets the ending position of the node.
|
void |
setPage(Page page)
Deprecated.
Set the page this node came from.
|
void |
setParent(Node node)
Deprecated.
Sets the parent of this node.
|
void |
setStartPosition(int position)
Deprecated.
Sets the starting position of the node.
|
void |
setText(java.lang.String text)
Deprecated.
Sets the contents of the node.
|
java.lang.String |
toHtml()
Deprecated.
Return the HTML for this node.
|
java.lang.String |
toPlainTextString()
Deprecated.
A string representation of the node.
|
java.lang.String |
toString()
Deprecated.
Return the string representation of the node.
|
protected Text delegate
protected AbstractNodeDecorator(Text delegate)
public java.lang.Object clone()
throws java.lang.CloneNotSupportedException
public void accept(NodeVisitor visitor)
Nodepublic void collectInto(NodeList list, NodeFilter filter)
NodeThis mechanism allows powerful filtering code to be written very
easily, without bothering about collection of embedded tags separately.
e.g. when we try to get all the links on a page, it is not possible to
get it at the top-level, as many tags (like form tags), can contain
links embedded in them. We could get the links out by checking if the
current node is a CompositeTag, and going
through its children. So this method provides a convenient way to do
this.
Using collectInto(), programs get a lot shorter. Now, the code to extract all links from a page would look like:
NodeList list = new NodeList ();
NodeFilter filter = new TagNameFilter ("A");
for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
e.nextNode ().collectInto (list, filter);
Thus, list will hold all the link nodes, irrespective of how
deep the links are embedded.
Another way to accomplish the same objective is:
NodeList list = new NodeList ();
NodeFilter filter = new TagClassFilter (LinkTag.class);
for (NodeIterator e = parser.elements (); e.hasMoreNodes ();)
e.nextNode ().collectInto (list, filter);
This is slightly less specific because the LinkTag class may be
registered for more than one node name, e.g. <LINK> tags too.collectInto in interface Nodelist - The list to collect nodes into.filter - The criteria to use when deciding if a node should
be added to the list.public int getStartPosition()
getStartPosition in interface Nodepublic void setStartPosition(int position)
setStartPosition in interface Nodeposition - The new start position.public int getEndPosition()
getEndPosition in interface Nodepublic void setEndPosition(int position)
setEndPosition in interface Nodeposition - The new end position.public Page getPage()
public void setPage(Page page)
public boolean equals(java.lang.Object arg0)
equals in class java.lang.Objectpublic Node getParent()
NodeLexer.
Currently, the object returned from this method can be safely cast to a
CompositeTag, but this behaviour should not
be expected in the future.public java.lang.String getText()
Textpublic void setParent(Node node)
Nodepublic NodeList getChildren()
getChildren in interface Nodenull otherwise.public void setChildren(NodeList children)
setChildren in interface Nodechildren - The new list of children this node contains.public void setText(java.lang.String text)
Textpublic java.lang.String toHtml()
Nodepublic java.lang.String toPlainTextString()
Node
for (Enumeration e = parser.elements (); e.hasMoreElements ();)
// or do whatever processing you wish with the plain text string
System.out.println ((Node)e.nextElement ()).toPlainTextString ());
toPlainTextString in interface Nodepublic java.lang.String toString()
NodeSystem.out.println (node);or within a debugging environment.
public void doSemanticAction()
throws ParserException
NodeNode.getChildren().doSemanticAction in interface NodeParserException - If a problem is encountered performing the
semantic action.HTML Parser is an open source library released under LGPL.