A more efficient implementation of affecting all string nodes, is to replace
the Text node prototype in the PrototypicalNodeFactory with a
custom TextNode that performs the required operation.
For example, if you were using:
StringNodeFactory factory = new StringNodeFactory(); factory.setDecode(true);to decode all text issued from
Text.toPlainTextString(),
you would instead create a subclass of TextNode
and set it as the prototype for text node generation:
PrototypicalNodeFactory factory = new PrototypicalNodeFactory ();
factory.setTextPrototype (new TextNode () {
public String toPlainTextString()
{
return (org.htmlparser.util.Translate.decode (super.toPlainTextString ()));
}
});
Similar constructs apply to removing escapes and converting non-breaking
spaces, which were the examples previously provided.
Using a subclass avoids the wrapping and delegation inherent in the decorator pattern, with subsequent improvements in processing speed and memory usage.
public class StringNodeFactory extends PrototypicalNodeFactory implements java.io.Serializable
| Modifier and Type | Field and Description |
|---|---|
protected boolean |
mConvertNonBreakingSpaces
Deprecated.
Flag to tell the parser to convert non breaking space (from ? to a space " ").
|
protected boolean |
mDecode
Deprecated.
Flag to tell the parser to decode strings returned by StringNode's toPlainTextString.
|
protected boolean |
mRemoveEscapes
Deprecated.
Flag to tell the parser to remove escape characters, like \n and \t, returned by StringNode's toPlainTextString.
|
mBlastocyst, mRemark, mTag, mText| Constructor and Description |
|---|
StringNodeFactory()
Deprecated.
|
| Modifier and Type | Method and Description |
|---|---|
Text |
createStringNode(Page page,
int start,
int end)
Deprecated.
Create a new string node.
|
boolean |
getConvertNonBreakingSpaces()
Deprecated.
Get the non-breaking space replacing state.
|
boolean |
getDecode()
Deprecated.
Get the decoding state.
|
boolean |
getRemoveEscapes()
Deprecated.
Get the escape removing state.
|
void |
setConvertNonBreakingSpaces(boolean convert)
Deprecated.
Set the non-breaking space replacing state.
|
void |
setDecode(boolean decode)
Deprecated.
Set the decoding state.
|
void |
setRemoveEscapes(boolean remove)
Deprecated.
Set the escape removing state.
|
clear, createRemarkNode, createTagNode, get, getRemarkPrototype, getTagNames, getTagPrototype, getTextPrototype, put, registerTag, registerTags, remove, setRemarkPrototype, setTagPrototype, setTextPrototype, unregisterTagprotected boolean mDecode
protected boolean mRemoveEscapes
protected boolean mConvertNonBreakingSpaces
public Text createStringNode(Page page, int start, int end)
createStringNode in interface NodeFactorycreateStringNode in class PrototypicalNodeFactorypage - The page the node is on.start - The beginning position of the string.end - The ending positiong of the string.public void setDecode(boolean decode)
decode - If true, string nodes decode text using Translate.decode(java.lang.String).public boolean getDecode()
true if string nodes decode text.public void setRemoveEscapes(boolean remove)
remove - If true, string nodes remove escape characters.public boolean getRemoveEscapes()
public void setConvertNonBreakingSpaces(boolean convert)
convert - If true, string nodes replace ;nbsp; characters with spaces.public boolean getConvertNonBreakingSpaces()
HTML Parser is an open source library released under LGPL.