Package org.apache.pdfbox.pdfparser
Class BruteForceParser
- java.lang.Object
-
- org.apache.pdfbox.pdfparser.BaseParser
-
- org.apache.pdfbox.pdfparser.COSParser
-
- org.apache.pdfbox.pdfparser.BruteForceParser
-
- All Implemented Interfaces:
ICOSParser
public class BruteForceParser extends COSParser
Brute force parser to be used as last resort if a malformed pdf can't be read.
-
-
Field Summary
Fields Modifier and Type Field Description private java.util.Map<COSObjectKey,java.lang.Long>bfSearchCOSObjectKeyOffsetsContains all found objects of a brute force search.private booleanbfSearchTriggeredprivate static char[]EOF_MARKEREOF-marker.private static org.apache.commons.logging.LogLOGprivate static longMINIMUM_SEARCH_OFFSETprivate static char[]OBJ_MARKERobj-marker.private static char[]OBJ_STREAMObjStream-marker.private static char[]TRAILER_MARKERtrailer-marker.private static char[]XREF_STREAMprivate static char[]XREF_TABLE-
Fields inherited from class org.apache.pdfbox.pdfparser.COSParser
fileLen, initialParseDone, securityHandler, SYSPROP_EOFLOOKUPRANGE, xrefTrailerResolver
-
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, MAX_LENGTH_LONG, N, O, R, S, source, STREAM_STRING, T
-
-
Constructor Summary
Constructors Constructor Description BruteForceParser(RandomAccessRead source, COSDocument document)Constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private longbfSearchForLastEOFMarker()Brute force search for the last EOF marker.private voidbfSearchForObjects()Brute force search for every object in the pdf.private java.util.Map<java.lang.Long,COSObjectKey>bfSearchForObjStreamOffsets()Search for all offsets of object streams within the given pdfprotected voidbfSearchForObjStreams(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler)Brute force search for all objects streams of a pdf.private booleanbfSearchForTrailer(COSDictionary trailer)Brute force search for all trailer marker.protected longbfSearchForXRef(long xrefOffset)Search for the offset of the given xref table/stream among those found by a brute force search.private java.util.List<java.lang.Long>bfSearchForXRefStreams()Brute force search for all /XRef entries (streams).private java.util.List<java.lang.Long>bfSearchForXRefTables()Brute force search for all xref entries (tables).booleanbfSearchTriggered()Indicates wether the brute force search for objects was triggered.private COSObjectcompareCOSObjects(COSObject newObject, java.lang.Long newOffset, COSObject currentObject)private longfindString(char[] string)Search for the given string.protected java.util.Map<COSObjectKey,java.lang.Long>getBFCOSObjectOffsets()Returns all found objects of a brute force search.private booleanisCatalog(COSDictionary dictionary)Tell if the dictionary is a PDF or FDF catalog.private booleanisInfo(COSDictionary dictionary)Tell if the dictionary is an info dictionary.protected COSDictionaryrebuildTrailer(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler)Rebuild the trailer dictionary if startxref can't be found.private booleansearchForTrailerItems(COSDictionary trailer)Search for the different parts of the trailer dictionary.private longsearchNearestValue(java.util.List<java.lang.Long> values, long offset)-
Methods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, createRandomAccessReadView, dereferenceCOSObject, getAccessPermission, getEncryption, isLenient, isString, lastIndexOf, parseCOSStream, parseFDFHeader, parseObjectDynamically, parseObjectStreamObject, parsePDFHeader, parseXrefTable, prepareDecryption, resetTrailerResolver, retrieveTrailer, setEOFLookupRange, setLenient
-
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
getObjectKey, isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOF, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipLinebreak, skipSpaces, skipWhiteSpaces
-
-
-
-
Field Detail
-
XREF_TABLE
private static final char[] XREF_TABLE
-
XREF_STREAM
private static final char[] XREF_STREAM
-
MINIMUM_SEARCH_OFFSET
private static final long MINIMUM_SEARCH_OFFSET
- See Also:
- Constant Field Values
-
EOF_MARKER
private static final char[] EOF_MARKER
EOF-marker.
-
OBJ_MARKER
private static final char[] OBJ_MARKER
obj-marker.
-
TRAILER_MARKER
private static final char[] TRAILER_MARKER
trailer-marker.
-
OBJ_STREAM
private static final char[] OBJ_STREAM
ObjStream-marker.
-
LOG
private static final org.apache.commons.logging.Log LOG
-
bfSearchCOSObjectKeyOffsets
private final java.util.Map<COSObjectKey,java.lang.Long> bfSearchCOSObjectKeyOffsets
Contains all found objects of a brute force search.
-
bfSearchTriggered
private boolean bfSearchTriggered
-
-
Constructor Detail
-
BruteForceParser
public BruteForceParser(RandomAccessRead source, COSDocument document) throws java.io.IOException
Constructor. Triggers a brute force search for all objects of the document.- Parameters:
source- input representing the pdf.document- the corresponding COS document- Throws:
java.io.IOException- if the source data could not be read
-
-
Method Detail
-
bfSearchTriggered
public boolean bfSearchTriggered()
Indicates wether the brute force search for objects was triggered.- Returns:
- true if the search was triggered
-
getBFCOSObjectOffsets
protected java.util.Map<COSObjectKey,java.lang.Long> getBFCOSObjectOffsets() throws java.io.IOException
Returns all found objects of a brute force search.- Returns:
- map containing all found objects of a brute force search
- Throws:
java.io.IOException- if something went wrong
-
bfSearchForObjects
private void bfSearchForObjects() throws java.io.IOExceptionBrute force search for every object in the pdf.- Throws:
java.io.IOException- if something went wrong
-
bfSearchForXRef
protected long bfSearchForXRef(long xrefOffset) throws java.io.IOExceptionSearch for the offset of the given xref table/stream among those found by a brute force search.- Parameters:
xrefOffset- the given offset to be searched for- Returns:
- the offset of the xref entry
- Throws:
java.io.IOException- if something went wrong
-
searchNearestValue
private long searchNearestValue(java.util.List<java.lang.Long> values, long offset)
-
bfSearchForObjStreams
protected void bfSearchForObjStreams(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler) throws java.io.IOException
Brute force search for all objects streams of a pdf.- Parameters:
trailerResolver- the trailer resolver of the documentsecurityHandler- security handler to be used to decrypt encrypted documents- Throws:
java.io.IOException- if something went wrong
-
bfSearchForTrailer
private boolean bfSearchForTrailer(COSDictionary trailer) throws java.io.IOException
Brute force search for all trailer marker.- Parameters:
trailer- dictionary to be used as trailer dictionary- Throws:
java.io.IOException- if something went wrong
-
searchForTrailerItems
private boolean searchForTrailerItems(COSDictionary trailer) throws java.io.IOException
Search for the different parts of the trailer dictionary.- Parameters:
trailer- dictionary to be used as trailer dictionary- Returns:
- true if the root was found, false if not.
- Throws:
java.io.IOException- if something went wrong
-
compareCOSObjects
private COSObject compareCOSObjects(COSObject newObject, java.lang.Long newOffset, COSObject currentObject)
-
bfSearchForLastEOFMarker
private long bfSearchForLastEOFMarker() throws java.io.IOExceptionBrute force search for the last EOF marker.- Throws:
java.io.IOException- if something went wrong
-
bfSearchForObjStreamOffsets
private java.util.Map<java.lang.Long,COSObjectKey> bfSearchForObjStreamOffsets() throws java.io.IOException
Search for all offsets of object streams within the given pdf- Returns:
- a map of all offsets for object streams
- Throws:
java.io.IOException- if something went wrong
-
bfSearchForXRefTables
private java.util.List<java.lang.Long> bfSearchForXRefTables() throws java.io.IOExceptionBrute force search for all xref entries (tables).- Throws:
java.io.IOException- if something went wrong
-
bfSearchForXRefStreams
private java.util.List<java.lang.Long> bfSearchForXRefStreams() throws java.io.IOExceptionBrute force search for all /XRef entries (streams).- Throws:
java.io.IOException- if something went wrong
-
isInfo
private boolean isInfo(COSDictionary dictionary)
Tell if the dictionary is an info dictionary.- Parameters:
dictionary- the dictionary to be checked- Returns:
- true if the given dictionary is an info dictionary
-
isCatalog
private boolean isCatalog(COSDictionary dictionary)
Tell if the dictionary is a PDF or FDF catalog.- Parameters:
dictionary-- Returns:
- true if the given dictionary is a root dictionary
-
findString
private long findString(char[] string) throws java.io.IOExceptionSearch for the given string. The search starts at the current position and returns the start position if the string was found. -1 is returned if there isn't any further occurrence of the given string. After returning the current position is either the end of the string or the end of the input.- Parameters:
string- the string to be searched- Returns:
- the start position of the found string
- Throws:
java.io.IOException- if something went wrong
-
rebuildTrailer
protected COSDictionary rebuildTrailer(XrefTrailerResolver trailerResolver, SecurityHandler<? extends ProtectionPolicy> securityHandler) throws java.io.IOException
Rebuild the trailer dictionary if startxref can't be found.- Parameters:
trailerResolver- the trailer resolver of the documentsecurityHandler- security handler to be used to decrypt encrypted documents- Returns:
- the rebuild trailer dictionary
- Throws:
java.io.IOException- if something went wrong
-
-