Package org.apache.pdfbox.pdfparser
Class COSParser
- java.lang.Object
-
- org.apache.pdfbox.pdfparser.BaseParser
-
- org.apache.pdfbox.pdfparser.COSParser
-
public class COSParser extends BaseParser
PDF-Parser which first reads startxref and xref tables in order to know valid objects and parse only these objects. FirstPDFParser.parse()orFDFParser.parse()must be called before page objects can be retrieved, e.g.PDFParser.getPDDocument(). This class is a much enhanced version ofQuickParserpresented in PDFBOX-1104 by Jeremy Villalobos.
-
-
Field Summary
Fields Modifier and Type Field Description private AccessPermissionaccessPermissionprivate java.util.Map<COSObjectKey,java.lang.Long>bfSearchCOSObjectKeyOffsetsContains all found objects of a brute force search.private java.util.List<java.lang.Long>bfSearchXRefStreamsOffsetsprivate java.util.List<java.lang.Long>bfSearchXRefTablesOffsetsprivate static intDEFAULT_TRAIL_BYTECOUNTHow many trailing bytes to read for EOF marker.private PDEncryptionencryptionprivate static byte[]ENDOBJprivate static byte[]ENDSTREAMprotected static char[]EOF_MARKEREOF-marker.private static java.lang.StringFDF_DEFAULT_VERSIONprivate static java.lang.StringFDF_HEADERprotected longfileLenfile length.protected booleaninitialParseDoneprivate booleanisLenientis parser using auto healing capacity ?private java.lang.StringkeyAliasprivate java.io.InputStreamkeyStoreInputStreamprivate java.lang.LonglastEOFMarkerprivate static org.apache.commons.logging.LogLOGprivate static longMINIMUM_SEARCH_OFFSETprotected static char[]OBJ_MARKERobj-marker.private static char[]OBJ_STREAMObjStream-marker.private java.lang.Stringpasswordprivate static java.lang.StringPDF_DEFAULT_VERSIONprivate static java.lang.StringPDF_HEADERprivate intreadTrailByteshow many trailing bytes to read for EOF marker.protected SecurityHandlersecurityHandlerThe security handler.protected RandomAccessReadsourceprivate static char[]STARTXREFprivate byte[]streamCopyBufprivate static intSTREAMCOPYBUFLENprivate byte[]strmBufprivate static intSTRMBUFLENstatic java.lang.StringSYSPROP_EOFLOOKUPRANGEThe range within the %%EOF marker will be searched.static java.lang.StringSYSPROP_PARSEMINIMALOnly parse the PDF file minimally allowing access to basic information.static java.lang.StringTMP_FILE_PREFIXThe prefix for the temp file being used.private static char[]TRAILER_MARKERtrailer-marker.private longtrailerOffsetprivate booleantrailerWasRebuildprivate static intXprivate static char[]XREF_STREAMprivate static char[]XREF_TABLEprotected XrefTrailerResolverxrefTrailerResolverCollects all Xref/trailer objects and resolves them into single object using startxref reference.-
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, MAX_LENGTH_LONG, N, O, R, S, seqSource, STREAM_STRING, T
-
-
Constructor Summary
Constructors Constructor Description COSParser(RandomAccessRead source)Default constructor.COSParser(RandomAccessRead source, java.lang.String password, java.io.InputStream keyStore, java.lang.String keyAlias)Constructor for encrypted pdfs.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidaddExcludedToList(COSName[] excludeObjects, COSDictionary dict, java.util.Set<java.lang.Long> parsedObjects)private voidaddNewToList(java.util.Queue<COSBase> toBeParsedList, java.util.Collection<COSBase> newObjects, java.util.Set<java.lang.Long> addedObjects)Adds all from newObjects to toBeParsedList if it is not an COSObject or we didn't add this COSObject already (checked via addedObjects).private voidaddNewToList(java.util.Queue<COSBase> toBeParsedList, COSBase newObject, java.util.Set<java.lang.Long> addedObjects)Adds newObject to toBeParsedList if it is not an COSObject or we didn't add this COSObject already (checked via addedObjects).private voidbfSearchForLastEOFMarker()Brute force search for the last EOF marker.private voidbfSearchForObjects()Brute force search for every object in the pdf.private voidbfSearchForObjStreams()Brute force search for all object streams.private booleanbfSearchForTrailer(COSDictionary trailer)Brute force search for all trailer marker.private longbfSearchForXRef(long xrefOffset, boolean streamsOnly)Search for the offset of the given xref table/stream among those found by a brute force search.private voidbfSearchForXRefStreams()Brute force search for all /XRef entries (streams).private voidbfSearchForXRefTables()Brute force search for all xref entries (tables).private longcalculateXRefFixedOffset(long objectOffset, boolean streamsOnly)Try to find a fixed offset for the given xref table/stream.protected voidcheckPages(COSDictionary root)Check if all entries of the pages dictionary are present.private intcheckPagesDictionary(COSDictionary pagesDict, java.util.Set<COSObject> set)private longcheckXRefOffset(long startXRefOffset)Check if the cross reference table/stream can be found at the current offset.private voidcheckXrefOffsets()Check the XRef table by dereferencing all objects and fixing the offset if necessary.private booleancheckXRefStreamOffset(long startXRefOffset)Check if the cross reference stream can be found at the current offset.private COSObjectcompareCOSObjects(COSObject newObject, java.lang.Long newOffset, COSObject currentObject, java.lang.Long currentOffset)private COSObjectKeyfindObjectKey(COSObjectKey objectKey, long offset, java.util.Map<COSObjectKey,java.lang.Long> xrefOffset)Check if the given object can be found at the given offset.AccessPermissiongetAccessPermission()This will get the AccessPermission.COSDocumentgetDocument()This will get the document that was parsed.PDEncryptiongetEncryption()This will get the encryption dictionary.private COSNumbergetLength(COSBase lengthBaseObj, COSName streamType)Returns length value referred to or defined in given object.private longgetObjectId(COSObject obj)Creates a unique object id using object number and object generation number.protected longgetStartxrefOffset()Looks for and parses startxref.protected booleanisCatalog(COSDictionary dictionary)Tell if the dictionary is a PDF catalog.private booleanisInfo(COSDictionary dictionary)Tell if the dictionary is an info dictionary.booleanisLenient()Return true if parser is lenient.private booleanisString(byte[] string)Checks if the given string can be found at the current offset.private booleanisString(char[] string)Checks if the given string can be found at the current offset.protected intlastIndexOf(char[] pattern, byte[] buf, int endOff)Searches last appearance of pattern within buffer.protected COSStreamparseCOSStream(COSDictionary dic)This will read a COSStream from the input stream using length attribute within dictionary.private voidparseDictionaryRecursive(COSObject dictionaryObject)Resolves all not already parsed objects of a dictionary recursively.protected voidparseDictObjects(COSDictionary dict, COSName... excludeObjects)Will parse every object necessary to load a single page from the pdf document.protected booleanparseFDFHeader()Parse the header of a fdf.private voidparseFileObject(java.lang.Long offsetOrObjstmObNr, COSObjectKey objKey, COSObject pdfObject)private booleanparseHeader(java.lang.String headerMarker, java.lang.String defaultVersion)protected COSBaseparseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj)This will parse the next object from the stream and add it to the local state.protected COSBaseparseObjectDynamically(COSObject obj, boolean requireExistingNotCompressedObj)This will parse the next object from the stream and add it to the local state.private voidparseObjectStream(int objstmObjNr)protected booleanparsePDFHeader()Parse the header of a pdf.private longparseStartXref()This will parse the startxref section from the stream.private booleanparseTrailer()This will parse the trailer from the stream and add it to the state.protected COSBaseparseTrailerValuesDynamically(COSDictionary trailer)Parse the values of the trailer dictionary and return the root object.protected COSDictionaryparseXref(long startXRefOffset)Parses cross reference tables.private longparseXrefObjStream(long objByteOffset, boolean isStandalone)Parses an xref object stream starting with indirect object id.private voidparseXrefStream(COSStream stream, long objByteOffset, boolean isStandalone)Fills XRefTrailerResolver with data of given stream.protected booleanparseXrefTable(long startByteOffset)This will parse the xref table from the stream and add it to the state The XrefTable contents are ignored.private voidprepareDecryption()Prepare for decryption.private voidreadUntilEndStream(java.io.OutputStream out)This method will read through the current stream object until we find the keyword "endstream" meaning we're at the end of this object.private voidreadValidStream(java.io.OutputStream out, COSNumber streamLengthObj)protected COSDictionaryrebuildTrailer()Rebuild the trailer dictionary if startxref can't be found.private COSDictionaryretrieveCOSDictionary(COSObject object)private COSDictionaryretrieveCOSDictionary(COSObjectKey key, long offset)protected COSDictionaryretrieveTrailer()Read the trailer information and provide a COSDictionary containing the trailer information.private booleansearchForTrailerItems(COSDictionary trailer)Search for the different parts of the trailer dictionary.private longsearchNearestValue(java.util.List<java.lang.Long> values, long offset)voidsetEOFLookupRange(int byteCount)Sets how many trailing bytes of PDF file are searched for EOF marker and 'startxref' marker.voidsetLenient(boolean lenient)Change the parser leniency flag.private booleanvalidateStreamLength(long streamLength)private booleanvalidateXrefOffsets(java.util.Map<COSObjectKey,java.lang.Long> xrefOffset)-
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseBoolean, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipSpaces, skipWhiteSpaces
-
-
-
-
Field Detail
-
PDF_HEADER
private static final java.lang.String PDF_HEADER
- See Also:
- Constant Field Values
-
FDF_HEADER
private static final java.lang.String FDF_HEADER
- See Also:
- Constant Field Values
-
PDF_DEFAULT_VERSION
private static final java.lang.String PDF_DEFAULT_VERSION
- See Also:
- Constant Field Values
-
FDF_DEFAULT_VERSION
private static final java.lang.String FDF_DEFAULT_VERSION
- See Also:
- Constant Field Values
-
XREF_TABLE
private static final char[] XREF_TABLE
-
XREF_STREAM
private static final char[] XREF_STREAM
-
STARTXREF
private static final char[] STARTXREF
-
ENDSTREAM
private static final byte[] ENDSTREAM
-
ENDOBJ
private static final byte[] ENDOBJ
-
MINIMUM_SEARCH_OFFSET
private static final long MINIMUM_SEARCH_OFFSET
- See Also:
- Constant Field Values
-
X
private static final int X
- See Also:
- Constant Field Values
-
STRMBUFLEN
private static final int STRMBUFLEN
- See Also:
- Constant Field Values
-
strmBuf
private final byte[] strmBuf
-
source
protected final RandomAccessRead source
-
accessPermission
private AccessPermission accessPermission
-
keyStoreInputStream
private java.io.InputStream keyStoreInputStream
-
password
private java.lang.String password
-
keyAlias
private java.lang.String keyAlias
-
SYSPROP_PARSEMINIMAL
public static final java.lang.String SYSPROP_PARSEMINIMAL
Only parse the PDF file minimally allowing access to basic information.- See Also:
- Constant Field Values
-
SYSPROP_EOFLOOKUPRANGE
public static final java.lang.String SYSPROP_EOFLOOKUPRANGE
The range within the %%EOF marker will be searched. Useful if there are additional characters after %%EOF within the PDF.- See Also:
- Constant Field Values
-
DEFAULT_TRAIL_BYTECOUNT
private static final int DEFAULT_TRAIL_BYTECOUNT
How many trailing bytes to read for EOF marker.- See Also:
- Constant Field Values
-
EOF_MARKER
protected static final char[] EOF_MARKER
EOF-marker.
-
OBJ_MARKER
protected static final char[] OBJ_MARKER
obj-marker.
-
TRAILER_MARKER
private static final char[] TRAILER_MARKER
trailer-marker.
-
OBJ_STREAM
private static final char[] OBJ_STREAM
ObjStream-marker.
-
trailerOffset
private long trailerOffset
-
fileLen
protected long fileLen
file length.
-
isLenient
private boolean isLenient
is parser using auto healing capacity ?
-
initialParseDone
protected boolean initialParseDone
-
trailerWasRebuild
private boolean trailerWasRebuild
-
bfSearchCOSObjectKeyOffsets
private java.util.Map<COSObjectKey,java.lang.Long> bfSearchCOSObjectKeyOffsets
Contains all found objects of a brute force search.
-
lastEOFMarker
private java.lang.Long lastEOFMarker
-
bfSearchXRefTablesOffsets
private java.util.List<java.lang.Long> bfSearchXRefTablesOffsets
-
bfSearchXRefStreamsOffsets
private java.util.List<java.lang.Long> bfSearchXRefStreamsOffsets
-
encryption
private PDEncryption encryption
-
securityHandler
protected SecurityHandler securityHandler
The security handler.
-
readTrailBytes
private int readTrailBytes
how many trailing bytes to read for EOF marker.
-
LOG
private static final org.apache.commons.logging.Log LOG
-
xrefTrailerResolver
protected XrefTrailerResolver xrefTrailerResolver
Collects all Xref/trailer objects and resolves them into single object using startxref reference.
-
TMP_FILE_PREFIX
public static final java.lang.String TMP_FILE_PREFIX
The prefix for the temp file being used.- See Also:
- Constant Field Values
-
STREAMCOPYBUFLEN
private static final int STREAMCOPYBUFLEN
- See Also:
- Constant Field Values
-
streamCopyBuf
private final byte[] streamCopyBuf
-
-
Constructor Detail
-
COSParser
public COSParser(RandomAccessRead source)
Default constructor.- Parameters:
source- input representing the pdf.
-
COSParser
public COSParser(RandomAccessRead source, java.lang.String password, java.io.InputStream keyStore, java.lang.String keyAlias)
Constructor for encrypted pdfs.- Parameters:
source- input representing the pdf.password- password to be used for decryption.keyStore- key store to be used for decryption when using public key securitykeyAlias- alias to be used for decryption when using public key security
-
-
Method Detail
-
setEOFLookupRange
public void setEOFLookupRange(int byteCount)
Sets how many trailing bytes of PDF file are searched for EOF marker and 'startxref' marker. If not set we use default valueDEFAULT_TRAIL_BYTECOUNT.We check that new value is at least 16. However for practical use cases this value should not be lower than 1000; even 2000 was found to not be enough in some cases where some trailing garbage like HTML snippets followed the EOF marker.
In case system property
SYSPROP_EOFLOOKUPRANGEis defined this value will be set on initialization but can be overwritten later.- Parameters:
byteCount- number of trailing bytes
-
retrieveTrailer
protected COSDictionary retrieveTrailer() throws java.io.IOException
Read the trailer information and provide a COSDictionary containing the trailer information.- Returns:
- a COSDictionary containing the trailer information
- Throws:
java.io.IOException- if something went wrong
-
parseXref
protected COSDictionary parseXref(long startXRefOffset) throws java.io.IOException
Parses cross reference tables.- Parameters:
startXRefOffset- start offset of the first table- Returns:
- the trailer dictionary
- Throws:
java.io.IOException- if something went wrong
-
parseXrefObjStream
private long parseXrefObjStream(long objByteOffset, boolean isStandalone) throws java.io.IOExceptionParses an xref object stream starting with indirect object id.- Returns:
- value of PREV item in dictionary or
-1if no such item exists - Throws:
java.io.IOException
-
getStartxrefOffset
protected final long getStartxrefOffset() throws java.io.IOExceptionLooks for and parses startxref. We first look for last '%%EOF' marker (within lastDEFAULT_TRAIL_BYTECOUNTbytes (or range set viasetEOFLookupRange(int)) and go back to findstartxref.- Returns:
- the offset of StartXref
- Throws:
java.io.IOException- If something went wrong.
-
lastIndexOf
protected int lastIndexOf(char[] pattern, byte[] buf, int endOff)Searches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.- Parameters:
pattern- pattern to search forbuf- buffer to search pattern inendOff- offset (exclusive) where lookup starts at- Returns:
- start offset of pattern within buffer or
-1if pattern could not be found
-
isLenient
public boolean isLenient()
Return true if parser is lenient. Meaning auto healing capacity of the parser are used.- Returns:
- true if parser is lenient
-
setLenient
public void setLenient(boolean lenient)
Change the parser leniency flag. This method can only be called before the parsing of the file.- Parameters:
lenient- try to handle malformed PDFs.
-
getObjectId
private long getObjectId(COSObject obj)
Creates a unique object id using object number and object generation number. (requires object number < 2^31))
-
addNewToList
private void addNewToList(java.util.Queue<COSBase> toBeParsedList, java.util.Collection<COSBase> newObjects, java.util.Set<java.lang.Long> addedObjects)
Adds all from newObjects to toBeParsedList if it is not an COSObject or we didn't add this COSObject already (checked via addedObjects).
-
addNewToList
private void addNewToList(java.util.Queue<COSBase> toBeParsedList, COSBase newObject, java.util.Set<java.lang.Long> addedObjects)
Adds newObject to toBeParsedList if it is not an COSObject or we didn't add this COSObject already (checked via addedObjects). Simple objects are not added because nothing is done with them when toBeParsedList is processed.
-
parseDictObjects
protected void parseDictObjects(COSDictionary dict, COSName... excludeObjects) throws java.io.IOException
Will parse every object necessary to load a single page from the pdf document. We try our best to order objects according to offset in file before reading to minimize seek operations.- Parameters:
dict- the COSObject from the parent pages.excludeObjects- dictionary object reference entries with these names will not be parsed- Throws:
java.io.IOException- if something went wrong
-
addExcludedToList
private void addExcludedToList(COSName[] excludeObjects, COSDictionary dict, java.util.Set<java.lang.Long> parsedObjects)
-
parseObjectDynamically
protected final COSBase parseObjectDynamically(COSObject obj, boolean requireExistingNotCompressedObj) throws java.io.IOException
This will parse the next object from the stream and add it to the local state.- Parameters:
obj- object to be parsed (we only take object number and generation number for lookup start offset)requireExistingNotCompressedObj- iftrueobject to be parsed must not be contained within compressed stream- Returns:
- the parsed object (which is also added to document object)
- Throws:
java.io.IOException- If an IO error occurs.
-
parseObjectDynamically
protected COSBase parseObjectDynamically(long objNr, int objGenNr, boolean requireExistingNotCompressedObj) throws java.io.IOException
This will parse the next object from the stream and add it to the local state. It's reduced to parsing an indirect object.- Parameters:
objNr- object number of object to be parsedobjGenNr- object generation number of object to be parsedrequireExistingNotCompressedObj- iftruethe object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)- Returns:
- the parsed object (which is also added to document object)
- Throws:
java.io.IOException- If an IO error occurs.
-
parseFileObject
private void parseFileObject(java.lang.Long offsetOrObjstmObNr, COSObjectKey objKey, COSObject pdfObject) throws java.io.IOException- Throws:
java.io.IOException
-
parseObjectStream
private void parseObjectStream(int objstmObjNr) throws java.io.IOException- Throws:
java.io.IOException
-
getLength
private COSNumber getLength(COSBase lengthBaseObj, COSName streamType) throws java.io.IOException
Returns length value referred to or defined in given object.- Throws:
java.io.IOException
-
parseCOSStream
protected COSStream parseCOSStream(COSDictionary dic) throws java.io.IOException
This will read a COSStream from the input stream using length attribute within dictionary. If length attribute is a indirect reference it is first resolved to get the stream length. This means we copy stream data without testing for 'endstream' or 'endobj' and thus it is no problem if these keywords occur within stream. We require 'endstream' to be found after stream data is read.- Parameters:
dic- dictionary that goes with this stream.- Returns:
- parsed pdf stream.
- Throws:
java.io.IOException- if an error occurred reading the stream, like problems with reading length attribute, stream does not end with 'endstream' after data read, stream too short etc.
-
readUntilEndStream
private void readUntilEndStream(java.io.OutputStream out) throws java.io.IOExceptionThis method will read through the current stream object until we find the keyword "endstream" meaning we're at the end of this object. Some pdf files, however, forget to write some endstream tags and just close off objects with an "endobj" tag so we have to handle this case as well. This method is optimized using buffered IO and reduced number of byte compare operations.- Parameters:
out- stream we write out to.- Throws:
java.io.IOException- if something went wrong
-
readValidStream
private void readValidStream(java.io.OutputStream out, COSNumber streamLengthObj) throws java.io.IOException- Throws:
java.io.IOException
-
validateStreamLength
private boolean validateStreamLength(long streamLength) throws java.io.IOException- Throws:
java.io.IOException
-
checkXRefOffset
private long checkXRefOffset(long startXRefOffset) throws java.io.IOExceptionCheck if the cross reference table/stream can be found at the current offset.- Parameters:
startXRefOffset-- Returns:
- the revised offset
- Throws:
java.io.IOException
-
checkXRefStreamOffset
private boolean checkXRefStreamOffset(long startXRefOffset) throws java.io.IOExceptionCheck if the cross reference stream can be found at the current offset.- Parameters:
startXRefOffset- the expected start offset of the XRef stream- Returns:
- the revised offset
- Throws:
java.io.IOException- if something went wrong
-
calculateXRefFixedOffset
private long calculateXRefFixedOffset(long objectOffset, boolean streamsOnly) throws java.io.IOExceptionTry to find a fixed offset for the given xref table/stream.- Parameters:
objectOffset- the given offset where to look atstreamsOnly- search for xref streams only- Returns:
- the fixed offset
- Throws:
java.io.IOException- if something went wrong
-
validateXrefOffsets
private boolean validateXrefOffsets(java.util.Map<COSObjectKey,java.lang.Long> xrefOffset) throws java.io.IOException
- Throws:
java.io.IOException
-
checkXrefOffsets
private void checkXrefOffsets() throws java.io.IOExceptionCheck the XRef table by dereferencing all objects and fixing the offset if necessary.- Throws:
java.io.IOException- if something went wrong.
-
findObjectKey
private COSObjectKey findObjectKey(COSObjectKey objectKey, long offset, java.util.Map<COSObjectKey,java.lang.Long> xrefOffset) throws java.io.IOException
Check if the given object can be found at the given offset. Returns the provided object key if everything is ok. If the generation number differs it will be fixed and a new object key is returned.- Parameters:
objectKey- the key of object we are looking foroffset- the offset where to lookxrefOffset- a map with with all known xref entries- Returns:
- returns the found/fixed object key
- Throws:
java.io.IOException- if something went wrong
-
bfSearchForObjects
private void bfSearchForObjects() throws java.io.IOExceptionBrute force search for every object in the pdf.- Throws:
java.io.IOException- if something went wrong
-
bfSearchForXRef
private long bfSearchForXRef(long xrefOffset, boolean streamsOnly) throws java.io.IOExceptionSearch for the offset of the given xref table/stream among those found by a brute force search.- Parameters:
streamsOnly- search for xref streams only- Returns:
- the offset of the xref entry
- Throws:
java.io.IOException- if something went wrong
-
searchNearestValue
private long searchNearestValue(java.util.List<java.lang.Long> values, long offset)
-
bfSearchForTrailer
private boolean bfSearchForTrailer(COSDictionary trailer) throws java.io.IOException
Brute force search for all trailer marker.- Throws:
java.io.IOException- if something went wrong
-
bfSearchForLastEOFMarker
private void bfSearchForLastEOFMarker() throws java.io.IOExceptionBrute force search for the last EOF marker.- Throws:
java.io.IOException- if something went wrong
-
bfSearchForObjStreams
private void bfSearchForObjStreams() throws java.io.IOExceptionBrute force search for all object streams.- Throws:
java.io.IOException- if something went wrong
-
bfSearchForXRefTables
private void bfSearchForXRefTables() throws java.io.IOExceptionBrute force search for all xref entries (tables).- Throws:
java.io.IOException- if something went wrong
-
bfSearchForXRefStreams
private void bfSearchForXRefStreams() throws java.io.IOExceptionBrute force search for all /XRef entries (streams).- Throws:
java.io.IOException- if something went wrong
-
rebuildTrailer
protected final COSDictionary rebuildTrailer() throws java.io.IOException
Rebuild the trailer dictionary if startxref can't be found.- Returns:
- the rebuild trailer dictionary
- Throws:
java.io.IOException- if something went wrong
-
searchForTrailerItems
private boolean searchForTrailerItems(COSDictionary trailer) throws java.io.IOException
Search for the different parts of the trailer dictionary.- Parameters:
trailer-- Returns:
- true if the root was found, false if not.
- Throws:
java.io.IOException
-
compareCOSObjects
private COSObject compareCOSObjects(COSObject newObject, java.lang.Long newOffset, COSObject currentObject, java.lang.Long currentOffset)
-
retrieveCOSDictionary
private COSDictionary retrieveCOSDictionary(COSObject object) throws java.io.IOException
- Throws:
java.io.IOException
-
retrieveCOSDictionary
private COSDictionary retrieveCOSDictionary(COSObjectKey key, long offset) throws java.io.IOException
- Throws:
java.io.IOException
-
checkPages
protected void checkPages(COSDictionary root)
Check if all entries of the pages dictionary are present. Those which can't be dereferenced are removed.- Parameters:
root- the root dictionary of the pdf
-
checkPagesDictionary
private int checkPagesDictionary(COSDictionary pagesDict, java.util.Set<COSObject> set)
-
isCatalog
protected boolean isCatalog(COSDictionary dictionary)
Tell if the dictionary is a PDF catalog. Override this for an FDF catalog.- Parameters:
dictionary-- Returns:
- true if the given dictionary is a root dictionary
-
isInfo
private boolean isInfo(COSDictionary dictionary)
Tell if the dictionary is an info dictionary.- Parameters:
dictionary-- Returns:
- true if the given dictionary is an info dictionary
-
parseStartXref
private long parseStartXref() throws java.io.IOExceptionThis will parse the startxref section from the stream. The startxref value is ignored.- Returns:
- the startxref value or -1 on parsing error
- Throws:
java.io.IOException- If an IO error occurs.
-
isString
private boolean isString(byte[] string) throws java.io.IOExceptionChecks if the given string can be found at the current offset.- Parameters:
string- the bytes of the string to look for- Returns:
- true if the bytes are in place, false if not
- Throws:
java.io.IOException- if something went wrong
-
isString
private boolean isString(char[] string) throws java.io.IOExceptionChecks if the given string can be found at the current offset.- Parameters:
string- the bytes of the string to look for- Returns:
- true if the bytes are in place, false if not
- Throws:
java.io.IOException- if something went wrong
-
parseTrailer
private boolean parseTrailer() throws java.io.IOExceptionThis will parse the trailer from the stream and add it to the state.- Returns:
- false on parsing error
- Throws:
java.io.IOException- If an IO error occurs.
-
parsePDFHeader
protected boolean parsePDFHeader() throws java.io.IOExceptionParse the header of a pdf.- Returns:
- true if a PDF header was found
- Throws:
java.io.IOException- if something went wrong
-
parseFDFHeader
protected boolean parseFDFHeader() throws java.io.IOExceptionParse the header of a fdf.- Returns:
- true if a FDF header was found
- Throws:
java.io.IOException- if something went wrong
-
parseHeader
private boolean parseHeader(java.lang.String headerMarker, java.lang.String defaultVersion) throws java.io.IOException- Throws:
java.io.IOException
-
parseXrefTable
protected boolean parseXrefTable(long startByteOffset) throws java.io.IOExceptionThis will parse the xref table from the stream and add it to the state The XrefTable contents are ignored.- Parameters:
startByteOffset- the offset to start at- Returns:
- false on parsing error
- Throws:
java.io.IOException- If an IO error occurs.
-
parseXrefStream
private void parseXrefStream(COSStream stream, long objByteOffset, boolean isStandalone) throws java.io.IOException
Fills XRefTrailerResolver with data of given stream. Stream must be of type XRef.- Parameters:
stream- the stream to be readobjByteOffset- the offset to start atisStandalone- should be set to true if the stream is not part of a hybrid xref table- Throws:
java.io.IOException- if there is an error parsing the stream
-
getDocument
public COSDocument getDocument() throws java.io.IOException
This will get the document that was parsed. The document must be parsed before this is called. When you are done with this document you must call close() on it to release resources.- Returns:
- The document that was parsed.
- Throws:
java.io.IOException- If there is an error getting the document.
-
getEncryption
public PDEncryption getEncryption() throws java.io.IOException
This will get the encryption dictionary. The document must be parsed before this is called.- Returns:
- The encryption dictionary of the document that was parsed.
- Throws:
java.io.IOException- If there is an error getting the document.
-
getAccessPermission
public AccessPermission getAccessPermission() throws java.io.IOException
This will get the AccessPermission. The document must be parsed before this is called.- Returns:
- The access permission of document that was parsed.
- Throws:
java.io.IOException- If there is an error getting the document.
-
parseTrailerValuesDynamically
protected COSBase parseTrailerValuesDynamically(COSDictionary trailer) throws java.io.IOException
Parse the values of the trailer dictionary and return the root object.- Parameters:
trailer- The trailer dictionary.- Returns:
- The parsed root object.
- Throws:
java.io.IOException- If an IO error occurs or if the root object is missing in the trailer dictionary.
-
prepareDecryption
private void prepareDecryption() throws java.io.IOExceptionPrepare for decryption.- Throws:
InvalidPasswordException- If the password is incorrect.java.io.IOException- if something went wrong
-
parseDictionaryRecursive
private void parseDictionaryRecursive(COSObject dictionaryObject) throws java.io.IOException
Resolves all not already parsed objects of a dictionary recursively.- Parameters:
dictionaryObject- dictionary to be parsed- Throws:
java.io.IOException- if something went wrong
-
-