Package org.apache.pdfbox.pdfparser
Class PDFParser
- java.lang.Object
-
- org.apache.pdfbox.pdfparser.BaseParser
-
- org.apache.pdfbox.pdfparser.COSParser
-
- org.apache.pdfbox.pdfparser.PDFParser
-
- All Implemented Interfaces:
ICOSParser
- Direct Known Subclasses:
PreflightParser
public class PDFParser extends COSParser
-
-
Field Summary
Fields Modifier and Type Field Description private static org.apache.commons.logging.LogLOG-
Fields inherited from class org.apache.pdfbox.pdfparser.COSParser
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, SYSPROP_EOFLOOKUPRANGE, xrefTrailerResolver
-
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, MAX_LENGTH_LONG, N, O, R, S, source, STREAM_STRING, T
-
-
Constructor Summary
Constructors Constructor Description PDFParser(RandomAccessRead source)Constructor.PDFParser(RandomAccessRead source, java.lang.String decryptionPassword)Constructor.PDFParser(RandomAccessRead source, java.lang.String decryptionPassword, java.io.InputStream keyStore, java.lang.String alias)Constructor.PDFParser(RandomAccessRead source, java.lang.String decryptionPassword, java.io.InputStream keyStore, java.lang.String alias, RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction)Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected PDDocumentcreateDocument()Create the resulting document.protected voidinitialParse()The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects.static PDDocumentload(java.io.File file)Deprecated.useLoader.loadPDF(File)insteadstatic PDDocumentload(java.io.File file, java.lang.String password)Deprecated.useLoader.loadPDF(File, String)insteadPDDocumentparse()This will parse the stream and populate the PDDocument object.PDDocumentparse(boolean lenient)This will parse the stream and populate the PDDocument object.-
Methods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, createRandomAccessReadView, dereferenceCOSObject, getAccessPermission, getEncryption, isLenient, isString, lastIndexOf, parseCOSStream, parseFDFHeader, parseObjectDynamically, parseObjectStreamObject, parsePDFHeader, parseXrefTable, prepareDecryption, resetTrailerResolver, retrieveTrailer, setEOFLookupRange, setLenient
-
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
getObjectKey, isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOF, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseCOSArray, parseCOSDictionary, parseCOSName, parseCOSString, parseDirObject, readExpectedChar, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipLinebreak, skipSpaces, skipWhiteSpaces
-
-
-
-
Constructor Detail
-
PDFParser
public PDFParser(RandomAccessRead source) throws java.io.IOException
Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source- source representing the pdf.- Throws:
java.io.IOException- If something went wrong.
-
PDFParser
public PDFParser(RandomAccessRead source, java.lang.String decryptionPassword) throws java.io.IOException
Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source- input representing the pdf.decryptionPassword- password to be used for decryption.- Throws:
java.io.IOException- If something went wrong.
-
PDFParser
public PDFParser(RandomAccessRead source, java.lang.String decryptionPassword, java.io.InputStream keyStore, java.lang.String alias) throws java.io.IOException
Constructor. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
source- input representing the pdf.decryptionPassword- password to be used for decryption.keyStore- key store to be used for decryption when using public key securityalias- alias to be used for decryption when using public key security- Throws:
java.io.IOException- If something went wrong.
-
PDFParser
public PDFParser(RandomAccessRead source, java.lang.String decryptionPassword, java.io.InputStream keyStore, java.lang.String alias, RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) throws java.io.IOException
Constructor.- Parameters:
source- input representing the pdf.decryptionPassword- password to be used for decryption.keyStore- key store to be used for decryption when using public key securityalias- alias to be used for decryption when using public key securitystreamCacheCreateFunction- a function to create an instance of the stream cache- Throws:
java.io.IOException- If something went wrong.
-
-
Method Detail
-
initialParse
protected void initialParse() throws java.io.IOExceptionThe initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.- Throws:
InvalidPasswordException- If the password is incorrect.java.io.IOException- If something went wrong.
-
parse
public PDDocument parse() throws java.io.IOException
This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing. Lenient mode is active by default.- Returns:
- the populated PDDocument
- Throws:
InvalidPasswordException- If the password is incorrect.java.io.IOException- If there is an error reading from the stream or corrupt data is found.
-
parse
public PDDocument parse(boolean lenient) throws java.io.IOException
This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing.- Parameters:
lenient- activate leniency if set to true- Returns:
- the populated PDDocument
- Throws:
InvalidPasswordException- If the password is incorrect.java.io.IOException- If there is an error reading from the stream or corrupt data is found.
-
createDocument
protected PDDocument createDocument() throws java.io.IOException
Create the resulting document. Maybe overwritten if the parser uses another class as document.- Returns:
- the resulting document
- Throws:
java.io.IOException- if the method is called before parsing the document
-
load
@Deprecated public static PDDocument load(java.io.File file) throws java.io.IOException
Deprecated.useLoader.loadPDF(File)insteadParses a PDF. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
file- file to be loaded- Returns:
- loaded document
- Throws:
InvalidPasswordException- If the file required a non-empty password.java.io.IOException- in case of a file reading or parsing error
-
load
@Deprecated public static PDDocument load(java.io.File file, java.lang.String password) throws java.io.IOException
Deprecated.useLoader.loadPDF(File, String)insteadParses a PDF. Unrestricted main memory will be used for buffering PDF streams.- Parameters:
file- file to be loadedpassword- password to be used for decryption- Returns:
- loaded document
- Throws:
InvalidPasswordException- If the password is incorrect.java.io.IOException- in case of a file reading or parsing error
-
-