Class PreflightParser
- java.lang.Object
-
- org.apache.pdfbox.pdfparser.BaseParser
-
- org.apache.pdfbox.pdfparser.COSParser
-
- org.apache.pdfbox.pdfparser.PDFParser
-
- org.apache.pdfbox.preflight.parser.PreflightParser
-
- All Implemented Interfaces:
ICOSParser
public class PreflightParser extends PDFParser
-
-
Field Summary
Fields Modifier and Type Field Description private PreflightConfigurationconfigprivate static java.nio.charset.CharsetENCODINGDefine a one byte encoding that hasn't specific encoding in UTF-8 charset.private Formatformatprivate PreflightDocumentpreflightDocumentprivate ValidationResultvalidationResult-
Fields inherited from class org.apache.pdfbox.pdfparser.COSParser
EOF_MARKER, fileLen, initialParseDone, OBJ_MARKER, securityHandler, SYSPROP_EOFLOOKUPRANGE, xrefTrailerResolver
-
Fields inherited from class org.apache.pdfbox.pdfparser.BaseParser
A, ASCII_CR, ASCII_LF, B, D, DEF, document, E, ENDOBJ_STRING, ENDSTREAM_STRING, J, M, N, O, R, S, source, STREAM_STRING, T
-
-
Constructor Summary
Constructors Constructor Description PreflightParser(java.io.File file)Constructor.PreflightParser(java.lang.String filename)Constructor.PreflightParser(RandomAccessRead rar)Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidaddValidationError(ValidationResult.ValidationError error)Add a validation error to the ValidationResult.private voidcheckEndstreamKeyWord(COSDictionary dic, long startOffset)'endstream' must be preceded by an EOLprivate voidcheckPdfHeader()Check that the PDF header match rules of the PDF/A specification.private longcheckStreamKeyWord()'stream' must be followed by <CR><LF> or only <LF>protected PDDocumentcreateDocument()Create the resulting document.protected voidinitialParse()The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects.protected intlastIndexOf(char[] pattern, byte[] buf, int endOff)Searches last appearance of pattern within buffer.private booleannextIsEOL()PDDocumentparse()This will parse the stream and populate the PDDocument object.PDDocumentparse(Format format)Parse the given file and check if it is a confirming file according to the given format.PDDocumentparse(Format format, PreflightConfiguration config)Parse the given file and check if it is a confirming file according to the given format.protected COSArrayparseCOSArray()This will parse a PDF array object.protected COSNameparseCOSName()This will parse a PDF name from the stream.protected COSStreamparseCOSStream(COSDictionary dic)Wraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)to check rules on 'stream' and 'endstream' keywords.protected COSStringparseCOSString()Check that the hexa string contains only an even number of Hexadecimal characters.protected COSBaseparseDirObject()CallBaseParser.parseDirObject()check limit range for Float, Integer and number of Dictionary entries.private COSBaseparseFileObject(java.lang.Long offsetOrObjstmObNr, COSObjectKey objKey)protected COSBaseparseObjectDynamically(COSObjectKey objKey, boolean requireExistingNotCompressedObj)Parse the object for the given object key.protected booleanparseXrefTable(long startByteOffset)Same method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so onprotected booleanresetTrailerResolver()Indicates whether the xref trailer resolver should be reset or not.static ValidationResultvalidate(java.io.File file)Load and validate the given file.-
Methods inherited from class org.apache.pdfbox.pdfparser.COSParser
checkPages, createRandomAccessReadView, dereferenceCOSObject, getAccessPermission, getEncryption, isLenient, isString, parseFDFHeader, parseObjectStreamObject, parsePDFHeader, prepareDecryption, retrieveTrailer, setEOFLookupRange, setLenient
-
Methods inherited from class org.apache.pdfbox.pdfparser.BaseParser
getObjectKey, isClosing, isClosing, isDigit, isDigit, isEndOfName, isEOF, isEOL, isEOL, isSpace, isSpace, isWhitespace, isWhitespace, parseCOSDictionary, readExpectedChar, readExpectedString, readGenerationNumber, readInt, readLine, readLong, readObjectNumber, readString, readString, readStringNumber, skipLinebreak, skipSpaces, skipWhiteSpaces
-
-
-
-
Field Detail
-
ENCODING
private static final java.nio.charset.Charset ENCODING
Define a one byte encoding that hasn't specific encoding in UTF-8 charset. Avoid unexpected error when the encoding is Cp5816
-
format
private Format format
-
config
private PreflightConfiguration config
-
preflightDocument
private PreflightDocument preflightDocument
-
validationResult
private ValidationResult validationResult
-
-
Constructor Detail
-
PreflightParser
public PreflightParser(java.io.File file) throws java.io.IOExceptionConstructor.- Parameters:
file-- Throws:
java.io.IOException- if there is a reading error.
-
PreflightParser
public PreflightParser(RandomAccessRead rar) throws java.io.IOException
Constructor.- Parameters:
rar- input source- Throws:
java.io.IOException- if there is a reading error.
-
PreflightParser
public PreflightParser(java.lang.String filename) throws java.io.IOExceptionConstructor.- Parameters:
filename-- Throws:
java.io.IOException- if there is a reading error.
-
-
Method Detail
-
addValidationError
private void addValidationError(ValidationResult.ValidationError error)
Add a validation error to the ValidationResult.- Parameters:
error- the validation error to be added
-
parse
public PDDocument parse() throws java.io.IOException
Description copied from class:PDFParserThis will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing. Lenient mode is active by default.- Overrides:
parsein classPDFParser- Returns:
- the populated PDDocument
- Throws:
InvalidPasswordException- If the password is incorrect.java.io.IOException- If there is an error reading from the stream or corrupt data is found.
-
parse
public PDDocument parse(Format format) throws java.io.IOException
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format- format that the document should follow (defaultFormat.PDF_A1B)- Returns:
- the parsed document.
- Throws:
java.io.IOException
-
parse
public PDDocument parse(Format format, PreflightConfiguration config) throws java.io.IOException
Parse the given file and check if it is a confirming file according to the given format.- Parameters:
format- format that the document should follow (defaultFormat.PDF_A1B)config- Configuration bean that will be used by the PreflightDocument. If null the format is used to determine the default configuration.- Returns:
- the parsed document.
- Throws:
java.io.IOException
-
createDocument
protected PDDocument createDocument() throws java.io.IOException
Description copied from class:PDFParserCreate the resulting document. Maybe overwritten if the parser uses another class as document.- Overrides:
createDocumentin classPDFParser- Returns:
- the resulting document
- Throws:
java.io.IOException- if the method is called before parsing the document
-
initialParse
protected void initialParse() throws java.io.IOExceptionDescription copied from class:PDFParserThe initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.- Overrides:
initialParsein classPDFParser- Throws:
InvalidPasswordException- If the password is incorrect.java.io.IOException- If something went wrong.
-
resetTrailerResolver
protected boolean resetTrailerResolver()
Description copied from class:COSParserIndicates whether the xref trailer resolver should be reset or not. Should be overwritten if the xref trailer resolver is needed after the initial parsing.- Overrides:
resetTrailerResolverin classCOSParser- Returns:
- true if the xref trailer resolver should be reset
-
checkPdfHeader
private void checkPdfHeader()
Check that the PDF header match rules of the PDF/A specification. First line (offset 0) must be a comment with the PDF version (version 1.0 isn't conform to the PDF/A specification) Second line is a comment with at least 4 bytes greater than 0x80
-
parseXrefTable
protected boolean parseXrefTable(long startByteOffset) throws java.io.IOExceptionSame method than the COSParser.parseXrefTable(long) with additional controls : - EOL mandatory after the 'xref' keyword - Cross reference subsection header uses single white space as separator - and so on- Overrides:
parseXrefTablein classCOSParser- Parameters:
startByteOffset- the offset to start at- Returns:
- false on parsing error
- Throws:
java.io.IOException- If an IO error occurs.
-
parseCOSStream
protected COSStream parseCOSStream(COSDictionary dic) throws java.io.IOException
Wraps theCOSParser.parseCOSStream(org.apache.pdfbox.cos.COSDictionary)to check rules on 'stream' and 'endstream' keywords.checkStreamKeyWord()andcheckEndstreamKeyWord(org.apache.pdfbox.cos.COSDictionary, long)- Overrides:
parseCOSStreamin classCOSParser- Parameters:
dic- dictionary that goes with this stream.- Returns:
- parsed pdf stream.
- Throws:
java.io.IOException- if an error occurred reading the stream, like problems with reading length attribute, stream does not end with 'endstream' after data read, stream too short etc.
-
checkStreamKeyWord
private long checkStreamKeyWord() throws java.io.IOException'stream' must be followed by <CR><LF> or only <LF>- Throws:
java.io.IOException
-
checkEndstreamKeyWord
private void checkEndstreamKeyWord(COSDictionary dic, long startOffset) throws java.io.IOException
'endstream' must be preceded by an EOL- Throws:
java.io.IOException
-
nextIsEOL
private boolean nextIsEOL() throws java.io.IOException- Throws:
java.io.IOException
-
parseCOSArray
protected COSArray parseCOSArray() throws java.io.IOException
Description copied from class:BaseParserThis will parse a PDF array object.- Overrides:
parseCOSArrayin classBaseParser- Returns:
- The parsed PDF array.
- Throws:
java.io.IOException- If there is an error parsing the stream.
-
parseCOSName
protected COSName parseCOSName() throws java.io.IOException
Description copied from class:BaseParserThis will parse a PDF name from the stream.- Overrides:
parseCOSNamein classBaseParser- Returns:
- The parsed PDF name.
- Throws:
java.io.IOException- If there is an error reading from the stream.
-
parseCOSString
protected COSString parseCOSString() throws java.io.IOException
Check that the hexa string contains only an even number of Hexadecimal characters. Once it is done, reset the offset at the beginning of the string and callBaseParser.parseCOSString()- Overrides:
parseCOSStringin classBaseParser- Returns:
- The parsed PDF string.
- Throws:
java.io.IOException- If there is an error reading from the stream.
-
parseDirObject
protected COSBase parseDirObject() throws java.io.IOException
CallBaseParser.parseDirObject()check limit range for Float, Integer and number of Dictionary entries.- Overrides:
parseDirObjectin classBaseParser- Returns:
- The parsed object.
- Throws:
java.io.IOException- if there is an error during parsing.
-
parseObjectDynamically
protected COSBase parseObjectDynamically(COSObjectKey objKey, boolean requireExistingNotCompressedObj) throws java.io.IOException
Description copied from class:COSParserParse the object for the given object key.- Overrides:
parseObjectDynamicallyin classCOSParser- Parameters:
objKey- key of object to be parsedrequireExistingNotCompressedObj- iftruethe object to be parsed must be defined in xref (comment: null objects may be missing from xref) and it must not be a compressed object within object stream (this is used to circumvent being stuck in a loop in a malicious PDF)- Returns:
- the parsed object (which is also added to document object)
- Throws:
java.io.IOException- If an IO error occurs.
-
parseFileObject
private COSBase parseFileObject(java.lang.Long offsetOrObjstmObNr, COSObjectKey objKey) throws java.io.IOException
- Throws:
java.io.IOException
-
lastIndexOf
protected int lastIndexOf(char[] pattern, byte[] buf, int endOff)Description copied from class:COSParserSearches last appearance of pattern within buffer. Lookup before _lastOff and goes back until 0.- Overrides:
lastIndexOfin classCOSParser- Parameters:
pattern- pattern to search forbuf- buffer to search pattern inendOff- offset (exclusive) where lookup starts at- Returns:
- start offset of pattern within buffer or
-1if pattern could not be found
-
validate
public static ValidationResult validate(java.io.File file) throws java.io.IOException
Load and validate the given file. Returns the validation result and closes the read pdf.- Parameters:
file- thew file to be read and validated- Returns:
- the validation result
- Throws:
java.io.IOException- in case of a file reading or parsing error
-
-