Class PDFParser

    • Field Detail

      • LOG

        private static final org.apache.commons.logging.Log LOG
    • Constructor Detail

      • PDFParser

        public PDFParser​(RandomAccessRead source)
                  throws java.io.IOException
        Constructor. Unrestricted main memory will be used for buffering PDF streams.
        Parameters:
        source - source representing the pdf.
        Throws:
        java.io.IOException - If something went wrong.
      • PDFParser

        public PDFParser​(RandomAccessRead source,
                         java.lang.String decryptionPassword)
                  throws java.io.IOException
        Constructor. Unrestricted main memory will be used for buffering PDF streams.
        Parameters:
        source - input representing the pdf.
        decryptionPassword - password to be used for decryption.
        Throws:
        java.io.IOException - If something went wrong.
      • PDFParser

        public PDFParser​(RandomAccessRead source,
                         java.lang.String decryptionPassword,
                         java.io.InputStream keyStore,
                         java.lang.String alias)
                  throws java.io.IOException
        Constructor. Unrestricted main memory will be used for buffering PDF streams.
        Parameters:
        source - input representing the pdf.
        decryptionPassword - password to be used for decryption.
        keyStore - key store to be used for decryption when using public key security
        alias - alias to be used for decryption when using public key security
        Throws:
        java.io.IOException - If something went wrong.
      • PDFParser

        public PDFParser​(RandomAccessRead source,
                         java.lang.String decryptionPassword,
                         java.io.InputStream keyStore,
                         java.lang.String alias,
                         RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction)
                  throws java.io.IOException
        Constructor.
        Parameters:
        source - input representing the pdf.
        decryptionPassword - password to be used for decryption.
        keyStore - key store to be used for decryption when using public key security
        alias - alias to be used for decryption when using public key security
        streamCacheCreateFunction - a function to create an instance of the stream cache
        Throws:
        java.io.IOException - If something went wrong.
    • Method Detail

      • initialParse

        protected void initialParse()
                             throws java.io.IOException
        The initial parse will first parse only the trailer, the xrefstart and all xref tables to have a pointer (offset) to all the pdf's objects. It can handle linearized pdfs, which will have an xref at the end pointing to an xref at the beginning of the file. Last the root object is parsed.
        Throws:
        InvalidPasswordException - If the password is incorrect.
        java.io.IOException - If something went wrong.
      • parse

        public PDDocument parse()
                         throws java.io.IOException
        This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing. Lenient mode is active by default.
        Returns:
        the populated PDDocument
        Throws:
        InvalidPasswordException - If the password is incorrect.
        java.io.IOException - If there is an error reading from the stream or corrupt data is found.
      • parse

        public PDDocument parse​(boolean lenient)
                         throws java.io.IOException
        This will parse the stream and populate the PDDocument object. This will close the keystore stream when it is done parsing.
        Parameters:
        lenient - activate leniency if set to true
        Returns:
        the populated PDDocument
        Throws:
        InvalidPasswordException - If the password is incorrect.
        java.io.IOException - If there is an error reading from the stream or corrupt data is found.
      • createDocument

        protected PDDocument createDocument()
                                     throws java.io.IOException
        Create the resulting document. Maybe overwritten if the parser uses another class as document.
        Returns:
        the resulting document
        Throws:
        java.io.IOException - if the method is called before parsing the document
      • load

        @Deprecated
        public static PDDocument load​(java.io.File file)
                               throws java.io.IOException
        Deprecated.
        Parses a PDF. Unrestricted main memory will be used for buffering PDF streams.
        Parameters:
        file - file to be loaded
        Returns:
        loaded document
        Throws:
        InvalidPasswordException - If the file required a non-empty password.
        java.io.IOException - in case of a file reading or parsing error
      • load

        @Deprecated
        public static PDDocument load​(java.io.File file,
                                      java.lang.String password)
                               throws java.io.IOException
        Deprecated.
        Parses a PDF. Unrestricted main memory will be used for buffering PDF streams.
        Parameters:
        file - file to be loaded
        password - password to be used for decryption
        Returns:
        loaded document
        Throws:
        InvalidPasswordException - If the password is incorrect.
        java.io.IOException - in case of a file reading or parsing error