Class BaseParser

    • Field Summary

      Fields 
      Modifier and Type Field Description
      protected static int A  
      private static java.nio.charset.Charset ALTERNATIVE_CHARSET  
      protected static byte ASCII_CR
      ASCII code for carriage return.
      protected static byte ASCII_LF
      ASCII code for line feed.
      private static byte ASCII_NINE  
      private static byte ASCII_SPACE  
      private static byte ASCII_ZERO  
      protected static int B  
      protected static int D  
      static java.lang.String DEF
      This is a string constant that will be used for comparisons.
      protected COSDocument document
      This is the document that will be parsed.
      protected static int E  
      protected static java.lang.String ENDOBJ_STRING
      This is a string constant that will be used for comparisons.
      protected static java.lang.String ENDSTREAM_STRING
      This is a string constant that will be used for comparisons.
      private static char[] FALSE
      This is a string constant that will be used for comparisons.
      private static long GENERATION_NUMBER_THRESHOLD  
      protected static int J  
      private java.util.Map<java.lang.Long,​COSObjectKey> keyCache  
      private static org.apache.commons.logging.Log LOG
      Log instance.
      protected static int M  
      (package private) static int MAX_LENGTH_LONG  
      private static int MAX_RECURSION_DEPTH  
      private static java.lang.String MAX_RECUSRION_MSG  
      protected static int N  
      private static char[] NULL
      This is a string constant that will be used for comparisons.
      protected static int O  
      private static long OBJECT_NUMBER_THRESHOLD  
      protected static int R  
      private int recursionDepth  
      protected static int S  
      protected RandomAccessRead source
      This is the stream that will be read from.
      protected static java.lang.String STREAM_STRING
      This is a string constant that will be used for comparisons.
      protected static int T  
      private static char[] TRUE
      This is a string constant that will be used for comparisons.
      private java.nio.charset.CharsetDecoder utf8Decoder  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      private int checkForEndOfString​(int bracesParameter)
      This is really a bug in the Document creators code, but it caused a crash in PDFBox, the first bug was in this format: /Title ( (5) /Creator which was patched in 1 place.
      private java.lang.String decodeBuffer​(java.io.ByteArrayOutputStream buffer)
      Tries to decode the buffer cotent to an UTF-8 String.
      private COSBase getObjectFromPool​(COSObjectKey key)  
      protected COSObjectKey getObjectKey​(long num, int gen)
      Returns the object key for the given combination of object and generation number.
      protected boolean isClosing()
      This will tell if the next character is a closing brace( close of PDF array ).
      protected boolean isClosing​(int c)
      Deprecated.
      This unused method will be removed in 4.0.
      private boolean isCR​(int c)  
      protected boolean isDigit()
      This will tell if the next byte is a digit or not.
      protected static boolean isDigit​(int c)
      This will tell if the given value is a digit or not.
      protected boolean isEndOfName​(int ch)
      Determine if a character terminates a PDF name.
      protected boolean isEOF()
      This will tell if the end of the data is reached.
      protected boolean isEOL()
      This will tell if the next byte to be read is an end of line byte.
      protected boolean isEOL​(int c)
      This will tell if the next byte to be read is an end of line byte.
      private static boolean isHexDigit​(char ch)  
      private boolean isLF​(int c)  
      protected boolean isSpace()
      This will tell if the next byte is a space or not.
      protected boolean isSpace​(int c)
      This will tell if the given value is a space or not.
      protected boolean isWhitespace()
      This will tell if the next byte is whitespace or not.
      protected static boolean isWhitespace​(int c)
      This will tell if a character is whitespace or not.
      protected COSArray parseCOSArray()
      This will parse a PDF array object.
      protected COSDictionary parseCOSDictionary​(boolean isDirect)
      This will parse a PDF dictionary.
      private boolean parseCOSDictionaryNameValuePair​(COSDictionary obj)  
      private COSBase parseCOSDictionaryValue()
      This will parse a PDF dictionary value.
      private COSString parseCOSHexString()
      This will parse a PDF HEX string with fail fast semantic meaning that we stop if a not allowed character is found.
      protected COSName parseCOSName()
      This will parse a PDF name from the stream.
      private COSNumber parseCOSNumber()  
      protected COSString parseCOSString()
      This will parse a PDF string.
      protected COSBase parseDirObject()
      This will parse a directory object from the stream.
      protected void readExpectedChar​(char ec)
      Read one char and throw an exception if it is not the expected value.
      protected void readExpectedString​(char[] expectedString, boolean skipSpaces)
      Reads given pattern from source.
      protected int readGenerationNumber()
      This will read a integer from the Stream and throw an IllegalArgumentException if the integer value has more than the maximum object revision (i.e.
      protected int readInt()
      This will read an integer from the stream.
      protected java.lang.String readLine()
      This will read bytes until the first end of line marker occurs.
      protected long readLong()
      This will read an long from the stream.
      protected long readObjectNumber()
      This will read a long from the Stream and throw an IOException if the long value is negative or has more than 10 digits (i.e.
      protected java.lang.String readString()
      This will read the next string from the stream.
      protected java.lang.String readString​(int length)
      Deprecated.
      this unused method will be removed in 4.0.
      protected java.lang.StringBuilder readStringNumber()
      This method is used to read a token by the readInt() and the readLong() method.
      private boolean readUntilEndOfCOSDictionary()
      Keep reading until the end of the dictionary object or the file has been hit, or until a '/' has been found.
      protected boolean skipLinebreak()
      Skip one line break, such as CR, LF or CRLF.
      private boolean skipLinebreak​(int linebreak)
      Skip one line break, such as CR, LF or CRLF.
      protected void skipSpaces()
      This will skip all spaces and comments that are present.
      protected void skipWhiteSpaces()
      Skip the upcoming CRLF or LF which are supposed to follow a stream.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Field Detail

      • LOG

        private static final org.apache.commons.logging.Log LOG
        Log instance.
      • OBJECT_NUMBER_THRESHOLD

        private static final long OBJECT_NUMBER_THRESHOLD
        See Also:
        Constant Field Values
      • GENERATION_NUMBER_THRESHOLD

        private static final long GENERATION_NUMBER_THRESHOLD
        See Also:
        Constant Field Values
      • MAX_LENGTH_LONG

        static final int MAX_LENGTH_LONG
      • ALTERNATIVE_CHARSET

        private static final java.nio.charset.Charset ALTERNATIVE_CHARSET
      • MAX_RECUSRION_MSG

        private static final java.lang.String MAX_RECUSRION_MSG
      • recursionDepth

        private int recursionDepth
      • keyCache

        private final java.util.Map<java.lang.Long,​COSObjectKey> keyCache
      • utf8Decoder

        private final java.nio.charset.CharsetDecoder utf8Decoder
      • DEF

        public static final java.lang.String DEF
        This is a string constant that will be used for comparisons.
        See Also:
        Constant Field Values
      • ENDOBJ_STRING

        protected static final java.lang.String ENDOBJ_STRING
        This is a string constant that will be used for comparisons.
        See Also:
        Constant Field Values
      • ENDSTREAM_STRING

        protected static final java.lang.String ENDSTREAM_STRING
        This is a string constant that will be used for comparisons.
        See Also:
        Constant Field Values
      • STREAM_STRING

        protected static final java.lang.String STREAM_STRING
        This is a string constant that will be used for comparisons.
        See Also:
        Constant Field Values
      • TRUE

        private static final char[] TRUE
        This is a string constant that will be used for comparisons.
      • FALSE

        private static final char[] FALSE
        This is a string constant that will be used for comparisons.
      • NULL

        private static final char[] NULL
        This is a string constant that will be used for comparisons.
      • ASCII_LF

        protected static final byte ASCII_LF
        ASCII code for line feed.
        See Also:
        Constant Field Values
      • ASCII_CR

        protected static final byte ASCII_CR
        ASCII code for carriage return.
        See Also:
        Constant Field Values
      • source

        protected final RandomAccessRead source
        This is the stream that will be read from.
      • document

        protected COSDocument document
        This is the document that will be parsed.
    • Constructor Detail

    • Method Detail

      • isHexDigit

        private static boolean isHexDigit​(char ch)
      • getObjectKey

        protected COSObjectKey getObjectKey​(long num,
                                            int gen)
        Returns the object key for the given combination of object and generation number. The object key from the cross reference table/stream will be reused if available. Otherwise a newly created object will be returned.
        Parameters:
        num - the given object number
        gen - the given generation number
        Returns:
        the COS object key
      • parseCOSDictionaryValue

        private COSBase parseCOSDictionaryValue()
                                         throws java.io.IOException
        This will parse a PDF dictionary value.
        Returns:
        The parsed Dictionary object.
        Throws:
        java.io.IOException - If there is an error parsing the dictionary object.
      • getObjectFromPool

        private COSBase getObjectFromPool​(COSObjectKey key)
                                   throws java.io.IOException
        Throws:
        java.io.IOException
      • parseCOSDictionary

        protected COSDictionary parseCOSDictionary​(boolean isDirect)
                                            throws java.io.IOException
        This will parse a PDF dictionary.
        Parameters:
        isDirect - indicates whether the dictionary to be read is a direct object
        Returns:
        The parsed dictionary, never null.
        Throws:
        java.io.IOException - If there is an error reading the stream.
      • readUntilEndOfCOSDictionary

        private boolean readUntilEndOfCOSDictionary()
                                             throws java.io.IOException
        Keep reading until the end of the dictionary object or the file has been hit, or until a '/' has been found.
        Returns:
        true if the end of the object or the file has been found, false if not, i.e. that the caller can continue to parse the dictionary at the current position.
        Throws:
        java.io.IOException - if there is a reading error.
      • parseCOSDictionaryNameValuePair

        private boolean parseCOSDictionaryNameValuePair​(COSDictionary obj)
                                                 throws java.io.IOException
        Throws:
        java.io.IOException
      • skipWhiteSpaces

        protected void skipWhiteSpaces()
                                throws java.io.IOException
        Skip the upcoming CRLF or LF which are supposed to follow a stream. Trailing spaces are removed as well.
        Throws:
        java.io.IOException - if something went wrong
      • skipLinebreak

        protected boolean skipLinebreak()
                                 throws java.io.IOException
        Skip one line break, such as CR, LF or CRLF.
        Returns:
        true if a line break was found and removed.
        Throws:
        java.io.IOException - if something went wrong
      • skipLinebreak

        private boolean skipLinebreak​(int linebreak)
                               throws java.io.IOException
        Skip one line break, such as CR, LF or CRLF.
        Parameters:
        linebreak - the first character to be checked.
        Returns:
        true if a line break was found and removed.
        Throws:
        java.io.IOException - if something went wrong
      • checkForEndOfString

        private int checkForEndOfString​(int bracesParameter)
                                 throws java.io.IOException
        This is really a bug in the Document creators code, but it caused a crash in PDFBox, the first bug was in this format: /Title ( (5) /Creator which was patched in 1 place. However it missed the case where the number of opening and closing parenthesis isn't balanced The second bug was in this format /Title (c:\) /Producer
        Parameters:
        bracesParameter - the number of braces currently open.
        Returns:
        the corrected value of the brace counter
        Throws:
        java.io.IOException
      • parseCOSString

        protected COSString parseCOSString()
                                    throws java.io.IOException
        This will parse a PDF string.
        Returns:
        The parsed PDF string.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • parseCOSHexString

        private COSString parseCOSHexString()
                                     throws java.io.IOException
        This will parse a PDF HEX string with fail fast semantic meaning that we stop if a not allowed character is found. This is necessary in order to detect malformed input and be able to skip to next object start. We assume starting '<' was already read.
        Returns:
        The parsed PDF string.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • parseCOSArray

        protected COSArray parseCOSArray()
                                  throws java.io.IOException
        This will parse a PDF array object.
        Returns:
        The parsed PDF array.
        Throws:
        java.io.IOException - If there is an error parsing the stream.
      • isEndOfName

        protected boolean isEndOfName​(int ch)
        Determine if a character terminates a PDF name.
        Parameters:
        ch - The character
        Returns:
        true if the character terminates a PDF name, otherwise false.
      • parseCOSName

        protected COSName parseCOSName()
                                throws java.io.IOException
        This will parse a PDF name from the stream.
        Returns:
        The parsed PDF name.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • decodeBuffer

        private java.lang.String decodeBuffer​(java.io.ByteArrayOutputStream buffer)
                                       throws java.io.UnsupportedEncodingException
        Tries to decode the buffer cotent to an UTF-8 String. If that fails, tries the alternative Encoding.
        Parameters:
        buffer - the ByteArrayOutputStream containing the bytes to decode
        Returns:
        the decoded String
        Throws:
        java.io.UnsupportedEncodingException
      • parseDirObject

        protected COSBase parseDirObject()
                                  throws java.io.IOException
        This will parse a directory object from the stream.
        Returns:
        The parsed object.
        Throws:
        java.io.IOException - If there is an error during parsing.
      • parseCOSNumber

        private COSNumber parseCOSNumber()
                                  throws java.io.IOException
        Throws:
        java.io.IOException
      • readString

        protected java.lang.String readString()
                                       throws java.io.IOException
        This will read the next string from the stream.
        Returns:
        The string that was read from the stream, never null.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • readExpectedString

        protected final void readExpectedString​(char[] expectedString,
                                                boolean skipSpaces)
                                         throws java.io.IOException
        Reads given pattern from source. Skipping whitespace at start and end if wanted.
        Parameters:
        expectedString - pattern to be skipped
        skipSpaces - if set to true spaces before and after the string will be skipped
        Throws:
        java.io.IOException - if pattern could not be read
      • readExpectedChar

        protected void readExpectedChar​(char ec)
                                 throws java.io.IOException
        Read one char and throw an exception if it is not the expected value.
        Parameters:
        ec - the char value that is expected.
        Throws:
        java.io.IOException - if the read char is not the expected value or if an I/O error occurs.
      • readString

        @Deprecated
        protected java.lang.String readString​(int length)
                                       throws java.io.IOException
        Deprecated.
        this unused method will be removed in 4.0.
        This will read the next string from the stream up to a certain length.
        Parameters:
        length - The length to stop reading at.
        Returns:
        The string that was read from the stream of length 0 to length.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • isClosing

        protected boolean isClosing()
                             throws java.io.IOException
        This will tell if the next character is a closing brace( close of PDF array ).
        Returns:
        true if the next byte is ']', false otherwise.
        Throws:
        java.io.IOException - If an IO error occurs.
      • isClosing

        @Deprecated
        protected boolean isClosing​(int c)
        Deprecated.
        This unused method will be removed in 4.0.
        This will tell if the next character is a closing brace( close of PDF array ).
        Parameters:
        c - The character to check against end of line
        Returns:
        true if the next byte is ']', false otherwise.
      • readLine

        protected java.lang.String readLine()
                                     throws java.io.IOException
        This will read bytes until the first end of line marker occurs. NOTE: The EOL marker may consists of 1 (CR or LF) or 2 (CR and CL) bytes which is an important detail if one wants to unread the line.
        Returns:
        The characters between the current position and the end of the line.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • isEOL

        protected boolean isEOL()
                         throws java.io.IOException
        This will tell if the next byte to be read is an end of line byte.
        Returns:
        true if the next byte is 0x0A or 0x0D.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • isEOF

        protected boolean isEOF()
                         throws java.io.IOException
        This will tell if the end of the data is reached.
        Returns:
        true if the end of the data is reached.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • isEOL

        protected boolean isEOL​(int c)
        This will tell if the next byte to be read is an end of line byte.
        Parameters:
        c - The character to check against end of line
        Returns:
        true if the next byte is 0x0A or 0x0D.
      • isLF

        private boolean isLF​(int c)
      • isCR

        private boolean isCR​(int c)
      • isWhitespace

        protected boolean isWhitespace()
                                throws java.io.IOException
        This will tell if the next byte is whitespace or not.
        Returns:
        true if the next byte in the stream is a whitespace character.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • isWhitespace

        protected static boolean isWhitespace​(int c)
        This will tell if a character is whitespace or not. These values are specified in table 1 (page 12) of ISO 32000-1:2008.
        Parameters:
        c - The character to check against whitespace
        Returns:
        true if the character is a whitespace character.
      • isSpace

        protected boolean isSpace()
                           throws java.io.IOException
        This will tell if the next byte is a space or not.
        Returns:
        true if the next byte in the stream is a space character.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • isSpace

        protected boolean isSpace​(int c)
        This will tell if the given value is a space or not.
        Parameters:
        c - The character to check against space
        Returns:
        true if the next byte in the stream is a space character.
      • isDigit

        protected boolean isDigit()
                           throws java.io.IOException
        This will tell if the next byte is a digit or not.
        Returns:
        true if the next byte in the stream is a digit.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • isDigit

        protected static boolean isDigit​(int c)
        This will tell if the given value is a digit or not.
        Parameters:
        c - The character to be checked
        Returns:
        true if the next byte in the stream is a digit.
      • skipSpaces

        protected void skipSpaces()
                           throws java.io.IOException
        This will skip all spaces and comments that are present.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • readObjectNumber

        protected long readObjectNumber()
                                 throws java.io.IOException
        This will read a long from the Stream and throw an IOException if the long value is negative or has more than 10 digits (i.e. : bigger than OBJECT_NUMBER_THRESHOLD)
        Returns:
        the object number being read.
        Throws:
        java.io.IOException - if an I/O error occurs
      • readGenerationNumber

        protected int readGenerationNumber()
                                    throws java.io.IOException
        This will read a integer from the Stream and throw an IllegalArgumentException if the integer value has more than the maximum object revision (i.e. : bigger than GENERATION_NUMBER_THRESHOLD)
        Returns:
        the generation number being read.
        Throws:
        java.io.IOException - if an I/O error occurs
      • readInt

        protected int readInt()
                       throws java.io.IOException
        This will read an integer from the stream.
        Returns:
        The integer that was read from the stream.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • readLong

        protected long readLong()
                         throws java.io.IOException
        This will read an long from the stream.
        Returns:
        The long that was read from the stream.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • readStringNumber

        protected final java.lang.StringBuilder readStringNumber()
                                                          throws java.io.IOException
        This method is used to read a token by the readInt() and the readLong() method. Valid delimiters are any non digit values.
        Returns:
        the token to parse as integer or long by the calling method.
        Throws:
        java.io.IOException - throws by the source methods.