Class PDFStreamParser


  • public class PDFStreamParser
    extends BaseParser
    This will parse a PDF byte stream and extract operands and such.
    • Field Detail

      • LOG

        private static final org.apache.commons.logging.Log LOG
        Log instance.
      • MAX_BIN_CHAR_TEST_LENGTH

        private static final int MAX_BIN_CHAR_TEST_LENGTH
        See Also:
        Constant Field Values
      • binCharTestArr

        private final byte[] binCharTestArr
      • inlineImageDepth

        private int inlineImageDepth
      • inlineOffset

        private long inlineOffset
    • Constructor Detail

      • PDFStreamParser

        public PDFStreamParser​(PDContentStream pdContentstream)
                        throws java.io.IOException
        Constructor.
        Parameters:
        pdContentstream - The content stream to parse.
        Throws:
        java.io.IOException - If there is an error initializing the stream.
      • PDFStreamParser

        public PDFStreamParser​(byte[] bytes)
        Constructor.
        Parameters:
        bytes - the bytes to parse.
    • Method Detail

      • parse

        public java.util.List<java.lang.Object> parse()
                                               throws java.io.IOException
        This will parse all the tokens in the stream. This will close the stream when it is finished parsing.
        Returns:
        All of the tokens in the stream.
        Throws:
        java.io.IOException - If there is an error while parsing the stream.
      • parseNextToken

        public java.lang.Object parseNextToken()
                                        throws java.io.IOException
        This will parse the next token in the stream.
        Returns:
        The next token in the stream or null if there are no more tokens in the stream.
        Throws:
        java.io.IOException - If an io error occurs while parsing the stream.
      • hasNoFollowingBinData

        private boolean hasNoFollowingBinData()
                                       throws java.io.IOException
        Looks up an amount of bytes if they contain only ASCII characters (no control sequences etc.), and that these ASCII characters begin with a sequence of 1-3 non-blank characters between blanks
        Returns:
        true if next bytes are probably printable ASCII characters starting with a PDF operator, otherwise false
        Throws:
        java.io.IOException
      • readOperator

        private java.lang.String readOperator()
                                       throws java.io.IOException
        This will read an operator from the stream.
        Returns:
        The operator that was read from the stream.
        Throws:
        java.io.IOException - If there is an error reading from the stream.
      • isSpaceOrReturn

        private boolean isSpaceOrReturn​(int c)
      • hasNextSpaceOrReturn

        private boolean hasNextSpaceOrReturn()
                                      throws java.io.IOException
        Checks if the next char is a space or a return.
        Returns:
        true if the next char is a space or a return
        Throws:
        java.io.IOException - if something went wrong
      • close

        public void close()
                   throws java.io.IOException
        Close the underlying resource.
        Throws:
        java.io.IOException - if something went wrong