Class BruteForceParser

  • All Implemented Interfaces:
    ICOSParser

    public class BruteForceParser
    extends COSParser
    Brute force parser to be used as last resort if a malformed pdf can't be read.
    • Field Detail

      • XREF_TABLE

        private static final char[] XREF_TABLE
      • XREF_STREAM

        private static final char[] XREF_STREAM
      • EOF_MARKER

        private static final char[] EOF_MARKER
        EOF-marker.
      • OBJ_MARKER

        private static final char[] OBJ_MARKER
        obj-marker.
      • TRAILER_MARKER

        private static final char[] TRAILER_MARKER
        trailer-marker.
      • OBJ_STREAM

        private static final char[] OBJ_STREAM
        ObjStream-marker.
      • LOG

        private static final org.apache.commons.logging.Log LOG
      • bfSearchCOSObjectKeyOffsets

        private final java.util.Map<COSObjectKey,​java.lang.Long> bfSearchCOSObjectKeyOffsets
        Contains all found objects of a brute force search.
      • bfSearchTriggered

        private boolean bfSearchTriggered
    • Constructor Detail

      • BruteForceParser

        public BruteForceParser​(RandomAccessRead source,
                                COSDocument document)
                         throws java.io.IOException
        Constructor. Triggers a brute force search for all objects of the document.
        Parameters:
        source - input representing the pdf.
        document - the corresponding COS document
        Throws:
        java.io.IOException - if the source data could not be read
    • Method Detail

      • bfSearchTriggered

        public boolean bfSearchTriggered()
        Indicates wether the brute force search for objects was triggered.
        Returns:
        true if the search was triggered
      • getBFCOSObjectOffsets

        protected java.util.Map<COSObjectKey,​java.lang.Long> getBFCOSObjectOffsets()
                                                                                  throws java.io.IOException
        Returns all found objects of a brute force search.
        Returns:
        map containing all found objects of a brute force search
        Throws:
        java.io.IOException - if something went wrong
      • bfSearchForObjects

        private void bfSearchForObjects()
                                 throws java.io.IOException
        Brute force search for every object in the pdf.
        Throws:
        java.io.IOException - if something went wrong
      • bfSearchForXRef

        protected long bfSearchForXRef​(long xrefOffset)
                                throws java.io.IOException
        Search for the offset of the given xref table/stream among those found by a brute force search.
        Parameters:
        xrefOffset - the given offset to be searched for
        Returns:
        the offset of the xref entry
        Throws:
        java.io.IOException - if something went wrong
      • searchNearestValue

        private long searchNearestValue​(java.util.List<java.lang.Long> values,
                                        long offset)
      • bfSearchForObjStreams

        protected void bfSearchForObjStreams​(XrefTrailerResolver trailerResolver,
                                             SecurityHandler<? extends ProtectionPolicy> securityHandler)
                                      throws java.io.IOException
        Brute force search for all objects streams of a pdf.
        Parameters:
        trailerResolver - the trailer resolver of the document
        securityHandler - security handler to be used to decrypt encrypted documents
        Throws:
        java.io.IOException - if something went wrong
      • bfSearchForTrailer

        private boolean bfSearchForTrailer​(COSDictionary trailer)
                                    throws java.io.IOException
        Brute force search for all trailer marker.
        Parameters:
        trailer - dictionary to be used as trailer dictionary
        Throws:
        java.io.IOException - if something went wrong
      • searchForTrailerItems

        private boolean searchForTrailerItems​(COSDictionary trailer)
                                       throws java.io.IOException
        Search for the different parts of the trailer dictionary.
        Parameters:
        trailer - dictionary to be used as trailer dictionary
        Returns:
        true if the root was found, false if not.
        Throws:
        java.io.IOException - if something went wrong
      • compareCOSObjects

        private COSObject compareCOSObjects​(COSObject newObject,
                                            java.lang.Long newOffset,
                                            COSObject currentObject)
      • bfSearchForLastEOFMarker

        private long bfSearchForLastEOFMarker()
                                       throws java.io.IOException
        Brute force search for the last EOF marker.
        Throws:
        java.io.IOException - if something went wrong
      • bfSearchForObjStreamOffsets

        private java.util.Map<java.lang.Long,​COSObjectKey> bfSearchForObjStreamOffsets()
                                                                                      throws java.io.IOException
        Search for all offsets of object streams within the given pdf
        Returns:
        a map of all offsets for object streams
        Throws:
        java.io.IOException - if something went wrong
      • bfSearchForXRefTables

        private java.util.List<java.lang.Long> bfSearchForXRefTables()
                                                              throws java.io.IOException
        Brute force search for all xref entries (tables).
        Throws:
        java.io.IOException - if something went wrong
      • bfSearchForXRefStreams

        private java.util.List<java.lang.Long> bfSearchForXRefStreams()
                                                               throws java.io.IOException
        Brute force search for all /XRef entries (streams).
        Throws:
        java.io.IOException - if something went wrong
      • isInfo

        private boolean isInfo​(COSDictionary dictionary)
        Tell if the dictionary is an info dictionary.
        Parameters:
        dictionary - the dictionary to be checked
        Returns:
        true if the given dictionary is an info dictionary
      • isCatalog

        private boolean isCatalog​(COSDictionary dictionary)
        Tell if the dictionary is a PDF or FDF catalog.
        Parameters:
        dictionary -
        Returns:
        true if the given dictionary is a root dictionary
      • findString

        private long findString​(char[] string)
                         throws java.io.IOException
        Search for the given string. The search starts at the current position and returns the start position if the string was found. -1 is returned if there isn't any further occurrence of the given string. After returning the current position is either the end of the string or the end of the input.
        Parameters:
        string - the string to be searched
        Returns:
        the start position of the found string
        Throws:
        java.io.IOException - if something went wrong
      • rebuildTrailer

        protected COSDictionary rebuildTrailer​(XrefTrailerResolver trailerResolver,
                                               SecurityHandler<? extends ProtectionPolicy> securityHandler)
                                        throws java.io.IOException
        Rebuild the trailer dictionary if startxref can't be found.
        Parameters:
        trailerResolver - the trailer resolver of the document
        securityHandler - security handler to be used to decrypt encrypted documents
        Returns:
        the rebuild trailer dictionary
        Throws:
        java.io.IOException - if something went wrong