Class ExtractText

  • All Implemented Interfaces:
    java.util.concurrent.Callable<java.lang.Integer>

    public final class ExtractText
    extends java.lang.Object
    implements java.util.concurrent.Callable<java.lang.Integer>
    This is the main program that simply parses the pdf document and transforms it into text.
    • Field Detail

      • LOG

        private static final org.apache.commons.logging.Log LOG
      • SYSOUT

        private final java.io.PrintStream SYSOUT
      • SYSERR

        private final java.io.PrintStream SYSERR
      • alwaysNext

        private boolean alwaysNext
      • toConsole

        private boolean toConsole
      • debug

        private boolean debug
      • encoding

        private java.lang.String encoding
      • endPage

        private int endPage
      • toHTML

        private boolean toHTML
      • toMD

        private boolean toMD
      • ignoreBeads

        private boolean ignoreBeads
      • password

        private java.lang.String password
      • rotationMagic

        private boolean rotationMagic
      • sort

        private boolean sort
      • startPage

        private int startPage
      • infile

        private java.io.File infile
      • outfile

        private java.io.File outfile
      • addFileName

        private boolean addFileName
      • append

        private boolean append
    • Constructor Detail

      • ExtractText

        public ExtractText()
        Constructor.
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)
        Infamous main method.
        Parameters:
        args - Command line arguments, should be one and a reference to a file.
      • call

        public java.lang.Integer call()
        Starts the text extraction.
        Specified by:
        call in interface java.util.concurrent.Callable<java.lang.Integer>
      • createOutputWriter

        private java.io.Writer createOutputWriter()
                                           throws java.io.IOException
        Throws:
        java.io.IOException
      • extractPages

        private void extractPages​(int startPage,
                                  int endPage,
                                  PDFTextStripper stripper,
                                  PDDocument document,
                                  java.io.Writer output,
                                  boolean rotationMagic,
                                  boolean alwaysNext)
                           throws java.io.IOException
        Throws:
        java.io.IOException
      • startProcessing

        private long startProcessing​(java.lang.String message)
      • stopProcessing

        private void stopProcessing​(java.lang.String message,
                                    long startTime)