Package org.apache.pdfbox.tools
Class ExtractText
- java.lang.Object
-
- org.apache.pdfbox.tools.ExtractText
-
- All Implemented Interfaces:
java.util.concurrent.Callable<java.lang.Integer>
public final class ExtractText extends java.lang.Object implements java.util.concurrent.Callable<java.lang.Integer>This is the main program that simply parses the pdf document and transforms it into text.
-
-
Field Summary
Fields Modifier and Type Field Description private booleanaddFileNameprivate booleanalwaysNextprivate booleanappendprivate booleandebugprivate java.lang.Stringencodingprivate intendPageprivate booleanignoreBeadsprivate java.io.Fileinfileprivate static org.apache.commons.logging.LogLOGprivate java.io.Fileoutfileprivate java.lang.Stringpasswordprivate booleanrotationMagicprivate booleansortprivate intstartPageprivate static java.lang.StringSTD_ENCODINGprivate java.io.PrintStreamSYSERRprivate java.io.PrintStreamSYSOUTprivate booleantoConsoleprivate booleantoHTMLprivate booleantoMD
-
Constructor Summary
Constructors Constructor Description ExtractText()Constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.Integercall()Starts the text extraction.private java.io.WritercreateOutputWriter()private voidextractPages(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, java.io.Writer output, boolean rotationMagic, boolean alwaysNext)(package private) static intgetAngle(TextPosition text)static voidmain(java.lang.String[] args)Infamous main method.private longstartProcessing(java.lang.String message)private voidstopProcessing(java.lang.String message, long startTime)
-
-
-
Field Detail
-
LOG
private static final org.apache.commons.logging.Log LOG
-
STD_ENCODING
private static final java.lang.String STD_ENCODING
- See Also:
- Constant Field Values
-
SYSOUT
private final java.io.PrintStream SYSOUT
-
SYSERR
private final java.io.PrintStream SYSERR
-
alwaysNext
private boolean alwaysNext
-
toConsole
private boolean toConsole
-
debug
private boolean debug
-
encoding
private java.lang.String encoding
-
endPage
private int endPage
-
toHTML
private boolean toHTML
-
toMD
private boolean toMD
-
ignoreBeads
private boolean ignoreBeads
-
password
private java.lang.String password
-
rotationMagic
private boolean rotationMagic
-
sort
private boolean sort
-
startPage
private int startPage
-
infile
private java.io.File infile
-
outfile
private java.io.File outfile
-
addFileName
private boolean addFileName
-
append
private boolean append
-
-
Method Detail
-
main
public static void main(java.lang.String[] args)
Infamous main method.- Parameters:
args- Command line arguments, should be one and a reference to a file.
-
call
public java.lang.Integer call()
Starts the text extraction.- Specified by:
callin interfacejava.util.concurrent.Callable<java.lang.Integer>
-
createOutputWriter
private java.io.Writer createOutputWriter() throws java.io.IOException- Throws:
java.io.IOException
-
extractPages
private void extractPages(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, java.io.Writer output, boolean rotationMagic, boolean alwaysNext) throws java.io.IOException- Throws:
java.io.IOException
-
startProcessing
private long startProcessing(java.lang.String message)
-
stopProcessing
private void stopProcessing(java.lang.String message, long startTime)
-
getAngle
static int getAngle(TextPosition text)
-
-