Package org.apache.pdfbox.tools
Class ExtractText
- java.lang.Object
-
- org.apache.pdfbox.tools.ExtractText
-
public final class ExtractText extends java.lang.ObjectThis is the main program that simply parses the pdf document and transforms it into text.
-
-
Field Summary
Fields Modifier and Type Field Description private static java.lang.StringALWAYSNEXTprivate static java.lang.StringCONSOLEprivate static java.lang.StringDEBUGprivate booleandebugOutputprivate static java.lang.StringENCODINGprivate static java.lang.StringEND_PAGEprivate static java.lang.StringHTMLprivate static java.lang.StringIGNORE_BEADSprivate static org.apache.commons.logging.LogLOGprivate static java.lang.StringPASSWORDprivate static java.lang.StringROTATION_MAGICprivate static java.lang.StringSORTprivate static java.lang.StringSTART_PAGEprivate static java.lang.StringSTD_ENCODING
-
Constructor Summary
Constructors Modifier Constructor Description privateExtractText()private constructor.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidextractPages(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, java.io.Writer output, boolean rotationMagic, boolean alwaysNext)(package private) static intgetAngle(TextPosition text)static voidmain(java.lang.String[] args)Infamous main method.voidstartExtraction(java.lang.String[] args)Starts the text extraction.private longstartProcessing(java.lang.String message)private voidstopProcessing(java.lang.String message, long startTime)private static voidusage()This will print the usage requirements and exit.
-
-
-
Field Detail
-
LOG
private static final org.apache.commons.logging.Log LOG
-
PASSWORD
private static final java.lang.String PASSWORD
- See Also:
- Constant Field Values
-
ENCODING
private static final java.lang.String ENCODING
- See Also:
- Constant Field Values
-
CONSOLE
private static final java.lang.String CONSOLE
- See Also:
- Constant Field Values
-
START_PAGE
private static final java.lang.String START_PAGE
- See Also:
- Constant Field Values
-
END_PAGE
private static final java.lang.String END_PAGE
- See Also:
- Constant Field Values
-
SORT
private static final java.lang.String SORT
- See Also:
- Constant Field Values
-
IGNORE_BEADS
private static final java.lang.String IGNORE_BEADS
- See Also:
- Constant Field Values
-
DEBUG
private static final java.lang.String DEBUG
- See Also:
- Constant Field Values
-
HTML
private static final java.lang.String HTML
- See Also:
- Constant Field Values
-
ALWAYSNEXT
private static final java.lang.String ALWAYSNEXT
- See Also:
- Constant Field Values
-
ROTATION_MAGIC
private static final java.lang.String ROTATION_MAGIC
- See Also:
- Constant Field Values
-
STD_ENCODING
private static final java.lang.String STD_ENCODING
- See Also:
- Constant Field Values
-
debugOutput
private boolean debugOutput
-
-
Method Detail
-
main
public static void main(java.lang.String[] args) throws java.io.IOExceptionInfamous main method.- Parameters:
args- Command line arguments, should be one and a reference to a file.- Throws:
java.io.IOException- if there is an error reading the document or extracting the text.
-
startExtraction
public void startExtraction(java.lang.String[] args) throws java.io.IOExceptionStarts the text extraction.- Parameters:
args- the commandline arguments.- Throws:
java.io.IOException- if there is an error reading the document or extracting the text.
-
extractPages
private void extractPages(int startPage, int endPage, PDFTextStripper stripper, PDDocument document, java.io.Writer output, boolean rotationMagic, boolean alwaysNext) throws java.io.IOException- Throws:
java.io.IOException
-
startProcessing
private long startProcessing(java.lang.String message)
-
stopProcessing
private void stopProcessing(java.lang.String message, long startTime)
-
getAngle
static int getAngle(TextPosition text)
-
usage
private static void usage()
This will print the usage requirements and exit.
-
-