Package org.apache.pdfbox.multipdf
Class Splitter
- java.lang.Object
-
- org.apache.pdfbox.multipdf.Splitter
-
public class Splitter extends java.lang.ObjectSplit a document into several other documents.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private classSplitter.KClonerClass to help clone the /K tree.
-
Field Summary
Fields Modifier and Type Field Description private java.util.Map<COSDictionary,COSDictionary>annotDictMapprivate java.util.List<java.util.Map<COSDictionary,COSDictionary>>annotDictMapsprivate PDDocumentcurrentDestinationDocumentprivate intcurrentPageNumberprivate java.util.List<PDDocument>destinationDocumentsprivate java.util.Map<PDPageDestination,PDPage>destToFixMapprivate intendPageprivate java.util.Set<java.lang.String>idSetprivate static org.apache.commons.logging.LogLOGprivate java.util.Map<COSDictionary,COSDictionary>pageDictMapprivate java.util.List<java.util.Map<COSDictionary,COSDictionary>>pageDictMapsprivate java.util.Set<COSName>roleSetprivate PDDocumentsourceDocumentprivate intsplitLengthprivate intstartPageprivate RandomAccessStreamCache.StreamCacheCreateFunctionstreamCacheCreateFunctionprivate java.util.Map<COSDictionary,COSDictionary>structDictMap
-
Constructor Summary
Constructors Constructor Description Splitter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidcloneIDTree(PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree)private voidcloneRoleMap(PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree)private voidcloneStructureTree(PDDocument destinationDocument)Clone the structure tree from the source to the current destination document.private voidcloneTreeElement(java.util.Map<java.lang.Integer,COSObjectable> srcNumberTreeAsMap, java.util.Map<java.lang.Integer,COSObjectable> dstNumberTreeAsMap, int sp)protected PDDocumentcreateNewDocument()Create a new document to write the split contents to.private voidcreateNewDocumentIfNecessary()Helper method for creating new documents at the appropriate pages.private voidfixDestinations(PDDocument destinationDocument)Replace the page destinations, if the source and destination pages are in the target document.protected PDDocumentgetDestinationDocument()The source PDF document.protected PDDocumentgetSourceDocument()The source PDF document.RandomAccessStreamCache.StreamCacheCreateFunctiongetStreamCacheCreateFunction()private voidprocessAnnotations(PDPage imported)Clone all annotations because of changes possibly made, and because the structure tree is cloned.protected voidprocessPage(PDPage page)Interface to start processing a new page.private voidprocessPages()Interface method to handle the start of the page processing.private voidprocessResources(PDResources res, java.util.Map<java.lang.Integer,COSObjectable> srcNumberTreeAsMap, java.util.Map<java.lang.Integer,COSObjectable> dstNumberTreeAsMap, java.util.Set<COSDictionary> visited)voidsetEndPage(int end)This will set the end page.voidsetSplitAtPage(int split)This will tell the splitting algorithm where to split the pages.voidsetStartPage(int start)This will set the start page.voidsetStreamCacheCreateFunction(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction)Set the current function to be used to create an instance of stream cache.java.util.List<PDDocument>split(PDDocument document)This will take a document and split into several other documents.protected booleansplitAtPage(int pageNumber)Check if it is necessary to create a new document.
-
-
-
Field Detail
-
LOG
private static final org.apache.commons.logging.Log LOG
-
sourceDocument
private PDDocument sourceDocument
-
currentDestinationDocument
private PDDocument currentDestinationDocument
-
splitLength
private int splitLength
-
startPage
private int startPage
-
endPage
private int endPage
-
destinationDocuments
private java.util.List<PDDocument> destinationDocuments
-
pageDictMap
private java.util.Map<COSDictionary,COSDictionary> pageDictMap
-
pageDictMaps
private java.util.List<java.util.Map<COSDictionary,COSDictionary>> pageDictMaps
-
structDictMap
private java.util.Map<COSDictionary,COSDictionary> structDictMap
-
annotDictMaps
private java.util.List<java.util.Map<COSDictionary,COSDictionary>> annotDictMaps
-
annotDictMap
private java.util.Map<COSDictionary,COSDictionary> annotDictMap
-
destToFixMap
private java.util.Map<PDPageDestination,PDPage> destToFixMap
-
idSet
private java.util.Set<java.lang.String> idSet
-
roleSet
private java.util.Set<COSName> roleSet
-
currentPageNumber
private int currentPageNumber
-
streamCacheCreateFunction
private RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction
-
-
Method Detail
-
getStreamCacheCreateFunction
public RandomAccessStreamCache.StreamCacheCreateFunction getStreamCacheCreateFunction()
- Returns:
- the current function to be used to create an instance of stream cache.
-
setStreamCacheCreateFunction
public void setStreamCacheCreateFunction(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction)
Set the current function to be used to create an instance of stream cache.- Parameters:
streamCacheCreateFunction- the current function to be used to create an instance of stream cache.
-
split
public java.util.List<PDDocument> split(PDDocument document) throws java.io.IOException
This will take a document and split into several other documents.- Parameters:
document- The document to split.- Returns:
- A list of all the split documents. These should all be saved before closing any documents, including the source document. Any further operations should be made after reloading them, to avoid problems due to resource sharing. For the same reason, they should not be saved with encryption.
- Throws:
java.io.IOException- If there is an IOError
-
fixDestinations
private void fixDestinations(PDDocument destinationDocument)
Replace the page destinations, if the source and destination pages are in the target document. This must be called after all pages (and its annotations) are processed.- Parameters:
destinationDocument-
-
cloneStructureTree
private void cloneStructureTree(PDDocument destinationDocument) throws java.io.IOException
Clone the structure tree from the source to the current destination document. This must be called after all pages are processed.- Parameters:
destinationDocument-- Throws:
java.io.IOException
-
cloneIDTree
private void cloneIDTree(PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree) throws java.io.IOException
- Throws:
java.io.IOException
-
cloneRoleMap
private void cloneRoleMap(PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree)
-
cloneTreeElement
private void cloneTreeElement(java.util.Map<java.lang.Integer,COSObjectable> srcNumberTreeAsMap, java.util.Map<java.lang.Integer,COSObjectable> dstNumberTreeAsMap, int sp)
-
processResources
private void processResources(PDResources res, java.util.Map<java.lang.Integer,COSObjectable> srcNumberTreeAsMap, java.util.Map<java.lang.Integer,COSObjectable> dstNumberTreeAsMap, java.util.Set<COSDictionary> visited) throws java.io.IOException
- Throws:
java.io.IOException
-
setSplitAtPage
public void setSplitAtPage(int split)
This will tell the splitting algorithm where to split the pages. The default is 1, so every page will become a new document. If it was two then each document would contain 2 pages. If the source document had 5 pages it would split into 3 new documents, 2 documents containing 2 pages and 1 document containing one page.- Parameters:
split- The number of pages each split document should contain.- Throws:
java.lang.IllegalArgumentException- if the page is smaller than one.
-
setStartPage
public void setStartPage(int start)
This will set the start page.- Parameters:
start- the 1-based start page- Throws:
java.lang.IllegalArgumentException- if the start page is smaller than one.
-
setEndPage
public void setEndPage(int end)
This will set the end page.- Parameters:
end- the 1-based end page- Throws:
java.lang.IllegalArgumentException- if the end page is smaller than one or than the start page.
-
processPages
private void processPages() throws java.io.IOExceptionInterface method to handle the start of the page processing.- Throws:
java.io.IOException- If an IO error occurs.
-
createNewDocumentIfNecessary
private void createNewDocumentIfNecessary() throws java.io.IOExceptionHelper method for creating new documents at the appropriate pages.- Throws:
java.io.IOException- If there is an error creating the new document.
-
splitAtPage
protected boolean splitAtPage(int pageNumber)
Check if it is necessary to create a new document. By default a split occurs at every page. If you wanted to split based on some complex logic then you could override this method. For example.protected void splitAtPage() { // will split at pages with prime numbers only return isPrime(pageNumber); }- Parameters:
pageNumber- the 0-based page number to be checked as splitting page- Returns:
- true If a new document should be created.
-
createNewDocument
protected PDDocument createNewDocument() throws java.io.IOException
Create a new document to write the split contents to.- Returns:
- the newly created PDDocument.
- Throws:
java.io.IOException- If there is an problem creating the new document.
-
processPage
protected void processPage(PDPage page) throws java.io.IOException
Interface to start processing a new page.- Parameters:
page- The page that is about to get processed.- Throws:
java.io.IOException- If there is an error creating the new document.
-
processAnnotations
private void processAnnotations(PDPage imported) throws java.io.IOException
Clone all annotations because of changes possibly made, and because the structure tree is cloned.- Parameters:
imported-- Throws:
java.io.IOException
-
getSourceDocument
protected final PDDocument getSourceDocument()
The source PDF document.- Returns:
- the pdf to be split
-
getDestinationDocument
protected final PDDocument getDestinationDocument()
The source PDF document.- Returns:
- current destination pdf
-
-