Package org.apache.pdfbox.multipdf
Class PDFMergerUtility
- java.lang.Object
-
- org.apache.pdfbox.multipdf.PDFMergerUtility
-
public class PDFMergerUtility extends java.lang.ObjectThis class will take a list of pdf documents and merge them, saving the result in a new document.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPDFMergerUtility.AcroFormMergeModeThe mode to use when merging AcroForm between documents:PDFMergerUtility.AcroFormMergeMode.JOIN_FORM_FIELDS_MODEfields with the same fully qualified name will be merged into one with the widget annotations of the merged fields becoming part of the same field.static classPDFMergerUtility.DocumentMergeModeThe mode to use when merging documents:PDFMergerUtility.DocumentMergeMode.OPTIMIZE_RESOURCES_MODEOptimizes resource handling such as closing documents early.
-
Field Summary
Fields Modifier and Type Field Description private PDFMergerUtility.AcroFormMergeModeacroFormMergeModeprivate PDDocumentInformationdestinationDocumentInformationprivate java.lang.StringdestinationFileNameprivate PDMetadatadestinationMetadataprivate java.io.OutputStreamdestinationStreamprivate PDFMergerUtility.DocumentMergeModedocumentMergeModeprivate booleanignoreAcroFormErrorsprivate static org.apache.commons.logging.LogLOGLog instance.private intnextFieldNumprivate java.util.List<java.lang.Object>sources
-
Constructor Summary
Constructors Constructor Description PDFMergerUtility()Instantiate a new PDFMergerUtility.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private voidacroFormJoinFieldsMode(PDFCloneUtility cloner, PDAcroForm destAcroForm, PDAcroForm srcAcroForm)private voidacroFormLegacyMode(PDFCloneUtility cloner, PDAcroForm destAcroForm, PDAcroForm srcAcroForm)voidaddSource(java.io.File source)Add a source file to the list of files to merge.voidaddSource(java.lang.String source)Add a source file to the list of files to merge.voidaddSource(RandomAccessRead source)Add a source to the list of documents to merge.voidaddSources(java.util.List<RandomAccessRead> sourcesList)Add a list of sources to the list of documents to merge.voidappendDocument(PDDocument destination, PDDocument source)append all pages from source to destination.PDFMergerUtility.AcroFormMergeModegetAcroFormMergeMode()Get the merge mode to be used for merging AcroForms between documentsPDFMergerUtility.AcroFormMergeModePDDocumentInformationgetDestinationDocumentInformation()Get the destination document information that is to be set inmergeDocuments(org.apache.pdfbox.io.RandomAccessStreamCache.StreamCacheCreateFunction).java.lang.StringgetDestinationFileName()Get the name of the destination file.PDMetadatagetDestinationMetadata()Set the destination metadata that is to be set inmergeDocuments(org.apache.pdfbox.io.RandomAccessStreamCache.StreamCacheCreateFunction).java.io.OutputStreamgetDestinationStream()Get the destination OutputStream.PDFMergerUtility.DocumentMergeModegetDocumentMergeMode()Get the merge mode to be used for merging documentsPDFMergerUtility.DocumentMergeMode(package private) static java.util.Map<java.lang.String,PDStructureElement>getIDTreeAsMap(PDNameTreeNode<PDStructureElement> idTree)(package private) static java.util.Map<java.lang.Integer,COSObjectable>getNumberTreeAsMap(PDNumberTreeNode tree)private booleanhasOnlyDocumentsOrParts(COSArray kLevelOneArray)private booleanisDynamicXfa(PDAcroForm acroForm)Test for dynamic XFA content.booleanisIgnoreAcroFormErrors()Indicates if acroform errors are ignored or not.private voidlegacyMergeDocuments(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, CompressParameters compressParameters)Merge the list of source documents, saving the result in the destination file.private voidmergeAcroForm(PDFCloneUtility cloner, PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog)voidmergeDocuments(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction)Merge the list of source documents, saving the result in the destination file.voidmergeDocuments(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, CompressParameters compressParameters)Merge the list of source documents, saving the result in the destination file.private voidmergeIDTree(PDFCloneUtility cloner, PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree)private voidmergeInto(COSDictionary src, COSDictionary dst, PDFCloneUtility cloner, java.util.Set<COSName> exclude)This will add all of the dictionaries keys/values to this dictionary, but only if they are not in an exclusion list and if they don't already exist.private voidmergeKEntries(PDFCloneUtility cloner, PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree)private voidmergeLanguage(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog)private voidmergeMarkInfo(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog)private voidmergeOpenAction(PDDocumentCatalog srcCatalog, PDDocumentCatalog dstCatalog, PDFCloneUtility cloner)private voidmergeOutputIntents(PDDocumentCatalog srcCatalog, PDDocumentCatalog destCatalog, PDFCloneUtility cloner)private voidmergeRoleMap(PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree, PDFCloneUtility cloner)private voidmergeViewerPreferences(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog, PDFCloneUtility cloner)private voidoptimizedMergeDocuments(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, CompressParameters compressParameters)voidsetAcroFormMergeMode(PDFMergerUtility.AcroFormMergeMode theAcroFormMergeMode)Set the merge mode to be used for merging AcroForms between documentsPDFMergerUtility.AcroFormMergeModevoidsetDestinationDocumentInformation(PDDocumentInformation info)Set the destination document information that is to be set inmergeDocuments(org.apache.pdfbox.io.RandomAccessStreamCache.StreamCacheCreateFunction).voidsetDestinationFileName(java.lang.String destination)Set the name of the destination file.voidsetDestinationMetadata(PDMetadata meta)Set the destination metadata that is to be set inmergeDocuments(org.apache.pdfbox.io.RandomAccessStreamCache.StreamCacheCreateFunction).voidsetDestinationStream(java.io.OutputStream destStream)Set the destination OutputStream.voidsetDocumentMergeMode(PDFMergerUtility.DocumentMergeMode theDocumentMergeMode)Set the merge mode to be used for merging documentsPDFMergerUtility.DocumentMergeModevoidsetIgnoreAcroFormErrors(boolean ignoreAcroFormErrorsValue)Set to true to ignore acroform errors.private voidupdatePageReferences(PDFCloneUtility cloner, java.util.Map<java.lang.Integer,COSObjectable> numberTreeAsMap, java.util.Map<COSDictionary,COSDictionary> objMapping)Update the Pg and Obj references to the new (merged) page.private voidupdatePageReferences(PDFCloneUtility cloner, COSArray parentTreeEntry, java.util.Map<COSDictionary,COSDictionary> objMapping)private voidupdatePageReferences(PDFCloneUtility cloner, COSDictionary parentTreeEntry, java.util.Map<COSDictionary,COSDictionary> objMapping)Update the Pg and Obj references to the new (merged) page.private voidupdateParentEntry(COSArray kArray, COSDictionary newParent, COSName newStructureType)Update the P reference to the new parent dictionary.private voidupdateStructParentEntries(PDPage page, int structParentOffset)Update the StructParents and StructParent values in a PDPage.
-
-
-
Field Detail
-
LOG
private static final org.apache.commons.logging.Log LOG
Log instance.
-
sources
private final java.util.List<java.lang.Object> sources
-
destinationFileName
private java.lang.String destinationFileName
-
destinationStream
private java.io.OutputStream destinationStream
-
ignoreAcroFormErrors
private boolean ignoreAcroFormErrors
-
destinationDocumentInformation
private PDDocumentInformation destinationDocumentInformation
-
destinationMetadata
private PDMetadata destinationMetadata
-
documentMergeMode
private PDFMergerUtility.DocumentMergeMode documentMergeMode
-
acroFormMergeMode
private PDFMergerUtility.AcroFormMergeMode acroFormMergeMode
-
nextFieldNum
private int nextFieldNum
-
-
Method Detail
-
getAcroFormMergeMode
public PDFMergerUtility.AcroFormMergeMode getAcroFormMergeMode()
Get the merge mode to be used for merging AcroForms between documentsPDFMergerUtility.AcroFormMergeMode- Returns:
- the current AcroFormMergeMode
-
setAcroFormMergeMode
public void setAcroFormMergeMode(PDFMergerUtility.AcroFormMergeMode theAcroFormMergeMode)
Set the merge mode to be used for merging AcroForms between documentsPDFMergerUtility.AcroFormMergeMode- Parameters:
theAcroFormMergeMode- AcroFormMergeMode to be used
-
getDocumentMergeMode
public PDFMergerUtility.DocumentMergeMode getDocumentMergeMode()
Get the merge mode to be used for merging documentsPDFMergerUtility.DocumentMergeMode- Returns:
- the current DocumentMergeMode
-
setDocumentMergeMode
public void setDocumentMergeMode(PDFMergerUtility.DocumentMergeMode theDocumentMergeMode)
Set the merge mode to be used for merging documentsPDFMergerUtility.DocumentMergeMode- Parameters:
theDocumentMergeMode- DocumentMergeMode to be used
-
getDestinationFileName
public java.lang.String getDestinationFileName()
Get the name of the destination file.- Returns:
- Returns the destination.
-
setDestinationFileName
public void setDestinationFileName(java.lang.String destination)
Set the name of the destination file.- Parameters:
destination- The destination to set.
-
getDestinationStream
public java.io.OutputStream getDestinationStream()
Get the destination OutputStream.- Returns:
- Returns the destination OutputStream.
-
setDestinationStream
public void setDestinationStream(java.io.OutputStream destStream)
Set the destination OutputStream.- Parameters:
destStream- The destination to set.
-
getDestinationDocumentInformation
public PDDocumentInformation getDestinationDocumentInformation()
Get the destination document information that is to be set inmergeDocuments(org.apache.pdfbox.io.RandomAccessStreamCache.StreamCacheCreateFunction). The default is null, which means that it is ignored.- Returns:
- The destination document information.
-
setDestinationDocumentInformation
public void setDestinationDocumentInformation(PDDocumentInformation info)
Set the destination document information that is to be set inmergeDocuments(org.apache.pdfbox.io.RandomAccessStreamCache.StreamCacheCreateFunction). The default is null, which means that it is ignored.- Parameters:
info- The destination document information.
-
getDestinationMetadata
public PDMetadata getDestinationMetadata()
Set the destination metadata that is to be set inmergeDocuments(org.apache.pdfbox.io.RandomAccessStreamCache.StreamCacheCreateFunction). The default is null, which means that it is ignored.- Returns:
- The destination metadata.
-
setDestinationMetadata
public void setDestinationMetadata(PDMetadata meta)
Set the destination metadata that is to be set inmergeDocuments(org.apache.pdfbox.io.RandomAccessStreamCache.StreamCacheCreateFunction). The default is null, which means that it is ignored.- Parameters:
meta- The destination metadata.
-
addSource
public void addSource(java.lang.String source) throws java.io.FileNotFoundExceptionAdd a source file to the list of files to merge.- Parameters:
source- Full path and file name of source document.- Throws:
java.io.FileNotFoundException- If the file doesn't exist
-
addSource
public void addSource(java.io.File source) throws java.io.FileNotFoundExceptionAdd a source file to the list of files to merge.- Parameters:
source- File representing source document- Throws:
java.io.FileNotFoundException- If the file doesn't exist
-
addSource
public void addSource(RandomAccessRead source)
Add a source to the list of documents to merge.- Parameters:
source- RandomAccessRead representing source document. To pass an InputStream, wrap it into aRandomAccessReadBuffer.
-
addSources
public void addSources(java.util.List<RandomAccessRead> sourcesList)
Add a list of sources to the list of documents to merge.- Parameters:
sourcesList- List of RandomAccessRead objects representing source documents
-
mergeDocuments
public void mergeDocuments(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction) throws java.io.IOException
Merge the list of source documents, saving the result in the destination file. The source list is not reset after merge. If you want to merge one document at a time, then it's better to useappendDocument(org.apache.pdfbox.pdmodel.PDDocument, org.apache.pdfbox.pdmodel.PDDocument).- Parameters:
streamCacheCreateFunction- a function to create an instance of a stream cache; in case ofnullunrestricted main memory is used- Throws:
java.io.IOException- If there is an error saving the document.
-
mergeDocuments
public void mergeDocuments(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, CompressParameters compressParameters) throws java.io.IOException
Merge the list of source documents, saving the result in the destination file. The source list is not reset after merge. If you want to merge one document at a time, then it's better to useappendDocument(org.apache.pdfbox.pdmodel.PDDocument, org.apache.pdfbox.pdmodel.PDDocument).- Parameters:
streamCacheCreateFunction- a function to create an instance of a stream cache; in case ofnullunrestricted main memory is usedcompressParameters- defines if compressed object streams are enabled- Throws:
java.io.IOException- If there is an error saving the document.
-
optimizedMergeDocuments
private void optimizedMergeDocuments(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, CompressParameters compressParameters) throws java.io.IOException
- Throws:
java.io.IOException
-
legacyMergeDocuments
private void legacyMergeDocuments(RandomAccessStreamCache.StreamCacheCreateFunction streamCacheCreateFunction, CompressParameters compressParameters) throws java.io.IOException
Merge the list of source documents, saving the result in the destination file.- Parameters:
streamCacheCreateFunction- a function to create an instance of a stream cache; in case ofnullunrestricted main memory is used- Throws:
java.io.IOException- If there is an error saving the document.
-
appendDocument
public void appendDocument(PDDocument destination, PDDocument source) throws java.io.IOException
append all pages from source to destination.- Parameters:
destination- the document to receive the pagessource- the document originating the new pages- Throws:
java.io.IOException- If there is an error accessing data from either document.
-
mergeOpenAction
private void mergeOpenAction(PDDocumentCatalog srcCatalog, PDDocumentCatalog dstCatalog, PDFCloneUtility cloner) throws java.io.IOException
- Throws:
java.io.IOException
-
mergeViewerPreferences
private void mergeViewerPreferences(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog, PDFCloneUtility cloner) throws java.io.IOException
- Throws:
java.io.IOException
-
mergeLanguage
private void mergeLanguage(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog)
-
mergeMarkInfo
private void mergeMarkInfo(PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog)
-
mergeKEntries
private void mergeKEntries(PDFCloneUtility cloner, PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree) throws java.io.IOException
- Throws:
java.io.IOException
-
hasOnlyDocumentsOrParts
private boolean hasOnlyDocumentsOrParts(COSArray kLevelOneArray)
-
updateParentEntry
private void updateParentEntry(COSArray kArray, COSDictionary newParent, COSName newStructureType)
Update the P reference to the new parent dictionary.- Parameters:
kArray- the kids arraynewParent- the new parentnewStructureType- the new structure type in /S or null so it doesn't get replaced
-
mergeIDTree
private void mergeIDTree(PDFCloneUtility cloner, PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree) throws java.io.IOException
- Throws:
java.io.IOException
-
getIDTreeAsMap
static java.util.Map<java.lang.String,PDStructureElement> getIDTreeAsMap(PDNameTreeNode<PDStructureElement> idTree) throws java.io.IOException
- Throws:
java.io.IOException
-
getNumberTreeAsMap
static java.util.Map<java.lang.Integer,COSObjectable> getNumberTreeAsMap(PDNumberTreeNode tree) throws java.io.IOException
- Throws:
java.io.IOException
-
mergeRoleMap
private void mergeRoleMap(PDStructureTreeRoot srcStructTree, PDStructureTreeRoot destStructTree, PDFCloneUtility cloner) throws java.io.IOException
- Throws:
java.io.IOException
-
mergeAcroForm
private void mergeAcroForm(PDFCloneUtility cloner, PDDocumentCatalog destCatalog, PDDocumentCatalog srcCatalog) throws java.io.IOException
- Throws:
java.io.IOException
-
acroFormJoinFieldsMode
private void acroFormJoinFieldsMode(PDFCloneUtility cloner, PDAcroForm destAcroForm, PDAcroForm srcAcroForm) throws java.io.IOException
- Throws:
java.io.IOException
-
acroFormLegacyMode
private void acroFormLegacyMode(PDFCloneUtility cloner, PDAcroForm destAcroForm, PDAcroForm srcAcroForm) throws java.io.IOException
- Throws:
java.io.IOException
-
mergeOutputIntents
private void mergeOutputIntents(PDDocumentCatalog srcCatalog, PDDocumentCatalog destCatalog, PDFCloneUtility cloner) throws java.io.IOException
- Throws:
java.io.IOException
-
isIgnoreAcroFormErrors
public boolean isIgnoreAcroFormErrors()
Indicates if acroform errors are ignored or not.- Returns:
- true if acroform errors are ignored
-
setIgnoreAcroFormErrors
public void setIgnoreAcroFormErrors(boolean ignoreAcroFormErrorsValue)
Set to true to ignore acroform errors.- Parameters:
ignoreAcroFormErrorsValue- true if acroform errors should be ignored
-
updatePageReferences
private void updatePageReferences(PDFCloneUtility cloner, java.util.Map<java.lang.Integer,COSObjectable> numberTreeAsMap, java.util.Map<COSDictionary,COSDictionary> objMapping) throws java.io.IOException
Update the Pg and Obj references to the new (merged) page.- Throws:
java.io.IOException
-
updatePageReferences
private void updatePageReferences(PDFCloneUtility cloner, COSDictionary parentTreeEntry, java.util.Map<COSDictionary,COSDictionary> objMapping) throws java.io.IOException
Update the Pg and Obj references to the new (merged) page.- Parameters:
parentTreeEntry-objMapping- mapping between old and new references- Throws:
java.io.IOException
-
updatePageReferences
private void updatePageReferences(PDFCloneUtility cloner, COSArray parentTreeEntry, java.util.Map<COSDictionary,COSDictionary> objMapping) throws java.io.IOException
- Throws:
java.io.IOException
-
updateStructParentEntries
private void updateStructParentEntries(PDPage page, int structParentOffset) throws java.io.IOException
Update the StructParents and StructParent values in a PDPage.- Parameters:
page- the new pagestructParentOffset- the offset which should be applied- Throws:
java.io.IOException
-
isDynamicXfa
private boolean isDynamicXfa(PDAcroForm acroForm)
Test for dynamic XFA content.- Parameters:
acroForm- the AcroForm- Returns:
- true if there is a dynamic XFA form.
-
mergeInto
private void mergeInto(COSDictionary src, COSDictionary dst, PDFCloneUtility cloner, java.util.Set<COSName> exclude) throws java.io.IOException
This will add all of the dictionaries keys/values to this dictionary, but only if they are not in an exclusion list and if they don't already exist. If a key already exists in this dictionary then nothing is changed.- Parameters:
src- The source dictionary to get the keys/values from.dst- The destination dictionary to merge the keys/values into.exclude- Names of keys that shall be skipped.- Throws:
java.io.IOException
-
-