Package org.apache.lucene.util
Class OfflineSorter
- java.lang.Object
-
- org.apache.lucene.util.OfflineSorter
-
public class OfflineSorter extends java.lang.ObjectOn-disk sorting of byte arrays. Each byte array (entry) is a composed of the following fields:- (two bytes) length of the following byte array,
- exactly the above count of bytes for the sequence to be sorted.
- See Also:
sort(String)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classOfflineSorter.BufferSizeA bit more descriptive unit for constructors.static classOfflineSorter.ByteSequencesReaderUtility class to read length-prefixed byte[] entries from an input.static classOfflineSorter.ByteSequencesWriterUtility class to emit length-prefixed byte[] entries to an output stream for sorting.(package private) static classOfflineSorter.FileAndTopprivate classOfflineSorter.MergePartitionsTaskMerges multiple file-based partitions to a single on-disk partition.private static classOfflineSorter.PartitionHolds one partition of items, either loaded into memory or based on a file.classOfflineSorter.SortInfoSort info (debugging mostly).private classOfflineSorter.SortPartitionTaskSorts one in-memory partition, writes it to disk, and returns the resulting file-based partition.
-
Field Summary
Fields Modifier and Type Field Description static longABSOLUTE_MIN_SORT_BUFFER_SIZEAbsolute minimum required buffer size for sorting.private java.util.Comparator<BytesRef>comparatorstatic java.util.Comparator<BytesRef>DEFAULT_COMPARATORDefault comparator: sorts in binary (codepoint) orderprivate Directorydirprivate java.util.concurrent.ExecutorServiceexecstatic longGBConvenience constant for gigabytesstatic intMAX_TEMPFILESMaximum number of temporary files before doing an intermediate merge.private intmaxTempFilesstatic longMBConvenience constant for megabytesstatic longMIN_BUFFER_SIZE_MBMinimum recommended buffer size for sorting.private static java.lang.StringMIN_BUFFER_SIZE_MSGprivate java.util.concurrent.SemaphorepartitionsInRAMprivate OfflineSorter.BufferSizeramBufferSize(package private) OfflineSorter.SortInfosortInfoprivate java.lang.StringtempFileNamePrefixprivate intvalueLength
-
Constructor Summary
Constructors Constructor Description OfflineSorter(Directory dir, java.lang.String tempFileNamePrefix)Defaults constructor.OfflineSorter(Directory dir, java.lang.String tempFileNamePrefix, java.util.Comparator<BytesRef> comparator)Defaults constructor with a custom comparator.OfflineSorter(Directory dir, java.lang.String tempFileNamePrefix, java.util.Comparator<BytesRef> comparator, OfflineSorter.BufferSize ramBufferSize, int maxTempfiles, int valueLength, java.util.concurrent.ExecutorService exec, int maxPartitionsInRAM)All-details constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.util.Comparator<BytesRef>getComparator()Returns the comparator in use to sort entriesDirectorygetDirectory()Returns theDirectorywe use to create temp files.private OfflineSorter.PartitiongetPartition(java.util.concurrent.Future<OfflineSorter.Partition> future)protected OfflineSorter.ByteSequencesReadergetReader(ChecksumIndexInput in, java.lang.String name)Subclasses can override to change how byte sequences are read from disk.java.lang.StringgetTempFileNamePrefix()Returns the temp file name prefix passed toDirectory.createTempOutput(java.lang.String, java.lang.String, org.apache.lucene.store.IOContext)to generate temporary files.protected OfflineSorter.ByteSequencesWritergetWriter(IndexOutput out, long itemCount)Subclasses can override to change how byte sequences are written to disk.(package private) voidmergePartitions(Directory trackingDir, java.util.List<java.util.concurrent.Future<OfflineSorter.Partition>> segments)Merge the most recentmaxTempFilepartitions into a new partition.(package private) OfflineSorter.PartitionreadPartition(OfflineSorter.ByteSequencesReader reader)Read in a single partition of data, setting isExhausted[0] to true if there are no more items.java.lang.Stringsort(java.lang.String inputFileName)Sort input to a new temp file, returning its name.private voidverifyChecksum(java.lang.Throwable priorException, OfflineSorter.ByteSequencesReader reader)Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.
-
-
-
Field Detail
-
MB
public static final long MB
Convenience constant for megabytes- See Also:
- Constant Field Values
-
GB
public static final long GB
Convenience constant for gigabytes- See Also:
- Constant Field Values
-
MIN_BUFFER_SIZE_MB
public static final long MIN_BUFFER_SIZE_MB
Minimum recommended buffer size for sorting.- See Also:
- Constant Field Values
-
ABSOLUTE_MIN_SORT_BUFFER_SIZE
public static final long ABSOLUTE_MIN_SORT_BUFFER_SIZE
Absolute minimum required buffer size for sorting.- See Also:
- Constant Field Values
-
MIN_BUFFER_SIZE_MSG
private static final java.lang.String MIN_BUFFER_SIZE_MSG
- See Also:
- Constant Field Values
-
MAX_TEMPFILES
public static final int MAX_TEMPFILES
Maximum number of temporary files before doing an intermediate merge.- See Also:
- Constant Field Values
-
dir
private final Directory dir
-
valueLength
private final int valueLength
-
tempFileNamePrefix
private final java.lang.String tempFileNamePrefix
-
exec
private final java.util.concurrent.ExecutorService exec
-
partitionsInRAM
private final java.util.concurrent.Semaphore partitionsInRAM
-
ramBufferSize
private final OfflineSorter.BufferSize ramBufferSize
-
sortInfo
OfflineSorter.SortInfo sortInfo
-
maxTempFiles
private int maxTempFiles
-
comparator
private final java.util.Comparator<BytesRef> comparator
-
DEFAULT_COMPARATOR
public static final java.util.Comparator<BytesRef> DEFAULT_COMPARATOR
Default comparator: sorts in binary (codepoint) order
-
-
Constructor Detail
-
OfflineSorter
public OfflineSorter(Directory dir, java.lang.String tempFileNamePrefix) throws java.io.IOException
Defaults constructor.- Throws:
java.io.IOException- See Also:
OfflineSorter.BufferSize.automatic()
-
OfflineSorter
public OfflineSorter(Directory dir, java.lang.String tempFileNamePrefix, java.util.Comparator<BytesRef> comparator) throws java.io.IOException
Defaults constructor with a custom comparator.- Throws:
java.io.IOException- See Also:
OfflineSorter.BufferSize.automatic()
-
OfflineSorter
public OfflineSorter(Directory dir, java.lang.String tempFileNamePrefix, java.util.Comparator<BytesRef> comparator, OfflineSorter.BufferSize ramBufferSize, int maxTempfiles, int valueLength, java.util.concurrent.ExecutorService exec, int maxPartitionsInRAM)
All-details constructor. IfvalueLengthis -1 (the default), the length of each value differs; otherwise, all values have the specified length. If you pass a non-nullExecutorServicethen it will be used to run sorting operations that can be run concurrently, and maxPartitionsInRAM is the maximum concurrent in-memory partitions. Thus the maximum possible RAM used by this class while sorting ismaxPartitionsInRAM * ramBufferSize.
-
-
Method Detail
-
getTempFileNamePrefix
public java.lang.String getTempFileNamePrefix()
Returns the temp file name prefix passed toDirectory.createTempOutput(java.lang.String, java.lang.String, org.apache.lucene.store.IOContext)to generate temporary files.
-
sort
public java.lang.String sort(java.lang.String inputFileName) throws java.io.IOExceptionSort input to a new temp file, returning its name.- Throws:
java.io.IOException
-
verifyChecksum
private void verifyChecksum(java.lang.Throwable priorException, OfflineSorter.ByteSequencesReader reader) throws java.io.IOExceptionCalled on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.- Throws:
java.io.IOException
-
mergePartitions
void mergePartitions(Directory trackingDir, java.util.List<java.util.concurrent.Future<OfflineSorter.Partition>> segments) throws java.io.IOException
Merge the most recentmaxTempFilepartitions into a new partition.- Throws:
java.io.IOException
-
readPartition
OfflineSorter.Partition readPartition(OfflineSorter.ByteSequencesReader reader) throws java.io.IOException, java.lang.InterruptedException
Read in a single partition of data, setting isExhausted[0] to true if there are no more items.- Throws:
java.io.IOExceptionjava.lang.InterruptedException
-
getWriter
protected OfflineSorter.ByteSequencesWriter getWriter(IndexOutput out, long itemCount) throws java.io.IOException
Subclasses can override to change how byte sequences are written to disk.- Throws:
java.io.IOException
-
getReader
protected OfflineSorter.ByteSequencesReader getReader(ChecksumIndexInput in, java.lang.String name) throws java.io.IOException
Subclasses can override to change how byte sequences are read from disk.- Throws:
java.io.IOException
-
getComparator
public java.util.Comparator<BytesRef> getComparator()
Returns the comparator in use to sort entries
-
getPartition
private OfflineSorter.Partition getPartition(java.util.concurrent.Future<OfflineSorter.Partition> future) throws java.io.IOException
- Throws:
java.io.IOException
-
-