Package org.apache.lucene.util.bkd
Class BKDWriter
- java.lang.Object
-
- org.apache.lucene.util.bkd.BKDWriter
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable
public class BKDWriter extends java.lang.Object implements java.io.CloseableRecursively builds a block KD-tree to assign all incoming points in N-dim space to smaller and smaller N-dim rectangles (cells) until the number of points in a given rectangle is <=config.maxPointsInLeafNode. The tree is partially balanced, which means the leaf nodes will have the requestedconfig.maxPointsInLeafNodeexcept one that might have less. Leaf nodes may straddle the two bottom levels of the binary tree. Values that fall exactly on a cell boundary may be in either cell.The number of dimensions can be 1 to 8, but every byte[] value is fixed length.
This consumes heap during writing: it allocates a
Long[numLeaves], abyte[numLeaves*(1+config.bytesPerDim)]and then uses up to the specifiedmaxMBSortInHeapheap space for writing.NOTE: This can write at most Integer.MAX_VALUE *
config.maxPointsInLeafNode/ config.bytesPerDim total points.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static classBKDWriter.BKDMergeQueueprivate static interfaceBKDWriter.BKDTreeLeafNodesflat representation of a kd-treeprivate static classBKDWriter.MergeReaderprivate classBKDWriter.OneDimensionBKDWriter
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringCODEC_NAME(package private) int[]commonPrefixLengthsprotected BKDConfigconfigBKD tree configurationstatic floatDEFAULT_MAX_MB_SORT_IN_HEAPDefault maximum heap to use, before spilling to (slower) diskprotected FixedBitSetdocsSeenprivate booleanfinishedprivate intmaxDoc(package private) doublemaxMBSortInHeapprotected byte[]maxPackedValueMaximum per-dim values, packedprivate intmaxPointsSortInHeapprotected byte[]minPackedValueMinimum per-dim values, packedprotected longpointCountprivate PointWriterpointWriter(package private) byte[]scratch1(package private) byte[]scratch2(package private) BytesRefscratchBytesRef1(package private) BytesRefscratchBytesRef2(package private) byte[]scratchDiffprivate ByteBuffersDataOutputscratchOutprivate static intSPLITS_BEFORE_EXACT_BOUNDSNumber of splits before we compute the exact bounding box of an inner node.(package private) TrackingDirectoryWrappertempDir(package private) java.lang.StringtempFileNamePrefixprivate IndexOutputtempInputprivate longtotalPointCountAn upper bound on how many points the caller will add (includes deletions)static intVERSION_CURRENTstatic intVERSION_LEAF_STORES_BOUNDSstatic intVERSION_LOW_CARDINALITY_LEAVESstatic intVERSION_META_FILEstatic intVERSION_SELECTIVE_INDEXINGstatic intVERSION_START
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(byte[] packedValue, int docID)private intappendBlock(ByteBuffersDataOutput writeBuffer, java.util.List<byte[]> blocks)Appends the current contents of writeBuffer as another block on the growing in-memory fileprivate voidbuild(int leavesOffset, int numLeaves, MutablePointValues reader, int from, int to, IndexOutput out, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds)private voidbuild(int leavesOffset, int numLeaves, BKDRadixSelector.PathSlice points, IndexOutput out, BKDRadixSelector radixSelector, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds)The point writer contains the data that is going to be splitted using radix selection.private voidcheckMaxLeafNodeCount(int numLeaves)voidclose()private voidcomputeCommonPrefixLength(HeapPointWriter heapPointWriter, byte[] commonPrefix, int from, int to)private static BytesRef[]computeMinMax(int count, java.util.function.IntFunction<BytesRef> packedValues, int offset, int length)Return an array that contains the min and max values for the [offset, offset+length] interval of the givenBytesRefs.private voidcomputePackedValueBounds(MutablePointValues values, int from, int to, byte[] minPackedValue, byte[] maxPackedValue, BytesRef scratch)private voidcomputePackedValueBounds(BKDRadixSelector.PathSlice slice, byte[] minPackedValue, byte[] maxPackedValue)java.lang.Runnablefinish(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut)Writes the BKD tree to the providedIndexOutputs and returns aRunnablethat writes the index of the tree if at least one point has been added, ornullotherwise.private intgetNumLeftLeafNodes(int numLeaves)private voidinitPointWriter()java.lang.Runnablemerge(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, java.util.List<MergeState.DocMap> docMaps, java.util.List<BKDReader> readers)More efficient bulk-add for incomingBKDReaders.private byte[]packIndex(BKDWriter.BKDTreeLeafNodes leafNodes)Packs the two arrays, representing a semi-balanced binary tree, into a compact byte[] structure.private intrecursePackIndex(ByteBuffersDataOutput writeBuffer, BKDWriter.BKDTreeLeafNodes leafNodes, long minBlockFP, java.util.List<byte[]> blocks, byte[] lastSplitValues, boolean[] negativeDeltas, boolean isLeft, int leavesOffset, int numLeaves)lastSplitValues is per-dimension split value previously seen; we use this to prefix-code the split byte[] on each inner nodeprivate static intrunLen(java.util.function.IntFunction<BytesRef> packedValues, int start, int end, int byteOffset)protected intsplit(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)Pick the next dimension to split.private HeapPointWriterswitchToHeap(PointWriter source)Pull a partition back into heap once the point count is low enough while recursing.private static booleanvalueInBounds(BKDConfig config, BytesRef packedValue, byte[] minPackedValue, byte[] maxPackedValue)private static booleanvalueInOrder(BKDConfig config, long ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset, int doc, int lastDoc)private static booleanvaluesInOrderAndBounds(BKDConfig config, int count, int sortedDim, byte[] minPackedValue, byte[] maxPackedValue, java.util.function.IntFunction<BytesRef> values, int[] docs, int docsOffset)private java.lang.ErrorverifyChecksum(java.lang.Throwable priorException, PointWriter writer)Called on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.private static voidverifyParams(double maxMBSortInHeap, long totalPointCount)private voidwriteActualBounds(DataOutput out, int[] commonPrefixLengths, int count, java.util.function.IntFunction<BytesRef> packedValues)private voidwriteCommonPrefixes(DataOutput out, int[] commonPrefixes, byte[] packedValue)java.lang.RunnablewriteField(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, java.lang.String fieldName, MutablePointValues reader)Write a field from aMutablePointValues.private java.lang.RunnablewriteField1Dim(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, java.lang.String fieldName, MutablePointValues reader)private java.lang.RunnablewriteFieldNDims(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, java.lang.String fieldName, MutablePointValues values)private voidwriteHighCardinalityLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, java.util.function.IntFunction<BytesRef> packedValues, int compressedByteOffset)private voidwriteIndex(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, int numLeaves, byte[] packedIndex, long dataStartFP)private voidwriteIndex(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, BKDWriter.BKDTreeLeafNodes leafNodes, long dataStartFP)private voidwriteLeafBlockDocs(DataOutput out, int[] docIDs, int start, int count)private voidwriteLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, java.util.function.IntFunction<BytesRef> packedValues, int leafCardinality)private voidwriteLeafBlockPackedValuesRange(DataOutput out, int[] commonPrefixLengths, int start, int end, java.util.function.IntFunction<BytesRef> packedValues)private voidwriteLowCardinalityLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, java.util.function.IntFunction<BytesRef> packedValues)
-
-
-
Field Detail
-
CODEC_NAME
public static final java.lang.String CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
public static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_LEAF_STORES_BOUNDS
public static final int VERSION_LEAF_STORES_BOUNDS
- See Also:
- Constant Field Values
-
VERSION_SELECTIVE_INDEXING
public static final int VERSION_SELECTIVE_INDEXING
- See Also:
- Constant Field Values
-
VERSION_LOW_CARDINALITY_LEAVES
public static final int VERSION_LOW_CARDINALITY_LEAVES
- See Also:
- Constant Field Values
-
VERSION_META_FILE
public static final int VERSION_META_FILE
- See Also:
- Constant Field Values
-
VERSION_CURRENT
public static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
SPLITS_BEFORE_EXACT_BOUNDS
private static final int SPLITS_BEFORE_EXACT_BOUNDS
Number of splits before we compute the exact bounding box of an inner node.- See Also:
- Constant Field Values
-
DEFAULT_MAX_MB_SORT_IN_HEAP
public static final float DEFAULT_MAX_MB_SORT_IN_HEAP
Default maximum heap to use, before spilling to (slower) disk- See Also:
- Constant Field Values
-
config
protected final BKDConfig config
BKD tree configuration
-
tempDir
final TrackingDirectoryWrapper tempDir
-
tempFileNamePrefix
final java.lang.String tempFileNamePrefix
-
maxMBSortInHeap
final double maxMBSortInHeap
-
scratchDiff
final byte[] scratchDiff
-
scratch1
final byte[] scratch1
-
scratch2
final byte[] scratch2
-
scratchBytesRef1
final BytesRef scratchBytesRef1
-
scratchBytesRef2
final BytesRef scratchBytesRef2
-
commonPrefixLengths
final int[] commonPrefixLengths
-
docsSeen
protected final FixedBitSet docsSeen
-
pointWriter
private PointWriter pointWriter
-
finished
private boolean finished
-
tempInput
private IndexOutput tempInput
-
maxPointsSortInHeap
private final int maxPointsSortInHeap
-
minPackedValue
protected final byte[] minPackedValue
Minimum per-dim values, packed
-
maxPackedValue
protected final byte[] maxPackedValue
Maximum per-dim values, packed
-
pointCount
protected long pointCount
-
totalPointCount
private final long totalPointCount
An upper bound on how many points the caller will add (includes deletions)
-
maxDoc
private final int maxDoc
-
scratchOut
private final ByteBuffersDataOutput scratchOut
-
-
Method Detail
-
verifyParams
private static void verifyParams(double maxMBSortInHeap, long totalPointCount)
-
initPointWriter
private void initPointWriter() throws java.io.IOException- Throws:
java.io.IOException
-
add
public void add(byte[] packedValue, int docID) throws java.io.IOException- Throws:
java.io.IOException
-
writeField
public java.lang.Runnable writeField(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, java.lang.String fieldName, MutablePointValues reader) throws java.io.IOException
Write a field from aMutablePointValues. This way of writing points is faster than regular writes withadd(byte[], int)since there is opportunity for reordering points before writing them to disk. This method does not use transient disk in order to reorder points.- Throws:
java.io.IOException
-
computePackedValueBounds
private void computePackedValueBounds(MutablePointValues values, int from, int to, byte[] minPackedValue, byte[] maxPackedValue, BytesRef scratch)
-
writeFieldNDims
private java.lang.Runnable writeFieldNDims(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, java.lang.String fieldName, MutablePointValues values) throws java.io.IOException
- Throws:
java.io.IOException
-
writeField1Dim
private java.lang.Runnable writeField1Dim(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, java.lang.String fieldName, MutablePointValues reader) throws java.io.IOException
- Throws:
java.io.IOException
-
merge
public java.lang.Runnable merge(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut, java.util.List<MergeState.DocMap> docMaps, java.util.List<BKDReader> readers) throws java.io.IOException
More efficient bulk-add for incomingBKDReaders. This does a merge sort of the already sorted values and currently only works when numDims==1. This returns -1 if all documents containing dimensional values were deleted.- Throws:
java.io.IOException
-
getNumLeftLeafNodes
private int getNumLeftLeafNodes(int numLeaves)
-
checkMaxLeafNodeCount
private void checkMaxLeafNodeCount(int numLeaves)
-
finish
public java.lang.Runnable finish(IndexOutput metaOut, IndexOutput indexOut, IndexOutput dataOut) throws java.io.IOException
Writes the BKD tree to the providedIndexOutputs and returns aRunnablethat writes the index of the tree if at least one point has been added, ornullotherwise.- Throws:
java.io.IOException
-
packIndex
private byte[] packIndex(BKDWriter.BKDTreeLeafNodes leafNodes) throws java.io.IOException
Packs the two arrays, representing a semi-balanced binary tree, into a compact byte[] structure.- Throws:
java.io.IOException
-
appendBlock
private int appendBlock(ByteBuffersDataOutput writeBuffer, java.util.List<byte[]> blocks)
Appends the current contents of writeBuffer as another block on the growing in-memory file
-
recursePackIndex
private int recursePackIndex(ByteBuffersDataOutput writeBuffer, BKDWriter.BKDTreeLeafNodes leafNodes, long minBlockFP, java.util.List<byte[]> blocks, byte[] lastSplitValues, boolean[] negativeDeltas, boolean isLeft, int leavesOffset, int numLeaves) throws java.io.IOException
lastSplitValues is per-dimension split value previously seen; we use this to prefix-code the split byte[] on each inner node- Throws:
java.io.IOException
-
writeIndex
private void writeIndex(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, BKDWriter.BKDTreeLeafNodes leafNodes, long dataStartFP) throws java.io.IOException
- Throws:
java.io.IOException
-
writeIndex
private void writeIndex(IndexOutput metaOut, IndexOutput indexOut, int countPerLeaf, int numLeaves, byte[] packedIndex, long dataStartFP) throws java.io.IOException
- Throws:
java.io.IOException
-
writeLeafBlockDocs
private void writeLeafBlockDocs(DataOutput out, int[] docIDs, int start, int count) throws java.io.IOException
- Throws:
java.io.IOException
-
writeLeafBlockPackedValues
private void writeLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, java.util.function.IntFunction<BytesRef> packedValues, int leafCardinality) throws java.io.IOException
- Throws:
java.io.IOException
-
writeLowCardinalityLeafBlockPackedValues
private void writeLowCardinalityLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, java.util.function.IntFunction<BytesRef> packedValues) throws java.io.IOException
- Throws:
java.io.IOException
-
writeHighCardinalityLeafBlockPackedValues
private void writeHighCardinalityLeafBlockPackedValues(DataOutput out, int[] commonPrefixLengths, int count, int sortedDim, java.util.function.IntFunction<BytesRef> packedValues, int compressedByteOffset) throws java.io.IOException
- Throws:
java.io.IOException
-
writeActualBounds
private void writeActualBounds(DataOutput out, int[] commonPrefixLengths, int count, java.util.function.IntFunction<BytesRef> packedValues) throws java.io.IOException
- Throws:
java.io.IOException
-
computeMinMax
private static BytesRef[] computeMinMax(int count, java.util.function.IntFunction<BytesRef> packedValues, int offset, int length)
Return an array that contains the min and max values for the [offset, offset+length] interval of the givenBytesRefs.
-
writeLeafBlockPackedValuesRange
private void writeLeafBlockPackedValuesRange(DataOutput out, int[] commonPrefixLengths, int start, int end, java.util.function.IntFunction<BytesRef> packedValues) throws java.io.IOException
- Throws:
java.io.IOException
-
runLen
private static int runLen(java.util.function.IntFunction<BytesRef> packedValues, int start, int end, int byteOffset)
-
writeCommonPrefixes
private void writeCommonPrefixes(DataOutput out, int[] commonPrefixes, byte[] packedValue) throws java.io.IOException
- Throws:
java.io.IOException
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException
-
verifyChecksum
private java.lang.Error verifyChecksum(java.lang.Throwable priorException, PointWriter writer) throws java.io.IOExceptionCalled on exception, to check whether the checksum is also corrupt in this source, and add that information (checksum matched or didn't) as a suppressed exception.- Throws:
java.io.IOException
-
split
protected int split(byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits)Pick the next dimension to split.- Parameters:
minPackedValue- the min values for all dimensionsmaxPackedValue- the max values for all dimensionsparentSplits- how many times each dim has been split on the parent levels- Returns:
- the dimension to split
-
switchToHeap
private HeapPointWriter switchToHeap(PointWriter source) throws java.io.IOException
Pull a partition back into heap once the point count is low enough while recursing.- Throws:
java.io.IOException
-
build
private void build(int leavesOffset, int numLeaves, MutablePointValues reader, int from, int to, IndexOutput out, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds) throws java.io.IOException- Throws:
java.io.IOException
-
computePackedValueBounds
private void computePackedValueBounds(BKDRadixSelector.PathSlice slice, byte[] minPackedValue, byte[] maxPackedValue) throws java.io.IOException
- Throws:
java.io.IOException
-
build
private void build(int leavesOffset, int numLeaves, BKDRadixSelector.PathSlice points, IndexOutput out, BKDRadixSelector radixSelector, byte[] minPackedValue, byte[] maxPackedValue, int[] parentSplits, byte[] splitPackedValues, byte[] splitDimensionValues, long[] leafBlockFPs, int[] spareDocIds) throws java.io.IOExceptionThe point writer contains the data that is going to be splitted using radix selection. /* This method is used when we are merging previously written segments, in the numDims > 1 case.- Throws:
java.io.IOException
-
computeCommonPrefixLength
private void computeCommonPrefixLength(HeapPointWriter heapPointWriter, byte[] commonPrefix, int from, int to)
-
valuesInOrderAndBounds
private static boolean valuesInOrderAndBounds(BKDConfig config, int count, int sortedDim, byte[] minPackedValue, byte[] maxPackedValue, java.util.function.IntFunction<BytesRef> values, int[] docs, int docsOffset)
-
valueInOrder
private static boolean valueInOrder(BKDConfig config, long ord, int sortedDim, byte[] lastPackedValue, byte[] packedValue, int packedValueOffset, int doc, int lastDoc)
-
-