Class IntersectBlockReader
- java.lang.Object
-
- org.apache.lucene.index.TermsEnum
-
- org.apache.lucene.index.BaseTermsEnum
-
- org.apache.lucene.codecs.uniformsplit.BlockReader
-
- org.apache.lucene.codecs.uniformsplit.IntersectBlockReader
-
- All Implemented Interfaces:
Accountable,BytesRefIterator
- Direct Known Subclasses:
STIntersectBlockReader
public class IntersectBlockReader extends BlockReader
The "intersect"TermsEnumresponse toUniformSplitTerms.intersect(CompiledAutomaton, BytesRef), intersecting the terms with an automaton.By design of the UniformSplit block keys, it is less efficient than
org.apache.lucene.codecs.blocktree.IntersectTermsEnumforFuzzyQuery(-37%). It is slightly slower forWildcardQuery(-5%) and slightly faster forPrefixQuery(+5%).
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description protected classIntersectBlockReader.AutomatonNextTermCalculatorThis is mostly a copy of AutomatonTermsEnum.protected static classIntersectBlockReader.BlockIterationBlock iteration order.-
Nested classes/interfaces inherited from class org.apache.lucene.index.TermsEnum
TermsEnum.SeekStatus
-
-
Field Summary
Fields Modifier and Type Field Description protected Automatonautomatonprotected IntersectBlockReader.BlockIterationblockIterationBlock iteration order determined when scanning the terms in the current block.protected BytesRefcommonSuffixprotected booleanfiniteprotected intminTermLengthprotected IntersectBlockReader.AutomatonNextTermCalculatornextStringCalculatorprotected intNUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLDThreshold that controls when to attempt to jump to a block away.protected intnumConsecutivelyRejectedTermsCounter of the number of consecutively rejected terms.protected intnumMatchedBytesNumber of bytes accepted by the automaton when validating the current term.protected ByteRunAutomatonrunAutomatonprotected BytesRefseekTermSet this when our current mode is seeking to this term.protected int[]statesAutomaton states reached when validating the current term, from 0 tonumMatchedBytes- 1.-
Fields inherited from class org.apache.lucene.codecs.uniformsplit.BlockReader
blockDecoder, blockFirstLineStart, blockHeader, blockHeaderReader, blockInput, blockLine, blockLineReader, blockReadBuffer, blockStartFP, dictionaryBrowser, dictionaryBrowserSupplier, fieldMetadata, forcedTerm, lineIndexInBlock, postingsReader, scratchBlockBytes, scratchBlockLine, scratchTermState, termState, termStateForced, termStateSerializer, termStatesReadBuffer
-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedIntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase postingsReader, FieldMetadata fieldMetadata, BlockDecoder blockDecoder)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected booleanendsWithCommonSuffix(byte[] termBytes, int termLength)Indicates whether the given term ends with the automaton common suffix.protected intgetMinTermLength()Computes the minimal length of the terms accepted by the automaton.BytesRefnext()Increments the iteration to the nextBytesRefin the iterator.protected booleannextBlock()Opens the next block.protected BytesRefnextTermInBlockMatching()Finds the next block line that matches (accepted by the automaton), or null when at end of block.TermsEnum.SeekStatusseekCeil(BytesRef text)Seeks to the specified term, if it exists, or to the next (ceiling) term.voidseekExact(long ord)Not supported.booleanseekExact(BytesRef text)Attempts to seek to the exact term, returning true if the term is found.voidseekExact(BytesRef term, TermState state)Positions thisBlockReaderwithout re-seeking the term dictionary.protected booleanseekFirstBlock()-
Methods inherited from class org.apache.lucene.codecs.uniformsplit.BlockReader
clearTermState, compareToMiddleAndJump, createBlockHeaderSerializer, createBlockLineSerializer, createDeltaBaseTermStateSerializer, decodeBlockBytesIfNeeded, docFreq, getOrCreateDictionaryBrowser, impacts, initializeBlockReadLazily, initializeHeader, isBeyondLastTerm, isCurrentTerm, newCorruptIndexException, nextTerm, ord, postings, ramBytesUsed, readHeader, readLineInBlock, readTermState, readTermStateIfNotRead, seekInBlock, seekInBlock, term, termState, totalTermFreq
-
Methods inherited from class org.apache.lucene.index.BaseTermsEnum
attributes
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD
protected final int NUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD
Threshold that controls when to attempt to jump to a block away.This counter is 0 when entering a block. It is incremented each time a term is rejected by the automaton. When the counter is greater than or equal to this threshold, then we compute the next term accepted by the automaton, with
IntersectBlockReader.AutomatonNextTermCalculator, and we jump to a block away if the next term accepted is greater than the immediate next term in the block.A low value, for example 1, improves the performance of automatons requiring many jumps, for example
FuzzyQueryand mostWildcardQuery. A higher value improves the performance of automatons with less or no jump, for examplePrefixQuery. A threshold of 4 seems to be a good balance.- See Also:
- Constant Field Values
-
automaton
protected final Automaton automaton
-
runAutomaton
protected final ByteRunAutomaton runAutomaton
-
finite
protected final boolean finite
-
commonSuffix
protected final BytesRef commonSuffix
-
minTermLength
protected final int minTermLength
-
nextStringCalculator
protected final IntersectBlockReader.AutomatonNextTermCalculator nextStringCalculator
-
seekTerm
protected BytesRef seekTerm
Set this when our current mode is seeking to this term. Set to null after.
-
numMatchedBytes
protected int numMatchedBytes
Number of bytes accepted by the automaton when validating the current term.
-
states
protected int[] states
Automaton states reached when validating the current term, from 0 tonumMatchedBytes- 1.
-
blockIteration
protected IntersectBlockReader.BlockIteration blockIteration
Block iteration order determined when scanning the terms in the current block.
-
numConsecutivelyRejectedTerms
protected int numConsecutivelyRejectedTerms
Counter of the number of consecutively rejected terms. Depending onNUM_CONSECUTIVELY_REJECTED_TERMS_THRESHOLD, this may trigger a jump to a block away.
-
-
Constructor Detail
-
IntersectBlockReader
protected IntersectBlockReader(CompiledAutomaton compiled, BytesRef startTerm, IndexDictionary.BrowserSupplier dictionaryBrowserSupplier, IndexInput blockInput, PostingsReaderBase postingsReader, FieldMetadata fieldMetadata, BlockDecoder blockDecoder) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
getMinTermLength
protected int getMinTermLength()
Computes the minimal length of the terms accepted by the automaton. This speeds up the term scanning for automatons accepting a finite language.
-
next
public BytesRef next() throws java.io.IOException
Description copied from interface:BytesRefIteratorIncrements the iteration to the nextBytesRefin the iterator. Returns the resultingBytesRefornullif the end of the iterator is reached. The returned BytesRef may be re-used across calls to next. After this method returns null, do not call it again: the results are undefined.- Specified by:
nextin interfaceBytesRefIterator- Overrides:
nextin classBlockReader- Returns:
- the next
BytesRefin the iterator ornullif the end of the iterator is reached. - Throws:
java.io.IOException- If there is a low-level I/O error.
-
seekFirstBlock
protected boolean seekFirstBlock() throws java.io.IOException- Throws:
java.io.IOException
-
nextTermInBlockMatching
protected BytesRef nextTermInBlockMatching() throws java.io.IOException
Finds the next block line that matches (accepted by the automaton), or null when at end of block.- Returns:
- The next term in the current block that is accepted by the automaton; or null if none.
- Throws:
java.io.IOException
-
endsWithCommonSuffix
protected boolean endsWithCommonSuffix(byte[] termBytes, int termLength)Indicates whether the given term ends with the automaton common suffix. This allows to quickly skip terms that the automaton would reject eventually.
-
nextBlock
protected boolean nextBlock() throws java.io.IOExceptionOpens the next block. Depending on theblockIterationorder, it may be the very next block, or a block away that may containseekTerm.- Returns:
- true if the next block is opened; false if there is no blocks anymore and the iteration is over.
- Throws:
java.io.IOException
-
seekExact
public boolean seekExact(BytesRef text)
Description copied from class:TermsEnumAttempts to seek to the exact term, returning true if the term is found. If this returns false, the enum is unpositioned. For some codecs, seekExact may be substantially faster thanTermsEnum.seekCeil(org.apache.lucene.util.BytesRef).- Overrides:
seekExactin classBlockReader- Returns:
- true if the term is found; return false if the enum is unpositioned.
-
seekExact
public void seekExact(long ord)
Description copied from class:BlockReaderNot supported.- Overrides:
seekExactin classBlockReader
-
seekExact
public void seekExact(BytesRef term, TermState state)
Description copied from class:BlockReaderPositions thisBlockReaderwithout re-seeking the term dictionary.The block containing the term is not read by this method. It will be read lazily only if needed, for example if
BlockReader.next()is called. CallingBlockReader.postings(org.apache.lucene.index.PostingsEnum, int)after this method does require the block to be read.- Overrides:
seekExactin classBlockReader- Parameters:
term- the term the TermState corresponds tostate- theTermState
-
seekCeil
public TermsEnum.SeekStatus seekCeil(BytesRef text)
Description copied from class:TermsEnumSeeks to the specified term, if it exists, or to the next (ceiling) term. Returns SeekStatus to indicate whether exact term was found, a different term was found, or EOF was hit. The target term may be before or after the current term. If this returns SeekStatus.END, the enum is unpositioned.- Overrides:
seekCeilin classBlockReader
-
-