Class FuzzyTermsEnum
- java.lang.Object
-
- org.apache.lucene.index.TermsEnum
-
- org.apache.lucene.search.FuzzyTermsEnum
-
- All Implemented Interfaces:
BytesRefIterator
public final class FuzzyTermsEnum extends TermsEnum
Subclass of TermsEnum for enumerating all terms that are similar to the specified filter term.Term enumerations are always ordered by
BytesRef.compareTo(org.apache.lucene.util.BytesRef). Each term in the enumeration is greater than all that precede it.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private static interfaceFuzzyTermsEnum.AutomatonAttributeUsed for sharing automata between segments Levenshtein automata are large and expensive to build; we don't want to build them directly on the query because this can blow up caches that use queries as keys; we also don't want to rebuild them for every segment.private static classFuzzyTermsEnum.AutomatonAttributeImplstatic classFuzzyTermsEnum.FuzzyTermsExceptionThrown to indicate that there was an issue creating a fuzzy query for a given term.-
Nested classes/interfaces inherited from class org.apache.lucene.index.TermsEnum
TermsEnum.SeekStatus
-
-
Field Summary
Fields Modifier and Type Field Description private TermsEnumactualEnumprivate AttributeSourceattsprivate CompiledAutomaton[]automataprivate BoostAttributeboostAttprivate floatbottomprivate BytesRefbottomTermprivate MaxNonCompetitiveBoostAttributemaxBoostAttprivate intmaxEditsprivate BytesRefqueuedBottomprivate Termtermprivate inttermLengthprivate Termsterms
-
Constructor Summary
Constructors Modifier Constructor Description FuzzyTermsEnum(Terms terms, Term term, int maxEdits, int prefixLength, boolean transpositions)Constructor for enumeration of all terms from specifiedreaderwhich share a prefix of lengthprefixLengthwithtermand which have at mostmaxEditsedits.(package private)FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, int maxEdits, int prefixLength, boolean transpositions)Constructor for enumeration of all terms from specifiedreaderwhich share a prefix of lengthprefixLengthwithtermand which have at mostmaxEditsedits.privateFuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, java.util.function.Supplier<FuzzyAutomatonBuilder> automatonBuilder)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description AttributeSourceattributes()Returns the related attributes.private voidbottomChanged(BytesRef lastTerm)fired when the max non-competitive boost has changed.intdocFreq()Returns the number of documents containing the current term.private TermsEnumgetAutomatonEnum(int editDistance, BytesRef lastTerm)return an automata-based enum for matching up to editDistance from lastTerm, if possiblefloatgetBoost()Gets the boost of the current termImpactsEnumimpacts(int flags)Return aImpactsEnum.private booleanmatches(BytesRef termIn, int k)returns true if term is within k edits of the query termBytesRefnext()Increments the iteration to the nextBytesRefin the iterator.longord()Returns ordinal position for current term.PostingsEnumpostings(PostingsEnum reuse, int flags)GetPostingsEnumfor the current term, with control over whether freqs, positions, offsets or payloads are required.TermsEnum.SeekStatusseekCeil(BytesRef text)Seeks to the specified term, if it exists, or to the next (ceiling) term.voidseekExact(long ord)Seeks to the specified term by ordinal (position) as previously returned byTermsEnum.ord().booleanseekExact(BytesRef text)Attempts to seek to the exact term, returning true if the term is found.voidseekExact(BytesRef term, TermState state)Expert: Seeks a specific position byTermStatepreviously obtained fromTermsEnum.termState().voidsetMaxNonCompetitiveBoost(float boost)Sets the maximum non-competitive boost, which may allow switching to a lower max-edit automaton at run timeBytesRefterm()Returns current term.TermStatetermState()Expert: Returns the TermsEnums internal state to position the TermsEnum without re-seeking the term dictionary.longtotalTermFreq()Returns the total number of occurrences of this term across all documents (the sum of the freq() for each doc that has this term).
-
-
-
Field Detail
-
actualEnum
private TermsEnum actualEnum
-
atts
private final AttributeSource atts
-
boostAtt
private final BoostAttribute boostAtt
-
maxBoostAtt
private final MaxNonCompetitiveBoostAttribute maxBoostAtt
-
automata
private final CompiledAutomaton[] automata
-
terms
private final Terms terms
-
termLength
private final int termLength
-
term
private final Term term
-
bottom
private float bottom
-
bottomTerm
private BytesRef bottomTerm
-
queuedBottom
private BytesRef queuedBottom
-
maxEdits
private int maxEdits
-
-
Constructor Detail
-
FuzzyTermsEnum
public FuzzyTermsEnum(Terms terms, Term term, int maxEdits, int prefixLength, boolean transpositions) throws java.io.IOException
Constructor for enumeration of all terms from specifiedreaderwhich share a prefix of lengthprefixLengthwithtermand which have at mostmaxEditsedits.After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.
- Parameters:
terms- Delivers terms.term- Pattern term.maxEdits- Maximum edit distance.prefixLength- the length of the required common prefixtranspositions- whether transpositions should count as a single edit- Throws:
java.io.IOException- if there is a low-level IO error
-
FuzzyTermsEnum
FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, int maxEdits, int prefixLength, boolean transpositions) throws java.io.IOException
Constructor for enumeration of all terms from specifiedreaderwhich share a prefix of lengthprefixLengthwithtermand which have at mostmaxEditsedits.After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.
- Parameters:
terms- Delivers terms.atts- An AttributeSource used to share automata between segmentsterm- Pattern term.maxEdits- Maximum edit distance.prefixLength- the length of the required common prefixtranspositions- whether transpositions should count as a single edit- Throws:
java.io.IOException- if there is a low-level IO error
-
FuzzyTermsEnum
private FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, java.util.function.Supplier<FuzzyAutomatonBuilder> automatonBuilder) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
setMaxNonCompetitiveBoost
public void setMaxNonCompetitiveBoost(float boost)
Sets the maximum non-competitive boost, which may allow switching to a lower max-edit automaton at run time
-
getBoost
public float getBoost()
Gets the boost of the current term
-
getAutomatonEnum
private TermsEnum getAutomatonEnum(int editDistance, BytesRef lastTerm) throws java.io.IOException
return an automata-based enum for matching up to editDistance from lastTerm, if possible- Throws:
java.io.IOException
-
bottomChanged
private void bottomChanged(BytesRef lastTerm) throws java.io.IOException
fired when the max non-competitive boost has changed. this is the hook to swap in a smarter actualEnum.- Throws:
java.io.IOException
-
next
public BytesRef next() throws java.io.IOException
Description copied from interface:BytesRefIteratorIncrements the iteration to the nextBytesRefin the iterator. Returns the resultingBytesRefornullif the end of the iterator is reached. The returned BytesRef may be re-used across calls to next. After this method returns null, do not call it again: the results are undefined.- Returns:
- the next
BytesRefin the iterator ornullif the end of the iterator is reached. - Throws:
java.io.IOException- If there is a low-level I/O error.
-
matches
private boolean matches(BytesRef termIn, int k)
returns true if term is within k edits of the query term
-
docFreq
public int docFreq() throws java.io.IOExceptionDescription copied from class:TermsEnumReturns the number of documents containing the current term. Do not call this when the enum is unpositioned.TermsEnum.SeekStatus.END.
-
totalTermFreq
public long totalTermFreq() throws java.io.IOExceptionDescription copied from class:TermsEnumReturns the total number of occurrences of this term across all documents (the sum of the freq() for each doc that has this term). Note that, like other term measures, this measure does not take deleted documents into account.- Specified by:
totalTermFreqin classTermsEnum- Throws:
java.io.IOException
-
postings
public PostingsEnum postings(PostingsEnum reuse, int flags) throws java.io.IOException
Description copied from class:TermsEnumGetPostingsEnumfor the current term, with control over whether freqs, positions, offsets or payloads are required. Do not call this when the enum is unpositioned. This method will not return null.NOTE: the returned iterator may return deleted documents, so deleted documents have to be checked on top of the
PostingsEnum.- Specified by:
postingsin classTermsEnum- Parameters:
reuse- pass a prior PostingsEnum for possible reuseflags- specifies which optional per-document values you require; seePostingsEnum.FREQS- Throws:
java.io.IOException
-
impacts
public ImpactsEnum impacts(int flags) throws java.io.IOException
Description copied from class:TermsEnumReturn aImpactsEnum.- Specified by:
impactsin classTermsEnum- Throws:
java.io.IOException- See Also:
TermsEnum.postings(PostingsEnum, int)
-
seekExact
public void seekExact(BytesRef term, TermState state) throws java.io.IOException
Description copied from class:TermsEnumExpert: Seeks a specific position byTermStatepreviously obtained fromTermsEnum.termState(). Callers should maintain theTermStateto use this method. Low-level implementations may position the TermsEnum without re-seeking the term dictionary.Seeking by
TermStateshould only be used iff the state was obtained from the sameTermsEnuminstance.NOTE: Using this method with an incompatible
TermStatemight leave thisTermsEnumin undefined state. On a segment levelTermStateinstances are compatible only iff the source and the targetTermsEnumoperate on the same field. If operating on segment level, TermState instances must not be used across segments.NOTE: A seek by
TermStatemight not restore theAttributeSource's state.AttributeSourcestates must be maintained separately if this method is used.
-
termState
public TermState termState() throws java.io.IOException
Description copied from class:TermsEnumExpert: Returns the TermsEnums internal state to position the TermsEnum without re-seeking the term dictionary.NOTE: A seek by
TermStatemight not capture theAttributeSource's state. Callers must maintain theAttributeSourcestates separately- Specified by:
termStatein classTermsEnum- Throws:
java.io.IOException- See Also:
TermState,TermsEnum.seekExact(BytesRef, TermState)
-
ord
public long ord() throws java.io.IOExceptionDescription copied from class:TermsEnumReturns ordinal position for current term. This is an optional method (the codec may throwUnsupportedOperationException). Do not call this when the enum is unpositioned.
-
attributes
public AttributeSource attributes()
Description copied from class:TermsEnumReturns the related attributes.- Specified by:
attributesin classTermsEnum
-
seekExact
public boolean seekExact(BytesRef text) throws java.io.IOException
Description copied from class:TermsEnumAttempts to seek to the exact term, returning true if the term is found. If this returns false, the enum is unpositioned. For some codecs, seekExact may be substantially faster thanTermsEnum.seekCeil(org.apache.lucene.util.BytesRef).
-
seekCeil
public TermsEnum.SeekStatus seekCeil(BytesRef text) throws java.io.IOException
Description copied from class:TermsEnumSeeks to the specified term, if it exists, or to the next (ceiling) term. Returns SeekStatus to indicate whether exact term was found, a different term was found, or EOF was hit. The target term may be before or after the current term. If this returns SeekStatus.END, the enum is unpositioned.
-
seekExact
public void seekExact(long ord) throws java.io.IOExceptionDescription copied from class:TermsEnumSeeks to the specified term by ordinal (position) as previously returned byTermsEnum.ord(). The target ord may be before or after the current ord, and must be within bounds.
-
-