Class NRTSuggester
- java.lang.Object
-
- org.apache.lucene.search.suggest.document.NRTSuggester
-
- All Implemented Interfaces:
Accountable
public final class NRTSuggester extends java.lang.Object implements Accountable
NRTSuggester executes Top N search on a weighted FST specified by a
CompletionScorerSee
lookup(CompletionScorer, Bits, TopSuggestDocsCollector)for more implementation details.FST Format:
- Input: analyzed forms of input terms
- Output: Pair<Long, BytesRef> containing weight, surface form and docID
NOTE:
- having too many deletions or using a very restrictive filter can make the search inadmissible due to
over-pruning of potential paths. See
CompletionScorer.accept(int, Bits) - when matched documents are arbitrarily filtered (
CompletionScorer.filteredset totrue, it is assumed that the filter will roughly filter out half the number of documents that match the provided automaton - lookup performance will degrade as more accepted completions lead to filtered out documents
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static classNRTSuggester.PayLoadProcessorHelper to encode/decode payload (surface + PAYLOAD_SEP + docID) outputprivate static classNRTSuggester.ScoringPathComparatorCompares partial completion paths usingCompletionScorer.score(float, float), breaks ties comparing path inputs
-
Field Summary
Fields Modifier and Type Field Description private FST<PairOutputs.Pair<java.lang.Long,BytesRef>>fstFST: input is the analyzed form, with a null byte between terms and a NRTSuggesterBuilder.END_BYTEto denote the end of the input weight is a long surface is the original, unanalyzed form followed by the docIDprivate static longMAX_TOP_N_QUEUE_SIZEMaximum queue depth for TopNSearcher NOTE: value should be <= Integer.MAX_VALUEprivate intmaxAnalyzedPathsPerOutputHighest number of analyzed paths we saw for any single input surface form.private intpayloadSepSeparator used between surface form and its docID in the FST output-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Modifier Constructor Description privateNRTSuggester(FST<PairOutputs.Pair<java.lang.Long,BytesRef>> fst, int maxAnalyzedPathsPerOutput, int payloadSep)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static doublecalculateLiveDocRatio(int numDocs, int maxDocs)(package private) static longdecode(long output)(package private) static longencode(long input)java.util.Collection<Accountable>getChildResources()Returns nested resources of this class.private static java.util.Comparator<PairOutputs.Pair<java.lang.Long,BytesRef>>getComparator()private intgetMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled)Simple heuristics to try to avoid over-pruning potential suggestions by the TopNSearcher.static NRTSuggesterload(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode)voidlookup(CompletionScorer scorer, Bits acceptDocs, TopSuggestDocsCollector collector)Collects at mostTopSuggestDocsCollector.getCountToCollect()completions that match the providedCompletionScorer.longramBytesUsed()Return the memory usage of this object in bytes.private static booleanshouldLoadFSTOffHeap(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode)
-
-
-
Field Detail
-
fst
private final FST<PairOutputs.Pair<java.lang.Long,BytesRef>> fst
FST: input is the analyzed form, with a null byte between terms and a NRTSuggesterBuilder.END_BYTEto denote the end of the input weight is a long surface is the original, unanalyzed form followed by the docID
-
maxAnalyzedPathsPerOutput
private final int maxAnalyzedPathsPerOutput
Highest number of analyzed paths we saw for any single input surface form. This can be > 1, when index analyzer creates graphs or if multiple surface form(s) yields the same analyzed form
-
payloadSep
private final int payloadSep
Separator used between surface form and its docID in the FST output
-
MAX_TOP_N_QUEUE_SIZE
private static final long MAX_TOP_N_QUEUE_SIZE
Maximum queue depth for TopNSearcher NOTE: value should be <= Integer.MAX_VALUE- See Also:
- Constant Field Values
-
-
Constructor Detail
-
NRTSuggester
private NRTSuggester(FST<PairOutputs.Pair<java.lang.Long,BytesRef>> fst, int maxAnalyzedPathsPerOutput, int payloadSep)
-
-
Method Detail
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:AccountableReturn the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsedin interfaceAccountable
-
getChildResources
public java.util.Collection<Accountable> getChildResources()
Description copied from interface:AccountableReturns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResourcesin interfaceAccountable- See Also:
Accountables
-
lookup
public void lookup(CompletionScorer scorer, Bits acceptDocs, TopSuggestDocsCollector collector) throws java.io.IOException
Collects at mostTopSuggestDocsCollector.getCountToCollect()completions that match the providedCompletionScorer.The
CompletionScorer.automatonis intersected with thefst.CompletionScorer.weightis used to compute boosts and/or extract context for each matched partial paths. A top N search is executed onfstseeded with the matched partial paths. Upon reaching a completed path,CompletionScorer.accept(int, Bits)andCompletionScorer.score(float, float)is used on the document id, index weight and query boost to filter and score the entry, before being collected viaTopSuggestDocsCollector.collect(int, CharSequence, CharSequence, float)- Throws:
java.io.IOException
-
getComparator
private static java.util.Comparator<PairOutputs.Pair<java.lang.Long,BytesRef>> getComparator()
-
getMaxTopNSearcherQueueSize
private int getMaxTopNSearcherQueueSize(int topN, int numDocs, double liveDocsRatio, boolean filterEnabled)Simple heuristics to try to avoid over-pruning potential suggestions by the TopNSearcher. Since suggestion entries can be rejected if they belong to a deleted document, the length of the TopNSearcher queue has to be increased by some factor, to account for the filtered out suggestions. This heuristic will try to make the searcher admissible, but the search can still lead to over-pruningIf a
filteris applied, the queue size is increased by half the number of live documents.The maximum queue size is
MAX_TOP_N_QUEUE_SIZE
-
calculateLiveDocRatio
private static double calculateLiveDocRatio(int numDocs, int maxDocs)
-
shouldLoadFSTOffHeap
private static boolean shouldLoadFSTOffHeap(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode)
-
load
public static NRTSuggester load(IndexInput input, CompletionPostingsFormat.FSTLoadMode fstLoadMode) throws java.io.IOException
- Throws:
java.io.IOException
-
encode
static long encode(long input)
-
decode
static long decode(long output)
-
-