Class AnalyzingInfixSuggester
- java.lang.Object
-
- org.apache.lucene.search.suggest.Lookup
-
- org.apache.lucene.search.suggest.analyzing.AnalyzingInfixSuggester
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable,Accountable
- Direct Known Subclasses:
BlendedInfixSuggester
public class AnalyzingInfixSuggester extends Lookup implements java.io.Closeable
Analyzes the input text and then suggests matches based on prefix matches to any tokens in the indexed text. This also highlights the tokens that match.This suggester supports payloads. Matches are sorted only by the suggest weight; it would be nice to support blended score + weight sort in the future. This means this suggester best applies when there is a strong a-priori ranking of all the suggestions.
This suggester supports contexts, including arbitrary binary terms.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.search.suggest.Lookup
Lookup.LookupPriorityQueue, Lookup.LookupResult
-
-
Field Summary
Fields Modifier and Type Field Description private booleanallTermsRequiredprivate booleancloseIndexWriterOnBuildprivate booleancommitOnBuildprotected static java.lang.StringCONTEXTS_FIELD_NAMEField name used for the indexed context, as a StringField and a SortedSetDVField, for filtering.static booleanDEFAULT_ALL_TERMS_REQUIREDDefault boolean clause option for multiple terms matching (all terms required).protected static booleanDEFAULT_CLOSE_INDEXWRITER_ON_BUILDDefault option to close the IndexWriter once the index has been built.static booleanDEFAULT_HIGHLIGHTDefault higlighting option.static intDEFAULT_MIN_PREFIX_CHARSDefault minimum number of leading characters before PrefixQuery is used (4).private Directorydirprotected static java.lang.StringEXACT_TEXT_FIELD_NAMEField name used for the indexed text, as a StringField, for exact lookup.private booleanhighlightprotected AnalyzerindexAnalyzerAnalyzer used at index time(package private) intminPrefixCharsprotected AnalyzerqueryAnalyzerAnalyzer used at search timeprotected SearcherManagersearcherMgrIndexSearcherused for lookups.protected java.lang.ObjectsearcherMgrLockUsed to manage concurrent access to searcherMgrprivate static SortSORTHow we sort the postings and search results.protected static java.lang.StringTEXT_FIELD_NAMEField name used for the indexed text.protected static java.lang.StringTEXTGRAMS_FIELD_NAMEedgegrams for searching short prefixes without Prefix Query that's controlled by minPrefixCharsprotected IndexWriterwriterUsed for ongoing NRT additions/updates.-
Fields inherited from class org.apache.lucene.search.suggest.Lookup
CHARSEQUENCE_COMPARATOR
-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description AnalyzingInfixSuggester(Directory dir, Analyzer analyzer)Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild)Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight)Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight, boolean closeIndexWriterOnBuild)Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidadd(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)Adds a new suggestion.voidaddContextToQuery(BooleanQuery.Builder query, BytesRef context, BooleanClause.Occur clause)This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries However, here may not be its best location.protected voidaddNonMatch(java.lang.StringBuilder sb, java.lang.String text)Called while highlighting a single result, to append a non-matching chunk of text from the suggestion to the provided fragments list.protected voidaddPrefixMatch(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed, java.lang.String prefixToken)Called while highlighting a single result, to append a matched prefix token, to the provided fragments list.protected voidaddWholeMatch(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed)Called while highlighting a single result, to append the whole matched token to the provided fragments list.voidbuild(InputIterator iter)Builds up a new internalLookuprepresentation based on the givenInputIterator.private DocumentbuildDocument(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)voidclose()voidcommit()Commits all pending changes made to this suggester to disk.protected java.util.List<Lookup.LookupResult>createResults(IndexSearcher searcher, TopFieldDocs hits, int num, java.lang.CharSequence charSequence, boolean doHighlight, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken)Create the results based on the search hits.private voidensureOpen()protected QueryfinishQuery(BooleanQuery.Builder in, boolean allTermsRequired)Subclass can override this to tweak the Query before searching.java.util.Collection<Accountable>getChildResources()Returns nested resources of this class.longgetCount()Get the number of entries the lookup was built withprotected DirectorygetDirectory(java.nio.file.Path path)Subclass can override to choose a specificDirectoryimplementation.private AnalyzergetGramAnalyzer()protected IndexWriterConfiggetIndexWriterConfig(Analyzer indexAnalyzer, IndexWriterConfig.OpenMode openMode)Override this to customize index settings, e.g.protected QuerygetLastTokenQuery(java.lang.String token)This is called if the last token isn't ended (e.g.protected FieldTypegetTextFieldType()Subclass can override this method to change the field type of the text field e.g.protected java.lang.Objecthighlight(java.lang.String text, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken)Override this method to customize the Object representing a single highlighted suggestions; the result is set on eachLookup.LookupResult.highlightKeymember.booleanload(DataInput out)Discard current lookup data and load it from a previously saved copy.java.util.List<Lookup.LookupResult>lookup(java.lang.CharSequence key, int num, boolean allTermsRequired, boolean doHighlight)Lookup, without any context.java.util.List<Lookup.LookupResult>lookup(java.lang.CharSequence key, java.util.Map<BytesRef,BooleanClause.Occur> contextInfo, int num, boolean allTermsRequired, boolean doHighlight)Retrieve suggestions, specifying whether all terms must match (allTermsRequired) and whether the hits should be highlighted (doHighlight).java.util.List<Lookup.LookupResult>lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, boolean onlyMorePopular, int num)Look up a key and return possible completion for this key.java.util.List<Lookup.LookupResult>lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight)Lookup, with context but without booleans.java.util.List<Lookup.LookupResult>lookup(java.lang.CharSequence key, BooleanQuery contextQuery, int num, boolean allTermsRequired, boolean doHighlight)This is an advanced method providing the capability to send down to the suggester any arbitrary lucene query to be used to filter the result of the suggesterlongramBytesUsed()Return the memory usage of this object in bytes.voidrefresh()Reopens the underlying searcher; it's best to "batch up" many additions/updates, and then call refresh once in the end.booleanstore(DataOutput in)Persist the constructed lookup data to a directory.private BooleanQuerytoQuery(java.util.Map<BytesRef,BooleanClause.Occur> contextInfo)private BooleanQuerytoQuery(java.util.Set<BytesRef> contextInfo)voidupdate(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload)Updates a previous suggestion, matching the exact same text as before.
-
-
-
Field Detail
-
TEXTGRAMS_FIELD_NAME
protected static final java.lang.String TEXTGRAMS_FIELD_NAME
edgegrams for searching short prefixes without Prefix Query that's controlled by minPrefixChars- See Also:
- Constant Field Values
-
TEXT_FIELD_NAME
protected static final java.lang.String TEXT_FIELD_NAME
Field name used for the indexed text.- See Also:
- Constant Field Values
-
EXACT_TEXT_FIELD_NAME
protected static final java.lang.String EXACT_TEXT_FIELD_NAME
Field name used for the indexed text, as a StringField, for exact lookup.- See Also:
- Constant Field Values
-
CONTEXTS_FIELD_NAME
protected static final java.lang.String CONTEXTS_FIELD_NAME
Field name used for the indexed context, as a StringField and a SortedSetDVField, for filtering.- See Also:
- Constant Field Values
-
queryAnalyzer
protected final Analyzer queryAnalyzer
Analyzer used at search time
-
indexAnalyzer
protected final Analyzer indexAnalyzer
Analyzer used at index time
-
dir
private final Directory dir
-
minPrefixChars
final int minPrefixChars
-
allTermsRequired
private final boolean allTermsRequired
-
highlight
private final boolean highlight
-
commitOnBuild
private final boolean commitOnBuild
-
closeIndexWriterOnBuild
private final boolean closeIndexWriterOnBuild
-
writer
protected IndexWriter writer
Used for ongoing NRT additions/updates.
-
searcherMgr
protected SearcherManager searcherMgr
IndexSearcherused for lookups.
-
searcherMgrLock
protected final java.lang.Object searcherMgrLock
Used to manage concurrent access to searcherMgr
-
DEFAULT_MIN_PREFIX_CHARS
public static final int DEFAULT_MIN_PREFIX_CHARS
Default minimum number of leading characters before PrefixQuery is used (4).- See Also:
- Constant Field Values
-
DEFAULT_ALL_TERMS_REQUIRED
public static final boolean DEFAULT_ALL_TERMS_REQUIRED
Default boolean clause option for multiple terms matching (all terms required).- See Also:
- Constant Field Values
-
DEFAULT_HIGHLIGHT
public static final boolean DEFAULT_HIGHLIGHT
Default higlighting option.- See Also:
- Constant Field Values
-
DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
protected static final boolean DEFAULT_CLOSE_INDEXWRITER_ON_BUILD
Default option to close the IndexWriter once the index has been built.- See Also:
- Constant Field Values
-
SORT
private static final Sort SORT
How we sort the postings and search results.
-
-
Constructor Detail
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer analyzer) throws java.io.IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()will also close the provided directory.- Throws:
java.io.IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild) throws java.io.IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()will also close the provided directory.- Parameters:
minPrefixChars- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.- Throws:
java.io.IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight) throws java.io.IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()will also close the provided directory.- Parameters:
minPrefixChars- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.allTermsRequired- All terms in the suggest query must be matched.highlight- Highlight suggest query in suggestions.- Throws:
java.io.IOException
-
AnalyzingInfixSuggester
public AnalyzingInfixSuggester(Directory dir, Analyzer indexAnalyzer, Analyzer queryAnalyzer, int minPrefixChars, boolean commitOnBuild, boolean allTermsRequired, boolean highlight, boolean closeIndexWriterOnBuild) throws java.io.IOException
Create a new instance, loading from a previously built AnalyzingInfixSuggester directory, if it exists. This directory must be private to the infix suggester (i.e., not an external Lucene index). Note thatclose()will also close the provided directory.- Parameters:
minPrefixChars- Minimum number of leading characters before PrefixQuery is used (default 4). Prefixes shorter than this are indexed as character ngrams (increasing index size but making lookups faster).commitOnBuild- Call commit after the index has finished building. This would persist the suggester index to disk and future instances of this suggester can use this pre-built dictionary.allTermsRequired- All terms in the suggest query must be matched.highlight- Highlight suggest query in suggestions.closeIndexWriterOnBuild- If true, the IndexWriter will be closed after the index has finished building.- Throws:
java.io.IOException
-
-
Method Detail
-
getIndexWriterConfig
protected IndexWriterConfig getIndexWriterConfig(Analyzer indexAnalyzer, IndexWriterConfig.OpenMode openMode)
Override this to customize index settings, e.g. which codec to use.
-
getDirectory
protected Directory getDirectory(java.nio.file.Path path) throws java.io.IOException
Subclass can override to choose a specificDirectoryimplementation.- Throws:
java.io.IOException
-
build
public void build(InputIterator iter) throws java.io.IOException
Description copied from class:LookupBuilds up a new internalLookuprepresentation based on the givenInputIterator. The implementation might re-sort the data internally.
-
commit
public void commit() throws java.io.IOExceptionCommits all pending changes made to this suggester to disk.- Throws:
java.io.IOException- See Also:
IndexWriter.commit()
-
getGramAnalyzer
private Analyzer getGramAnalyzer()
-
ensureOpen
private void ensureOpen() throws java.io.IOException- Throws:
java.io.IOException
-
add
public void add(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload) throws java.io.IOException
Adds a new suggestion. Be sure to useupdate(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)instead if you want to replace a previous suggestion. After adding or updating a batch of new suggestions, you must callrefresh()in the end in order to see the suggestions inlookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)- Throws:
java.io.IOException
-
update
public void update(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload) throws java.io.IOException
Updates a previous suggestion, matching the exact same text as before. Use this to change the weight or payload of an already added suggestion. If you know this text is not already present you can useadd(org.apache.lucene.util.BytesRef, java.util.Set<org.apache.lucene.util.BytesRef>, long, org.apache.lucene.util.BytesRef)instead. After adding or updating a batch of new suggestions, you must callrefresh()in the end in order to see the suggestions inlookup(java.lang.CharSequence, java.util.Set<org.apache.lucene.util.BytesRef>, boolean, int)- Throws:
java.io.IOException
-
buildDocument
private Document buildDocument(BytesRef text, java.util.Set<BytesRef> contexts, long weight, BytesRef payload) throws java.io.IOException
- Throws:
java.io.IOException
-
refresh
public void refresh() throws java.io.IOExceptionReopens the underlying searcher; it's best to "batch up" many additions/updates, and then call refresh once in the end.- Throws:
java.io.IOException
-
getTextFieldType
protected FieldType getTextFieldType()
Subclass can override this method to change the field type of the text field e.g. to change the index options
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, boolean onlyMorePopular, int num) throws java.io.IOException
Description copied from class:LookupLook up a key and return possible completion for this key.- Specified by:
lookupin classLookup- Parameters:
key- lookup key. Depending on the implementation this may be a prefix, misspelling, or even infix.contexts- contexts to filter the lookup by, or null if all contexts are allowed; if the suggestion contains any of the contexts, it's a matchonlyMorePopular- return only more popular resultsnum- maximum number of results to return- Returns:
- a list of possible completions, with their relative weight (e.g. popularity)
- Throws:
java.io.IOException
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, int num, boolean allTermsRequired, boolean doHighlight) throws java.io.IOException
Lookup, without any context.- Throws:
java.io.IOException
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Set<BytesRef> contexts, int num, boolean allTermsRequired, boolean doHighlight) throws java.io.IOException
Lookup, with context but without booleans. Context booleans default to SHOULD, so each suggestion must have at least one of the contexts.- Throws:
java.io.IOException
-
getLastTokenQuery
protected Query getLastTokenQuery(java.lang.String token) throws java.io.IOException
This is called if the last token isn't ended (e.g. user did not type a space after it). Return an appropriate Query clause to add to the BooleanQuery.- Throws:
java.io.IOException
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, java.util.Map<BytesRef,BooleanClause.Occur> contextInfo, int num, boolean allTermsRequired, boolean doHighlight) throws java.io.IOException
Retrieve suggestions, specifying whether all terms must match (allTermsRequired) and whether the hits should be highlighted (doHighlight).- Throws:
java.io.IOException
-
toQuery
private BooleanQuery toQuery(java.util.Map<BytesRef,BooleanClause.Occur> contextInfo)
-
toQuery
private BooleanQuery toQuery(java.util.Set<BytesRef> contextInfo)
-
addContextToQuery
public void addContextToQuery(BooleanQuery.Builder query, BytesRef context, BooleanClause.Occur clause)
This method is handy as we do not need access to internal fields such as CONTEXTS_FIELD_NAME in order to build queries However, here may not be its best location.- Parameters:
query- an instance of @SeeBooleanQuerycontext- the contextclause- one ofBooleanClause.Occur
-
lookup
public java.util.List<Lookup.LookupResult> lookup(java.lang.CharSequence key, BooleanQuery contextQuery, int num, boolean allTermsRequired, boolean doHighlight) throws java.io.IOException
This is an advanced method providing the capability to send down to the suggester any arbitrary lucene query to be used to filter the result of the suggester- Overrides:
lookupin classLookup- Parameters:
key- the keyword being looked forcontextQuery- an arbitrary Lucene query to be used to filter the result of the suggester.addContextToQuery(org.apache.lucene.search.BooleanQuery.Builder, org.apache.lucene.util.BytesRef, org.apache.lucene.search.BooleanClause.Occur)could be used to build this contextQuery.num- number of items to returnallTermsRequired- all searched terms must match or notdoHighlight- if true, the matching term will be highlighted in the search result- Returns:
- the result of the suggester
- Throws:
java.io.IOException- f the is IO exception while reading data from the index
-
createResults
protected java.util.List<Lookup.LookupResult> createResults(IndexSearcher searcher, TopFieldDocs hits, int num, java.lang.CharSequence charSequence, boolean doHighlight, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken) throws java.io.IOException
Create the results based on the search hits. Can be overridden by subclass to add particular behavior (e.g. weight transformation). Note that there is no prefix token (theprefixTokenargument will be null) whenever the final token in the incoming request was in fact finished (had trailing characters, such as white-space).- Throws:
java.io.IOException- If there are problems reading fields from the underlying Lucene index.
-
finishQuery
protected Query finishQuery(BooleanQuery.Builder in, boolean allTermsRequired)
Subclass can override this to tweak the Query before searching.
-
highlight
protected java.lang.Object highlight(java.lang.String text, java.util.Set<java.lang.String> matchedTokens, java.lang.String prefixToken) throws java.io.IOExceptionOverride this method to customize the Object representing a single highlighted suggestions; the result is set on eachLookup.LookupResult.highlightKeymember.- Throws:
java.io.IOException
-
addNonMatch
protected void addNonMatch(java.lang.StringBuilder sb, java.lang.String text)Called while highlighting a single result, to append a non-matching chunk of text from the suggestion to the provided fragments list.- Parameters:
sb- TheStringBuilderto append totext- The text chunk to add
-
addWholeMatch
protected void addWholeMatch(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed)Called while highlighting a single result, to append the whole matched token to the provided fragments list.- Parameters:
sb- TheStringBuilderto append tosurface- The surface form (original) textanalyzed- The analyzed token corresponding to the surface form text
-
addPrefixMatch
protected void addPrefixMatch(java.lang.StringBuilder sb, java.lang.String surface, java.lang.String analyzed, java.lang.String prefixToken)Called while highlighting a single result, to append a matched prefix token, to the provided fragments list.- Parameters:
sb- TheStringBuilderto append tosurface- The fragment of the surface form (indexed duringbuild(org.apache.lucene.search.suggest.InputIterator), corresponding to this matchanalyzed- The analyzed token that matchedprefixToken- The prefix of the token that matched
-
store
public boolean store(DataOutput in) throws java.io.IOException
Description copied from class:LookupPersist the constructed lookup data to a directory. Optional operation.- Specified by:
storein classLookup- Parameters:
in-DataOutputto write the data to.- Returns:
- true if successful, false if unsuccessful or not supported.
- Throws:
java.io.IOException- when fatal IO error occurs.
-
load
public boolean load(DataInput out) throws java.io.IOException
Description copied from class:LookupDiscard current lookup data and load it from a previously saved copy. Optional operation.
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Throws:
java.io.IOException
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:AccountableReturn the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsedin interfaceAccountable
-
getChildResources
public java.util.Collection<Accountable> getChildResources()
Description copied from interface:AccountableReturns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- Specified by:
getChildResourcesin interfaceAccountable- See Also:
Accountables
-
-