Class WeightedSpanTermExtractor
- java.lang.Object
-
- org.apache.lucene.search.highlight.WeightedSpanTermExtractor
-
public class WeightedSpanTermExtractor extends java.lang.ObjectClass used to extractWeightedSpanTerms from aQuerybased on whetherTerms from theQueryare contained in a suppliedTokenStream. In order to support additional, by default unsupported queries, subclasses can overrideextract(Query, float, Map)for extracting wrapped or delegate queries andextractUnknownQuery(Query, Map)to process custom leaf queries:WeightedSpanTermExtractor extractor = new WeightedSpanTermExtractor() { protected void extract(Query query, float boost, Map<String, WeightedSpanTerm>terms) throws IOException { if (query instanceof QueryWrapper) { extract(((QueryWrapper)query).getQuery(), boost, terms); } else { super.extract(query, boost, terms); } } protected void extractUnknownQuery(Query query, Map<String, WeightedSpanTerm> terms) throws IOException { if (query instanceOf CustomTermQuery) { Term term = ((CustomTermQuery) query).getTerm(); terms.put(term.field(), new WeightedSpanTerm(1, term.text())); } } }; }
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description (package private) static classWeightedSpanTermExtractor.DelegatingLeafReaderprotected static classWeightedSpanTermExtractor.PositionCheckingMap<K>This class makes sure that if both position sensitive and insensitive versions of the same term are added, the position insensitive one wins.
-
Field Summary
Fields Modifier and Type Field Description private booleancachedTokenStreamprivate java.lang.StringdefaultFieldprivate booleanexpandMultiTermQueryprivate java.lang.StringfieldNameprivate LeafReaderinternalReaderprivate intmaxDocCharsToAnalyzeprivate TokenStreamtokenStreamprivate booleanusePayloadsprivate booleanwrapToCaching
-
Constructor Summary
Constructors Constructor Description WeightedSpanTermExtractor()WeightedSpanTermExtractor(java.lang.String defaultField)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidcollectSpanQueryFields(SpanQuery spanQuery, java.util.Set<java.lang.String> fieldNames)protected voidextract(Query query, float boost, java.util.Map<java.lang.String,WeightedSpanTerm> terms)protected voidextractUnknownQuery(Query query, java.util.Map<java.lang.String,WeightedSpanTerm> terms)protected voidextractWeightedSpanTerms(java.util.Map<java.lang.String,WeightedSpanTerm> terms, SpanQuery spanQuery, float boost)protected voidextractWeightedTerms(java.util.Map<java.lang.String,WeightedSpanTerm> terms, Query query, float boost)protected booleanfieldNameComparator(java.lang.String fieldNameToCheck)Necessary to implement matches for queries againstdefaultFieldbooleangetExpandMultiTermQuery()protected LeafReaderContextgetLeafContext()TokenStreamgetTokenStream()Returns the tokenStream which may have been wrapped in a CachingTokenFilter.java.util.Map<java.lang.String,WeightedSpanTerm>getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream)Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.java.util.Map<java.lang.String,WeightedSpanTerm>getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, java.lang.String fieldName)Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.java.util.Map<java.lang.String,WeightedSpanTerm>getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, java.lang.String fieldName, IndexReader reader)Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.booleanisCachedTokenStream()protected booleanisQueryUnsupported(java.lang.Class<? extends Query> clazz)booleanisUsePayloads()protected booleanmustRewriteQuery(SpanQuery spanQuery)voidsetExpandMultiTermQuery(boolean expandMultiTermQuery)protected voidsetMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)A threshold of number of characters to analyze.voidsetUsePayloads(boolean usePayloads)voidsetWrapIfNotCachingTokenFilter(boolean wrap)By default,TokenStreams that are not of the typeCachingTokenFilterare wrapped in aCachingTokenFilterto ensure an efficient reset - if you are already using a different cachingTokenStreamimpl and you don't want it to be wrapped, set this to false.
-
-
-
Field Detail
-
fieldName
private java.lang.String fieldName
-
tokenStream
private TokenStream tokenStream
-
defaultField
private java.lang.String defaultField
-
expandMultiTermQuery
private boolean expandMultiTermQuery
-
cachedTokenStream
private boolean cachedTokenStream
-
wrapToCaching
private boolean wrapToCaching
-
maxDocCharsToAnalyze
private int maxDocCharsToAnalyze
-
usePayloads
private boolean usePayloads
-
internalReader
private LeafReader internalReader
-
-
Method Detail
-
extract
protected void extract(Query query, float boost, java.util.Map<java.lang.String,WeightedSpanTerm> terms) throws java.io.IOException
- Parameters:
query- Query to extract Terms fromterms- Map to place created WeightedSpanTerms in- Throws:
java.io.IOException- If there is a low-level I/O error
-
isQueryUnsupported
protected boolean isQueryUnsupported(java.lang.Class<? extends Query> clazz)
-
extractUnknownQuery
protected void extractUnknownQuery(Query query, java.util.Map<java.lang.String,WeightedSpanTerm> terms) throws java.io.IOException
- Throws:
java.io.IOException
-
extractWeightedSpanTerms
protected void extractWeightedSpanTerms(java.util.Map<java.lang.String,WeightedSpanTerm> terms, SpanQuery spanQuery, float boost) throws java.io.IOException
- Parameters:
terms- Map to place created WeightedSpanTerms inspanQuery- SpanQuery to extract Terms from- Throws:
java.io.IOException- If there is a low-level I/O error
-
extractWeightedTerms
protected void extractWeightedTerms(java.util.Map<java.lang.String,WeightedSpanTerm> terms, Query query, float boost) throws java.io.IOException
- Parameters:
terms- Map to place created WeightedSpanTerms inquery- Query to extract Terms from- Throws:
java.io.IOException- If there is a low-level I/O error
-
fieldNameComparator
protected boolean fieldNameComparator(java.lang.String fieldNameToCheck)
Necessary to implement matches for queries againstdefaultField
-
getLeafContext
protected LeafReaderContext getLeafContext() throws java.io.IOException
- Throws:
java.io.IOException
-
getWeightedSpanTerms
public java.util.Map<java.lang.String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream) throws java.io.IOException
Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.- Parameters:
query- that caused hittokenStream- of text to be highlighted- Returns:
- Map containing WeightedSpanTerms
- Throws:
java.io.IOException- If there is a low-level I/O error
-
getWeightedSpanTerms
public java.util.Map<java.lang.String,WeightedSpanTerm> getWeightedSpanTerms(Query query, float boost, TokenStream tokenStream, java.lang.String fieldName) throws java.io.IOException
Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream.- Parameters:
query- that caused hittokenStream- of text to be highlightedfieldName- restricts Term's used based on field name- Returns:
- Map containing WeightedSpanTerms
- Throws:
java.io.IOException- If there is a low-level I/O error
-
getWeightedSpanTermsWithScores
public java.util.Map<java.lang.String,WeightedSpanTerm> getWeightedSpanTermsWithScores(Query query, float boost, TokenStream tokenStream, java.lang.String fieldName, IndexReader reader) throws java.io.IOException
Creates a Map ofWeightedSpanTermsfrom the givenQueryandTokenStream. Uses a suppliedIndexReaderto properly weight terms (for gradient highlighting).- Parameters:
query- that caused hittokenStream- of text to be highlightedfieldName- restricts Term's used based on field namereader- to use for scoring- Returns:
- Map of WeightedSpanTerms with quasi tf/idf scores
- Throws:
java.io.IOException- If there is a low-level I/O error
-
collectSpanQueryFields
protected void collectSpanQueryFields(SpanQuery spanQuery, java.util.Set<java.lang.String> fieldNames)
-
mustRewriteQuery
protected boolean mustRewriteQuery(SpanQuery spanQuery)
-
getExpandMultiTermQuery
public boolean getExpandMultiTermQuery()
-
setExpandMultiTermQuery
public void setExpandMultiTermQuery(boolean expandMultiTermQuery)
-
isUsePayloads
public boolean isUsePayloads()
-
setUsePayloads
public void setUsePayloads(boolean usePayloads)
-
isCachedTokenStream
public boolean isCachedTokenStream()
-
getTokenStream
public TokenStream getTokenStream()
Returns the tokenStream which may have been wrapped in a CachingTokenFilter. getWeightedSpanTerms* sets the tokenStream, so don't call this before.
-
setWrapIfNotCachingTokenFilter
public void setWrapIfNotCachingTokenFilter(boolean wrap)
By default,TokenStreams that are not of the typeCachingTokenFilterare wrapped in aCachingTokenFilterto ensure an efficient reset - if you are already using a different cachingTokenStreamimpl and you don't want it to be wrapped, set this to false. This setting is ignored when a term vector based TokenStream is supplied, since it can be reset efficiently.
-
setMaxDocCharsToAnalyze
protected final void setMaxDocCharsToAnalyze(int maxDocCharsToAnalyze)
A threshold of number of characters to analyze. When a TokenStream based on term vectors with offsets and positions are supplied, this setting does not apply.
-
-