Class UAX29URLEmailAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.StopwordAnalyzerBase
-
- org.apache.lucene.analysis.standard.UAX29URLEmailAnalyzer
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable
public final class UAX29URLEmailAnalyzer extends StopwordAnalyzerBase
FiltersUAX29URLEmailTokenizerwithLowerCaseFilterandStopFilter, using a list of English stop words.- Since:
- 3.6.0
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
Fields Modifier and Type Field Description static intDEFAULT_MAX_TOKEN_LENGTHDefault maximum allowed token lengthprivate intmaxTokenLengthstatic CharArraySetSTOP_WORDS_SETAn unmodifiable set containing some common English words that are usually not useful for searching.-
Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
stopwords
-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Constructor Summary
Constructors Constructor Description UAX29URLEmailAnalyzer()Builds an analyzer with the default stop words (STOP_WORDS_SET).UAX29URLEmailAnalyzer(java.io.Reader stopwords)Builds an analyzer with the stop words from the given reader.UAX29URLEmailAnalyzer(CharArraySet stopWords)Builds an analyzer with the given stop words.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected Analyzer.TokenStreamComponentscreateComponents(java.lang.String fieldName)Creates a newAnalyzer.TokenStreamComponentsinstance for this analyzer.intgetMaxTokenLength()protected TokenStreamnormalize(java.lang.String fieldName, TokenStream in)Wrap the givenTokenStreamin order to apply normalization filters.voidsetMaxTokenLength(int length)Set the max allowed token length.-
Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, initReaderForNormalization, normalize, setVersion, tokenStream, tokenStream
-
-
-
-
Field Detail
-
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
Default maximum allowed token length- See Also:
- Constant Field Values
-
maxTokenLength
private int maxTokenLength
-
STOP_WORDS_SET
public static final CharArraySet STOP_WORDS_SET
An unmodifiable set containing some common English words that are usually not useful for searching.
-
-
Constructor Detail
-
UAX29URLEmailAnalyzer
public UAX29URLEmailAnalyzer(CharArraySet stopWords)
Builds an analyzer with the given stop words.- Parameters:
stopWords- stop words
-
UAX29URLEmailAnalyzer
public UAX29URLEmailAnalyzer()
Builds an analyzer with the default stop words (STOP_WORDS_SET).
-
UAX29URLEmailAnalyzer
public UAX29URLEmailAnalyzer(java.io.Reader stopwords) throws java.io.IOExceptionBuilds an analyzer with the stop words from the given reader.- Parameters:
stopwords- Reader to read stop words from- Throws:
java.io.IOException- See Also:
WordlistLoader.getWordSet(java.io.Reader)
-
-
Method Detail
-
setMaxTokenLength
public void setMaxTokenLength(int length)
Set the max allowed token length. Tokens larger than this will be chopped up at this token length and emitted as multiple tokens. If you need to skip such large tokens, you could increase this max length, and then useLengthFilterto remove long tokens. The default isDEFAULT_MAX_TOKEN_LENGTH.
-
getMaxTokenLength
public int getMaxTokenLength()
- See Also:
setMaxTokenLength(int)
-
createComponents
protected Analyzer.TokenStreamComponents createComponents(java.lang.String fieldName)
Description copied from class:AnalyzerCreates a newAnalyzer.TokenStreamComponentsinstance for this analyzer.- Specified by:
createComponentsin classAnalyzer- Parameters:
fieldName- the name of the fields content passed to theAnalyzer.TokenStreamComponentssink as a reader- Returns:
- the
Analyzer.TokenStreamComponentsfor this analyzer.
-
normalize
protected TokenStream normalize(java.lang.String fieldName, TokenStream in)
Description copied from class:AnalyzerWrap the givenTokenStreamin order to apply normalization filters. The default implementation returns theTokenStreamas-is. This is used byAnalyzer.normalize(String, String).
-
-