Class HyphenationCompoundWordTokenFilterFactory
- java.lang.Object
-
- org.apache.lucene.analysis.util.AbstractAnalysisFactory
-
- org.apache.lucene.analysis.util.TokenFilterFactory
-
- org.apache.lucene.analysis.compound.HyphenationCompoundWordTokenFilterFactory
-
- All Implemented Interfaces:
ResourceLoaderAware
public class HyphenationCompoundWordTokenFilterFactory extends TokenFilterFactory implements ResourceLoaderAware
Factory forHyphenationCompoundWordTokenFilter.This factory accepts the following parameters:
hyphenator(mandatory): path to the FOP xml hyphenation pattern. See http://offo.sourceforge.net/hyphenation/.encoding(optional): encoding of the xml hyphenation file. defaults to UTF-8.dictionary(optional): dictionary of words. defaults to no dictionary.minWordSize(optional): minimal word length that gets decomposed. defaults to 5.minSubwordSize(optional): minimum length of subwords. defaults to 2.maxSubwordSize(optional): maximum length of subwords. defaults to 15.onlyLongestMatch(optional): if true, adds only the longest matching subword to the stream. defaults to false.
<fieldType name="text_hyphncomp" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.HyphenationCompoundWordTokenFilterFactory" hyphenator="hyphenator.xml" encoding="UTF-8" dictionary="dictionary.txt" minWordSize="5" minSubwordSize="2" maxSubwordSize="15" onlyLongestMatch="false"/> </analyzer> </fieldType>- Since:
- 3.1.0
- See Also:
HyphenationCompoundWordTokenFilter
-
-
Field Summary
Fields Modifier and Type Field Description private java.lang.StringdictFileprivate CharArraySetdictionaryprivate java.lang.Stringencodingprivate java.lang.StringhypFileprivate HyphenationTreehyphenatorprivate intmaxSubwordSizeprivate intminSubwordSizeprivate intminWordSizestatic java.lang.StringNAMESPI nameprivate booleanonlyLongestMatch-
Fields inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
LUCENE_MATCH_VERSION_PARAM, luceneMatchVersion
-
-
Constructor Summary
Constructors Constructor Description HyphenationCompoundWordTokenFilterFactory(java.util.Map<java.lang.String,java.lang.String> args)Creates a new HyphenationCompoundWordTokenFilterFactory
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description TokenFiltercreate(TokenStream input)Transform the specified input TokenStreamvoidinform(ResourceLoader loader)Initializes this component with the provided ResourceLoader (used for loading classes, files, etc).-
Methods inherited from class org.apache.lucene.analysis.util.TokenFilterFactory
availableTokenFilters, findSPIName, forName, lookupClass, normalize, reloadTokenFilters
-
Methods inherited from class org.apache.lucene.analysis.util.AbstractAnalysisFactory
get, get, get, get, get, getBoolean, getChar, getClassArg, getFloat, getInt, getLines, getLuceneMatchVersion, getOriginalArgs, getPattern, getSet, getSnowballWordSet, getWordSet, isExplicitLuceneMatchVersion, require, require, require, requireBoolean, requireChar, requireFloat, requireInt, setExplicitLuceneMatchVersion, splitAt, splitFileNames
-
-
-
-
Field Detail
-
NAME
public static final java.lang.String NAME
SPI name- See Also:
- Constant Field Values
-
dictionary
private CharArraySet dictionary
-
hyphenator
private HyphenationTree hyphenator
-
dictFile
private final java.lang.String dictFile
-
hypFile
private final java.lang.String hypFile
-
encoding
private final java.lang.String encoding
-
minWordSize
private final int minWordSize
-
minSubwordSize
private final int minSubwordSize
-
maxSubwordSize
private final int maxSubwordSize
-
onlyLongestMatch
private final boolean onlyLongestMatch
-
-
Method Detail
-
inform
public void inform(ResourceLoader loader) throws java.io.IOException
Description copied from interface:ResourceLoaderAwareInitializes this component with the provided ResourceLoader (used for loading classes, files, etc).- Specified by:
informin interfaceResourceLoaderAware- Throws:
java.io.IOException
-
create
public TokenFilter create(TokenStream input)
Description copied from class:TokenFilterFactoryTransform the specified input TokenStream- Specified by:
createin classTokenFilterFactory
-
-