Class Stemmer
- java.lang.Object
-
- org.apache.lucene.analysis.hunspell.Stemmer
-
final class Stemmer extends java.lang.ObjectStemmer uses the affix rules declared in the Dictionary to generate one or more stems for a word. It conforms to the algorithm in the original hunspell algorithm, including recursive suffix stripping.
-
-
Field Summary
Fields Modifier and Type Field Description private ByteArrayDataInputaffixReaderprivate Dictionarydictionaryprivate static intEXACT_CASEprivate intformStepprivate char[]lowerBuffer(package private) FST.Arc<IntsRef>[]prefixArcs(package private) FST.BytesReader[]prefixReadersprivate BytesRefscratchprivate char[]scratchBufferprivate java.lang.StringBuilderscratchSegmentprivate java.lang.StringBuildersegment(package private) FST.Arc<IntsRef>[]suffixArcs(package private) FST.BytesReader[]suffixReadersprivate static intTITLE_CASEprivate char[]titleBufferprivate static intUPPER_CASE
-
Constructor Summary
Constructors Constructor Description Stemmer(Dictionary dictionary)Constructs a new Stemmer which will use the provided Dictionary to create its stems.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description (package private) java.util.List<CharsRef>applyAffix(char[] strippedWord, int length, int affix, int prefixFlag, int recursionDepth, boolean prefix, boolean circumfix, boolean caseVariant)Applies the affix rule to the given word, producing a list of stems if any are foundprivate voidcaseFoldLower(char[] word, int length)folds lowercase variant of word (title cased) to lowerBufferprivate voidcaseFoldTitle(char[] word, int length)folds titlecase variant of word to titleBufferprivate intcaseOf(char[] word, int length)returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the wordprivate booleancheckCondition(int condition, char[] c1, int c1off, int c1len, char[] c2, int c2off, int c2len)checks condition of the concatenation of two stringsprivate java.util.List<CharsRef>doStem(char[] word, int length, boolean caseVariant)private booleanhasCrossCheckedFlag(char flag, char[] flags, boolean matchEmpty)Checks if the given flag cross checks with the given array of flagsprivate CharsRefnewStem(char[] buffer, int length, IntsRef forms, int formID)java.util.List<CharsRef>stem(char[] word, int length)Find the stem(s) of the provided wordprivate java.util.List<CharsRef>stem(char[] word, int length, int previous, int prevFlag, int prefixFlag, int recursionDepth, boolean doPrefix, boolean doSuffix, boolean previousWasPrefix, boolean circumfix, boolean caseVariant)Generates a list of stems for the provided wordjava.util.List<CharsRef>stem(java.lang.String word)Find the stem(s) of the provided word.java.util.List<CharsRef>uniqueStems(char[] word, int length)Find the unique stem(s) of the provided word
-
-
-
Field Detail
-
dictionary
private final Dictionary dictionary
-
scratch
private final BytesRef scratch
-
segment
private final java.lang.StringBuilder segment
-
affixReader
private final ByteArrayDataInput affixReader
-
scratchSegment
private final java.lang.StringBuilder scratchSegment
-
scratchBuffer
private char[] scratchBuffer
-
formStep
private final int formStep
-
lowerBuffer
private char[] lowerBuffer
-
titleBuffer
private char[] titleBuffer
-
EXACT_CASE
private static final int EXACT_CASE
- See Also:
- Constant Field Values
-
TITLE_CASE
private static final int TITLE_CASE
- See Also:
- Constant Field Values
-
UPPER_CASE
private static final int UPPER_CASE
- See Also:
- Constant Field Values
-
prefixReaders
final FST.BytesReader[] prefixReaders
-
suffixReaders
final FST.BytesReader[] suffixReaders
-
-
Constructor Detail
-
Stemmer
public Stemmer(Dictionary dictionary)
Constructs a new Stemmer which will use the provided Dictionary to create its stems.- Parameters:
dictionary- Dictionary that will be used to create the stems
-
-
Method Detail
-
stem
public java.util.List<CharsRef> stem(java.lang.String word)
Find the stem(s) of the provided word.- Parameters:
word- Word to find the stems for- Returns:
- List of stems for the word
-
stem
public java.util.List<CharsRef> stem(char[] word, int length)
Find the stem(s) of the provided word- Parameters:
word- Word to find the stems for- Returns:
- List of stems for the word
-
caseOf
private int caseOf(char[] word, int length)returns EXACT_CASE,TITLE_CASE, or UPPER_CASE type for the word
-
caseFoldTitle
private void caseFoldTitle(char[] word, int length)folds titlecase variant of word to titleBuffer
-
caseFoldLower
private void caseFoldLower(char[] word, int length)folds lowercase variant of word (title cased) to lowerBuffer
-
doStem
private java.util.List<CharsRef> doStem(char[] word, int length, boolean caseVariant)
-
uniqueStems
public java.util.List<CharsRef> uniqueStems(char[] word, int length)
Find the unique stem(s) of the provided word- Parameters:
word- Word to find the stems for- Returns:
- List of stems for the word
-
stem
private java.util.List<CharsRef> stem(char[] word, int length, int previous, int prevFlag, int prefixFlag, int recursionDepth, boolean doPrefix, boolean doSuffix, boolean previousWasPrefix, boolean circumfix, boolean caseVariant) throws java.io.IOException
Generates a list of stems for the provided word- Parameters:
word- Word to generate the stems forprevious- previous affix that was removed (so we dont remove same one twice)prevFlag- Flag from a previous stemming step that need to be cross-checked with any affixes in this recursive stepprefixFlag- flag of the most inner removed prefix, so that when removing a suffix, it's also checked against the wordrecursionDepth- current recursiondepthdoPrefix- true if we should remove prefixesdoSuffix- true if we should remove suffixespreviousWasPrefix- true if the previous removal was a prefix: if we are removing a suffix, and it has no continuation requirements, it's ok. but two prefixes (COMPLEXPREFIXES) or two suffixes must have continuation requirements to recurse.circumfix- true if the previous prefix removal was signed as a circumfix this means inner most suffix must also contain circumfix flag.caseVariant- true if we are searching for a case variant. if the word has KEEPCASE flag it cannot succeed.- Returns:
- List of stems, or empty list if no stems are found
- Throws:
java.io.IOException
-
checkCondition
private boolean checkCondition(int condition, char[] c1, int c1off, int c1len, char[] c2, int c2off, int c2len)checks condition of the concatenation of two strings
-
applyAffix
java.util.List<CharsRef> applyAffix(char[] strippedWord, int length, int affix, int prefixFlag, int recursionDepth, boolean prefix, boolean circumfix, boolean caseVariant) throws java.io.IOException
Applies the affix rule to the given word, producing a list of stems if any are found- Parameters:
strippedWord- Word the affix has been removed and the strip addedlength- valid length of stripped wordaffix- HunspellAffix representing the affix rule itselfprefixFlag- when we already stripped a prefix, we cant simply recurse and check the suffix, unless both are compatible so we must check dictionary form against both to add it as a stem!recursionDepth- current recursion depthprefix- true if we are removing a prefix (false if it's a suffix)- Returns:
- List of stems for the word, or an empty list if none are found
- Throws:
java.io.IOException
-
hasCrossCheckedFlag
private boolean hasCrossCheckedFlag(char flag, char[] flags, boolean matchEmpty)Checks if the given flag cross checks with the given array of flags- Parameters:
flag- Flag to cross check with the array of flagsflags- Array of flags to cross check against. Can benull- Returns:
trueif the flag is found in the array or the array isnull,falseotherwise
-
-