Class WordSegmenter
- java.lang.Object
-
- org.apache.lucene.analysis.cn.smart.WordSegmenter
-
class WordSegmenter extends java.lang.ObjectSegment a sentence of Chinese text into words.
-
-
Field Summary
Fields Modifier and Type Field Description private HHMMSegmenterhhmmSegmenterprivate SegTokenFiltertokenFilter
-
Constructor Summary
Constructors Constructor Description WordSegmenter()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description SegTokenconvertSegToken(SegToken st, java.lang.String sentence, int sentenceStartOffset)Process aSegTokenso that it is ready for indexing.java.util.List<SegToken>segmentSentence(java.lang.String sentence, int startOffset)Segment a sentence into words withHHMMSegmenter
-
-
-
Field Detail
-
hhmmSegmenter
private HHMMSegmenter hhmmSegmenter
-
tokenFilter
private SegTokenFilter tokenFilter
-
-
Method Detail
-
segmentSentence
public java.util.List<SegToken> segmentSentence(java.lang.String sentence, int startOffset)
Segment a sentence into words withHHMMSegmenter- Parameters:
sentence- input sentencestartOffset- start offset of sentence- Returns:
ListofSegToken
-
convertSegToken
public SegToken convertSegToken(SegToken st, java.lang.String sentence, int sentenceStartOffset)
Process aSegTokenso that it is ready for indexing. This method calculates offsets and normalizes the token withSegTokenFilter.
-
-