Uses of Class
org.apache.lucene.analysis.Tokenizer
-
Packages that use Tokenizer Package Description org.apache.lucene.analysis Text analysis.org.apache.lucene.analysis.cn.smart Analyzer for Simplified Chinese, which indexes words.org.apache.lucene.analysis.core Basic, general-purpose analysis components.org.apache.lucene.analysis.icu.segmentation Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.org.apache.lucene.analysis.ja Analyzer for Japanese.org.apache.lucene.analysis.ko Analyzer for Korean.org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames.org.apache.lucene.analysis.pattern Set of components for pattern-based (regex) analysis.org.apache.lucene.analysis.standard Fast, general-purpose grammar-based tokenizerStandardTokenizerimplements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.org.apache.lucene.analysis.th Analyzer for Thai.org.apache.lucene.analysis.util Utility functions for text analysis.org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax. -
-
Uses of Tokenizer in org.apache.lucene.analysis
Constructors in org.apache.lucene.analysis with parameters of type Tokenizer Constructor Description TokenStreamComponents(Tokenizer tokenizer)Creates a newAnalyzer.TokenStreamComponentsfrom a TokenizerTokenStreamComponents(Tokenizer tokenizer, TokenStream result)Creates a newAnalyzer.TokenStreamComponentsinstance -
Uses of Tokenizer in org.apache.lucene.analysis.cn.smart
Subclasses of Tokenizer in org.apache.lucene.analysis.cn.smart Modifier and Type Class Description classHMMChineseTokenizerTokenizer for Chinese or mixed Chinese-English text.Methods in org.apache.lucene.analysis.cn.smart that return Tokenizer Modifier and Type Method Description TokenizerHMMChineseTokenizerFactory. create(AttributeFactory factory) -
Uses of Tokenizer in org.apache.lucene.analysis.core
Subclasses of Tokenizer in org.apache.lucene.analysis.core Modifier and Type Class Description classKeywordTokenizerEmits the entire input as a single token.classLetterTokenizerA LetterTokenizer is a tokenizer that divides text at non-letters.classUnicodeWhitespaceTokenizerA UnicodeWhitespaceTokenizer is a tokenizer that divides text at whitespace.classWhitespaceTokenizerA tokenizer that divides text at whitespace characters as defined byCharacter.isWhitespace(int).Methods in org.apache.lucene.analysis.core that return Tokenizer Modifier and Type Method Description TokenizerWhitespaceTokenizerFactory. create(AttributeFactory factory) -
Uses of Tokenizer in org.apache.lucene.analysis.icu.segmentation
Subclasses of Tokenizer in org.apache.lucene.analysis.icu.segmentation Modifier and Type Class Description classICUTokenizerBreaks text into words according to UAX #29: Unicode Text Segmentation (http://www.unicode.org/reports/tr29/) -
Uses of Tokenizer in org.apache.lucene.analysis.ja
Subclasses of Tokenizer in org.apache.lucene.analysis.ja Modifier and Type Class Description classJapaneseTokenizerTokenizer for Japanese that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ko
Subclasses of Tokenizer in org.apache.lucene.analysis.ko Modifier and Type Class Description classKoreanTokenizerTokenizer for Korean that uses morphological analysis. -
Uses of Tokenizer in org.apache.lucene.analysis.ngram
Subclasses of Tokenizer in org.apache.lucene.analysis.ngram Modifier and Type Class Description classEdgeNGramTokenizerTokenizes the input from an edge into n-grams of given size(s).classNGramTokenizerTokenizes the input into n-grams of the given size(s).Methods in org.apache.lucene.analysis.ngram that return Tokenizer Modifier and Type Method Description TokenizerEdgeNGramTokenizerFactory. create(AttributeFactory factory)TokenizerNGramTokenizerFactory. create(AttributeFactory factory) -
Uses of Tokenizer in org.apache.lucene.analysis.path
Subclasses of Tokenizer in org.apache.lucene.analysis.path Modifier and Type Class Description classPathHierarchyTokenizerTokenizer for path-like hierarchies.classReversePathHierarchyTokenizerTokenizer for domain-like hierarchies.Methods in org.apache.lucene.analysis.path that return Tokenizer Modifier and Type Method Description TokenizerPathHierarchyTokenizerFactory. create(AttributeFactory factory) -
Uses of Tokenizer in org.apache.lucene.analysis.pattern
Subclasses of Tokenizer in org.apache.lucene.analysis.pattern Modifier and Type Class Description classPatternTokenizerThis tokenizer uses regex pattern matching to construct distinct tokens for the input stream.classSimplePatternSplitTokenizerclassSimplePatternTokenizer -
Uses of Tokenizer in org.apache.lucene.analysis.standard
Subclasses of Tokenizer in org.apache.lucene.analysis.standard Modifier and Type Class Description classClassicTokenizerA grammar-based tokenizer constructed with JFlexclassStandardTokenizerA grammar-based tokenizer constructed with JFlex.classUAX29URLEmailTokenizerThis class implements Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29 URLs and email addresses are also tokenized according to the relevant RFCs. -
Uses of Tokenizer in org.apache.lucene.analysis.th
Subclasses of Tokenizer in org.apache.lucene.analysis.th Modifier and Type Class Description classThaiTokenizerTokenizer that useBreakIteratorto tokenize Thai text.Methods in org.apache.lucene.analysis.th that return Tokenizer Modifier and Type Method Description TokenizerThaiTokenizerFactory. create(AttributeFactory factory) -
Uses of Tokenizer in org.apache.lucene.analysis.util
Subclasses of Tokenizer in org.apache.lucene.analysis.util Modifier and Type Class Description classCharTokenizerAn abstract base class for simple, character-oriented tokenizers.classSegmentingTokenizerBaseBreaks text into sentences with aBreakIteratorand allows subclasses to decompose these sentences into words.Methods in org.apache.lucene.analysis.util that return Tokenizer Modifier and Type Method Description TokenizerTokenizerFactory. create()Creates a TokenStream of the specified input using the default attribute factory.abstract TokenizerTokenizerFactory. create(AttributeFactory factory)Creates a TokenStream of the specified input using the given AttributeFactory -
Uses of Tokenizer in org.apache.lucene.analysis.wikipedia
Subclasses of Tokenizer in org.apache.lucene.analysis.wikipedia Modifier and Type Class Description classWikipediaTokenizerExtension of StandardTokenizer that is aware of Wikipedia syntax.
-