Class UserDictionary
- java.lang.Object
-
- org.apache.lucene.analysis.ja.dict.UserDictionary
-
- All Implemented Interfaces:
Dictionary
public final class UserDictionary extends java.lang.Object implements Dictionary
Class for building a User Dictionary. This class allows for custom segmentation of phrases.
-
-
Field Summary
Fields Modifier and Type Field Description private static intCUSTOM_DICTIONARY_WORD_ID_OFFSETprivate java.lang.String[]dataprivate static int[][]EMPTY_RESULTprivate TokenInfoFSTfststatic intLEFT_IDstatic intRIGHT_IDprivate int[][]segmentationsstatic intWORD_COST-
Fields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR
-
-
Constructor Summary
Constructors Modifier Constructor Description privateUserDictionary(java.util.List<java.lang.String[]> featureEntries)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private java.lang.String[]getAllFeaturesArray(int wordId)java.lang.StringgetBaseForm(int wordId, char[] surface, int off, int len)Get base form of wordprivate java.lang.StringgetFeature(int wordId, int... fields)TokenInfoFSTgetFST()java.lang.StringgetInflectionForm(int wordId)Get inflection form of tokensjava.lang.StringgetInflectionType(int wordId)Get inflection type of tokensintgetLeftId(int wordId)Get left id of specified wordjava.lang.StringgetPartOfSpeech(int wordId)Get Part-Of-Speech of tokensjava.lang.StringgetPronunciation(int wordId, char[] surface, int off, int len)Get pronunciation of tokensjava.lang.StringgetReading(int wordId, char[] surface, int off, int len)Get reading of tokensintgetRightId(int wordId)Get right id of specified wordintgetWordCost(int wordId)Get word cost of specified wordint[][]lookup(char[] chars, int off, int len)Lookup words in textint[]lookupSegmentation(int phraseID)static UserDictionaryopen(java.io.Reader reader)private int[][]toIndexArray(java.util.Map<java.lang.Integer,int[]> input)Convert Map of index and wordIdAndLength to array of {wordId, index, length}
-
-
-
Field Detail
-
fst
private final TokenInfoFST fst
-
segmentations
private final int[][] segmentations
-
data
private final java.lang.String[] data
-
CUSTOM_DICTIONARY_WORD_ID_OFFSET
private static final int CUSTOM_DICTIONARY_WORD_ID_OFFSET
- See Also:
- Constant Field Values
-
WORD_COST
public static final int WORD_COST
- See Also:
- Constant Field Values
-
LEFT_ID
public static final int LEFT_ID
- See Also:
- Constant Field Values
-
RIGHT_ID
public static final int RIGHT_ID
- See Also:
- Constant Field Values
-
EMPTY_RESULT
private static final int[][] EMPTY_RESULT
-
-
Method Detail
-
open
public static UserDictionary open(java.io.Reader reader) throws java.io.IOException
- Throws:
java.io.IOException
-
lookup
public int[][] lookup(char[] chars, int off, int len) throws java.io.IOExceptionLookup words in text- Parameters:
chars- textoff- offset into textlen- length of text- Returns:
- array of {wordId, position, length}
- Throws:
java.io.IOException
-
getFST
public TokenInfoFST getFST()
-
toIndexArray
private int[][] toIndexArray(java.util.Map<java.lang.Integer,int[]> input)
Convert Map of index and wordIdAndLength to array of {wordId, index, length}- Returns:
- array of {wordId, index, length}
-
lookupSegmentation
public int[] lookupSegmentation(int phraseID)
-
getLeftId
public int getLeftId(int wordId)
Description copied from interface:DictionaryGet left id of specified word- Specified by:
getLeftIdin interfaceDictionary- Returns:
- left id
-
getRightId
public int getRightId(int wordId)
Description copied from interface:DictionaryGet right id of specified word- Specified by:
getRightIdin interfaceDictionary- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId)
Description copied from interface:DictionaryGet word cost of specified word- Specified by:
getWordCostin interfaceDictionary- Returns:
- word's cost
-
getReading
public java.lang.String getReading(int wordId, char[] surface, int off, int len)Description copied from interface:DictionaryGet reading of tokens- Specified by:
getReadingin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
public java.lang.String getPartOfSpeech(int wordId)
Description copied from interface:DictionaryGet Part-Of-Speech of tokens- Specified by:
getPartOfSpeechin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Part-Of-Speech of the token
-
getBaseForm
public java.lang.String getBaseForm(int wordId, char[] surface, int off, int len)Description copied from interface:DictionaryGet base form of word- Specified by:
getBaseFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getPronunciation
public java.lang.String getPronunciation(int wordId, char[] surface, int off, int len)Description copied from interface:DictionaryGet pronunciation of tokens- Specified by:
getPronunciationin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
public java.lang.String getInflectionType(int wordId)
Description copied from interface:DictionaryGet inflection type of tokens- Specified by:
getInflectionTypein interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
public java.lang.String getInflectionForm(int wordId)
Description copied from interface:DictionaryGet inflection form of tokens- Specified by:
getInflectionFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection form, or null
-
getAllFeaturesArray
private java.lang.String[] getAllFeaturesArray(int wordId)
-
getFeature
private java.lang.String getFeature(int wordId, int... fields)
-
-