Class UserDictionary
- java.lang.Object
-
- org.apache.lucene.analysis.ko.dict.UserDictionary
-
- All Implemented Interfaces:
Dictionary
public final class UserDictionary extends java.lang.Object implements Dictionary
Class for building a User Dictionary. This class allows for adding custom nouns (세종) or compounds (세종시 세종 시).
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from interface org.apache.lucene.analysis.ko.dict.Dictionary
Dictionary.Morpheme
-
-
Field Summary
Fields Modifier and Type Field Description private TokenInfoFSTfstprivate static shortLEFT_IDprivate static shortRIGHT_IDprivate static shortRIGHT_ID_Fprivate static shortRIGHT_ID_Tprivate short[]rightIdsprivate int[][]segmentationsprivate static intWORD_COST
-
Constructor Summary
Constructors Modifier Constructor Description privateUserDictionary(java.util.List<java.lang.String> entries)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description TokenInfoFSTgetFST()intgetLeftId(int wordId)Get left id of specified wordPOS.TaggetLeftPOS(int wordId)Get the leftPOS.Tagof specfied word.Dictionary.Morpheme[]getMorphemes(int wordId, char[] surfaceForm, int off, int len)Get the morphemes of specified word (e.g.POS.TypegetPOSType(int wordId)Get thePOS.Typeof specified word (morpheme, compound, inflect or pre-analysis)java.lang.StringgetReading(int wordId)Get the reading of specified word (mainly used for Hanja to Hangul conversion).intgetRightId(int wordId)Get right id of specified wordPOS.TaggetRightPOS(int wordId)Get the rightPOS.Tagof specfied word.intgetWordCost(int wordId)Get word cost of specified wordjava.util.List<java.lang.Integer>lookup(char[] chars, int off, int len)Lookup words in textstatic UserDictionaryopen(java.io.Reader reader)
-
-
-
Field Detail
-
fst
private final TokenInfoFST fst
-
WORD_COST
private static final int WORD_COST
- See Also:
- Constant Field Values
-
LEFT_ID
private static final short LEFT_ID
- See Also:
- Constant Field Values
-
RIGHT_ID
private static final short RIGHT_ID
- See Also:
- Constant Field Values
-
RIGHT_ID_T
private static final short RIGHT_ID_T
- See Also:
- Constant Field Values
-
RIGHT_ID_F
private static final short RIGHT_ID_F
- See Also:
- Constant Field Values
-
segmentations
private final int[][] segmentations
-
rightIds
private final short[] rightIds
-
-
Method Detail
-
open
public static UserDictionary open(java.io.Reader reader) throws java.io.IOException
- Throws:
java.io.IOException
-
getFST
public TokenInfoFST getFST()
-
getLeftId
public int getLeftId(int wordId)
Description copied from interface:DictionaryGet left id of specified word- Specified by:
getLeftIdin interfaceDictionary
-
getRightId
public int getRightId(int wordId)
Description copied from interface:DictionaryGet right id of specified word- Specified by:
getRightIdin interfaceDictionary
-
getWordCost
public int getWordCost(int wordId)
Description copied from interface:DictionaryGet word cost of specified word- Specified by:
getWordCostin interfaceDictionary
-
getPOSType
public POS.Type getPOSType(int wordId)
Description copied from interface:DictionaryGet thePOS.Typeof specified word (morpheme, compound, inflect or pre-analysis)- Specified by:
getPOSTypein interfaceDictionary
-
getLeftPOS
public POS.Tag getLeftPOS(int wordId)
Description copied from interface:DictionaryGet the leftPOS.Tagof specfied word. ForPOS.Type.MORPHEMEandPOS.Type.COMPOUNDthe left and right POS are the same.- Specified by:
getLeftPOSin interfaceDictionary
-
getRightPOS
public POS.Tag getRightPOS(int wordId)
Description copied from interface:DictionaryGet the rightPOS.Tagof specfied word. ForPOS.Type.MORPHEMEandPOS.Type.COMPOUNDthe left and right POS are the same.- Specified by:
getRightPOSin interfaceDictionary
-
getReading
public java.lang.String getReading(int wordId)
Description copied from interface:DictionaryGet the reading of specified word (mainly used for Hanja to Hangul conversion).- Specified by:
getReadingin interfaceDictionary
-
getMorphemes
public Dictionary.Morpheme[] getMorphemes(int wordId, char[] surfaceForm, int off, int len)
Description copied from interface:DictionaryGet the morphemes of specified word (e.g. 가깝으나: 가깝 + 으나).- Specified by:
getMorphemesin interfaceDictionary
-
lookup
public java.util.List<java.lang.Integer> lookup(char[] chars, int off, int len) throws java.io.IOExceptionLookup words in text- Parameters:
chars- textoff- offset into textlen- length of text- Returns:
- array of wordId
- Throws:
java.io.IOException
-
-