Class BinaryDictionary
- java.lang.Object
-
- org.apache.lucene.analysis.ja.dict.BinaryDictionary
-
- All Implemented Interfaces:
Dictionary
- Direct Known Subclasses:
TokenInfoDictionary,UnknownDictionary
public abstract class BinaryDictionary extends java.lang.Object implements Dictionary
Base class for a binary-encoded in-memory dictionary.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBinaryDictionary.ResourceSchemeUsed to specify where (dictionary) resources get loaded from.
-
Field Summary
Fields Modifier and Type Field Description private java.nio.ByteBufferbufferstatic java.lang.StringDICT_FILENAME_SUFFIXstatic java.lang.StringDICT_HEADERstatic intHAS_BASEFORMflag that the entry has baseform data.static intHAS_PRONUNCIATIONflag that the entry has pronunciation data.static intHAS_READINGflag that the entry has reading data.private java.lang.String[]inflFormDictprivate java.lang.String[]inflTypeDictprivate java.lang.String[]posDictstatic java.lang.StringPOSDICT_FILENAME_SUFFIXstatic java.lang.StringPOSDICT_HEADERprivate java.lang.StringresourcePathprivate BinaryDictionary.ResourceSchemeresourceSchemeprivate int[]targetMapstatic java.lang.StringTARGETMAP_FILENAME_SUFFIXstatic java.lang.StringTARGETMAP_HEADERprivate int[]targetMapOffsetsstatic intVERSION-
Fields inherited from interface org.apache.lucene.analysis.ja.dict.Dictionary
INTERNAL_SEPARATOR
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedBinaryDictionary()protectedBinaryDictionary(BinaryDictionary.ResourceScheme resourceScheme, java.lang.String resourcePath)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static intbaseFormOffset(int wordId)java.lang.StringgetBaseForm(int wordId, char[] surfaceForm, int off, int len)Get base form of wordstatic java.io.InputStreamgetClassResource(java.lang.Class<?> clazz, java.lang.String suffix)private static java.io.InputStreamgetClassResource(java.lang.String path)java.lang.StringgetInflectionForm(int wordId)Get inflection form of tokensjava.lang.StringgetInflectionType(int wordId)Get inflection type of tokensintgetLeftId(int wordId)Get left id of specified wordjava.lang.StringgetPartOfSpeech(int wordId)Get Part-Of-Speech of tokensjava.lang.StringgetPronunciation(int wordId, char[] surface, int off, int len)Get pronunciation of tokensjava.lang.StringgetReading(int wordId, char[] surface, int off, int len)Get reading of tokensprotected java.io.InputStreamgetResource(java.lang.String suffix)static java.io.InputStreamgetResource(BinaryDictionary.ResourceScheme scheme, java.lang.String path)intgetRightId(int wordId)Get right id of specified wordintgetWordCost(int wordId)Get word cost of specified wordprivate booleanhasBaseFormData(int wordId)private booleanhasPronunciationData(int wordId)private booleanhasReadingData(int wordId)voidlookupWordIds(int sourceId, IntsRef ref)private intpronunciationOffset(int wordId)private intreadingOffset(int wordId)private java.lang.StringreadString(int offset, int length, boolean kana)
-
-
-
Field Detail
-
DICT_FILENAME_SUFFIX
public static final java.lang.String DICT_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
TARGETMAP_FILENAME_SUFFIX
public static final java.lang.String TARGETMAP_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
POSDICT_FILENAME_SUFFIX
public static final java.lang.String POSDICT_FILENAME_SUFFIX
- See Also:
- Constant Field Values
-
DICT_HEADER
public static final java.lang.String DICT_HEADER
- See Also:
- Constant Field Values
-
TARGETMAP_HEADER
public static final java.lang.String TARGETMAP_HEADER
- See Also:
- Constant Field Values
-
POSDICT_HEADER
public static final java.lang.String POSDICT_HEADER
- See Also:
- Constant Field Values
-
VERSION
public static final int VERSION
- See Also:
- Constant Field Values
-
resourceScheme
private final BinaryDictionary.ResourceScheme resourceScheme
-
resourcePath
private final java.lang.String resourcePath
-
buffer
private final java.nio.ByteBuffer buffer
-
targetMapOffsets
private final int[] targetMapOffsets
-
targetMap
private final int[] targetMap
-
posDict
private final java.lang.String[] posDict
-
inflTypeDict
private final java.lang.String[] inflTypeDict
-
inflFormDict
private final java.lang.String[] inflFormDict
-
HAS_BASEFORM
public static final int HAS_BASEFORM
flag that the entry has baseform data. otherwise it's not inflected (same as surface form)- See Also:
- Constant Field Values
-
HAS_READING
public static final int HAS_READING
flag that the entry has reading data. otherwise reading is surface form converted to katakana- See Also:
- Constant Field Values
-
HAS_PRONUNCIATION
public static final int HAS_PRONUNCIATION
flag that the entry has pronunciation data. otherwise pronunciation is the reading- See Also:
- Constant Field Values
-
-
Constructor Detail
-
BinaryDictionary
protected BinaryDictionary() throws java.io.IOException- Throws:
java.io.IOException
-
BinaryDictionary
protected BinaryDictionary(BinaryDictionary.ResourceScheme resourceScheme, java.lang.String resourcePath) throws java.io.IOException
- Parameters:
resourceScheme- - scheme for loading resources (FILE or CLASSPATH).resourcePath- - where to load resources (dictionaries) from. If null, with CLASSPATH scheme only, use this class's name as the path.- Throws:
java.io.IOException
-
-
Method Detail
-
getResource
protected final java.io.InputStream getResource(java.lang.String suffix) throws java.io.IOException- Throws:
java.io.IOException
-
getResource
public static final java.io.InputStream getResource(BinaryDictionary.ResourceScheme scheme, java.lang.String path) throws java.io.IOException
- Throws:
java.io.IOException
-
getClassResource
public static final java.io.InputStream getClassResource(java.lang.Class<?> clazz, java.lang.String suffix) throws java.io.IOException- Throws:
java.io.IOException
-
getClassResource
private static java.io.InputStream getClassResource(java.lang.String path) throws java.io.IOException- Throws:
java.io.IOException
-
lookupWordIds
public void lookupWordIds(int sourceId, IntsRef ref)
-
getLeftId
public int getLeftId(int wordId)
Description copied from interface:DictionaryGet left id of specified word- Specified by:
getLeftIdin interfaceDictionary- Returns:
- left id
-
getRightId
public int getRightId(int wordId)
Description copied from interface:DictionaryGet right id of specified word- Specified by:
getRightIdin interfaceDictionary- Returns:
- right id
-
getWordCost
public int getWordCost(int wordId)
Description copied from interface:DictionaryGet word cost of specified word- Specified by:
getWordCostin interfaceDictionary- Returns:
- word's cost
-
getBaseForm
public java.lang.String getBaseForm(int wordId, char[] surfaceForm, int off, int len)Description copied from interface:DictionaryGet base form of word- Specified by:
getBaseFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Base form (only different for inflected words, otherwise null)
-
getReading
public java.lang.String getReading(int wordId, char[] surface, int off, int len)Description copied from interface:DictionaryGet reading of tokens- Specified by:
getReadingin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Reading of the token
-
getPartOfSpeech
public java.lang.String getPartOfSpeech(int wordId)
Description copied from interface:DictionaryGet Part-Of-Speech of tokens- Specified by:
getPartOfSpeechin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Part-Of-Speech of the token
-
getPronunciation
public java.lang.String getPronunciation(int wordId, char[] surface, int off, int len)Description copied from interface:DictionaryGet pronunciation of tokens- Specified by:
getPronunciationin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- Pronunciation of the token
-
getInflectionType
public java.lang.String getInflectionType(int wordId)
Description copied from interface:DictionaryGet inflection type of tokens- Specified by:
getInflectionTypein interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection type, or null
-
getInflectionForm
public java.lang.String getInflectionForm(int wordId)
Description copied from interface:DictionaryGet inflection form of tokens- Specified by:
getInflectionFormin interfaceDictionary- Parameters:
wordId- word ID of token- Returns:
- inflection form, or null
-
baseFormOffset
private static int baseFormOffset(int wordId)
-
readingOffset
private int readingOffset(int wordId)
-
pronunciationOffset
private int pronunciationOffset(int wordId)
-
hasBaseFormData
private boolean hasBaseFormData(int wordId)
-
hasReadingData
private boolean hasReadingData(int wordId)
-
hasPronunciationData
private boolean hasPronunciationData(int wordId)
-
readString
private java.lang.String readString(int offset, int length, boolean kana)
-
-