Package org.apache.lucene.analysis.ckb
Class SoraniNormalizer
- java.lang.Object
-
- org.apache.lucene.analysis.ckb.SoraniNormalizer
-
public class SoraniNormalizer extends java.lang.ObjectNormalizes the Unicode representation of Sorani text.Normalization consists of:
- Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
- Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
- Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
- Alternate (joining) form of 'h' (06BE) is converted to 0647
- Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
- Harakat, tatweel, and formatting characters such as directional controls are removed.
-
-
Field Summary
Fields Modifier and Type Field Description (package private) static charAE(package private) static charDAMMA(package private) static charDAMMATAN(package private) static charDOTLESS_YEH(package private) static charFARSI_YEH(package private) static charFATHA(package private) static charFATHATAN(package private) static charHEH(package private) static charHEH_DOACHASHMEE(package private) static charKAF(package private) static charKASRA(package private) static charKASRATAN(package private) static charKEHEH(package private) static charREH(package private) static charRREH(package private) static charRREH_ABOVE(package private) static charSHADDA(package private) static charSUKUN(package private) static charTATWEEL(package private) static charTEH_MARBUTA(package private) static charYEH(package private) static charZWNJ
-
Constructor Summary
Constructors Constructor Description SoraniNormalizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intnormalize(char[] s, int len)Normalize an input buffer of Sorani text
-
-
-
Field Detail
-
YEH
static final char YEH
- See Also:
- Constant Field Values
-
DOTLESS_YEH
static final char DOTLESS_YEH
- See Also:
- Constant Field Values
-
FARSI_YEH
static final char FARSI_YEH
- See Also:
- Constant Field Values
-
KAF
static final char KAF
- See Also:
- Constant Field Values
-
KEHEH
static final char KEHEH
- See Also:
- Constant Field Values
-
HEH
static final char HEH
- See Also:
- Constant Field Values
-
AE
static final char AE
- See Also:
- Constant Field Values
-
ZWNJ
static final char ZWNJ
- See Also:
- Constant Field Values
-
HEH_DOACHASHMEE
static final char HEH_DOACHASHMEE
- See Also:
- Constant Field Values
-
TEH_MARBUTA
static final char TEH_MARBUTA
- See Also:
- Constant Field Values
-
REH
static final char REH
- See Also:
- Constant Field Values
-
RREH
static final char RREH
- See Also:
- Constant Field Values
-
RREH_ABOVE
static final char RREH_ABOVE
- See Also:
- Constant Field Values
-
TATWEEL
static final char TATWEEL
- See Also:
- Constant Field Values
-
FATHATAN
static final char FATHATAN
- See Also:
- Constant Field Values
-
DAMMATAN
static final char DAMMATAN
- See Also:
- Constant Field Values
-
KASRATAN
static final char KASRATAN
- See Also:
- Constant Field Values
-
FATHA
static final char FATHA
- See Also:
- Constant Field Values
-
DAMMA
static final char DAMMA
- See Also:
- Constant Field Values
-
KASRA
static final char KASRA
- See Also:
- Constant Field Values
-
SHADDA
static final char SHADDA
- See Also:
- Constant Field Values
-
SUKUN
static final char SUKUN
- See Also:
- Constant Field Values
-
-