- java.lang.Object
-
- org.jcodings.Encoding
-
- org.jcodings.MultiByteEncoding
-
- org.jcodings.unicode.UnicodeEncoding
-
- org.jcodings.specific.UTF8Encoding
-
- All Implemented Interfaces:
Cloneable
public final class UTF8Encoding extends UnicodeEncoding
-
-
Field Summary
Fields Modifier and Type Field Description static UTF8EncodingINSTANCE
-
Constructor Summary
Constructors Modifier Constructor Description protectedUTF8Encoding()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected voidasciiApplyAllCaseFold(int flag, ApplyAllCaseFoldFunction fun, Object arg)protected CaseFoldCodeItem[]asciiCaseFoldCodesByString(int flag, byte[] bytes, int p, int end)protected intasciiMbcCaseFold(int flag, byte[] bytes, IntHolder pp, int end, byte[] lower)intcodeToMbc(int code, byte[] bytes, int p)Extracts code point into it's multibyte representationintcodeToMbcLength(int code)Returns character length given a code point Oniguruma equivalent:code_to_mbclenint[]ctypeCodeRange(int ctype, IntHolder sbOut)utf8_get_ctype_code_rangeStringgetCharsetName()The name of the equivalent Java Charset for this encoding.protected booleanisCodeCTypeInternal(int code, int ctype)ONIGENC_IS_XXXXXX_CODE_CTYPEbooleanisNewLine(byte[] bytes, int p, int end)onigenc_is_mbc_newline_0x0a / used also by multibyte encodingsbooleanisReverseMatchAllowed(byte[] bytes, int p, int end)onigenc_always_true_is_allowed_reverse_matchintleftAdjustCharHead(byte[] bytes, int p, int s, int end)utf8_left_adjust_char_headintlength(byte[] bytes, int p, int end)Returns character length given stream, character position and stream end returns1for singlebyte encodings or performs sanity validations for multibyte ones and returns the character length, missing characters in the stream otherwiseintmbcCaseFold(int flag, byte[] bytes, IntHolder pp, int end, byte[] fold)onigenc_ascii_mbc_case_foldintmbcToCode(byte[] bytes, int p, int end)Returns code point for a character Oniguruma equivalent:mbc_to_code-
Methods inherited from class org.jcodings.unicode.UnicodeEncoding
applyAllCaseFold, caseFoldCodesByString, caseMap, ctypeCodeRange, isCodeCType, propertyNameToCType
-
Methods inherited from class org.jcodings.MultiByteEncoding
length, lengthForTwoUptoFour, mb2CodeToMbc, mb2CodeToMbcLength, mb2IsCodeCType, mb4CodeToMbc, mb4CodeToMbcLength, mb4IsCodeCType, mbnMbcCaseFold, mbnMbcToCode, missing, missing, safeLengthForUptoFour, safeLengthForUptoThree, safeLengthForUptoTwo, strCodeAt, strLength
-
Methods inherited from class org.jcodings.Encoding
asciiToLower, asciiToUpper, digitVal, equals, getCharset, getIndex, getName, hashCode, isAlnum, isAlpha, isAscii, isAscii, isAsciiCompatible, isBlank, isCntrl, isDigit, isDummy, isFixedWidth, isGraph, isLower, isMbcAscii, isMbcCrnl, isMbcHead, isMbcWord, isNewLine, isPrint, isPunct, isSbWord, isSingleByte, isSpace, isUnicode, isUpper, isUTF8, isWord, isWordGraphPrint, isXDigit, load, maxLength, maxLengthDistance, mbcodeStartPosition, minLength, odigitVal, prevCharHead, rightAdjustCharHead, rightAdjustCharHeadWithPrev, setDummy, setName, setName, step, stepBack, strByteLengthNull, strLengthNull, strNCmp, toLowerCaseTable, toString, xdigitVal
-
-
-
-
Field Detail
-
INSTANCE
public static final UTF8Encoding INSTANCE
-
-
Method Detail
-
length
public int length(byte[] bytes, int p, int end)Description copied from class:EncodingReturns character length given stream, character position and stream end returns1for singlebyte encodings or performs sanity validations for multibyte ones and returns the character length, missing characters in the stream otherwise
-
getCharsetName
public String getCharsetName()
Description copied from class:EncodingThe name of the equivalent Java Charset for this encoding. Defaults to the name of the encoding. Subclasses can override this to provide a different name.- Overrides:
getCharsetNamein classUnicodeEncoding- Returns:
- the name of the equivalent Java Charset for this encoding
-
isNewLine
public boolean isNewLine(byte[] bytes, int p, int end)onigenc_is_mbc_newline_0x0a / used also by multibyte encodings
-
codeToMbcLength
public int codeToMbcLength(int code)
Description copied from class:EncodingReturns character length given a code point Oniguruma equivalent:code_to_mbclen- Specified by:
codeToMbcLengthin classEncoding
-
mbcToCode
public int mbcToCode(byte[] bytes, int p, int end)Description copied from class:EncodingReturns code point for a character Oniguruma equivalent:mbc_to_code
-
codeToMbc
public int codeToMbc(int code, byte[] bytes, int p)Description copied from class:EncodingExtracts code point into it's multibyte representation
-
mbcCaseFold
public int mbcCaseFold(int flag, byte[] bytes, IntHolder pp, int end, byte[] fold)onigenc_ascii_mbc_case_fold- Overrides:
mbcCaseFoldin classUnicodeEncoding- Parameters:
flag- case fold flagpp- anIntHolderthat points at character headfold- a buffer where to extract case folded character Oniguruma equivalent:mbc_case_fold
-
ctypeCodeRange
public int[] ctypeCodeRange(int ctype, IntHolder sbOut)utf8_get_ctype_code_range- Specified by:
ctypeCodeRangein classEncoding
-
leftAdjustCharHead
public int leftAdjustCharHead(byte[] bytes, int p, int s, int end)utf8_left_adjust_char_head- Specified by:
leftAdjustCharHeadin classEncoding- Parameters:
bytes- byte streamp- positions- stopend- end
-
isReverseMatchAllowed
public boolean isReverseMatchAllowed(byte[] bytes, int p, int end)onigenc_always_true_is_allowed_reverse_match- Specified by:
isReverseMatchAllowedin classEncoding
-
isCodeCTypeInternal
protected final boolean isCodeCTypeInternal(int code, int ctype)ONIGENC_IS_XXXXXX_CODE_CTYPE
-
asciiMbcCaseFold
protected final int asciiMbcCaseFold(int flag, byte[] bytes, IntHolder pp, int end, byte[] lower)
-
asciiApplyAllCaseFold
protected final void asciiApplyAllCaseFold(int flag, ApplyAllCaseFoldFunction fun, Object arg)
-
asciiCaseFoldCodesByString
protected final CaseFoldCodeItem[] asciiCaseFoldCodesByString(int flag, byte[] bytes, int p, int end)
-
-