Package org.apache.lucene.util
Class BytesRefHash
- java.lang.Object
-
- org.apache.lucene.util.BytesRefHash
-
- All Implemented Interfaces:
Accountable
public final class BytesRefHash extends java.lang.Object implements Accountable
BytesRefHashis a special purpose hash-map like data-structure optimized forBytesRefinstances. BytesRefHash maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated insideBytesRefHashand is guaranteed to be increased for each addedBytesRef.Note: The maximum capacity
BytesRefinstance passed toadd(BytesRef)must not be longer thanByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classBytesRefHash.BytesStartArrayManages allocation of the per-term addresses.static classBytesRefHash.DirectBytesStartArrayA simpleBytesRefHash.BytesStartArraythat tracks memory allocation using a privateCounterinstance.static classBytesRefHash.MaxBytesLengthExceededException
-
Field Summary
Fields Modifier and Type Field Description private static longBASE_RAM_BYTES(package private) int[]bytesStartprivate BytesRefHash.BytesStartArraybytesStartArrayprivate CounterbytesUsedprivate intcountstatic intDEFAULT_CAPACITYprivate inthashHalfSizeprivate inthashMaskprivate inthashSizeprivate int[]idsprivate intlastCount(package private) ByteBlockPoolpoolprivate BytesRefscratch1-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description BytesRefHash()BytesRefHash(ByteBlockPool pool)Creates a newBytesRefHashBytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)Creates a newBytesRefHash
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description intadd(BytesRef bytes)Adds a newBytesRefintaddByPoolOffset(int offset)Adds a "arbitrary" int offset instead of a BytesRef term.intbyteStart(int bytesID)Returns the bytesStart offset into the internally usedByteBlockPoolfor the given bytesIDvoidclear()voidclear(boolean resetPool)voidclose()Closes the BytesRefHash and releases all internally used memoryint[]compact()Returns the ids array in arbitrary order.private intdoHash(byte[] bytes, int offset, int length)private booleanequals(int id, BytesRef b)intfind(BytesRef bytes)Returns the id of the givenBytesRef.private intfindHash(BytesRef bytes)BytesRefget(int bytesID, BytesRef ref)Populates and returns aBytesRefwith the bytes for the given bytesID.longramBytesUsed()Return the memory usage of this object in bytes.private voidrehash(int newSize, boolean hashOnData)Called when hash is too small (> 50%occupied) or too large (< 20%occupied).voidreinit()reinitializes theBytesRefHashafter a previousclear()call.private booleanshrink(int targetSize)intsize()Returns the number ofBytesRefvalues in thisBytesRefHash.int[]sort()Returns the values array sorted by the referenced byte values.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources
-
-
-
-
Field Detail
-
BASE_RAM_BYTES
private static final long BASE_RAM_BYTES
-
DEFAULT_CAPACITY
public static final int DEFAULT_CAPACITY
- See Also:
- Constant Field Values
-
pool
final ByteBlockPool pool
-
bytesStart
int[] bytesStart
-
scratch1
private final BytesRef scratch1
-
hashSize
private int hashSize
-
hashHalfSize
private int hashHalfSize
-
hashMask
private int hashMask
-
count
private int count
-
lastCount
private int lastCount
-
ids
private int[] ids
-
bytesStartArray
private final BytesRefHash.BytesStartArray bytesStartArray
-
bytesUsed
private Counter bytesUsed
-
-
Constructor Detail
-
BytesRefHash
public BytesRefHash()
-
BytesRefHash
public BytesRefHash(ByteBlockPool pool)
Creates a newBytesRefHash
-
BytesRefHash
public BytesRefHash(ByteBlockPool pool, int capacity, BytesRefHash.BytesStartArray bytesStartArray)
Creates a newBytesRefHash
-
-
Method Detail
-
size
public int size()
Returns the number ofBytesRefvalues in thisBytesRefHash.- Returns:
- the number of
BytesRefvalues in thisBytesRefHash.
-
get
public BytesRef get(int bytesID, BytesRef ref)
Populates and returns aBytesRefwith the bytes for the given bytesID.Note: the given bytesID must be a positive integer less than the current size (
size())- Parameters:
bytesID- the idref- theBytesRefto populate- Returns:
- the given BytesRef instance populated with the bytes for the given bytesID
-
compact
public int[] compact()
Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit ofsize()- 1Note: This is a destructive operation.
clear()must be called in order to reuse thisBytesRefHashinstance.
-
sort
public int[] sort()
Returns the values array sorted by the referenced byte values.Note: This is a destructive operation.
clear()must be called in order to reuse thisBytesRefHashinstance.
-
equals
private boolean equals(int id, BytesRef b)
-
shrink
private boolean shrink(int targetSize)
-
clear
public void clear(boolean resetPool)
-
clear
public void clear()
-
close
public void close()
Closes the BytesRefHash and releases all internally used memory
-
add
public int add(BytesRef bytes)
Adds a newBytesRef- Parameters:
bytes- the bytes to hash- Returns:
- the id the given bytes are hashed if there was no mapping for the
given bytes, otherwise
(-(id)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before. - Throws:
BytesRefHash.MaxBytesLengthExceededException- if the given bytes are> 2 +ByteBlockPool.BYTE_BLOCK_SIZE
-
find
public int find(BytesRef bytes)
Returns the id of the givenBytesRef.- Parameters:
bytes- the bytes to look for- Returns:
- the id of the given bytes, or
-1if there is no mapping for the given bytes.
-
findHash
private int findHash(BytesRef bytes)
-
addByPoolOffset
public int addByPoolOffset(int offset)
Adds a "arbitrary" int offset instead of a BytesRef term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the byte[] term directly and instead reference the byte[] term already stored by the postings BytesRefHash. See add(int textStart) in TermsHashPerField.
-
rehash
private void rehash(int newSize, boolean hashOnData)Called when hash is too small (> 50%occupied) or too large (< 20%occupied).
-
doHash
private int doHash(byte[] bytes, int offset, int length)
-
reinit
public void reinit()
reinitializes theBytesRefHashafter a previousclear()call. Ifclear()has not been called previously this method has no effect.
-
byteStart
public int byteStart(int bytesID)
Returns the bytesStart offset into the internally usedByteBlockPoolfor the given bytesID- Parameters:
bytesID- the id to look up- Returns:
- the bytesStart offset into the internally used
ByteBlockPoolfor the given id
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:AccountableReturn the memory usage of this object in bytes. Negative values are illegal.- Specified by:
ramBytesUsedin interfaceAccountable
-
-