Class CompressingTermVectorsWriter
- java.lang.Object
-
- org.apache.lucene.codecs.TermVectorsWriter
-
- org.apache.lucene.codecs.compressing.CompressingTermVectorsWriter
-
- All Implemented Interfaces:
java.io.Closeable,java.lang.AutoCloseable,Accountable
public final class CompressingTermVectorsWriter extends TermVectorsWriter
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private classCompressingTermVectorsWriter.DocDataa pending docprivate classCompressingTermVectorsWriter.FieldDataa pending field
-
Field Summary
Fields Modifier and Type Field Description (package private) static booleanBULK_MERGE_ENABLED(package private) static java.lang.StringBULK_MERGE_ENABLED_SYSPROPprivate intchunkSizeprivate CompressionModecompressionModeprivate Compressorcompressorprivate CompressingTermVectorsWriter.DocDatacurDocprivate CompressingTermVectorsWriter.FieldDatacurField(package private) static intFLAGS_BITSprivate FieldsIndexWriterindexWriterprivate BytesReflastTermprivate int[]lengthsBuf(package private) static intMAX_DOCUMENTS_PER_CHUNK(package private) static intMETA_VERSION_STARTprivate IndexOutputmetaStreamprivate longnumDirtyChunksprivate longnumDirtyDocsprivate intnumDocs(package private) static intOFFSETS(package private) static intPACKED_BLOCK_SIZEprivate GrowableByteArrayDataOutputpayloadBytesprivate int[]payloadLengthsBuf(package private) static intPAYLOADSprivate java.util.Deque<CompressingTermVectorsWriter.DocData>pendingDocs(package private) static intPOSITIONSprivate int[]positionsBufprivate java.lang.Stringsegmentprivate int[]startOffsetsBufprivate GrowableByteArrayDataOutputtermSuffixes(package private) static java.lang.StringVECTORS_EXTENSION(package private) static java.lang.StringVECTORS_INDEX_CODEC_NAME(package private) static java.lang.StringVECTORS_INDEX_EXTENSION(package private) static java.lang.StringVECTORS_META_EXTENSIONprivate IndexOutputvectorsStream(package private) static intVERSION_CURRENT(package private) static intVERSION_METAVersion where all metadata were moved to the meta file.(package private) static intVERSION_OFFHEAP_INDEX(package private) static intVERSION_STARTprivate BlockPackedWriterwriter-
Fields inherited from interface org.apache.lucene.util.Accountable
NULL_ACCOUNTABLE
-
-
Constructor Summary
Constructors Constructor Description CompressingTermVectorsWriter(Directory directory, SegmentInfo si, java.lang.String segmentSuffix, IOContext context, java.lang.String formatName, CompressionMode compressionMode, int chunkSize, int blockShift)Sole constructor.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private CompressingTermVectorsWriter.DocDataaddDocData(int numVectorFields)voidaddPosition(int position, int startOffset, int endOffset, BytesRef payload)Adds a term position and offsetsvoidaddProx(int numProx, DataInput positions, DataInput offsets)Called by IndexWriter when writing new segments.voidclose()voidfinish(FieldInfos fis, int numDocs)Called beforeTermVectorsWriter.close(), passing in the number of documents that were written.voidfinishDocument()Called after a doc and all its fields have been added.voidfinishField()Called after a field and all its terms have been added.private voidflush()private int[]flushFieldNums()Returns a sorted array containing unique field numbersprivate voidflushFields(int totalFields, int[] fieldNums)private voidflushFlags(int totalFields, int[] fieldNums)private intflushNumFields(int chunkDocs)private voidflushNumTerms(int totalFields)private voidflushOffsets(int[] fieldNums)private voidflushPayloadLengths()private voidflushPositions()private voidflushTermFreqs()private voidflushTermLengths()java.util.Collection<Accountable>getChildResources()Returns nested resources of this class.intmerge(MergeState mergeState)Merges in the term vectors from the readers inmergeState.longramBytesUsed()Return the memory usage of this object in bytes.voidstartDocument(int numVectorFields)Called before writing the term vectors of the document.voidstartField(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads)Called before writing the terms of the field.voidstartTerm(BytesRef term, int freq)Adds a term and its term frequencyfreq.(package private) booleantooDirty(CompressingTermVectorsReader candidate)Returns true if we should recompress this reader, even though we could bulk merge compressed dataprivate booleantriggerFlush()-
Methods inherited from class org.apache.lucene.codecs.TermVectorsWriter
addAllDocVectors, finishTerm
-
-
-
-
Field Detail
-
MAX_DOCUMENTS_PER_CHUNK
static final int MAX_DOCUMENTS_PER_CHUNK
- See Also:
- Constant Field Values
-
VECTORS_EXTENSION
static final java.lang.String VECTORS_EXTENSION
- See Also:
- Constant Field Values
-
VECTORS_INDEX_EXTENSION
static final java.lang.String VECTORS_INDEX_EXTENSION
- See Also:
- Constant Field Values
-
VECTORS_META_EXTENSION
static final java.lang.String VECTORS_META_EXTENSION
- See Also:
- Constant Field Values
-
VECTORS_INDEX_CODEC_NAME
static final java.lang.String VECTORS_INDEX_CODEC_NAME
- See Also:
- Constant Field Values
-
VERSION_START
static final int VERSION_START
- See Also:
- Constant Field Values
-
VERSION_OFFHEAP_INDEX
static final int VERSION_OFFHEAP_INDEX
- See Also:
- Constant Field Values
-
VERSION_META
static final int VERSION_META
Version where all metadata were moved to the meta file.- See Also:
- Constant Field Values
-
VERSION_CURRENT
static final int VERSION_CURRENT
- See Also:
- Constant Field Values
-
META_VERSION_START
static final int META_VERSION_START
- See Also:
- Constant Field Values
-
PACKED_BLOCK_SIZE
static final int PACKED_BLOCK_SIZE
- See Also:
- Constant Field Values
-
POSITIONS
static final int POSITIONS
- See Also:
- Constant Field Values
-
OFFSETS
static final int OFFSETS
- See Also:
- Constant Field Values
-
PAYLOADS
static final int PAYLOADS
- See Also:
- Constant Field Values
-
FLAGS_BITS
static final int FLAGS_BITS
-
segment
private final java.lang.String segment
-
indexWriter
private FieldsIndexWriter indexWriter
-
metaStream
private IndexOutput metaStream
-
vectorsStream
private IndexOutput vectorsStream
-
compressionMode
private final CompressionMode compressionMode
-
compressor
private final Compressor compressor
-
chunkSize
private final int chunkSize
-
numDirtyChunks
private long numDirtyChunks
-
numDirtyDocs
private long numDirtyDocs
-
numDocs
private int numDocs
-
pendingDocs
private final java.util.Deque<CompressingTermVectorsWriter.DocData> pendingDocs
-
curDoc
private CompressingTermVectorsWriter.DocData curDoc
-
curField
private CompressingTermVectorsWriter.FieldData curField
-
lastTerm
private final BytesRef lastTerm
-
positionsBuf
private int[] positionsBuf
-
startOffsetsBuf
private int[] startOffsetsBuf
-
lengthsBuf
private int[] lengthsBuf
-
payloadLengthsBuf
private int[] payloadLengthsBuf
-
termSuffixes
private final GrowableByteArrayDataOutput termSuffixes
-
payloadBytes
private final GrowableByteArrayDataOutput payloadBytes
-
writer
private final BlockPackedWriter writer
-
BULK_MERGE_ENABLED_SYSPROP
static final java.lang.String BULK_MERGE_ENABLED_SYSPROP
-
BULK_MERGE_ENABLED
static final boolean BULK_MERGE_ENABLED
-
-
Constructor Detail
-
CompressingTermVectorsWriter
CompressingTermVectorsWriter(Directory directory, SegmentInfo si, java.lang.String segmentSuffix, IOContext context, java.lang.String formatName, CompressionMode compressionMode, int chunkSize, int blockShift) throws java.io.IOException
Sole constructor.- Throws:
java.io.IOException
-
-
Method Detail
-
addDocData
private CompressingTermVectorsWriter.DocData addDocData(int numVectorFields)
-
close
public void close() throws java.io.IOException- Specified by:
closein interfacejava.lang.AutoCloseable- Specified by:
closein interfacejava.io.Closeable- Specified by:
closein classTermVectorsWriter- Throws:
java.io.IOException
-
startDocument
public void startDocument(int numVectorFields) throws java.io.IOExceptionDescription copied from class:TermVectorsWriterCalled before writing the term vectors of the document.TermVectorsWriter.startField(FieldInfo, int, boolean, boolean, boolean)will be callednumVectorFieldstimes. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this casenumVectorFieldswill be zero.- Specified by:
startDocumentin classTermVectorsWriter- Throws:
java.io.IOException
-
finishDocument
public void finishDocument() throws java.io.IOExceptionDescription copied from class:TermVectorsWriterCalled after a doc and all its fields have been added.- Overrides:
finishDocumentin classTermVectorsWriter- Throws:
java.io.IOException
-
startField
public void startField(FieldInfo info, int numTerms, boolean positions, boolean offsets, boolean payloads) throws java.io.IOException
Description copied from class:TermVectorsWriterCalled before writing the terms of the field.TermVectorsWriter.startTerm(BytesRef, int)will be callednumTermstimes.- Specified by:
startFieldin classTermVectorsWriter- Throws:
java.io.IOException
-
finishField
public void finishField() throws java.io.IOExceptionDescription copied from class:TermVectorsWriterCalled after a field and all its terms have been added.- Overrides:
finishFieldin classTermVectorsWriter- Throws:
java.io.IOException
-
startTerm
public void startTerm(BytesRef term, int freq) throws java.io.IOException
Description copied from class:TermVectorsWriterAdds a term and its term frequencyfreq. If this field has positions and/or offsets enabled, thenTermVectorsWriter.addPosition(int, int, int, BytesRef)will be calledfreqtimes respectively.- Specified by:
startTermin classTermVectorsWriter- Throws:
java.io.IOException
-
addPosition
public void addPosition(int position, int startOffset, int endOffset, BytesRef payload) throws java.io.IOExceptionDescription copied from class:TermVectorsWriterAdds a term position and offsets- Specified by:
addPositionin classTermVectorsWriter- Throws:
java.io.IOException
-
triggerFlush
private boolean triggerFlush()
-
flush
private void flush() throws java.io.IOException- Throws:
java.io.IOException
-
flushNumFields
private int flushNumFields(int chunkDocs) throws java.io.IOException- Throws:
java.io.IOException
-
flushFieldNums
private int[] flushFieldNums() throws java.io.IOExceptionReturns a sorted array containing unique field numbers- Throws:
java.io.IOException
-
flushFields
private void flushFields(int totalFields, int[] fieldNums) throws java.io.IOException- Throws:
java.io.IOException
-
flushFlags
private void flushFlags(int totalFields, int[] fieldNums) throws java.io.IOException- Throws:
java.io.IOException
-
flushNumTerms
private void flushNumTerms(int totalFields) throws java.io.IOException- Throws:
java.io.IOException
-
flushTermLengths
private void flushTermLengths() throws java.io.IOException- Throws:
java.io.IOException
-
flushTermFreqs
private void flushTermFreqs() throws java.io.IOException- Throws:
java.io.IOException
-
flushPositions
private void flushPositions() throws java.io.IOException- Throws:
java.io.IOException
-
flushOffsets
private void flushOffsets(int[] fieldNums) throws java.io.IOException- Throws:
java.io.IOException
-
flushPayloadLengths
private void flushPayloadLengths() throws java.io.IOException- Throws:
java.io.IOException
-
finish
public void finish(FieldInfos fis, int numDocs) throws java.io.IOException
Description copied from class:TermVectorsWriterCalled beforeTermVectorsWriter.close(), passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls toTermVectorsWriter.startDocument(int), but a Codec should check that this is the case to detect the JRE bug described in LUCENE-1282.- Specified by:
finishin classTermVectorsWriter- Throws:
java.io.IOException
-
addProx
public void addProx(int numProx, DataInput positions, DataInput offsets) throws java.io.IOExceptionDescription copied from class:TermVectorsWriterCalled by IndexWriter when writing new segments.This is an expert API that allows the codec to consume positions and offsets directly from the indexer.
The default implementation calls
TermVectorsWriter.addPosition(int, int, int, BytesRef), but subclasses can override this if they want to efficiently write all the positions, then all the offsets, for example.NOTE: This API is extremely expert and subject to change or removal!!!
- Overrides:
addProxin classTermVectorsWriter- Throws:
java.io.IOException
-
merge
public int merge(MergeState mergeState) throws java.io.IOException
Description copied from class:TermVectorsWriterMerges in the term vectors from the readers inmergeState. The default implementation skips over deleted documents, and usesTermVectorsWriter.startDocument(int),TermVectorsWriter.startField(FieldInfo, int, boolean, boolean, boolean),TermVectorsWriter.startTerm(BytesRef, int),TermVectorsWriter.addPosition(int, int, int, BytesRef), andTermVectorsWriter.finish(FieldInfos, int), returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).- Overrides:
mergein classTermVectorsWriter- Throws:
java.io.IOException
-
tooDirty
boolean tooDirty(CompressingTermVectorsReader candidate)
Returns true if we should recompress this reader, even though we could bulk merge compressed dataThe last chunk written for a segment is typically incomplete, so without recompressing, in some worst-case situations (e.g. frequent reopen with tiny flushes), over time the compression ratio can degrade. This is a safety switch.
-
ramBytesUsed
public long ramBytesUsed()
Description copied from interface:AccountableReturn the memory usage of this object in bytes. Negative values are illegal.
-
getChildResources
public java.util.Collection<Accountable> getChildResources()
Description copied from interface:AccountableReturns nested resources of this class. The result should be a point-in-time snapshot (to avoid race conditions).- See Also:
Accountables
-
-