Class DatasetSplitter
- java.lang.Object
-
- org.apache.lucene.classification.utils.DatasetSplitter
-
public class DatasetSplitter extends java.lang.ObjectUtility class for creating training / test / cross validation indexes from the original index.
-
-
Field Summary
Fields Modifier and Type Field Description private doublecrossValidationRatioprivate doubletestRatio
-
Constructor Summary
Constructors Constructor Description DatasetSplitter(double testRatio, double crossValidationRatio)Create aDatasetSplitterby giving test and cross validation IDXs sizes
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description private DocumentcreateNewDoc(IndexReader originalIndex, FieldType ft, ScoreDoc scoreDoc, java.lang.String[] fieldNames)voidsplit(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, java.lang.String classFieldName, java.lang.String... fieldNames)Split a given index into 3 indexes for training, test and cross validation tasks respectively
-
-
-
Constructor Detail
-
DatasetSplitter
public DatasetSplitter(double testRatio, double crossValidationRatio)Create aDatasetSplitterby giving test and cross validation IDXs sizes- Parameters:
testRatio- the ratio of the original index to be used for the test IDX as adoublebetween 0.0 and 1.0crossValidationRatio- the ratio of the original index to be used for the c.v. IDX as adoublebetween 0.0 and 1.0
-
-
Method Detail
-
split
public void split(IndexReader originalIndex, Directory trainingIndex, Directory testIndex, Directory crossValidationIndex, Analyzer analyzer, boolean termVectors, java.lang.String classFieldName, java.lang.String... fieldNames) throws java.io.IOException
Split a given index into 3 indexes for training, test and cross validation tasks respectively- Parameters:
originalIndex- anLeafReaderon the source indextrainingIndex- aDirectoryused to write the training indextestIndex- aDirectoryused to write the test indexcrossValidationIndex- aDirectoryused to write the cross validation indexanalyzer-Analyzerused to create the new docstermVectors-trueif term vectors should be keptclassFieldName- name of the field used as the label for classification; this must be indexed with sorted doc valuesfieldNames- names of fields that need to be put in the new indexes ornullif all should be used- Throws:
java.io.IOException- if any writing operation fails on any of the indexes
-
createNewDoc
private Document createNewDoc(IndexReader originalIndex, FieldType ft, ScoreDoc scoreDoc, java.lang.String[] fieldNames) throws java.io.IOException
- Throws:
java.io.IOException
-
-