JavaML
Class DatasetDouble

java.lang.Object
  extended by JavaML.DatasetDouble
All Implemented Interfaces:
IDataset, IOnlineDataset

public class DatasetDouble
extends java.lang.Object
implements IDataset

An implementation of IDataset using doubles


Nested Class Summary
 
Nested classes/interfaces inherited from interface JavaML.IOnlineDataset
IOnlineDataset.DataType, IOnlineDataset.FeatureType
 
Constructor Summary
DatasetDouble(double[][] newFeatureMatrix, int[] newTargetVector)
          Constructor from data
DatasetDouble(double[][] newFeatureMatrix, int[] newTargetVector, double[] newDataWeights, int newTotalWeight)
          Constructor from weighted data
DatasetDouble(java.lang.String fileToLoad, boolean classIsFirst, boolean weightingIsFirst)
          Constructor from file
 
Method Summary
 IDataset getDataFeatureSubset(int[] featuresRequired)
          Returns a new IDataset which contains a subset of the features of the original
 double[][] getDataMatrixDouble()
           
 IDataset getDataSampleSubset(int[] samplesRequired)
          Returns a new IDataset which contains a subset of the samples of the original
 IOnlineDataset.DataType getDataType()
           
 double[] getDataWeights()
           
 double[] getNextDataPoint()
           
 int getNextTarget()
           
 int getNumberOfFeatures()
           
 int getNumberOfSamples()
           
 IDataset getSampledDataset(double[] distributionOverFeatures, int sampleSize)
          Returns a new IDataset a sampled version of the original, based upon the supplied distribution over examples
 double[] getSampleDouble(int sampleNumber, boolean hasTarget)
           
 int[] getTargetVector()
           
 int getTotalWeight()
           
 boolean hasData()
           
 void resetDataset()
           
 int returnCurrentDataSampleNumber()
           
 int returnCurrentTargetSampleNumber()
           
 IDataset returnSampledTestingSet(int foldNumber, int sampleSize)
          Returns a new dataset containing fold foldNumber, by sampling from the weight distribution
 IDataset returnSampledTrainingSet(int foldNumber, int sampleSize)
          Returns a new dataset without fold foldNumber, by sampling from the weight distribution
 IDataset returnWeightedTestingSet(int foldNumber)
          Returns a new dataset containing fold foldNumber
 IDataset returnWeightedTrainingSet(int foldNumber)
          Returns a new dataset without fold foldNumber
 void setDataWeights(double[] newWeights)
          Assigns the newWeights to the data weights.
 void setRandomSeed(long newSeed)
           
 void splitIntoFolds(int newNumberOfFolds)
          Prepares the dataset to be split into newNumberOfFolds folds, all previous splits are forgotten MUST BE CALLED before using any other folds function
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DatasetDouble

public DatasetDouble(java.lang.String fileToLoad,
                     boolean classIsFirst,
                     boolean weightingIsFirst)
Constructor from file

Parameters:
fileToLoad - is the path of the csv file to read in
classIsFirst - specifies if the class label is at the start or the end of the file
weightingIsFirst - specifies if the file has example weights (appears before the class label if so)

DatasetDouble

public DatasetDouble(double[][] newFeatureMatrix,
                     int[] newTargetVector)
Constructor from data

Parameters:
newFeatureMatrix - is the new feature matrix, must have the same number of examples as newTargetVector
newTargetVector - is the new list of targets

DatasetDouble

public DatasetDouble(double[][] newFeatureMatrix,
                     int[] newTargetVector,
                     double[] newDataWeights,
                     int newTotalWeight)
Constructor from weighted data

Parameters:
newFeatureMatrix - is the new feature matrix, must have the same number of examples as newTargetVector
newTargetVector - is the new list of targets
newDataWeights - is the new list of weights
newTotalWeight - is the sum of all the weights
Method Detail

getDataMatrixDouble

public double[][] getDataMatrixDouble()
Specified by:
getDataMatrixDouble in interface IDataset

getTargetVector

public int[] getTargetVector()
Specified by:
getTargetVector in interface IDataset

getSampleDouble

public double[] getSampleDouble(int sampleNumber,
                                boolean hasTarget)
Specified by:
getSampleDouble in interface IDataset

getDataWeights

public double[] getDataWeights()
Specified by:
getDataWeights in interface IDataset

setDataWeights

public void setDataWeights(double[] newWeights)
Assigns the newWeights to the data weights. Does not copy the newWeights.

Specified by:
setDataWeights in interface IDataset

getTotalWeight

public int getTotalWeight()
Specified by:
getTotalWeight in interface IDataset

getDataFeatureSubset

public IDataset getDataFeatureSubset(int[] featuresRequired)
Returns a new IDataset which contains a subset of the features of the original

Specified by:
getDataFeatureSubset in interface IDataset
Parameters:
featuresRequired - is a vector of required features, all values must be < numberOfFeatures

getDataSampleSubset

public IDataset getDataSampleSubset(int[] samplesRequired)
Returns a new IDataset which contains a subset of the samples of the original

Specified by:
getDataSampleSubset in interface IDataset
Parameters:
samplesRequired - is a vector of required samples, all values must be < numberOfSamples

getSampledDataset

public IDataset getSampledDataset(double[] distributionOverFeatures,
                                  int sampleSize)
Returns a new IDataset a sampled version of the original, based upon the supplied distribution over examples

Specified by:
getSampledDataset in interface IDataset
Parameters:
distributionOverFeatures - is a distribution vector, with length equal to the numberOfSamples
sampleSize - is the number of datapoints in the new IDataset

getNextDataPoint

public double[] getNextDataPoint()
Specified by:
getNextDataPoint in interface IOnlineDataset

getNextTarget

public int getNextTarget()
Specified by:
getNextTarget in interface IOnlineDataset

returnCurrentDataSampleNumber

public int returnCurrentDataSampleNumber()
Specified by:
returnCurrentDataSampleNumber in interface IOnlineDataset

returnCurrentTargetSampleNumber

public int returnCurrentTargetSampleNumber()
Specified by:
returnCurrentTargetSampleNumber in interface IOnlineDataset

resetDataset

public void resetDataset()
Specified by:
resetDataset in interface IOnlineDataset

splitIntoFolds

public void splitIntoFolds(int newNumberOfFolds)
Prepares the dataset to be split into newNumberOfFolds folds, all previous splits are forgotten MUST BE CALLED before using any other folds function

Specified by:
splitIntoFolds in interface IDataset
Parameters:
newNumberOfFolds - is the number of folds required

returnWeightedTrainingSet

public IDataset returnWeightedTrainingSet(int foldNumber)
Returns a new dataset without fold foldNumber

Specified by:
returnWeightedTrainingSet in interface IDataset
Parameters:
foldNumber - is the current testing fold

returnWeightedTestingSet

public IDataset returnWeightedTestingSet(int foldNumber)
Returns a new dataset containing fold foldNumber

Specified by:
returnWeightedTestingSet in interface IDataset
Parameters:
foldNumber - is the current testing fold

returnSampledTrainingSet

public IDataset returnSampledTrainingSet(int foldNumber,
                                         int sampleSize)
Returns a new dataset without fold foldNumber, by sampling from the weight distribution

Specified by:
returnSampledTrainingSet in interface IDataset
Parameters:
foldNumber - is the current testing fold
sampleSize - is the number of examples in the new dataset

returnSampledTestingSet

public IDataset returnSampledTestingSet(int foldNumber,
                                        int sampleSize)
Returns a new dataset containing fold foldNumber, by sampling from the weight distribution

Specified by:
returnSampledTestingSet in interface IDataset
Parameters:
foldNumber - is the current testing fold
sampleSize - is the number of examples in the new dataset

setRandomSeed

public void setRandomSeed(long newSeed)
Specified by:
setRandomSeed in interface IOnlineDataset

getDataType

public IOnlineDataset.DataType getDataType()
Specified by:
getDataType in interface IOnlineDataset

getNumberOfFeatures

public int getNumberOfFeatures()
Specified by:
getNumberOfFeatures in interface IOnlineDataset

getNumberOfSamples

public int getNumberOfSamples()
Specified by:
getNumberOfSamples in interface IOnlineDataset

hasData

public boolean hasData()
Specified by:
hasData in interface IOnlineDataset