Gaia
gaia2::DataSet Class Reference

This class represents a dataset and all related information. More...

#include <dataset.h>

Inheritance diagram for gaia2::DataSet:
gaia2::PointArray EmptyDataSet

Public Member Functions

const QString & name () const
 Return the name of this dataset.
 
void setName (const QString &name)
 Set the name for this dataset.
 
const Pointpoint (const QString &id) const
 Find a point with a given ID. More...
 
Pointpoint (const QString &id)
 Find a point with a given ID. More...
 
bool contains (const QString &id) const
 Return whether this dataset contains a point with the given ID.
 
QStringList pointNames () const
 Return a list of the names of the points contained in this dataset.
 
const PointLayoutlayout () const
 Return the layout of this dataset.
 
const PointLayoutoriginalLayout () const
 Return the original layout of this dataset.
 
void checkAllPointsShareSameLayout (const QVector< Point * > *points=0) const
 Check that all given points have the same layout object as this dataset. More...
 
const TransfoChainhistory () const
 Return the history of this dataset (the list of all transformations that have been applied).
 
void setHistory (const TransfoChain &history)
 Set a predefined history for this dataset. More...
 
void forgetHistory ()
 Reset the history to an empty one but does not touch any of the points.
 
void simplifyHistory ()
 Simplify the history of the transformations to have it in a "normalized" state, which consist at most of a Remove transformation followed by a FixLength one. More...
 
void setReferenceDataSet (DataSet *dataset=0, bool checkOriginalLayout=true)
 Set the reference dataset, ie: the one used for fetching the values when doing filtered queries. More...
 
const DataSetreferenceDataSet () const
 Return the reference dataset. More...
 
void addPoint (const Point *point)
 Add the specified point to this dataset. More...
 
void addPoints (const QVector< Point * > &points)
 Add the specified points to this dataset. More...
 
void appendDataSet (const DataSet *dataset)
 Append the points from the second dataset to the first one. More...
 
void removePoint (const QString &id)
 Remove a single point from the dataset given its ID. More...
 
void removePoints (const QList< QString > &ids)
 Remove a list of points from the dataset given their IDs. More...
 
void addView (View *view)
 Register a view on this dataset. More...
 
void removeView (View *view)
 Remove a View from the list of registered views for this dataset.
 
DataSetcopy () const
 Performs a deep copy of this dataset (ie: the contained points are copied as well).
 
void load (const QString &filename, int start=0, int end=-1)
 Load a dataset from disk. More...
 
void loadNthPart (const QString &filename, int idx=0, int total=1)
 Load the n-th part of a dataset from disk. More...
 
void save (const QString &filename) const
 Save this dataset to disk.
 
void fromBase64 (const std::string &data)
 Load a DataSet from its base64 representation.
 
void fromBase64 (const QByteArray &data)
 Load a DataSet from its base64 representation.
 
std::string toBase64 () const
 Return a base64 representation for this DataSet.
 
- Public Member Functions inherited from gaia2::PointArray
 PointArray (int n=0, bool ownsMemory=true)
 
 PointArray (bool ownsMemory)
 
void clear ()
 Delete the points this array contains (if it owns them), then resize the array to 0.
 
int totalSegments () const
 Returns the total number of segments in this PointArray (the sum of the number of segments for each point in the array).
 
const PointsamplePoint () const
 Returns any single point from the PointArray.
 

Static Public Member Functions

static DataSetmergeFiles (const QMap< QString, QString > &sigfiles, const QStringList &descsSelect=QStringList()<< "*", const QStringList &descsExclude=QStringList(), int start=0, int end=10000000, PointLayout *reflayout=0)
 Take a map of pairs (pointID, filename) and merges the signature files in a single dataset and returns it.
 

Public Attributes

QReadWriteLock lock
 A lock available for users to take, if the dataset is to be used in a multi-threaded context.
 

Protected Member Functions

int binarySearch (const QString &id, int start, int end) const
 Looks for point with name id, between indices start and end included.
 
int pointIndex (const QString &id) const
 Returns the index of the point with the given name. More...
 
void clear ()
 
int load (QDataStream &in, int start=0, int end=-1, bool readAllPointsFromStream=false)
 Returns the number of points in the dataset (not the number of points loaded).
 
void setLayoutIfEmpty (const Point *point)
 
void invalidateViews ()
 
void modify ()
 
void setHistoryNoCheck (const TransfoChain &history)
 
void addTransformation (const Transformation &transfo)
 
bool consistentLinks () const
 
void forceUnlinkReferringDataSets ()
 
void unifyLayout ()
 Make all the points in this DataSet share the same layout.
 
void checkUniqueIDs ()
 Checks that all point names are unique (ie: there are no duplicates in this dataset). More...
 
void checkUniqueIDsFrom (const QVector< Point * > &v)
 Checks that all point names inside v are unique and also that none of them is already in this dataset. More...
 
void addPoints (const QVector< Point * > &points, bool layoutCheck, bool transformPoint, bool checkUnique=true, bool takeOwnership=false, bool relaySignal=true)
 Adds the given points to his dataset. More...
 
void removePoints (const QList< QString > &ids, bool relaySignal)
 
void sortPoints (int pivotIdx=-1)
 
 DataSet (const DataSet &rhs)
 
DataSetoperator= (const DataSet &rhs)
 
void resize (int n)
 

Protected Attributes

QString _name
 Represents the name of the dataset, which should be a short way to describe its function or where it comes from, its purpose, etc... More...
 
TransfoChain _history
 This represents the history of transformations that have been applied to this dataset, and also contains all the parameters to allow mapping a point from the original dataset space into the space this dataset is in. More...
 
PointLayout _layout
 This represents the common layout of all points contained in this dataset and provides functions for retrieving the physical location of a descriptor given its name, and reciprocally, retrieving the name of a descriptor given its physical location.
 
QList< DataSet * > _linkedDataSets
 This list contains all datasets linked to this one, like when a dataset is referencing another one for the original values used when filtering. More...
 
QList< View * > _linkedViews
 This list contains all Views linked to this DataSet, ie: the Views using points from this dataset to do their queries. More...
 
bool _isDataSorted
 
- Protected Attributes inherited from gaia2::PointArray
bool _ownsMemory
 

Friends

class Applier
 
DataSetmergeDataSets (const DataSet *ds1, const DataSet *ds2)
 Merges two datasets together, provided that their layout don't overlap, and return the resulting dataset.
 
QDataStream & operator<< (QDataStream &out, const DataSet &dataset)
 
QDataStream & operator>> (QDataStream &in, DataSet &dataset)
 

Detailed Description

This class represents a dataset and all related information.

A DataSet is a set of points which all share the same structure. The points are sorted internally to allow fast lookups in O(log(N)).

The structure contains information such as dimension names, a pointer to the original dataset, the history of the applied transformations, and a point layout which maps: names of dimensions <-> indices inside the point data.

The DataSet also provides methods for adding & removing points, merging with another DataSet, and serialization functions.

Member Function Documentation

void DataSet::addPoint ( const Point point)

Add the specified point to this dataset.

The dataset makes a copy of the point for its own use, so no ownership is taken. If you don't want the point anymore after having added it to the dataset, it is your responsibility to free the memory for it.

Exceptions
GaiaExceptionif there was already a point with the same ID in this dataset.
void DataSet::addPoints ( const QVector< Point * > &  points)

Add the specified points to this dataset.

The dataset makes a copy of the points for its own use, so no ownership is taken. This method is equivalent to calling addPoint repeatedly, but is much faster.

Exceptions
GaiaExceptionif adding all points would result in a dataset with duplicates.

Referenced by addPoints(), gaia2::Applier::addPointsNoLayoutCheck(), Cyclops::getPoints(), gaia2::mergeDataSets(), and mergeFiles().

void DataSet::addPoints ( const QVector< Point * > &  points,
bool  layoutCheck,
bool  transformPoint,
bool  checkUnique = true,
bool  takeOwnership = false,
bool  relaySignal = true 
)
protected

Adds the given points to his dataset.

Parameters
layoutCheckwhether to check that the layouts of all points are compatible with that of the dataset. In case a point has a layout which is incompatible, it will throw an exception and no points will have been added.
transformPointwhether to apply the history of transformations to the points we're adding, or to insert them directly as is in the dataset. In the former case, the layouts (if checked) need to be the same as the original layout of the dataset, in the former, they need to be the same as the current layout of the dataset.
checkUniquewhether to check for uniqueness condition in point names. A dataset is only valid if all points inside it have different names, so this makes sure that after adding the given points, we still have a valid dataset. It throws an exception otherwise.
takeOwnershipwhether this method needs to make a copy of the given points or not. WARNING: if takeOwnership = true, there is no guarantee that the pointers in the vector are still valid after this call (ie: not only should you not delete them, but you should also stop using them directly afterwards).
relaySignalwhether this should be applied to all linked datasets or only to this one.

References addPoints(), gaia2::PointLayout::canMorphInto(), gaia2::Point::layout(), gaia2::PointLayout::morphPoint(), gaia2::Point::name(), gaia2::Point::setLayout(), gaia2::Point::switchLayout(), and gaia2::PointLayout::symmetricDifferenceWith().

void DataSet::addView ( View view)

Register a view on this dataset.

Registered Views are notified when the underlying dataset change (ie: points are added, removed, ...).

void DataSet::appendDataSet ( const DataSet dataset)

Append the points from the second dataset to the first one.

They must have the same layout and transformation history for this to work.

Exceptions
GaiaExceptionif there were duplicate IDs in the 2 dataset, if the layouts were not the same or if the transformation histories were not the same.

References history(), and layout().

void DataSet::checkAllPointsShareSameLayout ( const QVector< Point * > *  points = 0) const

Check that all given points have the same layout object as this dataset.

If no points are given, it will take those from this dataset.

Exceptions
GaiaExceptionif there are some points with a different layout object

References gaia2::Point::layout().

Referenced by gaia2::Analyzer::checkDataSet(), gaia2::Applier::checkLayout(), and mergeFiles().

void DataSet::checkUniqueIDs ( )
protected

Checks that all point names are unique (ie: there are no duplicates in this dataset).

A dataset with duplicate IDs is invalid and can lead to crashes.

Exceptions
GaiaExceptionif multiple points were found with the same ID.
void DataSet::checkUniqueIDsFrom ( const QVector< Point * > &  v)
protected

Checks that all point names inside v are unique and also that none of them is already in this dataset.

As DataSet derives from QVector<Point*> you can also pass a DataSet instance to this method.

Exceptions
GaiaExceptionif either the given list of points contains duplicates or if one of them was found in this dataset.
void DataSet::load ( const QString &  filename,
int  start = 0,
int  end = -1 
)

Load a dataset from disk.

A value of end < 0 means that we should load all the points.

Parameters
filenamethe path to the dataset file
startindex of the first point to be loaded
endthe index of the last point to be loaded + 1 (ie: past iterator)

Referenced by Cyclops::load(), and mergeFiles().

void DataSet::loadNthPart ( const QString &  filename,
int  idx = 0,
int  total = 1 
)

Load the n-th part of a dataset from disk.

Parameters
filenamethe path to the dataset file
idxthe index of the part to be loaded (0 <= idx < total)
totalthe number of parts in which the dataset should be split.

Referenced by Cyclops::loadNthPart().

const Point * DataSet::point ( const QString &  id) const

Find a point with a given ID.

Returns
the point with the given ID
Exceptions
GaiaExceptionwhen the point was not found in the dataset

Referenced by Cyclops::chainedSearch(), gaia2::RCA::computeCovarianceMatrix(), and Cyclops::getPoints().

Point * DataSet::point ( const QString &  id)

Find a point with a given ID.

Returns
the point with the given ID
Exceptions
GaiaExceptionwhen the point was not found in the dataset
int DataSet::pointIndex ( const QString &  id) const
protected

Returns the index of the point with the given name.

Exceptions
GaiaExceptionif the point name could not be found.

References gaia2::binarySearch().

const DataSet * DataSet::referenceDataSet ( ) const

Return the reference dataset.

A dataset always has a reference dataset, where it looks for the values used when filtering queries. If no dataset has been set as a reference dataset, the current dataset will be used.

Referenced by Cyclops::chainedSearch(), gaia2::PointArray::samplePoint(), and gaia2::BaseView< DataSetType, PointType, SearchPointType, DistanceType >::validate().

void DataSet::removePoint ( const QString &  id)

Remove a single point from the dataset given its ID.

Exceptions
GaiaExceptionif no point could be found with this ID.
void DataSet::removePoints ( const QList< QString > &  ids)

Remove a list of points from the dataset given their IDs.

This is much faster than calling DataSet::removePoint() repeatedly.

Exceptions
GaiaExceptionif at least one ID could not be found. In that case, no points will have been removed.

References removePoints().

Referenced by removePoints().

void DataSet::setHistory ( const TransfoChain history)

Set a predefined history for this dataset.

This only works on empty datasets, as it is forbidden to change the history of preexisting points/datasets.

This can be useful in the case where you copy points from one dataset into another, and want to preserve their history.

void DataSet::setReferenceDataSet ( DataSet dataset = 0,
bool  checkOriginalLayout = true 
)

Set the reference dataset, ie: the one used for fetching the values when doing filtered queries.

Passing it a null pointer (or no argument) will set the calling dataset as a reference dataset. It is highly recommended to always check for the original layout, as it is impossible to add a point to 2 linked datasets that don't have the same original layout.

References _linkedDataSets, and gaia2::checkIsomorphDataSets().

Referenced by gaia2::PointArray::samplePoint().

void DataSet::simplifyHistory ( )

Simplify the history of the transformations to have it in a "normalized" state, which consist at most of a Remove transformation followed by a FixLength one.

You can only call this method on a DataSet whose history contains the following allowed transformations: [ Select, Remove, RemoveVL, Cleaner, FixLength ].

References _linkedDataSets, gaia2::PointLayout::copy(), gaia2::PointLayout::descriptorLocation(), gaia2::PointLayout::descriptorNames(), gaia2::Region::dimension(), gaia2::PointLayout::fixLength(), gaia2::PointLayout::remove(), and gaia2::PointLayout::symmetricDifferenceWith().

Member Data Documentation

TransfoChain gaia2::DataSet::_history
protected

This represents the history of transformations that have been applied to this dataset, and also contains all the parameters to allow mapping a point from the original dataset space into the space this dataset is in.

For more information on this structure, refer to the Transformation class.

Referenced by copy(), and mergeFiles().

QList<DataSet*> gaia2::DataSet::_linkedDataSets
protected

This list contains all datasets linked to this one, like when a dataset is referencing another one for the original values used when filtering.

This is necessary, because when we add a point in a dataset, we need to add it as well in the referenced datasets and further in all the datasets referencing it, etc... So we need to create a set of linked datasets, which will be stored entirely in each one of these linked datasets. The first element of this list holds a specific role as well: it is the dataset which holds the reference values used when filtering.

Referenced by setReferenceDataSet(), and simplifyHistory().

QList<View*> gaia2::DataSet::_linkedViews
protected

This list contains all Views linked to this DataSet, ie: the Views using points from this dataset to do their queries.

We need this, because when add points, this invalidates the Views pointing on the dataset, so we need a way to inform them.

QString gaia2::DataSet::_name
protected

Represents the name of the dataset, which should be a short way to describe its function or where it comes from, its purpose, etc...

This information is only used for debugging purposes.

Referenced by copy(), and mergeFiles().


The documentation for this class was generated from the following files: