Gaia
|
This class represents a dataset and all related information. More...
#include <dataset.h>
Public Member Functions | |
const QString & | name () const |
Return the name of this dataset. | |
void | setName (const QString &name) |
Set the name for this dataset. | |
const Point * | point (const QString &id) const |
Find a point with a given ID. More... | |
Point * | point (const QString &id) |
Find a point with a given ID. More... | |
bool | contains (const QString &id) const |
Return whether this dataset contains a point with the given ID. | |
QStringList | pointNames () const |
Return a list of the names of the points contained in this dataset. | |
const PointLayout & | layout () const |
Return the layout of this dataset. | |
const PointLayout & | originalLayout () const |
Return the original layout of this dataset. | |
void | checkAllPointsShareSameLayout (const QVector< Point * > *points=0) const |
Check that all given points have the same layout object as this dataset. More... | |
const TransfoChain & | history () const |
Return the history of this dataset (the list of all transformations that have been applied). | |
void | setHistory (const TransfoChain &history) |
Set a predefined history for this dataset. More... | |
void | forgetHistory () |
Reset the history to an empty one but does not touch any of the points. | |
void | simplifyHistory () |
Simplify the history of the transformations to have it in a "normalized" state, which consist at most of a Remove transformation followed by a FixLength one. More... | |
void | setReferenceDataSet (DataSet *dataset=0, bool checkOriginalLayout=true) |
Set the reference dataset, ie: the one used for fetching the values when doing filtered queries. More... | |
const DataSet * | referenceDataSet () const |
Return the reference dataset. More... | |
void | addPoint (const Point *point) |
Add the specified point to this dataset. More... | |
void | addPoints (const QVector< Point * > &points) |
Add the specified points to this dataset. More... | |
void | appendDataSet (const DataSet *dataset) |
Append the points from the second dataset to the first one. More... | |
void | removePoint (const QString &id) |
Remove a single point from the dataset given its ID. More... | |
void | removePoints (const QList< QString > &ids) |
Remove a list of points from the dataset given their IDs. More... | |
void | addView (View *view) |
Register a view on this dataset. More... | |
void | removeView (View *view) |
Remove a View from the list of registered views for this dataset. | |
DataSet * | copy () const |
Performs a deep copy of this dataset (ie: the contained points are copied as well). | |
void | load (const QString &filename, int start=0, int end=-1) |
Load a dataset from disk. More... | |
void | loadNthPart (const QString &filename, int idx=0, int total=1) |
Load the n-th part of a dataset from disk. More... | |
void | save (const QString &filename) const |
Save this dataset to disk. | |
void | fromBase64 (const std::string &data) |
Load a DataSet from its base64 representation. | |
void | fromBase64 (const QByteArray &data) |
Load a DataSet from its base64 representation. | |
std::string | toBase64 () const |
Return a base64 representation for this DataSet. | |
![]() | |
PointArray (int n=0, bool ownsMemory=true) | |
PointArray (bool ownsMemory) | |
void | clear () |
Delete the points this array contains (if it owns them), then resize the array to 0. | |
int | totalSegments () const |
Returns the total number of segments in this PointArray (the sum of the number of segments for each point in the array). | |
const Point * | samplePoint () const |
Returns any single point from the PointArray. | |
Static Public Member Functions | |
static DataSet * | mergeFiles (const QMap< QString, QString > &sigfiles, const QStringList &descsSelect=QStringList()<< "*", const QStringList &descsExclude=QStringList(), int start=0, int end=10000000, PointLayout *reflayout=0) |
Take a map of pairs (pointID, filename) and merges the signature files in a single dataset and returns it. | |
Public Attributes | |
QReadWriteLock | lock |
A lock available for users to take, if the dataset is to be used in a multi-threaded context. | |
Protected Member Functions | |
int | binarySearch (const QString &id, int start, int end) const |
Looks for point with name id, between indices start and end included. | |
int | pointIndex (const QString &id) const |
Returns the index of the point with the given name. More... | |
void | clear () |
int | load (QDataStream &in, int start=0, int end=-1, bool readAllPointsFromStream=false) |
Returns the number of points in the dataset (not the number of points loaded). | |
void | setLayoutIfEmpty (const Point *point) |
void | invalidateViews () |
void | modify () |
void | setHistoryNoCheck (const TransfoChain &history) |
void | addTransformation (const Transformation &transfo) |
bool | consistentLinks () const |
void | forceUnlinkReferringDataSets () |
void | unifyLayout () |
Make all the points in this DataSet share the same layout. | |
void | checkUniqueIDs () |
Checks that all point names are unique (ie: there are no duplicates in this dataset). More... | |
void | checkUniqueIDsFrom (const QVector< Point * > &v) |
Checks that all point names inside v are unique and also that none of them is already in this dataset. More... | |
void | addPoints (const QVector< Point * > &points, bool layoutCheck, bool transformPoint, bool checkUnique=true, bool takeOwnership=false, bool relaySignal=true) |
Adds the given points to his dataset. More... | |
void | removePoints (const QList< QString > &ids, bool relaySignal) |
void | sortPoints (int pivotIdx=-1) |
DataSet (const DataSet &rhs) | |
DataSet & | operator= (const DataSet &rhs) |
void | resize (int n) |
Protected Attributes | |
QString | _name |
Represents the name of the dataset, which should be a short way to describe its function or where it comes from, its purpose, etc... More... | |
TransfoChain | _history |
This represents the history of transformations that have been applied to this dataset, and also contains all the parameters to allow mapping a point from the original dataset space into the space this dataset is in. More... | |
PointLayout | _layout |
This represents the common layout of all points contained in this dataset and provides functions for retrieving the physical location of a descriptor given its name, and reciprocally, retrieving the name of a descriptor given its physical location. | |
QList< DataSet * > | _linkedDataSets |
This list contains all datasets linked to this one, like when a dataset is referencing another one for the original values used when filtering. More... | |
QList< View * > | _linkedViews |
This list contains all Views linked to this DataSet, ie: the Views using points from this dataset to do their queries. More... | |
bool | _isDataSorted |
![]() | |
bool | _ownsMemory |
Friends | |
class | Applier |
DataSet * | mergeDataSets (const DataSet *ds1, const DataSet *ds2) |
Merges two datasets together, provided that their layout don't overlap, and return the resulting dataset. | |
QDataStream & | operator<< (QDataStream &out, const DataSet &dataset) |
QDataStream & | operator>> (QDataStream &in, DataSet &dataset) |
This class represents a dataset and all related information.
A DataSet is a set of points which all share the same structure. The points are sorted internally to allow fast lookups in O(log(N)).
The structure contains information such as dimension names, a pointer to the original dataset, the history of the applied transformations, and a point layout which maps: names of dimensions <-> indices inside the point data.
The DataSet also provides methods for adding & removing points, merging with another DataSet, and serialization functions.
void DataSet::addPoint | ( | const Point * | point | ) |
Add the specified point to this dataset.
The dataset makes a copy of the point for its own use, so no ownership is taken. If you don't want the point anymore after having added it to the dataset, it is your responsibility to free the memory for it.
GaiaException | if there was already a point with the same ID in this dataset. |
void DataSet::addPoints | ( | const QVector< Point * > & | points | ) |
Add the specified points to this dataset.
The dataset makes a copy of the points for its own use, so no ownership is taken. This method is equivalent to calling addPoint repeatedly, but is much faster.
GaiaException | if adding all points would result in a dataset with duplicates. |
Referenced by addPoints(), gaia2::Applier::addPointsNoLayoutCheck(), Cyclops::getPoints(), gaia2::mergeDataSets(), and mergeFiles().
|
protected |
Adds the given points to his dataset.
layoutCheck | whether to check that the layouts of all points are compatible with that of the dataset. In case a point has a layout which is incompatible, it will throw an exception and no points will have been added. |
transformPoint | whether to apply the history of transformations to the points we're adding, or to insert them directly as is in the dataset. In the former case, the layouts (if checked) need to be the same as the original layout of the dataset, in the former, they need to be the same as the current layout of the dataset. |
checkUnique | whether to check for uniqueness condition in point names. A dataset is only valid if all points inside it have different names, so this makes sure that after adding the given points, we still have a valid dataset. It throws an exception otherwise. |
takeOwnership | whether this method needs to make a copy of the given points or not. WARNING: if takeOwnership = true, there is no guarantee that the pointers in the vector are still valid after this call (ie: not only should you not delete them, but you should also stop using them directly afterwards). |
relaySignal | whether this should be applied to all linked datasets or only to this one. |
References addPoints(), gaia2::PointLayout::canMorphInto(), gaia2::Point::layout(), gaia2::PointLayout::morphPoint(), gaia2::Point::name(), gaia2::Point::setLayout(), gaia2::Point::switchLayout(), and gaia2::PointLayout::symmetricDifferenceWith().
void DataSet::addView | ( | View * | view | ) |
Register a view on this dataset.
Registered Views are notified when the underlying dataset change (ie: points are added, removed, ...).
void DataSet::appendDataSet | ( | const DataSet * | dataset | ) |
Append the points from the second dataset to the first one.
They must have the same layout and transformation history for this to work.
GaiaException | if there were duplicate IDs in the 2 dataset, if the layouts were not the same or if the transformation histories were not the same. |
void DataSet::checkAllPointsShareSameLayout | ( | const QVector< Point * > * | points = 0 | ) | const |
Check that all given points have the same layout object as this dataset.
If no points are given, it will take those from this dataset.
GaiaException | if there are some points with a different layout object |
References gaia2::Point::layout().
Referenced by gaia2::Analyzer::checkDataSet(), gaia2::Applier::checkLayout(), and mergeFiles().
|
protected |
Checks that all point names are unique (ie: there are no duplicates in this dataset).
A dataset with duplicate IDs is invalid and can lead to crashes.
GaiaException | if multiple points were found with the same ID. |
|
protected |
Checks that all point names inside v
are unique and also that none of them is already in this dataset.
As DataSet
derives from QVector<Point*>
you can also pass a DataSet
instance to this method.
GaiaException | if either the given list of points contains duplicates or if one of them was found in this dataset. |
void DataSet::load | ( | const QString & | filename, |
int | start = 0 , |
||
int | end = -1 |
||
) |
Load a dataset from disk.
A value of end < 0 means that we should load all the points.
filename | the path to the dataset file |
start | index of the first point to be loaded |
end | the index of the last point to be loaded + 1 (ie: past iterator) |
Referenced by Cyclops::load(), and mergeFiles().
void DataSet::loadNthPart | ( | const QString & | filename, |
int | idx = 0 , |
||
int | total = 1 |
||
) |
Load the n-th part of a dataset from disk.
filename | the path to the dataset file |
idx | the index of the part to be loaded (0 <= idx < total) |
total | the number of parts in which the dataset should be split. |
Referenced by Cyclops::loadNthPart().
const Point * DataSet::point | ( | const QString & | id | ) | const |
Find a point with a given ID.
GaiaException | when the point was not found in the dataset |
Referenced by Cyclops::chainedSearch(), gaia2::RCA::computeCovarianceMatrix(), and Cyclops::getPoints().
Point * DataSet::point | ( | const QString & | id | ) |
Find a point with a given ID.
GaiaException | when the point was not found in the dataset |
|
protected |
Returns the index of the point with the given name.
GaiaException | if the point name could not be found. |
References gaia2::binarySearch().
const DataSet * DataSet::referenceDataSet | ( | ) | const |
Return the reference dataset.
A dataset always has a reference dataset, where it looks for the values used when filtering queries. If no dataset has been set as a reference dataset, the current dataset will be used.
Referenced by Cyclops::chainedSearch(), gaia2::PointArray::samplePoint(), and gaia2::BaseView< DataSetType, PointType, SearchPointType, DistanceType >::validate().
void DataSet::removePoint | ( | const QString & | id | ) |
Remove a single point from the dataset given its ID.
GaiaException | if no point could be found with this ID. |
void DataSet::removePoints | ( | const QList< QString > & | ids | ) |
Remove a list of points from the dataset given their IDs.
This is much faster than calling DataSet::removePoint() repeatedly.
GaiaException | if at least one ID could not be found. In that case, no points will have been removed. |
References removePoints().
Referenced by removePoints().
void DataSet::setHistory | ( | const TransfoChain & | history | ) |
Set a predefined history for this dataset.
This only works on empty datasets, as it is forbidden to change the history of preexisting points/datasets.
This can be useful in the case where you copy points from one dataset into another, and want to preserve their history.
void DataSet::setReferenceDataSet | ( | DataSet * | dataset = 0 , |
bool | checkOriginalLayout = true |
||
) |
Set the reference dataset, ie: the one used for fetching the values when doing filtered queries.
Passing it a null pointer (or no argument) will set the calling dataset as a reference dataset. It is highly recommended to always check for the original layout, as it is impossible to add a point to 2 linked datasets that don't have the same original layout.
References _linkedDataSets, and gaia2::checkIsomorphDataSets().
Referenced by gaia2::PointArray::samplePoint().
void DataSet::simplifyHistory | ( | ) |
Simplify the history of the transformations to have it in a "normalized" state, which consist at most of a Remove transformation followed by a FixLength one.
You can only call this method on a DataSet whose history contains the following allowed transformations: [ Select, Remove, RemoveVL, Cleaner, FixLength ].
References _linkedDataSets, gaia2::PointLayout::copy(), gaia2::PointLayout::descriptorLocation(), gaia2::PointLayout::descriptorNames(), gaia2::Region::dimension(), gaia2::PointLayout::fixLength(), gaia2::PointLayout::remove(), and gaia2::PointLayout::symmetricDifferenceWith().
|
protected |
This represents the history of transformations that have been applied to this dataset, and also contains all the parameters to allow mapping a point from the original dataset space into the space this dataset is in.
For more information on this structure, refer to the Transformation class.
Referenced by copy(), and mergeFiles().
|
protected |
This list contains all datasets linked to this one, like when a dataset is referencing another one for the original values used when filtering.
This is necessary, because when we add a point in a dataset, we need to add it as well in the referenced datasets and further in all the datasets referencing it, etc... So we need to create a set of linked datasets, which will be stored entirely in each one of these linked datasets. The first element of this list holds a specific role as well: it is the dataset which holds the reference values used when filtering.
Referenced by setReferenceDataSet(), and simplifyHistory().
|
protected |
This list contains all Views linked to this DataSet, ie: the Views using points from this dataset to do their queries.
We need this, because when add points, this invalidates the Views pointing on the dataset, so we need a way to inform them.
|
protected |
Represents the name of the dataset, which should be a short way to describe its function or where it comes from, its purpose, etc...
This information is only used for debugging purposes.
Referenced by copy(), and mergeFiles().