Gaia
|
When doing a search on a DataSet using a View, you are allowed to use a filtering term.
That means that of all the results found by the search, only those that comply to the condition expressed in the filtering term will be returned. In particular, that means that the filter term should be an expression that evaluates to true or false for each point.
In order to be able to filter queries on a DataSet, you will first need 2 things:
Let's say you have a DataSet on which you applied PCA, then all your dimensions are mangled, and you have lost the original values. However, you still want to be able to do stuff like "bpm > 120" on it. The reference DataSet is the DataSet where the filter should be looking for these values.
To set a DataSet as a reference DataSet for another one, just do the following:
dsQuery.setReferenceDataSet(dsRef)
The reference DataSet needs to comply to 2 conditions before you can call this:
The corresponding grammar (expressed using EBNF notation) is the following:
Filter ::= [ 'WHERE' Predicate ] Predicate ::= Boolean | PredComparison | PredBinaryOp | PredUnaryOp | '(' Predicate ')' Predicate ::= Value 'BETWEEN' VALUE_CONSTANT 'AND' VALUE_CONSTANT Predicate ::= String 'NOT'? 'IN' '(' StringList ')' | Value 'NOT'? 'IN' '(' ValueList ')' StringList ::= String | String ',' StringList ValueList ::= Value | Value ',' ValueList ValueComparisonType ::= '=' | '!=' | '<' | '<=' | '>' | '>=' StringComparisonType ::= '=' | '!=' PredComparison ::= Value ValueComparisonType Value | String StringComparisonType String BinaryOp ::= '&&' | '||' | 'AND' | 'OR' UnaryOp ::= 'NOT' PredBinaryOp ::= Predicate BinaryOp Predicate PredUnaryOp ::= UnaryOp Predicate Boolean ::= 'TRUE' | 'FALSE' Value ::= VALUE_CONSTANT | VALUE_VARIABLE String ::= STRING_CONSTANT | STRING_VARIABLE
NB: due a little weakness in the parser, you have to specify the type of a variable before its name, like this:
value.tempotap_bpm.mean
to refer to the real value named tempotap_bpm.mean, or
label.key_mode.value
to refer the the string label named key_mode.value.
When using multidimensional descriptors in a filter, you need to specify which dimension to consider. For instance, mfcc's have 13 dimensions, when filtering with mfcc's you need to tell which dimension should be filtered. To do that, you add "[#]" to the name of the descriptor in the filter, where "#" is the number of the dimension (first dimension is 0, see example below). Multidimensional filtering only works with fixed-length descriptors and requires the application of the 'FixLength' transformation to your dataset.
Here are some concrete examples of possible filter terms for those of you not used to reading EBNF grammars! :-)
WHERE value.tempotap_bpm.mean > 100 WHERE value.danceability < 3 AND (label.genre = "classical" OR label.genre = "jazz") WHERE value.lowlevel.mfcc.mean[3] > 20 etc...
Starting with Gaia 2.2, you can now use indexation to speed up queries which use filters. The idea is the same as a normal database, where you explicitly index using a certain descriptor, and then all subsequent queries which make use of this descriptor will be faster.
To index on a descriptor, just call the following:
descName = 'rhythm.bpm' v = View(dataset, dist)here's the interesting part
v.indexOn(descName)
And that's all there is to it!