Gaia
Filtering queries

When doing a search on a DataSet using a View, you are allowed to use a filtering term.

That means that of all the results found by the search, only those that comply to the condition expressed in the filtering term will be returned. In particular, that means that the filter term should be an expression that evaluates to true or false for each point.

Prerequisites

In order to be able to filter queries on a DataSet, you will first need 2 things:

  • the DataSet in which you want to do the queries (you should have this one ready by now!)
  • a reference DataSet, which will be used to get the values used when filtering.

Let's say you have a DataSet on which you applied PCA, then all your dimensions are mangled, and you have lost the original values. However, you still want to be able to do stuff like "bpm > 120" on it. The reference DataSet is the DataSet where the filter should be looking for these values.

To set a DataSet as a reference DataSet for another one, just do the following:

dsQuery.setReferenceDataSet(dsRef)

The reference DataSet needs to comply to 2 conditions before you can call this:

  1. it needs to be isomorph to the original one, which means that they must both have exactly the same collections with the same points inside. The layout can be different (otherwise it would be useless!), but there must be an exact mapping from points in one DataSet to the points in the other one.
  2. it needs to be self-referencing. DataSets are self-referencing by default, so most of the time you won't have any problems. However, you cannot use as as reference a DataSet which is itself referencing another DataSet.

Filter Grammar

The corresponding grammar (expressed using EBNF notation) is the following:

Filter    ::= [ 'WHERE' Predicate ]
Predicate ::= Boolean | PredComparison | PredBinaryOp | PredUnaryOp | '(' Predicate ')'
Predicate ::= Value 'BETWEEN' VALUE_CONSTANT 'AND' VALUE_CONSTANT
Predicate ::= String 'NOT'? 'IN' '(' StringList ')' | Value 'NOT'? 'IN' '(' ValueList ')'
 
StringList ::= String | String ',' StringList
ValueList  ::= Value  | Value ',' ValueList
 
ValueComparisonType  ::= '=' | '!=' | '<' | '<=' | '>' | '>='
StringComparisonType ::= '=' | '!='
PredComparison       ::= Value ValueComparisonType Value | String StringComparisonType String
 
BinaryOp ::= '&&' | '||' | 'AND' | 'OR'
UnaryOp  ::= 'NOT'
 
PredBinaryOp ::= Predicate BinaryOp Predicate
PredUnaryOp  ::= UnaryOp Predicate
 
Boolean ::= 'TRUE' | 'FALSE'
Value   ::= VALUE_CONSTANT | VALUE_VARIABLE
String  ::= STRING_CONSTANT | STRING_VARIABLE

NB: due a little weakness in the parser, you have to specify the type of a variable before its name, like this:
value.tempotap_bpm.mean to refer to the real value named tempotap_bpm.mean, or
label.key_mode.value to refer the the string label named key_mode.value.

When using multidimensional descriptors in a filter, you need to specify which dimension to consider. For instance, mfcc's have 13 dimensions, when filtering with mfcc's you need to tell which dimension should be filtered. To do that, you add "[#]" to the name of the descriptor in the filter, where "#" is the number of the dimension (first dimension is 0, see example below). Multidimensional filtering only works with fixed-length descriptors and requires the application of the 'FixLength' transformation to your dataset.

Examples

Here are some concrete examples of possible filter terms for those of you not used to reading EBNF grammars! :-)

WHERE value.tempotap_bpm.mean > 100
WHERE value.danceability < 3 AND (label.genre = "classical" OR label.genre = "jazz")
WHERE value.lowlevel.mfcc.mean[3] > 20
etc...

Indexing views on certain descriptors

Starting with Gaia 2.2, you can now use indexation to speed up queries which use filters. The idea is the same as a normal database, where you explicitly index using a certain descriptor, and then all subsequent queries which make use of this descriptor will be faster.

To index on a descriptor, just call the following:

descName = 'rhythm.bpm'
v = View(dataset, dist)
 

here's the interesting part

v.indexOn(descName)

And that's all there is to it!