Redirected from filter-index-implemenation Home Features Demos Download Installation User Manual Developer Manual Relation function Credits

Login

Filter index implementation

The instruction filter index provides a fast search for fields if you know an exact value for a field

Example:

filter index title, director, year "2020"

Use filter index when you

  • have queries that depend on a key you know exactly
  • the number of rows in the result is not too big.

User filter when you

  • need to make fuzzy searches
  • make global searches on all values of a field.

Implementation

The swRelationIndexSearch function maintains a big key-value table of all user defined fields and the _name field in a SQLite database /site/indexes/fields.db. The database contains

  • a table fields with the columns revision, row, key, value
  • an index on revision,row
  • an index on key,value

On each call, to swRelationIndexSearch, the index is updated comparing the revisions with the currentbitmap. The update is limited by the maximal search time. New revisions are added, deleted revisions are removed.

The filter query is transformed into a SQL statement to extract the values directly as named columns. This is done with a multiple self join based on revision and row and selected values. The join adapts the order to select first in the fields that have the longest given value and uses left joins on fields without values to adapt for missing values.

The above example becomes

SELECT t1.value as _name , t2.value AS year , t3.value AS title , t4.value AS director FROM fields t1
JOIN fields t2 ON t1.revision = t2.revision AND t1.row = t2.row
LEFT JOIN fields t3 ON t1.revision = t3.revision AND t1.row = t3.row
LEFT JOIN fields t4 ON t1.revision = t4.revision AND t1.row = t4.row
WHERE t1.key = '_name'
AND t2.key = 'year'
AND t2.value = '2020'
AND t3.key = 'title'
AND t4.key = 'director'

The _name column is included to manage user rights. It is discarded later if the query did not include it.

Advantages

A single index file is used for many queries. It is therefore always up to date.

The query uses the indexes of SQLite and is therefore faster than the linear scan of Relation.

This is much faster for all kind of queries that uses indexes and return a limited set. For example, you search for cast and crew for a given film id.

It is probably not much faster for queries that return many rows, like the list of all US movies. Here, the caching mechanism of the traditional filter might sometimes be faster.

Reset and index the fields.db

Normally, there should not be a need for reset and index. Consider it, when the database seems corrupt.

The database is reset by either

The database is indexed by

  • Any query including filter index. If the indexing is not complete, the $swOverTime flag is set.
  • By Index names in Special:Indexes

NB: If you still have pages in the "logs" namespace from earlier versions of Sofawiki, which are not used any more, consider deleting them with

filter _name, _revision, _namespace "logs"
execute delete