Sofawiki Database Primer

The Sofawiki database is part of the Sofawiki CMS. The properties of the database are

Syntax of data fields

A datafield can be inserted at any position within a is delimited by double square brackets. The key and the value are separated with double colons.

[[key::value]] 

If you have multiple values, you can either use multiple fields, or separate the values with double colons. The following two notations are equivalent

[[key::value1]] [[key::value1]] 
[[key::value1::value2]] 

Order does not matter between keys. If you have multiple values for a key, order does matter. Each iteration of a field belongs to its tuple. If the number of iterations is not the same between the keys, the iterations are padded.

Note also that the query can be tricky if the number of values is not the same per key. See How Query deals multiple values.

The key can be any string if it does not contain a colon or a pipe character. However, to be useful with the query function, start with a lowercase letters and use only lowercase letters and numbers. Do not start with an underscore, that is reserved for internal values

The value can be any string, even with line returns, but it cannot use the chracacter double colon, pipe and square brackets. You will also have problems with curly brackets. Use the predefined tags colon, pipe, leftsquare, rightsquare, leftcurly and rightcurly. The query function is aware of the predefined tags.

The fields are not rendered directly on the page. They are invisible. You use a template or a function to render them to wikitext.

Querying the database

You can either lookup the values of a specific page or querying the entire database.

The query function

The query function accepts a list of statements. Each statement operates on a stack of two-dimensional tables. The result is the top table, which then can be formatted as table, tabbed text, fields or text. As in each function, statements are separated by the pipe character.

Each statement starts with a keyword in uppercase and then a list of arguments.

The most important statement is the SELECT statement. It works like the SQL statement, but you must define the fields explicitely and you can check only against one condition. In Sofawiki, you break up the conditions in a set of simple conditions and then combine them with boolean statements. If you do not know what fields are in the database, use the FIELDS statement.

SELECT fieldlist (FROM namespace)? (WHERE field operator string|WHERE field dollaroperator expression)?
fieldlist = field (, field)*

The fieldlist is a comma-separated list of fields. You cannot aggregate at this step (see GROUP statement).
FROM is optional. It restricts the query to pages where the name starts with namespace. "FROM main:" restricts to main namespace.
WHERE is optional. It adds one condition. You can either compare a field against a string or you can compare the field against an expression.

FROM VIRTUAL has a special meaning. It looks for fields only in the specified page. But while normally only the source text is searched, here the page is parsed with the template parser and then searched only. You can therefore use a query function with OUTPUT FIELDS to create what would be a VIEW in SQL. This is not faster, but creates more readable query functions.

SELECT fieldlist FROM VIRTUAL pagename (WHERE field operator string|WHERE field dollaroperator expression)?

There are many operators
You can compare also against an expression, by preceding the operator with the $ sign. Expressions are calculations within the fields of the tuple using an RPN notation. An expression can contain
As you see, the functions are uppercase like all keywords in the query function.

The IMPORT statement imports table from foreign formats, saved in a wiki page. Actually, implemented are:
The WHERE statement further filters the top table

WHERE field (operator string|dollaroperator expression)

The GROUP statement aggregates the top table. Add aggregation keywords COUNT, SUM, MIN, MAX, AVG after the fields in the field values. If there is no BY keyword, the aggregation is global

GROUP (fieldaggregation) (, fieldaggregation)* BY fieldlist
fieldaggregation = field (COUNT | SUM | MIN | MAX | AVG)?

The ORDER statement order the top table. You can order albhabetic or numeric, ascendant or descendant

ORDER ORDER fieldorder (, fieldorder)*
fieldorder = field (NUMERIC)? (DESC)?  

The LIMIT statement selects only a certain number of rows in the table. Start is zero-based

LIMIT start limit

The PROJECT statement selects only some columns. Use this to get rid of columns used before. The RENAME statement renames one field of the column or appends with * a postfix to each columns (use this before self joins). With # you can recover the key value in a normal visible column.

PROJECT fieldlist
RENAME oldfield newfield, otheroldfield newfield
RENAME * postfix
RENAME # field

The CALC statement assigns a value on a column based on a expression using values of the tuple. The FORMAT statement applies a sprintf formatation on the string. The NUMBERFORMAT statement makes pretty thousands-sepataror numbers with a chosen number of digits.

CALC field expression
FORMAT field formatstring
NUMBERFORMAT field digits, otherfield digits

The UNION and EXCEPT statements make boolean operations on the two top tables.

UNION
EXCEPT field
The JOIN statement makes a natural join based on the provided field (not optional). The CROSS statement makes a cross join. The left join adds null values for missing tuples on the right side, the outerjoin adds null values for missing matching tuples on both sides. There is no right join. You can swap and left join.

JOIN field
LEFTJOIN field
OUTERJOIN field
CROSS

The ROTATE statement rotates columns and rows. The keys become column names and the column names keys

The COPY, POP and SWAP operator work on the stack.

There are several options for the output. By default there is a HTML table. You can apply a TEMPLATE to each field. With the OUTPUT statement, you can specify a deep field list, a textarea with tabbed text to copy paste, text or a fixed format like SQL tables. Finally you can style the table

TEMPLATE field templatename
OUTPUT (HTML | FIXED (fieldlist)* | FIELDS | TAB | TABHEADER | TEXT | TEXTSPACE | ROWTEMPLATE string)
STYLE cssclass

The deep list format expects that the columns are ordered from general to detail.

To debug, finally, use the SHOWKEY and the VERBOSE statement.

The inner workings of the query function

The SELECT function uses the sofawiki PHP function swFilter. All other statements are calculated directly in the function each time the page is requested.

The swFilter function scans linearily through the revisions. For each revision, it checks if the condition is met and then extracts the fields of the fieldlist. The result of this search is persistant. It is saved in the queries folder. So even if a linear scan may seem long, it is done only one time for each revision because the Sofawiki knows that revisions are permament, they do not change overtime. This is a kind of selfindexing.

Sometimes the scanning is very long. The swFilter has a timeout, where it stops scanning and returns the result so far. At this moment you get the message Incomplete results. You can refresh the page to get further results.

The swFilter function then compares the found revisions with the current revisions (that have the status "ok" or "protected"). The revision that stay are then returned.

The set of found revisions are hold in a swBitmap, which is a set of booleans that take exactly one bit of memory space per boolean. We can then make boolean operations on the bitmaps.

The swFilter functions makes several optimizations not to search each revision