Sofawiki Database Primer
Note: Starting from 3.0.0, Sofawiki has implemented the Relation function which will over time (until 2023) replace the Query function. The documentation will be updated over time.
Transition from Query to Relation
The Sofawiki database is part of the Sofawiki CMS. The properties of the database are
- There is no predefined data schema. Fields are defined ad hoc.
- There is no interface for data inserting nor for rule based updating. Data fields are defined ad hoc when you edit the text of a wiki page.
- You can however edit an existing page with the fields editor.
- You can also use the Special:Regex for limited rule-based updating.
- Fields have no type.
- Fields can have multiple values on a page.
- Queries can be defined ad hoc in the Special:Query page.
- Queries can be integrated into pages with the query function.
- Fields of a speficic page can be integrated into pages with the value or the extvalue function.
- As in relational algebra, tuples are always unique.
Syntax of data fields
A datafield can be inserted at any position within a is delimited by double square brackets. The key and the value are separated with double colons.
[[key::value]]
If you have multiple values, you can either use multiple fields, or separate the values with double colons. The following two notations are equivalent
[[key::value1]] [[key::value1]]
[[key::value1::value2]]
Order does not matter between keys. If you have multiple values for a key, order does matter. Each iteration of a field belongs to its tuple. If the number of iterations is not the same between the keys, the iterations are padded.
Note also that the query can be tricky if the number of values is not the same per key. See How Query deals multiple values.
The key must start with a latin letter and cannot have the following characters: space, pipe, hash, lower tag, greater tag, curly brackets, square braquets, comma, colon, plus, slash. The minus is allowed for queries but will now work with relation function.
- Good keys: name, grade, title, director, stars
- Valid keys, but diffficult to use: NAME, Title_2, director-assistant
- Invalid keys: _stars, 3stars, article:title
- Predefined keys you can use for queries, but not for defining values
- _name: Name of the page
- _user: User who modifed last the page
- _timestamp: Time of last modification of the page
- _content: Entire content of the page
- _revision: Revision of the page
- _category: List of category links
- _link: List of internal links
- _fields: Fields of the page
- _any: Any field value
- _template: List of templates used by page (only directly used, no transclusion)
The value can be any string, even with line returns, but it cannot use the chracacter double colon, pipe and square brackets. You will also have problems with curly brackets. Use the predefined tags colon, pipe, leftsquare, rightsquare, leftcurly and rightcurly. The query function is aware of the predefined tags.
The fields are not rendered directly on the page. They are invisible. You use a template or a function to render them to wikitext.
Querying the database
You can either lookup the values of a specific page or querying the entire database.
- To lookup specific values that are on the same page, use the value function.
- To lookup specific values that are on a page you know, use the extvalue function.
- To query the database generally, use the query function.
The query function
The query function accepts a list of statements. Each statement operates on a stack of two-dimensional tables. The result is the top table, which then can be formatted as table, tabbed text, fields or text. As in each function, statements are separated by the pipe character.
Each statement starts with a keyword in uppercase and then a list of arguments.
The most important statement is the SELECT statement. It works like the SQL statement, but you must define the fields explicitely and you can check only against one condition. In Sofawiki, you break up the conditions in a set of simple conditions and then combine them with boolean statements. If you do not know what fields are in the database, use the FIELDS statement.
SELECT fieldlist (FROM namespace)? (WHERE field operator string|WHERE field operator expression)?
fieldlist = field (, field)*
The fieldlist is a comma-separated list of fields. You cannot aggregate at this step (see GROUP statement).
FROM is optional. It restricts the query to pages where the name starts with namespace. "FROM main:" restricts to main namespace.
WHERE is optional. It adds one condition. You can either compare a field against a string or you can compare the field against an expression.
FROM VIRTUAL has a special meaning. It looks for fields only in the specified page. But while normally only the source text is searched, here the page is parsed with the template parser and then searched only. You can therefore use a query function with OUTPUT FIELDS to create what would be a VIEW in SQL. This is not faster, but creates more readable query functions.
SELECT fieldlist FROM VIRTUAL pagename (WHERE field operator string|WHERE field dollaroperator expression)?
FROM VIRTUAL can be used with external servers, if you set up a swInternalLinkHook function to transform special pagenames (eg "art:my page") in URL (eg "https://www.artfilm.ch/my-page"). Note that these pages need to render fields so that you can read them.
There are many operators
- numeric operators compare the numeric values = < > <= >=
- string operators compare the string value == << >> >=, but also =* (starts with) *= (ends with) *=* (contains)
- relaxed string operators work on the url value of the string ~~ ~* *~ *~*
- the * operator checks if the field is present. It does not need a string
- the 0 operator checks if the field is empty. It does not need a string
- the IN operator matches with a list of values separated by ::
- the ! prefix inverts the result (examples: != !== !~* !0 !* but it works on all operators)
- the r= operator makes a regex comparison. Note that you may need to use the predefined tags to escape brackets and pipes. You can add a hint to the term to profit from the bloom filter
SELECT title WHERE directo r= /Mac(.*)/ HINT mac
For the handling of null values see Query null values
You can compare also against an expression, by preceding the operator with the $ sign. Expressions are calculations within the fields of the tuple using an RPN notation. An expression can contain
- fields
- numbers
- quoted strings (use double quotes)
- + - * / . (concat) :: (concat with double ::)
- ABS SIGN SQRT RAND COPY POP SWAP STRLEN SUBSTR REPLACE REGEX URLIFY
- Most functions with FUNCTION-function. (Why most? functions using external values cannot be used as they violate consistency of the filter function.)
As you see, the predefined functions are uppercase like all keywords in the query function.
You can also compare against a value of the last tuple of the relation on top of the stack, if that field is not present on the present relation.
DATA [[fool::bar]]
SELECT title FROM films WHERE film == $foo
New The FILTER statement is an emulation of the filter statement in Relation. You define the fields and the hints. Hint can be any URLtext (a-z0-9-). Use space to create AND logic and pipe to create OR logic. The FILTER statement is much faster than SELECT. To emulte SELECT, use WHERE in the second line
SELECT foo, bar WHERE bar = 3 | FILTER foo, bar "3" |
SELECT foo, bar WHERE bar > 3 | FILTER foo, bar "*" WHERE bar > 3 |
SELECT foo WHERE bar > 3 | FILTER foo, bar "*" WHERE bar > 3 PROJECT foo" |
The IMPORT statement imports table from foreign formats, saved in a wiki page. Actually, implemented are:
- the TAB format is, using the first line as labels and the following lines as records, all separated by tabs.
- the FIELDS format
IMPORT can be used with external servers, if you set up a swInternalLinkHook function to transform special pagenames (eg "art:my page") in URL (eg "https://www.artfilm.ch/my-page"). Note that these pages need to render fields so that you can read them.
The DATA statement allows you to create a table ad hoc using wikifield syntax
DATA [[fool::bar]] [[fool::bar2]]
The WHERE statement further filters the top table
WHERE field (operator string|dollaroperator expression)
The GROUP statement aggregates the top table. Add aggregation keywords COUNT, SUM, MIN, MAX, AVG, CONCAT after the fields in the field values. If there is no BY keyword, the aggregation is global. CONCAT concatenates all unique strings with separator ::
GROUP (fieldaggregation) (, fieldaggregation)* BY fieldlist
fieldaggregation = field (COUNT | SUM | MIN | MAX | AVG | CONCAT)?
The ORDER statement order the top table. You can order albhabetic or numeric, ascendant or descendant. URL orders alphabetically with the URL-representation of the field. FAMILYNAME orders with the collation of the familyname function. NOCASE orders case-insensitive.
Case insensitive will still order accented letters separately. You may have a more natural order with URL.
ORDER ORDER fieldorder (, fieldorder)*
fieldorder = field (NUMERIC)? (DESC)? (URL)? (FAMILYNAME)? (NOCASE)?
The LIMIT statement selects only a certain number of rows in the table. Start is zero-based
LIMIT start limit
The PROJECT statement selects only some columns. Use this to get rid of columns used before. The RENAME statement renames one field of the column or appends with * a postfix to each columns (use this before self joins). With # you can recover the key value in a normal visible column.
PROJECT fieldlist
RENAME oldfield newfield, otheroldfield newfield
RENAME * postfix
RENAME # field
The CALC statement assigns a value on a column based on a expression using values of the tuple. The FORMAT statement applies a sprintf formatation on the string. The NUMBERFORMAT statement makes pretty thousands-sepataror numbers with a chosen number of digits.
CALC field expression
FORMAT field formatstring
NUMBERFORMAT field digits, otherfield digits
The UNION and EXCEPT statements make boolean operations on the two top tables.
UNION
EXCEPT field
The JOIN statement makes a natural join based on the provided field (not optional). The CROSS statement makes a cross join. The left join adds null values for missing tuples on the right side, the outerjoin adds null values for missing matching tuples on both sides. There is no right join. You can swap and left join.
JOIN field
LEFTJOIN field
OUTERJOIN field
CROSS
See also Join and Cross
The ROTATE statement rotates columns and rows. The keys become column names and the column names keys. You must provide two arguments. The name of the old column used for the column names and the name of the new column for the old column names.
ROTATE director fields
The COPY, POP and SWAP operator work on the stack.
There are several options for the output. By default there is a HTML table. You can apply a TEMPLATE to each field. With the OUTPUT statement, you can specify a deep field list, a textarea with tabbed text to copy paste, text or a fixed format like SQL tables. Finally you can style the table. HTMLPAGED and LISTPAGED return a table limited to 50 values or how many you define, with an integrated navigation to the other values. Finally, you can display the values as Charts.
TEMPLATE (field+) templatename
OUTPUT (HTML | HTMLPAGED number | LIST | LISTPAGED number | FIXED (fieldlist)* | FIELDS | FIELDSCOMPACT | TAB | TABHEADER | TEXT | TEXTSPACE | ROWTEMPLATE string)
STYLE cssclass | CHART (BAR|LINE|PIE) (optionlist)?
The template will be applied to the first field, but you can use other fields to calculate the template.
The deep list format expects that the columns are ordered from general to detail.
To debug, finally, use the VERBOSE statement.
The inner workings of the query function
The SELECT function uses the sofawiki PHP function swFilter. All other statements are calculated directly in the function each time the page is requested.
The swFilter function scans linearily through the revisions. For each revision, it checks if the condition is met and then extracts the fields of the fieldlist. The result of this search is persistant. It is saved in the queries folder. So even if a linear scan may seem long, it is done only one time for each revision because the Sofawiki knows that revisions are permament, they do not change overtime. This is a kind of selfindexing.
Sometimes the scanning is very long. The swFilter has a timeout, where it stops scanning and returns the result so far. At this moment you get the message Incomplete results. You can refresh the page to get further results.
The swFilter function then compares the found revisions with the current revisions (that have the status "ok" or "protected"). The revision that stay are then returned.
The set of found revisions are hold in a swBitmap, which is a set of booleans that take exactly one bit of memory space per boolean. We can then make boolean operations on the bitmaps.
The swFilter functions makes several optimizations not to search each revision
- The namespace limits the search
- It compares against queries that checked the existence of the field
- For certain string operators (== *= 0* *=* ~~ *~ ~* *~*) it can check if the page contains the text. It uses therefore a global bloom filter.
Overtime
Filtering is limited by timeout. When it reaches this limit, $swOvertime is set true.
You can use AJAX to reload the page Reload automaticallly after Overtime.