SofaDoc

SofaDoc Map Usage

inc/bloom.php

Provides a bloom index of all revisions indicating if a trigram might be present.
The bloom index is used by the filter functions.
The file initializes also the bloom index.

Functions

swOpenBloom()

Opens the bloom file.

swClearBloom()

Resets the bloom file.

swIndexBloom($numberofrevisions=,$continue=false)

Indexes 1000 revisions for the bloom index.

swGetHashesFromTerm($s)

Calculates the hashes for a given term
@link http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
False positive is minmized with k = ln(2)*m/n
Where
k = number of hashes used per trigram
m = bitdepth of bloom filter
n = number of trigrams
or n = ln(2)*m/k
Our design: ln2 ~~ 0.7, k = 3, m = 1024, eg n = 240
It is optimal therefore for 240 trigrams per revision.
False positives 15%.
If the text is longer, false positives will rise to 25%.
To get less than 10% false positive for 500 characters, it would need to double the bit length with the same number of hashes.

swFnvHash($s,$size,$prime,$offset)

Calculates one single hash with the Fowler-Noll-Vo hash function

swGetBloomBitmapFromTerm($term)

Returns a bitmap with all probable revisions for a given term

Used elements

Classes

swBitmap

Functions

echotime, swFileGet, swFnvHash, swGetHashesFromTerm, swNameURL, swOpenBloom, swSemaphoreRelease, swSemaphoreSignal

Globals

$db, $swBloomIndex, $swMaxSearchTime, $swMemoryLimit, $swOvertime, $swRoot

Variables

$bitmap, $block, $blocks, $byte, $char, $elem, $fileoffset, $found, $hash, $hashes, $list, $minblock, $notchecked, $nowtime, $numberofrevisions, $offset, $offsetmax, $path, $prime, $revisionpath, $size, $starttime, $stream, $term, $test, $text

Short variables: $bit, $bm, $col, $dur, $fpt, $fs, $h, $hbm, $i, $l, $p, $raw, $rev, $s, $t

Features

array, file, math, php, str, string, time