Provides a bloom index of all revisions indicating if a trigram might be present.
The bloom index is used by the filter functions.
The file initializes also the bloom index.
swOpenBloom()
Opens the bloom file.swClearBloom()
Resets the bloom file.swIndexBloom($numberofrevisions=,$continue=false)
Indexes 1000 revisions for the bloom index.swGetHashesFromTerm($s)
Calculates the hashes for a given term
@link http://pages.cs.wisc.edu/~cao/papers/summary-cache/node8.html
False positive is minmized with k = ln(2)*m/n
Where
k = number of hashes used per trigram
m = bitdepth of bloom filter
n = number of trigrams
or n = ln(2)*m/k
Our design: ln2 ~~ 0.7, k = 3, m = 1024, eg n = 240
It is optimal therefore for 240 trigrams per revision.
False positives 15%.
If the text is longer, false positives will rise to 25%.
To get less than 10% false positive for 500 characters, it would need to double the bit length with the same number of hashes.swFnvHash($s,$size,$prime,$offset)
Calculates one single hash with the Fowler-Noll-Vo hash functionswGetBloomBitmapFromTerm($term)
Returns a bitmap with all probable revisions for a given term
echotime, swFileGet, swFnvHash, swGetHashesFromTerm, swNameURL, swOpenBloom, swSemaphoreRelease, swSemaphoreSignal
$db, $swBloomIndex, $swMaxSearchTime, $swMemoryLimit, $swOvertime, $swRoot
$bitmap, $block, $blocks, $byte, $char, $elem, $fileoffset, $found, $hash, $hashes, $list, $minblock, $notchecked, $nowtime, $numberofrevisions, $offset, $offsetmax, $path, $prime, $revisionpath, $size, $starttime, $stream, $term, $test, $text
Short variables: $bit, $bm, $col, $dur, $fpt, $fs, $h, $hbm, $i, $l, $p, $raw, $rev, $s, $t
array, file, math, php, str, string, time