Created attachment 7253 [details]
proof of concept [NON WORKING]
i have a way to do this without making the program need to cache anything
and it can be simply extended to support caching
this is hint based like as is described in Lampson's Hints for Computer Design.
when you do a search, you remember for each page if there was a hit on it or not. this requires only O(pageno) storage. the hint starts out saying that all pages have a match on them so when searching you look at the hint if it says 'x' then the page might match so it is loaded and searched for the next hit if you search the page and don't find anything you set the hint for this page to 'o'
you have a array: char * match_hit = malloc(sizeof(char)*pageno)
you set 'match_hit' to all 'x' i.e. memset(match_hit, 'x', sizeof(char)*pageno)
you have a vector of 1s the length of the number of pages
when you start a new search you reset it to all 1s
when looking for the next hit
- if match_hit[current page] == 'o': you skip it (i.e. no need to check its text
for possible hits because there are none)
- if you look at a page and don't find a hit: you set match_hit[pageno] = 'o'
to cache this for more than one search you just put match_hit in a cache keyed
by the term and when you start a search you put the current term:vector in the
cache and get the one that matches this search from the cache in this way you
could have a limited cache of say 8 or so without the need for any data
structure than one of your existing
i have attached a patch that does attempt at this it doesn't work. for some reason it after it reaches the end of the document it starts marking off pages that do have a match on them until there are none left and it gets stuck in a loop
*** This bug has been marked as a duplicate of bug 691330 ***