691974 – after a page has been searched for a given string the fact of it presence or absence should be cached

Bug 691974 - after a page has been searched for a given string the fact of it presence or absence should be cached

Summary: after a page has been searched for a given string the fact of it presence or ...

Status:	RESOLVED DUPLICATE of bug 691330

Alias:	None

Product:	MuPDF
Classification:	Unclassified
Component:	apps (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P4 normal
Assignee:	Tor Andersson

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-02-15 16:36 UTC by Daniel Dutkiewicz
Modified:	2011-04-10 13:57 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
proof of concept [NON WORKING] (4.79 KB, application/octet-stream) 2011-02-15 16:36 UTC, Daniel Dutkiewicz	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Daniel Dutkiewicz 2011-02-15 16:36:09 UTC

Created attachment 7253 [details]
proof of concept [NON WORKING]

i have a way to do this without making the program need to cache anything
and it can be simply extended to support caching

this is hint based like as is described in Lampson's Hints for Computer Design.

when you do a search, you remember for each page if there was a hit on it or not. this requires only O(pageno) storage. the hint starts out saying that all pages have a match on them so when searching you look at the hint if it says 'x' then the page might match so it is loaded and searched for the next hit if you search the page and don't find anything you set the hint for this page to 'o'

you have a array: char * match_hit = malloc(sizeof(char)*pageno)
you set 'match_hit' to all 'x' i.e. memset(match_hit, 'x', sizeof(char)*pageno)
you have a vector of 1s the length of the number of pages

when you start a new search you reset it to all 1s
when looking for the next hit 
- if match_hit[current page] == 'o': you skip it (i.e. no need to check its text
for possible hits because there are none)
- if you look at a page and don't find a hit: you set match_hit[pageno] = 'o'

--

to cache this for more than one search you just put match_hit in a cache keyed
by the term and when you start a search you put the current term:vector in the
cache and get the one that matches this search from the cache in this way you
could have a limited cache of say 8 or so without the need for any data
structure than one of your existing

--

i have attached a patch that does attempt at this it doesn't work. for some reason it after it reaches the end of the document it starts marking off pages that do have a match on them until there are none left and it gets stuck in a loop

Comment 1 Tor Andersson 2011-04-10 13:57:56 UTC


*** This bug has been marked as a duplicate of bug 691330 ***