Bug 705621

Summary: Memory leak in `fz_new_stext_page_from_page_number` when `FZ_STEXT_PRESEVE_IMAGES` is set
Product: MuPDF Reporter: Ali Mostafavi <a.hr.mostafavi>
Component: fitzAssignee: MuPDF bugs <mupdf-bugs>
Status: UNCONFIRMED ---    
Severity: normal    
Priority: P4    
Version: 1.19.0   
Hardware: PC   
OS: Windows 10   
Customer: Word Size: ---

Description Ali Mostafavi 2022-06-25 11:01:34 UTC
Creating `fz_stext_page` with `FZ_STEXT_PRESERVE_IMAGES` option creates huge amounts of memeory leak. This is especially significant in scanned documents where every character is an image. Here is a minimal example:


int main(){
	fz_context* mupdf_context = fz_new_context(nullptr, nullptr, FZ_STORE_UNLIMITED);
	fz_register_document_handlers(mupdf_context);

	fz_document* doc = fz_open_document(mupdf_context, "path to pdf file (preferably scanned)");
	int num_pages = fz_count_pages(mupdf_context, doc);
	for (int i = 0; i < num_pages; i++) {
		fz_stext_options options;
		options.flags = FZ_STEXT_PRESERVE_IMAGES;
		fz_stext_page* page = fz_new_stext_page_from_page_number(mupdf_context, doc, i, &options);
		fz_drop_stext_page(mupdf_context, page);
	}
	return 0;
}

Actual Result: By the time we reach the `return` we use gigabytes of memeory per 100 pages.
Expected result: Minimal memory usage