Summary: | Search results sometimes split into horizontally overlapping rectangles | ||
---|---|---|---|
Product: | MuPDF | Reporter: | Tamir Evan <tamirevan> |
Component: | fitz | Assignee: | MuPDF bugs <mupdf-bugs> |
Status: | UNCONFIRMED --- | ||
Severity: | normal | CC: | giulitao635, tamirevan |
Priority: | P4 | ||
Version: | master | ||
Hardware: | PC | ||
OS: | Windows 7 | ||
Customer: | Word Size: | --- | |
Attachments: |
mupdf-gl showing example_033 PDF with search for "commodo", zoomed to show problem
test.js mentioned in the first comment The image created by running test.js Image created by running patched test.js with fixed mutool Image created by running patched test.js with fixed mutool Evince selection example KOReader selection example PDF used to make the selection examples |
Description
Tamir Evan
2019-01-09 09:12:39 UTC
Created attachment 16683 [details]
test.js mentioned in the first comment
Created attachment 16686 [details]
The image created by running test.js
commit eaa4040b69fbb01f77056a4c40f7404627bc499b Author: Tor Andersson <tor.andersson@artifex.com> Date: Wed Jan 9 15:35:30 2019 +0100 Bug 700466: Use same quad merging threshold for text search as selection. Created attachment 16722 [details] Image created by running patched test.js with fixed mutool (In reply to Tor Andersson from comment #3) > commit eaa4040b69fbb01f77056a4c40f7404627bc499b > Author: Tor Andersson <tor.andersson@artifex.com> > Date: Wed Jan 9 15:35:30 2019 +0100 > > Bug 700466: Use same quad merging threshold for text search as selection. That commit gives the desired result for the example I brought, but doesn't solve the underlying problem. Another example: If I download the PDF from http://beta.hebrewbooks.org/pagefeed/hebrewbooks_org_9717_1.pdf (saved as HebrewBooksOrg_9717_page_1.pdf), patch test.js with: --- test.js 2019-01-13 12:55:01.155221700 +0200 +++ test1.js 2019-01-13 12:55:06.818031600 +0200 @@ -1,4 +1,4 @@ -var doc = new Document('example_033.pdf'); +var doc = new Document('HebrewBooksOrg_9717_page_1.pdf'); var page = doc.loadPage(0); var tansform = [4,0,0,4,0,0]; @@ -6,7 +6,7 @@ var pixmap = page.toPixmap(tansform, DeviceRGB); var device = new DrawDevice(Identity, pixmap); -var arr = page.search('commodo'); +var arr = page.search('\u05d4\u05e0\u05e9\u05de'); // He-Nun-Shin-Mem var i; for(i = 0; i < arr.length; i++) { @@ -31,4 +31,4 @@ } device.close(); -pixmap.saveAsPNG('example_033.png'); +pixmap.saveAsPNG('HebrewBooksOrg_9717_page_1.png'); and run it with mutool built from the latest git (commit eaa4040b69fbb01f77056a4c40f7404627bc499b), I get 11 rectangles (where I should be getting 6 now), and the image attached. The commit has improved the situation, because if I run the patched test.js with an older version of mutool (built from commit 4f08f6adbbb7d6f5d3dc0257b9fc0bb79a3c55cd), I get 23 rectangles. Created attachment 16723 [details]
Image created by running patched test.js with fixed mutool
(By mistake I uploaded the wrong image)
Created attachment 25587 [details]
Evince selection example
Created attachment 25588 [details]
KOReader selection example
Created attachment 25589 [details]
PDF used to make the selection examples
Hello, using KOReader, which uses MuPDF as backend, I've found a bug I first thought it was of KOReader. I create an issue there, but they've found that it was from MuPDF. The issue is about selection boxes. Here is how is supposed to be like, in Evince: https://bugs.ghostscript.com/attachment.cgi?id=25587 Here is what happen with MuPDF in KOReader: https://bugs.ghostscript.com/attachment.cgi?id=25588 And here is the page of the pdf used in the images: https://bugs.ghostscript.com/attachment.cgi?id=25589 And, finally a diagnostic made by one of KOReader's maintainers: <span font="DLYZCT+LatinModernMath-Regular" wmode="0" bidi="0" trm="9.96264 0 0 9.96264"> <g unicode="𝜆" glyph="4470" x="236.69" y="353.349" adv=".583"/> <g unicode="𝑥" glyph="1319" x="242.49822" y="353.349" adv=".572"/> <g unicode="." glyph="15" x="248.19684" y="353.349" adv=".278"/> <g unicode="𝑥" glyph="1319" x="250.96645" y="353.349" adv=".572"/> <g unicode="𝑧" glyph="1321" x="256.66508" y="353.349" adv=".465"/> </span> <span font="IWOTBY+LibreBaskerville-Regular" wmode="0" bidi="0" trm="10.161893 0 0 9.96264"> <g unicode="a" glyph="66" x="264.867" y="353.349" adv=".554"/> <g unicode="n" glyph="79" x="270.4967" y="353.349" adv=".689"/> <g unicode="d" glyph="69" x="277.49827" y="353.349" adv=".675"/> </span> 04/09/24-21:40:58 DEBUG dict lookup word: 𝜆𝑥𝑦.𝑦𝑥 { "124x270+497+545" } --[[table: 0x7c9d79e20440]] |