Created attachment 16681 [details] mupdf-gl showing example_033 PDF with search for "commodo", zoomed to show problem If I download the example 033 PDF from the tcpdf website (https://tcpdf.org/files/examples/example_033.pdf), open it with mupdf-gl, built from the latest git source (commit 6c383df4f897c3ceb562807ed92fe6075efffdaf), and search for "commodo", the last result is shown (see attached image) with two vertical lines, before and after the second 'm', that are darker than the rest of the rectangle. These are caused by overlapping search result rectangles, resulting from splitting the last result there into three rectangles. To demonstrate that, I created a JavaScript file (test.js, to be attached to my next comment), that prints the coordinates for each search result rectangle, and creates an image illustrating the split and the overlap. when I run, with mutool from the same build as above: mutool run test.js I get: Result 1: Xul = 185.01202392578126 Yul = 204.28689575195313 Xur = 225.01202392578126 Yur = 204.28689575195313 Xll = 185.01202392578126 Yll = 217.62689208984376 Xlr = 225.01202392578126 Ylr = 217.62689208984376 Result 2: Xul = 216.7663116455078 Yul = 349.3339538574219 Xur = 266.4544677734375 Yur = 349.3339538574219 Xll = 216.7663116455078 Yll = 360.9745788574219 Xlr = 266.4544677734375 Ylr = 360.9745788574219 Result 3: Xul = 185.32339477539063 Yul = 459.0024719238281 Xur = 203.61138916015626 Yur = 459.0024719238281 Xll = 185.32339477539063 Yll = 471.93548583984377 Xlr = 203.61138916015626 Ylr = 471.93548583984377 Result 4: Xul = 202.3243865966797 Yul = 459.0024719238281 Xur = 211.10838317871095 Yur = 459.0024719238281 Xll = 202.3243865966797 Yll = 471.93548583984377 Xlr = 211.10838317871095 Ylr = 471.93548583984377 Result 5: Xul = 209.82138061523438 Yul = 459.0024719238281 Xur = 225.1753692626953 Yur = 459.0024719238281 Xll = 209.82138061523438 Yll = 471.93548583984377 Xlr = 225.1753692626953 Ylr = 471.93548583984377 and an image (to be attached to my third comment). Note that for results 3-5, all upper Ys are the same, all lower Ys are the same, and the right Xs of each result are larger than left Xs of the next one. What I should be getting, is something like: [...] Result 3: Xul = 185.32339477539063 Yul = 459.0024719238281 Xur = 225.1753692626953 Yur = 459.0024719238281 Xll = 185.32339477539063 Yll = 471.93548583984377 Xlr = 225.1753692626953 Ylr = 471.93548583984377
Created attachment 16683 [details] test.js mentioned in the first comment
Created attachment 16686 [details] The image created by running test.js
commit eaa4040b69fbb01f77056a4c40f7404627bc499b Author: Tor Andersson <tor.andersson@artifex.com> Date: Wed Jan 9 15:35:30 2019 +0100 Bug 700466: Use same quad merging threshold for text search as selection.
Created attachment 16722 [details] Image created by running patched test.js with fixed mutool (In reply to Tor Andersson from comment #3) > commit eaa4040b69fbb01f77056a4c40f7404627bc499b > Author: Tor Andersson <tor.andersson@artifex.com> > Date: Wed Jan 9 15:35:30 2019 +0100 > > Bug 700466: Use same quad merging threshold for text search as selection. That commit gives the desired result for the example I brought, but doesn't solve the underlying problem. Another example: If I download the PDF from http://beta.hebrewbooks.org/pagefeed/hebrewbooks_org_9717_1.pdf (saved as HebrewBooksOrg_9717_page_1.pdf), patch test.js with: --- test.js 2019-01-13 12:55:01.155221700 +0200 +++ test1.js 2019-01-13 12:55:06.818031600 +0200 @@ -1,4 +1,4 @@ -var doc = new Document('example_033.pdf'); +var doc = new Document('HebrewBooksOrg_9717_page_1.pdf'); var page = doc.loadPage(0); var tansform = [4,0,0,4,0,0]; @@ -6,7 +6,7 @@ var pixmap = page.toPixmap(tansform, DeviceRGB); var device = new DrawDevice(Identity, pixmap); -var arr = page.search('commodo'); +var arr = page.search('\u05d4\u05e0\u05e9\u05de'); // He-Nun-Shin-Mem var i; for(i = 0; i < arr.length; i++) { @@ -31,4 +31,4 @@ } device.close(); -pixmap.saveAsPNG('example_033.png'); +pixmap.saveAsPNG('HebrewBooksOrg_9717_page_1.png'); and run it with mutool built from the latest git (commit eaa4040b69fbb01f77056a4c40f7404627bc499b), I get 11 rectangles (where I should be getting 6 now), and the image attached. The commit has improved the situation, because if I run the patched test.js with an older version of mutool (built from commit 4f08f6adbbb7d6f5d3dc0257b9fc0bb79a3c55cd), I get 23 rectangles.
Created attachment 16723 [details] Image created by running patched test.js with fixed mutool (By mistake I uploaded the wrong image)
Created attachment 25587 [details] Evince selection example
Created attachment 25588 [details] KOReader selection example
Created attachment 25589 [details] PDF used to make the selection examples
Hello, using KOReader, which uses MuPDF as backend, I've found a bug I first thought it was of KOReader. I create an issue there, but they've found that it was from MuPDF. The issue is about selection boxes. Here is how is supposed to be like, in Evince: https://bugs.ghostscript.com/attachment.cgi?id=25587 Here is what happen with MuPDF in KOReader: https://bugs.ghostscript.com/attachment.cgi?id=25588 And here is the page of the pdf used in the images: https://bugs.ghostscript.com/attachment.cgi?id=25589 And, finally a diagnostic made by one of KOReader's maintainers: <span font="DLYZCT+LatinModernMath-Regular" wmode="0" bidi="0" trm="9.96264 0 0 9.96264"> <g unicode="𝜆" glyph="4470" x="236.69" y="353.349" adv=".583"/> <g unicode="𝑥" glyph="1319" x="242.49822" y="353.349" adv=".572"/> <g unicode="." glyph="15" x="248.19684" y="353.349" adv=".278"/> <g unicode="𝑥" glyph="1319" x="250.96645" y="353.349" adv=".572"/> <g unicode="𝑧" glyph="1321" x="256.66508" y="353.349" adv=".465"/> </span> <span font="IWOTBY+LibreBaskerville-Regular" wmode="0" bidi="0" trm="10.161893 0 0 9.96264"> <g unicode="a" glyph="66" x="264.867" y="353.349" adv=".554"/> <g unicode="n" glyph="79" x="270.4967" y="353.349" adv=".689"/> <g unicode="d" glyph="69" x="277.49827" y="353.349" adv=".675"/> </span> 04/09/24-21:40:58 DEBUG dict lookup word: 𝜆𝑥𝑦.𝑦𝑥 { "124x270+497+545" } --[[table: 0x7c9d79e20440]]