Bug 697708 - Arabic text search highlights the wrong letter and doesn't support shaping
Summary: Arabic text search highlights the wrong letter and doesn't support shaping
Status: UNCONFIRMED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: mupdf (show other bugs)
Version: 1.10
Hardware: PC Linux
: P4 enhancement
Assignee: MuPDF bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-04-01 14:37 UTC by Munzir Taha
Modified: 2022-03-08 16:23 UTC (History)
4 users (show)

See Also:
Customer:
Word Size: ---


Attachments
A document with two Arabic words for testing (13.73 KB, application/pdf)
2019-11-28 18:11 UTC, Munzir Taha
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Munzir Taha 2017-04-01 14:37:29 UTC
In my GNU/Linux system (Arch Linux), I launched mupdf, pressed slash so the search box appeared, typed some Arabic but nothing shows in the box. Hence, the functionality is not working.
Comment 1 Tor Andersson 2017-04-02 03:16:45 UTC
Did you use mupdf-x11 or mupdf-gl?

mupdf-x11 only supports ascii input.

mupdf-gl supports unicode input, but it does not yet do RTL reordering or other opentype layout.
Comment 2 Munzir Taha 2017-04-02 07:06:36 UTC
Thanks, you are right mupdf-gl is much better. At least, I can type and search now. However, When I search for a letter, it highlights the letter next to it!

Is this difficult to support? You are already using harfbuzz which supports Arabic shaping.

That's said, mupdf is still better than okular currently which I had to type in reverse, so thanks for this.
Comment 3 Tor Andersson 2018-11-13 13:49:31 UTC
Harfbuzz is only used for EPUB layout. It's on my TODO list to use it for the search text field as well.

However -- in PDF there is no shaping or any intelligence behind text drawing at all, it's just a sequence of commands to draw glyphs at (x, y) coordinates.

If you could provide a document where the highlighting highlights the wrong area, that would be helpful in case there's a bug with creating the bounding box areas for the glyphs.

Beware that this may prove impossible just due to how the PDF is constructed -- the PDF format hardcodes a text advance moving LTR, and depending on how the embedded fonts are constructed, we may or may not be able to detect and reverse this for selection and searching.
Comment 4 Munzir Taha 2019-11-28 18:08:59 UTC
Sorry, I didn't notice your last request for a document. However, I just tried it and I can't even type Arabic.

≻ pacman -Q mupdf-gl
mupdf-gl 1.16.1-2
Comment 5 Munzir Taha 2019-11-28 18:11:44 UTC
Created attachment 18664 [details]
A document with two Arabic words for testing

The document contains two words to test searching.
بسم الله
Comment 6 erfan_Ara 2020-09-22 10:35:03 UTC
I can confirm this in up to dated Archlinux. I can't type any Arabic,Persian,... character in search box in both mupdf & mupdf-gl. So i can't search for any arabic word. I tested this in both Wayland & X11.

> pacman -Qi mupdf-gl
Name            : mupdf-gl
Version         : 1.17.0-3
Depends On      : desktop-file-utils  freetype2  freeglut  glu  harfbuzz
                  jbig2dec  libjpeg  openjpeg2  openssl
Comment 7 Tor Andersson 2020-09-22 12:03:15 UTC
Did you build with the modified FreeGLUT that we ship with the MuPDF source release?

If you link with the system provided FreeGLUT, non-ASCII input does NOT work.

Please use our FreeGLUT version.

That said, the text input does not support RTL and shaping yet, so the text will appear wrong, but you should at least be seeing something.
Comment 8 erfan_Ara 2020-09-22 14:31:29 UTC
(In reply to Tor Andersson from comment #7)
> Did you build with the modified FreeGLUT that we ship with the MuPDF source
> release?
> 
> If you link with the system provided FreeGLUT, non-ASCII input does NOT work.
> 
> Please use our FreeGLUT version.
> 
> That said, the text input does not support RTL and shaping yet, so the text
> will appear wrong, but you should at least be seeing something.

Thank you for your fast response .So maybe i should report this to the Arch PKGBUILD maintainer of this package ,because they don't use your libs for the builds.
Comment 9 Philipp Rösner 2022-01-03 23:43:53 UTC
Hello, 
I built mupdf-x11 with the bundled third party libraries and still can't search for non-ASCII text via the '/' shortcut in mupdf-x11.
Is it correct then, that searching for non-ASCII text is currently only possible in mupdf-gl?
Comment 10 Tor Andersson 2022-03-08 16:23:31 UTC
Correct. Non-ASCII search only works in mupdf-gl, and then only if you build with our provided FreeGLUT fork.