Bug 707960

Summary: Simple text extraction changed from Mupdf 1.22 to 1.24.8
Product: MuPDF Reporter: Ardo <aldo.w.buratti>
Component: mupdfAssignee: MuPDF bugs <mupdf-bugs>
Status: RESOLVED DUPLICATE    
Severity: normal CC: sebastian.rasmussen
Priority: P2    
Version: 1.24.8   
Hardware: PC   
OS: Windows 10   
Customer: Word Size: ---
Attachments: Text with ligatures and accented letters

Description Ardo 2024-08-16 23:55:41 UTC
Created attachment 25957 [details]
Text with ligatures and accented letters

Please, see the attached PDF (ligatures.pdf)
 Note there're ligatures (fi) (fl) and other non-ascii characters...

Here 's the text extracted with mutool (from MuPdf 1.22)
  mutool convert -F text -O preserve-ligatures  ligatures.pdf
---
Zzz..
L’ape si pos`o sul minu-
scolo fiore blu fluore-
scente.
----


and here the result with the new mutool (from MuPdf 1.24.8)
---
Zzz..
L’ape si pos`
o sul minu-
scolo fiore blu fluore-
scente.
---

As you can see, the ligatures are still OK, but, the "accented o" is translated 
in the first case as
`o

and in the second case as
`<newline>o

----

Now, I would have expected the "accented o" to be translated as u00F2, but more importantly, I don't see the reason why a new-line is inserted between the accent and the "o".
What is the correct result ?
Comment 1 Sebastian Rasmussen 2024-09-02 21:25:39 UTC
Bisecting reveals that the added new line issue was fixed by 707859 recently included in 1.24.9.

*** This bug has been marked as a duplicate of bug 707859 ***