Bug 707045

Summary: Space characters translated to Invalid Unicode
Product: MuPDF Reporter: Jorj <jorj.x.mckie>
Component: mupdfAssignee: MuPDF bugs <mupdf-bugs>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P4    
Version: unspecified   
Hardware: PC   
OS: All   
Customer: Word Size: ---
Attachments: test PDF

Description Jorj 2023-08-22 14:08:58 UTC
On extracting text from the attached file, all space characters are outputted as invalid unicode 0xFFFD.

Command "mutool draw -o test.txt test.pdf".

Other tools like XPDF-pdftotext do not show this behaviour.
Comment 1 Jorj 2023-08-22 14:11:10 UTC
Created attachment 24724 [details]
test PDF
Comment 2 Jorj 2023-08-26 18:07:37 UTC
For more background and user communication, please also see the corresponding Github issue: https://github.com/pymupdf/PyMuPDF/issues/2609.
Comment 3 Tor Andersson 2023-11-14 18:52:22 UTC
commit 3afbba702d491a8f8d2bbeba72115fa9ca04a411 (origin/master, origin/HEAD)
Author: Tor Andersson <tor.andersson@artifex.com>
Date:   Tue Nov 14 19:51:09 2023 +0100

    Bug 707045: Convert some ascii control characters to spaces.
    
    Tabs, newlines, etc.