Bug 707045 - Space characters translated to Invalid Unicode
Summary: Space characters translated to Invalid Unicode
Status: RESOLVED FIXED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: mupdf (show other bugs)
Version: unspecified
Hardware: PC All
: P4 normal
Assignee: MuPDF bugs
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-22 14:08 UTC by Jorj
Modified: 2023-11-14 18:52 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
test PDF (77.12 KB, application/pdf)
2023-08-22 14:11 UTC, Jorj
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jorj 2023-08-22 14:08:58 UTC
On extracting text from the attached file, all space characters are outputted as invalid unicode 0xFFFD.

Command "mutool draw -o test.txt test.pdf".

Other tools like XPDF-pdftotext do not show this behaviour.
Comment 1 Jorj 2023-08-22 14:11:10 UTC
Created attachment 24724 [details]
test PDF
Comment 2 Jorj 2023-08-26 18:07:37 UTC
For more background and user communication, please also see the corresponding Github issue: https://github.com/pymupdf/PyMuPDF/issues/2609.
Comment 3 Tor Andersson 2023-11-14 18:52:22 UTC
commit 3afbba702d491a8f8d2bbeba72115fa9ca04a411 (origin/master, origin/HEAD)
Author: Tor Andersson <tor.andersson@artifex.com>
Date:   Tue Nov 14 19:51:09 2023 +0100

    Bug 707045: Convert some ascii control characters to spaces.
    
    Tabs, newlines, etc.