Summary: | new PDF interpreter may yield an incorrect ToUnicode CMap with the presence of U+2308 LEFT CEILING in input | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Vincent Lefevre <vincent-gs> |
Component: | PDF Interpreter | Assignee: | Chris Liddell (chrisl) <chris.liddell> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | chris.liddell |
Priority: | P4 | ||
Version: | 9.56.1 | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: |
archive containing the PDF files obtained with the chartest9 script
input PDF (from the archive) file that yields the issue |
Description
Vincent Lefevre
2022-04-21 15:17:31 UTC
Created attachment 22439 [details]
input PDF (from the archive) file that yields the issue
Not generating a ToUnicode when none exists in the output is (for me) a conscious decision. The reason being that, in that case, we're basically guessing what the contents should be, and we're guessing based on the same information that is in the output file and thus available to subsequent interpreters. Personally, I feel that kind of heuristic should be left to the final consumer. The current code in git doesn't produce a ToUnicode when converting chartest9a1-uc.pdf doesn't produce a ToUnicode CMap - since the ToUnicode in the input contains no actual information, that seems to fall into the "no ToUnicode" case. The only change in behaviour in that area that I can recall was this commit: https://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=226cb507884b so I would guess that's the cause of the difference. It *seems* to now be behaving as I'd expect, but I won't close this yet in case I've misunderstood something. (In reply to Chris Liddell (chrisl) from comment #2) > The only change in behaviour in that area that I can recall was this commit: > > https://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=226cb507884b > > so I would guess that's the cause of the difference. I've just seen this comment (I did not receive the mail from Bugzilla, which might have been lost due to the major hardware failure of the storage of my VM that occurred at that time). I'll try to do some tests with this diff against the Debian package to see if this changes the behavior on my side (I'm rather busy ATM, fighting against all kinds of bugs). But first... > It *seems* to now be behaving as I'd expect, but I won't close this yet in > case I've misunderstood something. You have just closed it. So do you have any additional information or do you just confirm? (In reply to Vincent Lefevre from comment #3) <SNIP> > > It *seems* to now be behaving as I'd expect, but I won't close this yet in > > case I've misunderstood something. > > You have just closed it. So do you have any additional information or do you > just confirm? As I said, I only left it open for a few days in case I had misunderstood your description, I haven't looked into this any more. (In reply to Vincent Lefevre from comment #3) > I've just seen this comment (I did not receive the mail from Bugzilla, which > might have been lost due to the major hardware failure of the storage of my > VM that occurred at that time). I'll try to do some tests with this diff > against the Debian package to see if this changes the behavior on my side > (I'm rather busy ATM, fighting against all kinds of bugs). But first... [...] I eventually forgot to do the tests. Anyway, with Ghostscript 10.0.0 now in Debian/unstable, I can see that this bug no longer occurs. |