692009 – Regression from 8.71 when handling fonts with "ToUnicode CMaps"?

Bug 692009 - Regression from 8.71 when handling fonts with "ToUnicode CMaps"?

Summary: Regression from 8.71 when handling fonts with "ToUnicode CMaps"?

Status:	RESOLVED INVALID

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Writer (show other bugs)
Version:	master
Hardware:	PC Linux

Importance:	P4 minor
Assignee:	Ken Sharp

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-03-01 19:28 UTC by pipitas
Modified:	2011-06-17 09:31 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description pipitas 2011-03-01 19:28:13 UTC

I've applied the "simple" commandline

  gs -o output.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress input.pdf

to a set of PDFs. (My job was to test Ghostscript's capabilities to do simple "preflight" fixes like: embed missing fonts, etc.)  On the resulting PDF output I use "pdffonts output.pdf" in order to check what happened to the fonts.


Here is one sample PDF which shows a sort of "regression" from v8.71 to v9.02svn (sorry, I can't test v9.01 release right now). On the original PDF, pdffonts (the Poppler-based version) gives this output:


kp@kpuntu:~$ pdffonts bad-orig-#NNNNN.pdf 
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
PZOXVN+FrutigerLTCom-Light           TrueType          yes yes yes      7  0
PZOXVN+FrutigerLTCom-Roman           TrueType          yes yes yes     13  0
QHYDJL+MinionPro-Regular             Type 1C           yes yes yes      8  0
QHYDJL+FrutigerLTCom-Light           CID TrueType      yes yes yes     19  0
PZOXVN+MinionPro-It                  Type 1C           yes yes yes     14  0
ORERHP+MinionPro-Bold                Type 1C           yes yes yes     15  0


This output looks OK to me. (With the "bad-orig" filename I don't mean to say that the original PDF is "bad". I mean to say that this file will give a "bad" result when treated by gs9.0x...) Note, that according to pdffonts, all used fonts are embedded as subsets, and all fonts do have a "ToUnicode CMap".

After v8.71's treatment of the original, pdffonts returns this on the output, which looks OK to me:

kp@kpuntu:~$ pdffonts gs871-bad-orig-#NNNNN.pdf 
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
TCBQTQ+FrutigerLTCom-Light-Identity-H CID TrueType      yes yes yes     22  0
GNPESZ+MinionPro-Bold                Type 1C           yes yes no      19  0
UEFSOY+FrutigerLTCom-Roman           TrueType          yes yes no      15  0
NQPBKA+MinionPro-Regular             Type 1C           yes yes yes      8  0
YRHNWT+MinionPro-It                  Type 1C           yes yes no      17  0
XLWRJU+FrutigerLTCom-Light           TrueType          yes yes no      11  0


Note however, that now 4 of the original "ToUnicode CMaps" have disappeared, and that one font has acquired the additional suffix "-Identity-H".

After v9.02svn's treatment of the original, pdffonts returns this on the output PDF, which definitely looks not OK to me:

kp@kpuntu:~$ pdffonts gs902svn-bad-orig-#NNNNN.pdf 
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
TCBQTQ+FrutigerLTCom-Light-Identity-H CID TrueType      yes yes yes     22  0
YRHNWT+MinionPro-It                  Type 1C           yes yes no      17  0
XLWRJU+FrutigerLTCom-Light           TrueType          yes yes yes     11  0
GNPESZ+MinionPro-Bold                Type 1C           yes yes no      19  0
UEFSOY+FrutigerLTCom-Roman           TrueType          yes yes yes     15  0
NQPBKA+MinionPro-Regular             Type 1C           yes yes yes      8  0


Note, that gs9.0x did keep 4 "ToUnicode CMaps" and removed 2 (gs8.71 did keep 2 and remove 4). But what concerns me much more are all these "Error: Illegal entry in bfrange block in ToUnicode CMap" lines...

Now, this may well be a bug in the pdffonts utility, which claims to see an error where there is none. However, some change in pdfwrite's handling of (already embedded!) fonts between 8.71 and 9.0x may be causing a problem which was not there before.

This is not happening with *ALL* PDF files. I'll attach another example original PDF ("good-orig-#NNNNN.pdf") which has similar type of fonts embedded, created by the same application, where neither gs8.71 nor gs9.02svn cause such an effect.

I'm attaching all the files named above (replacing all NNNNN with the actual bug number).

Comment 1 pipitas 2011-03-01 19:33:34 UTC

Created attachment 7301 [details]
Original PDF file used for testing (not "bad" per se, despite its name)

Comment 2 pipitas 2011-03-01 19:35:06 UTC

Created attachment 7302 [details]
gs8.71 output of "gs -o out.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress in.pdf"

Comment 3 pipitas 2011-03-01 19:36:21 UTC

Created attachment 7303 [details]
gs9.02svn output of "gs -o out.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress in.pdf"

Comment 4 pipitas 2011-03-01 19:37:34 UTC

Created attachment 7304 [details]
Another (similar) original PDF where described problem does NOT occur with gs9.0Xsvn

Another

Comment 5 Ken Sharp 2011-06-17 09:31:57 UTC

Well, Acrobat appears to like all the files, but I suspect this is because in the 8.71 case it is ignoring the ToUnicode CMap and simply using the character codes.

See revision 11170 (Bug #691274), where the first change was made because we were writing an invalid ToUnicode CMap. This altered the emission to follow the specification for CMaps in general by emitting a single byte where possible.

This was then reverted in revision 11975 (Bug # 691849) because it caused a regression with Acrobat.

Further investigation is documented in revision 11993 (which references bug #691849 and #691862).

I suspect that pdffonts is complaining because a ToUnicode CMap is not 2-bytes and 0 padded in the bfrange (the warnings about illegal entries would seem to support this). If you read through the log in revision 11993 you'll see that as far as I can tell the ToUnicode specification does not match what Acrobat actually expects.

So technically (from reading the spec) pdffonts is correct, and the ToUnicode CMap is invalid. However in practice the CMap now matches what Acrobat expects. Its rather more important to us that Acrobat search/copy works, than conformance with a non-Adobe validator, so I don't plan to change this. 

Of course, if you can find a PDF fie which demonstrates that I'm wring in my understanding of the behaviour of Acrobat I will work on this problem some more.