691272 – gs produces incorrect pdf file

Bug 691272 - gs produces incorrect pdf file

Summary: gs produces incorrect pdf file

Status:	RESOLVED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Writer (show other bugs)
Version:	8.71
Hardware:	PC Linux

Importance:	P4 normal
Assignee:	Ken Sharp

URL:
Keywords:

Depends on:
Blocks:

Reported:	2010-04-30 00:21 UTC by johnsmith7219
Modified:	2010-05-03 13:31 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
test.ps (10.90 KB, application/postscript) 2010-04-30 00:21 UTC, johnsmith7219	Details
pdf file written with r11151 (4.83 KB, application/pdf) 2010-04-30 23:18 UTC, Hin-Tak Leung	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description johnsmith7219 2010-04-30 00:21:41 UTC

Created attachment 6239 [details]
test.ps

If I run the following command (where test.ps is the attached file):

gs -dSAFER -dCompatibilityLevel=1.4 -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=test.pdf -dSAFER -dCompatibilityLevel=1.4 -c .setpdfwrite -f test.ps

then I get a pdf file which seems incorrect.  Indeed, running

pdffonts test.pdf

produces (in the middle of the output) the error "Error: Illegal entry in bfrange block in ToUnicode CMap".  Also xpdf produces this error.

I am running gs from the package ghostscript-8.71-6.fc12.i686 on an up to date Fedora 12.  The problem does not appear on a ubuntu machine running ghostscript-8.61.

Comment 1 Ken Sharp 2010-04-30 08:12:14 UTC

I'm unable to reproduce this. I started on Windows with the current HEAD revision, and then moved to Fedora 12. Still unable to reproduce the problem I downloaded the 8.71 release code from sourceforge and rebuilt it. Using the specified command line (no need to specify -dSAFER and -dCompatibilityLevel twice by the way) I still get a PDF file which opens with evince and causes no problems with pdffonts.

I'm closing this as 'worksforme' with the suggestion that you try picking up and building the current version of Ghostscript from source, rather than using the Fedora package. The various Linux distributors do make custom changes to Ghostscript, and sometimes these can cause problems.

Comment 2 Hin-Tak Leung 2010-04-30 23:18:09 UTC

Created attachment 6243 [details]
pdf file written with r11151

pdf file written with r11151 debug build. Both xpdf and pdffonts complains, but seems to work anyway. I am attaching the file just in case kens want to look further. OTOH, if it isn't a fatal error, it probably should be "warning".

$xpdf /tmp/test.pdf 
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap
Error: Illegal entry in bfrange block in ToUnicode CMap

$ pdffonts test.pdf 
name                                 type              emb sub uni object ID
------------------------------------ ----------------- --- --- --- ---------
Error: Illegal entry in bfrange block in ToUnicode CMap
IQMXLL+CMR10                         Type 1C           yes yes no      10  0
WWKJIK+CMSY10                        Type 1C           yes yes yes      8  0

Comment 3 Hin-Tak Leung 2010-04-30 23:46:34 UTC

I think this should probably be considered a xpdf/pdffonts bug - acrobat reader on linux also read the file happily, so xpdf/pdffonts seems to be needlessly alarming.

Comment 4 Ken Sharp 2010-05-01 08:21:16 UTC

I'm not sure where the ToUnicode information is coming from in the output. However I can't see anything actually wrong with it either. Admittedly specifying a range of 1 character is inefficient, but it doesn't seem to be illegal from a reading of the spec.

I'm assuming that pdffonts and xpdf are complaining that the low and high values of the range are the same, and as I say while this is unusual there's nothing obvious in the spec that says this is invalid (as long as the count of the range is 1, which this is).

ToUnicode CMaps are only used when copying text, so aren't tremendously useful anyway. Although Acrobat isn't complaining, it doesn't look like its using it either, so it may be worth looking further at this.

Comment 5 johnsmith7219 2010-05-02 18:56:36 UTC

I know nothing about pdf but just in case it might help: debugging pdffonts on the pdf file given by Hin-Tak Leung shows that it complains when reading the token <0021> (which if I understand correctly is the first token after beginbfrange, that is, srcCodeLo), and the reason it complains is not because srcCodeLo is equal to srcCodeHi, but because 0021 is 4 digits long.  At this point pdffonts is expecting only 2 digits (apparently because this is supposed to be an 8-bit number).  Is this a problem with pdffonts or is the pdf file incorrect ?  For what it's worth, pdfedit also produces the same error message.

Comment 6 Ken Sharp 2010-05-03 09:10:41 UTC

(In reply to comment #5)

> the pdf file given by Hin-Tak Leung shows that it complains when reading the
> token <0021> (which if I understand correctly is the first token after
> beginbfrange, that is, srcCodeLo), 

There are two entries with the value <0021>, srcCodeLo and srcCodeHi are both equal to <0021> in the file I looked at:

1 beginbfrange
<0021><0021><2192>
endbfrange


> and the reason it complains is not because
> srcCodeLo is equal to srcCodeHi, but because 0021 is 4 digits long.  At this
> point pdffonts is expecting only 2 digits (apparently because this is supposed
> to be an 8-bit number).

I would say that would be incorrect. The spec says its a hexadecimal number, so the number of digits is irrelevant, leading zeros should be ignored. In PostScript CMaps they most certainly are.

The spec doesn't say anything about limiting the values, and since there is no limit on the potential size of a source code, this wouldn't make sense.

Notice the declared codespace range:

1 begincodespacerange
<0000><ffff>
endcodespacerange

the code space range defines the valid input values, as you can see these range from 0 to ffff.

However it seems Acrobat suffers the same madness, and doesn't like leading zeros in the beginbfrange hex strings (large values are not a problem though, as expected).

>  Is this a problem with pdffonts or is the pdf file
> incorrect ?  For what it's worth, pdfedit also produces the same error message.

I don't personally see a problem with the definition in the CMap. However it seems that Acrobat doesn't like it either (it doesn't copy the glyph properly), so I'm prepared to accept it as a limitation at least.

Comment 7 Ken Sharp 2010-05-03 09:33:32 UTC

And in fact Technical note 5411, "ToUnicode mapping file tutorial" has this example (page 29):

100 beginbfrange
<000A> <000B> <0384>
<0017> <0019> <0388>
...

Altering the ToUnicode CMap in the test file from :

1 beginbfrange
<0021><0021><2192>
endbfrange

To:

1 beginbfrange
    <21><21><2192>
endbfrange

correctly pastes the right arrow glyph into Wordpad, showing that Acrobat is using the Unicode values (when not using the values, a '!' is pasted instead). 

This shows that its not the fact that the input codes are coincident, nor is it some white space issue, its simply that Acrobat doesn't like the leading zeros. These are documented as being correct in the tutorial, so technically I would say this is an Acrobat bug.

So as usual we will strive to be bug-compatible with Acrobat instead of following the documentation.

Its also worth noting that the tutorial says (page3):

"the following “codespacerange” definition, without exception, shall always be used: 

1 begincodespacerange
  <0000> <FFFF>
endcodespacerange"

Converting the test PostScript file to PDF using Acrobat Distiller 9 produces:

1 begincodespacerange <03> <03> endcodespacerange

So Adobe seems to be pretty comprehensively ignoring their document. 

The assumption of two byte keys for ToUnicode CMaps is pretty tightly interwoven into the pdfwrite ToUnicode CMap emission, so this might be quite tricky to work around.

Comment 8 Ken Sharp 2010-05-03 13:31:30 UTC

Revision 11170 updates pdfwrite to emit ToUnicode CMaps which Acrobat 9 seems happy to read (at least the specimen file works). Please see the submission log for more details, especially caveats regarding testing.

Patch and log here:
http://ghostscript.com/pipermail/gs-cvs/2010-May/010942.html