Created attachment 6239 [details] test.ps If I run the following command (where test.ps is the attached file): gs -dSAFER -dCompatibilityLevel=1.4 -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sstdout=%stderr -sOutputFile=test.pdf -dSAFER -dCompatibilityLevel=1.4 -c .setpdfwrite -f test.ps then I get a pdf file which seems incorrect. Indeed, running pdffonts test.pdf produces (in the middle of the output) the error "Error: Illegal entry in bfrange block in ToUnicode CMap". Also xpdf produces this error. I am running gs from the package ghostscript-8.71-6.fc12.i686 on an up to date Fedora 12. The problem does not appear on a ubuntu machine running ghostscript-8.61.
I'm unable to reproduce this. I started on Windows with the current HEAD revision, and then moved to Fedora 12. Still unable to reproduce the problem I downloaded the 8.71 release code from sourceforge and rebuilt it. Using the specified command line (no need to specify -dSAFER and -dCompatibilityLevel twice by the way) I still get a PDF file which opens with evince and causes no problems with pdffonts. I'm closing this as 'worksforme' with the suggestion that you try picking up and building the current version of Ghostscript from source, rather than using the Fedora package. The various Linux distributors do make custom changes to Ghostscript, and sometimes these can cause problems.
Created attachment 6243 [details] pdf file written with r11151 pdf file written with r11151 debug build. Both xpdf and pdffonts complains, but seems to work anyway. I am attaching the file just in case kens want to look further. OTOH, if it isn't a fatal error, it probably should be "warning". $xpdf /tmp/test.pdf Error: Illegal entry in bfrange block in ToUnicode CMap Error: Illegal entry in bfrange block in ToUnicode CMap Error: Illegal entry in bfrange block in ToUnicode CMap Error: Illegal entry in bfrange block in ToUnicode CMap $ pdffonts test.pdf name type emb sub uni object ID ------------------------------------ ----------------- --- --- --- --------- Error: Illegal entry in bfrange block in ToUnicode CMap IQMXLL+CMR10 Type 1C yes yes no 10 0 WWKJIK+CMSY10 Type 1C yes yes yes 8 0
I think this should probably be considered a xpdf/pdffonts bug - acrobat reader on linux also read the file happily, so xpdf/pdffonts seems to be needlessly alarming.
I'm not sure where the ToUnicode information is coming from in the output. However I can't see anything actually wrong with it either. Admittedly specifying a range of 1 character is inefficient, but it doesn't seem to be illegal from a reading of the spec. I'm assuming that pdffonts and xpdf are complaining that the low and high values of the range are the same, and as I say while this is unusual there's nothing obvious in the spec that says this is invalid (as long as the count of the range is 1, which this is). ToUnicode CMaps are only used when copying text, so aren't tremendously useful anyway. Although Acrobat isn't complaining, it doesn't look like its using it either, so it may be worth looking further at this.
I know nothing about pdf but just in case it might help: debugging pdffonts on the pdf file given by Hin-Tak Leung shows that it complains when reading the token <0021> (which if I understand correctly is the first token after beginbfrange, that is, srcCodeLo), and the reason it complains is not because srcCodeLo is equal to srcCodeHi, but because 0021 is 4 digits long. At this point pdffonts is expecting only 2 digits (apparently because this is supposed to be an 8-bit number). Is this a problem with pdffonts or is the pdf file incorrect ? For what it's worth, pdfedit also produces the same error message.
(In reply to comment #5) > the pdf file given by Hin-Tak Leung shows that it complains when reading the > token <0021> (which if I understand correctly is the first token after > beginbfrange, that is, srcCodeLo), There are two entries with the value <0021>, srcCodeLo and srcCodeHi are both equal to <0021> in the file I looked at: 1 beginbfrange <0021><0021><2192> endbfrange > and the reason it complains is not because > srcCodeLo is equal to srcCodeHi, but because 0021 is 4 digits long. At this > point pdffonts is expecting only 2 digits (apparently because this is supposed > to be an 8-bit number). I would say that would be incorrect. The spec says its a hexadecimal number, so the number of digits is irrelevant, leading zeros should be ignored. In PostScript CMaps they most certainly are. The spec doesn't say anything about limiting the values, and since there is no limit on the potential size of a source code, this wouldn't make sense. Notice the declared codespace range: 1 begincodespacerange <0000><ffff> endcodespacerange the code space range defines the valid input values, as you can see these range from 0 to ffff. However it seems Acrobat suffers the same madness, and doesn't like leading zeros in the beginbfrange hex strings (large values are not a problem though, as expected). > Is this a problem with pdffonts or is the pdf file > incorrect ? For what it's worth, pdfedit also produces the same error message. I don't personally see a problem with the definition in the CMap. However it seems that Acrobat doesn't like it either (it doesn't copy the glyph properly), so I'm prepared to accept it as a limitation at least.
And in fact Technical note 5411, "ToUnicode mapping file tutorial" has this example (page 29): 100 beginbfrange <000A> <000B> <0384> <0017> <0019> <0388> ... Altering the ToUnicode CMap in the test file from : 1 beginbfrange <0021><0021><2192> endbfrange To: 1 beginbfrange <21><21><2192> endbfrange correctly pastes the right arrow glyph into Wordpad, showing that Acrobat is using the Unicode values (when not using the values, a '!' is pasted instead). This shows that its not the fact that the input codes are coincident, nor is it some white space issue, its simply that Acrobat doesn't like the leading zeros. These are documented as being correct in the tutorial, so technically I would say this is an Acrobat bug. So as usual we will strive to be bug-compatible with Acrobat instead of following the documentation. Its also worth noting that the tutorial says (page3): "the following “codespacerange” definition, without exception, shall always be used: 1 begincodespacerange <0000> <FFFF> endcodespacerange" Converting the test PostScript file to PDF using Acrobat Distiller 9 produces: 1 begincodespacerange <03> <03> endcodespacerange So Adobe seems to be pretty comprehensively ignoring their document. The assumption of two byte keys for ToUnicode CMaps is pretty tightly interwoven into the pdfwrite ToUnicode CMap emission, so this might be quite tricky to work around.
Revision 11170 updates pdfwrite to emit ToUnicode CMaps which Acrobat 9 seems happy to read (at least the specimen file works). Please see the submission log for more details, especially caveats regarding testing. Patch and log here: http://ghostscript.com/pipermail/gs-cvs/2010-May/010942.html