Bug 689711 - ps2pdf output losing characters
Summary: ps2pdf output losing characters
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.61
Hardware: PC NetBSD
: P4 normal
Assignee: Ralph Giles
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-02-18 13:48 UTC by Mark Davies
Modified: 2009-09-14 13:09 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
input for ps2pdf exhibiting problem (48.39 KB, application/postscript)
2008-02-18 13:50 UTC, Mark Davies
Details
output from ps2pdf exhibiting problem (15.42 KB, application/pdf)
2008-02-18 13:51 UTC, Mark Davies
Details
conversion result by gs SVN revision 8351 (mpsuzuki) (15.43 KB, application/pdf)
2008-02-18 20:48 UTC, mpsuzuki
Details
debug.zip (34.62 KB, application/zip)
2008-02-18 21:26 UTC, Marcos H. Woehrmann
Details
tar file with debug output and samples (50.00 KB, application/octet-stream)
2008-02-19 03:15 UTC, Mark Davies
Details
Fix for the discussed problem (449 bytes, patch)
2008-03-05 12:57 UTC, Mark Davies
Details | Diff
alternative patch (564 bytes, patch)
2009-09-14 06:30 UTC, Alex Cherepanov
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Mark Davies 2008-02-18 13:48:19 UTC
In ghostscript 8.61 ps2pdf produces pdf's that don't display all characters 
whereas the ps2pdf from 8.60 on the same input does produce correct pdf's.
I will attach example that exhibits the problem.
Comment 1 Mark Davies 2008-02-18 13:50:29 UTC
Created attachment 3796 [details]
input for ps2pdf exhibiting problem
Comment 2 Mark Davies 2008-02-18 13:51:10 UTC
Created attachment 3797 [details]
output from ps2pdf exhibiting problem
Comment 3 Marcos H. Woehrmann 2008-02-18 19:24:24 UTC
I'm having trouble duplicating this.  With gs8.61 the missing symbols aren't missing in the generated 
PDF file.  The command line I'm using:

gs861 -dCompatibilityLevel=1.4 -sPAPERSIZE=a4 -dSAFER -sDEVICE=pdfwrite -o gs861.pdf -c 
.setpdfwrite -f 689711.ps

I believe this should be the identical file that ps2pdf is passing to Ghostscript (except for the -o 
gs861.pdf line, but likely that isn't the issue).

As a potential clue when I read the attached PDF file using gshead (r8520) I get the following warnings:

Substituting .notdef for producttext in the font YWTQOV+CMEX10
Substituting .notdef for productdisplay in the font YWTQOV+CMEX10
Substituting .notdef for summationdisplay in the font YWTQOV+CMEX10
Substituting .notdef for integraldisplay in the font YWTQOV+CMEX10
Comment 4 mpsuzuki 2008-02-18 20:48:18 UTC
Created attachment 3798 [details]
conversion result by gs SVN revision 8351 (mpsuzuki)

Excuse me, I could not reproduce the problem. I attached
my result, it includes the symbol glyphs that 3797.pdf
does not show.

Marcos, could you upload the output of -dDEBUG message?
Comment 5 mpsuzuki 2008-02-18 20:54:12 UTC
My anxiety is that my heuristic fix for bug 689495 caused this issue.
Comment 6 Marcos H. Woehrmann 2008-02-18 21:26:01 UTC
Created attachment 3799 [details]
debug.zip

Since I can't reproduce the bug either I'm not sure what good my -dDEBUG output
will be, but I've attached a zip file with the following:

my_ps_to_pdf.dbg - converting the user's ps to pdf with gs8.61
reading_my_pdf.dbg - converting my pdf file to tiff with head
reading_user_pdf.dbg - converting the user's pdf file to tiff with head

It would probably be useful if the user posted his -dDEBUG output.
Comment 7 Mark Davies 2008-02-19 03:15:15 UTC
Created attachment 3800 [details]
tar file with debug output and samples

OK, I've explored some more and it is NOT a new 8.61 issue after all but
something environmental that I don't understand.  Two machines; one with 8.60,
and one with a somewhat newer OS and 8.61 - the old version works with the
previous example input while the later doesn't.  If I then take the 8.60 gs
binary and ghostscript lib from the older machine to the newer one and run it
there it fails in the same way.  The attached tar file gives the resultant
pdf's of the 8.60 gs from both machines and the associated -dDEBUG output.
If someone can tell me how these two pdf's differ and which bit of code is
likely to be responsible for writing those bits, then I can dig some more.
Comment 8 Marcos H. Woehrmann 2008-02-19 11:28:21 UTC
I suspect there may be a difference in the Fontmap or Fontmap.GS file between the two machines (or 
perhaps a difference in the actual fonts themselves).
Comment 9 Mark Davies 2008-02-19 13:30:28 UTC
> I suspect there may be a difference in the Fontmap or Fontmap.GS file between 
> the two machines (or perhaps a difference in the actual fonts themselves).

But note that in the test that I sent the results of I copied 
the /usr/pkg/share/ghostscript/{8.60,fonts} tree from the working machine to 
the failing one so in both cases they are identical and the font that it seems 
to have issues with is in the postscript source so I can't see how there is a 
difference there.
Comment 10 mpsuzuki 2008-02-19 21:36:45 UTC
Mark, Marcos, Thank you for uploading the logs.

Although there is a difference from Mark's successful log and Mark's
failure log (in failure log, ~/.fonts, a directory for fontconfig
is scanned, but it is not scanned in successful log). The imported
font is only Times-Roman (replaced by NimbusRomNo9L-Regu), and it
would not be the problematic font causing symbol glyphs.

According to the directory /usr/pkg, I guess your platform is
NetBSD, and I'm not sure if the ghostscript you're using is
prebuilt binary package (or built by ports system with their patch
collections), or vanilla ghostscript 8.60. Please let me know more
detail about your environment, because yet both of I and Marcos
cannot reproduce the problem.
Comment 11 Mark Davies 2008-02-21 01:32:49 UTC
> According to the directory /usr/pkg, I guess your platform is
> NetBSD, and I'm not sure if the ghostscript you're using is
> prebuilt binary package (or built by ports system with their patch
> collections), or vanilla ghostscript 8.60. Please let me know more
> detail about your environment, because yet both of I and Marcos
> cannot reproduce the problem.

Yes its NetBSD and the ghostscript is from pkgsrc, build locally. The failing 
system(s) is a recent -current while the working one is a somewhat 
older -current.  I don't see anything in the pkgsrc build that would make it 
significantly different from a vanilla ghostscript (note that I'm a NetBSD and 
pkgsrc developer so I do understand what the pkgsrc build is doing).

As noted in my previous comment, given the additional testing, its likely that 
the problem is some interaction with NetBSD-current (or something else thats 
changed in my environment) but I need some help to know where to start looking.
If someone can tell me what the difference between the good and bad pdf is (at 
some sort of functional level) and roughly which bit of the ghostscript code is 
responsible for dealing with that bit then that gives a starting point to do 
some traces to pin down where/how they diverge.
Comment 12 Ken Sharp 2008-02-25 04:33:06 UTC
I've grabbed, decompressed and compared the working and non-working PDF files.
Not being conversant with TeX I'm not sure if its possible to simplify the
maths, so that the input file contains just a single symbol, but it would be
easier to diagnose if this were possible.

Anyway, the only significant difference between the two is in the inclusion of
the font CMEX10, which is the font used to render the missing symbols. It seems
that the 'bad' file has a faulty embedded font. Its been converted to CFF and
the conversion looks wrong, the string table contains some random junk. This is
probably what causes the GS error 'substituting .notdef' which Marcos noted,
because the names of the glyphs are not present in the string table. Acrobat
just does the substitution silently.

Bad font:
==========
Index: String 7
String:
Copyright (C) 1997 American Mathematical Society. All Rights Reserved
EndString
String:
CMEX10
EndString
String:
Computer Modern
EndString
String:
ts Reserved
EndString
String:
2½sÇ‚ûlt&sïHø¥d4
EndString
String:
¦Ñ
Si¢¬ßP�
EndString
String:
¼ÓS%YÅ÷ž$uAô
EndString
Endindex: String

Good font:
===========
Index: String 7
String:
Copyright (C) 1997 American Mathematical Society. All Rights Reserved
EndString
String:
CMEX10
EndString
String:
Computer Modern
EndString
String:
producttext
EndString
String:
summationdisplay
EndString
String:
productdisplay
EndString
String:
integraldisplay
EndString
Endindex: String

The font is included in the original PostScript file, and so it seems unlikely
that this is the source of the problem. Unless GS is using fonts from disk
instead of fonts defined in memory, which seems deeply unlikely. One way to be
sure would be to rerun the test, explicitly removing this font from the fontmap
(if it is there). Its also hard to see how the glyph names could be affected
this way without triggering an error when processing the incoming PDF file.

I do see (as Marcos mentioned) from the debug files that there is a difference
between the good and bad debug reports, the bad one includes
'/u/staff/mark/.fonts/Fontmap 40 2273276 938627 1417560 123162 true 1151 4 <1>',
which the good one does not.

I'm unable to think of anything else which could affect the conversion to CFF,
except of course binary changes to GS and pdfwrite.

Mark, are you using the ps2pdf script, or running GS directly to the pdfwrite
device ? If you are using ps2pdf, can I ask you to use the GS binary instead,
and tell us what your command line is please ? If you aren't sure how to go
about that, try this:

./gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=<output filename> <input
filename>

Or use Marcos' command line in comment #3.

This is just to eliminate any environment stuff which might be causing confusion
(for me, at least ;-)

Finally Mark, to answer your question, one place where you can interrupt GS is
the routine 'psf_write_type2_font', you will need to do this more than once
because there are a number of fonts to be written. When (*pfont).font_name.chars
is CMEX10 that's the font you are after.

You can probably see from the string table dumped above that its the glyph names
which are wrong, I have no idea why this would be but I suspect the actual
problem will be far from this point, when the font is interpreted or used.

The strings representing the glyph names are written into the table in the
routine 'cff_glyph_sid', and the string table is written by cff_put_Index,
called towards the end of psf_write_type2_font.

Anyway, you are welcome to have a poke, there isn't a lot we can do to help
unless we can reproduce the problem.

To me, this looks like it might be a memory corruption problem.
Comment 13 Mark Davies 2008-02-28 03:42:33 UTC
Thanks for all the detail.

> Mark, are you using the ps2pdf script, or running GS directly to the pdfwrite
> device ?

The tests were done with the command line from comment #3.

> To me, this looks like it might be a memory corruption problem.

That helped a lot. That reminded me that one of the significant differences in 
the libc between the two NetBSD versions is that the malloc() implementation 
switched from phkmalloc to jemalloc and indeed if I build gs on the current 
system but explicitly link in the phkmalloc then the sample file works.
So some malloc() misuse issue that you get away with in some malloc 
implementations but not others?
Comment 14 Ken Sharp 2008-02-28 04:12:14 UTC
> The tests were done with the command line from comment #3.

Ooops, sorry, missed that...

> the libc between the two NetBSD versions is that the malloc() implementation 
> switched from phkmalloc to jemalloc and indeed if I build gs on the current 
> system but explicitly link in the phkmalloc then the sample file works.

Hmm, interesting, but not my area of expertise ;-)

> So some malloc() misuse issue that you get away with in some malloc 
> implementations but not others?

Possible I guess, I'm not any kind of expert on Ghostscript's memory management.
It could just be that the memory moves around with a different library, but you
say the only thing you altered was to explicitly link against phkmalloc. So that
sounds to me like it really is the problem.

Trouble is, I don't have NetBSD, and don't have the expertise to debug using
that OS even if I had a version, or to debug the GS memory management. It is
still possible this is a red herring though. Unfortunately, without any way to
debug the problem, I'm not sure where to go next. 

Any ideas anyone ?
Comment 15 Mark Davies 2008-03-05 12:57:12 UTC
Created attachment 3837 [details]
Fix for the discussed problem

Matthias Drochner found the actual problem and this is his patch.

Fixes a botched pointer comparison which fails if the pointer difference
overflows the signed integer range.
Comment 16 leonardo 2008-03-11 12:51:17 UTC
Well, according to "INTERNATIONAL STANDARD ISO/IEC 9899 Second edition 1999-12-
01 Programming languages — C" the type of pointer difference is "s ptrdiff_t 
defined in the <stddef.h>" I'm not sure why the compier fails to convert both 
sides of the comparison to same numeric type, I believe it is compiler error. 
But the suggested conversion to "unsigned long" looks incorrect as well, 
because (at least teoretically) s ptrdiff_t may be long long. So I suggest to 
cast *both* sides to s ptrdiff_t.

Anyway thanks a lot to Matthias Drochner for the problem localization, which I 
think was not simple. 

I will pass this bug to Ralph who works on portability issues.
Comment 17 Alex Cherepanov 2009-09-14 06:30:46 UTC
Created attachment 5373 [details]
alternative patch

This patch avoids the question about the type of the pointer differences.
Every usable compiler should be able to add a number to a pointer
or compare 2 pointers.

I cannot check whether this patch solves the problem because old code works
just fine for me. Regression testing (on x86_64 Linux) finds no differences
after applying this patch.
Comment 18 Alex Cherepanov 2009-09-14 13:09:56 UTC
The patch from the comment #17 has been committed as a rev. 10069.
Regression testing shows no differences.