Bug 690714 - Missing Characters after PDF Write
Summary: Missing Characters after PDF Write
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.70
Hardware: Macintosh MacOS X
: P4 normal
Assignee: Alex Cherepanov
URL:
Keywords:
: 691004 (view as bug list)
Depends on:
Blocks:
 
Reported: 2009-08-18 13:12 UTC by Dylan
Modified: 2010-05-02 22:45 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
Problem file (120.58 KB, application/pdf)
2009-08-18 13:14 UTC, Dylan
Details
h4i7.pdf - simplified sample file. (32.21 KB, application/pdf)
2010-04-26 19:09 UTC, Alex Cherepanov
Details
patch (978 bytes, patch)
2010-04-28 21:56 UTC, Alex Cherepanov
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dylan 2009-08-18 13:12:52 UTC
I came across a PDF that renders fine after a pdfwrite, but is missing random characters.  I've tried in 
8.7.0, as well as 8.6.4, and get the same results.   The command I ran in order to test this was the 
following:

gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=exo_result.pdf -dBATCH exo.pdf

My current environment is as follows:

GPL Ghostscript 8.70 (2009-07-31)
Copyright (C) 2009 Artifex Software, Inc.  All rights reserved.
Usage: gs [switches] [file1.ps file2.ps ...]
Most frequently used switches: (you can use # in place of =)
 -dNOPAUSE           no pause after page   | -q       `quiet', fewer messages
 -g<width>x<height>  page size in pixels   | -r<res>  pixels/inch resolution
 -sDEVICE=<devname>  select device         | -dBATCH  exit after last file
 -sOutputFile=<file> select output file: - for stdout, |command for pipe,
                                         embed %d or %ld for page #
Input formats: PostScript PostScriptLevel1 PostScriptLevel2 PostScriptLevel3 PDF
Default output device: x11alpha
Available devices:
   alc1900 alc2000 alc4000 alc4100 alc8500 alc8600 alc9100 ap3250 appledmp
   atx23 atx24 atx38 bbox bit bitcmyk bitrgb bitrgbtags bj10e bj10v bj10vh
   bj200 bjc600 bjc800 bjc880j bjccmyk bjccolor bjcgray bjcmono bmp16 bmp16m
   bmp256 bmp32b bmpgray bmpmono bmpsep1 bmpsep8 cairo ccr cdeskjet cdj1600
   cdj500 cdj550 cdj670 cdj850 cdj880 cdj890 cdj970 cdjcolor cdjmono cdnj500
   cfax cgm24 cgm8 cgmmono chp2200 cif cljet5 cljet5c cljet5pr coslw2p
   coslwxl cp50 declj250 deskjet devicen dfaxhigh dfaxlow dj505j djet500
   djet500c dl2100 dnj650c epl2050 epl2050p epl2120 epl2500 epl2750 epl5800
   epl5900 epl6100 epl6200 eplcolor eplmono eps9high eps9mid epson epsonc
   epswrite escp escpage faxg3 faxg32d faxg4 fmlbp fmpr fs600 gdi hl1240
   hl1250 hl7x0 hpdj1120c hpdj310 hpdj320 hpdj340 hpdj400 hpdj500 hpdj500c
   hpdj510 hpdj520 hpdj540 hpdj550c hpdj560c hpdj600 hpdj660c hpdj670c
   hpdj680c hpdj690c hpdj850c hpdj855c hpdj870c hpdj890c hpdjplus
   hpdjportable ibmpro ijs imagen imdi inferno iwhi iwlo iwlq jetp3852 jj100
   jpeg jpegcmyk jpeggray la50 la70 la75 la75plus laserjet lbp310 lbp320
   lbp8 lex2050 lex3200 lex5700 lex7000 lips2p lips3 lips4 lips4v lj250
   lj3100sw lj4dith lj4dithp lj5gray lj5mono ljet2p ljet3 ljet3d ljet4
   ljet4d ljet4pjl ljetplus ln03 lp1800 lp1900 lp2000 lp2200 lp2400 lp2500
   lp2563 lp3000c lp7500 lp7700 lp7900 lp8000 lp8000c lp8100 lp8200c lp8300c
   lp8300f lp8400f lp8500c lp8600 lp8600f lp8700 lp8800c lp8900 lp9000b
   lp9000c lp9100 lp9200b lp9200c lp9300 lp9400 lp9500c lp9600 lp9600s
   lp9800c lps4500 lps6500 lq850 lx5000 lxm3200 lxm5700m m8510 mag16 mag256
   md1xMono md2k md50Eco md50Mono md5k mgr4 mgr8 mgrgray2 mgrgray4 mgrgray8
   mgrmono miff24 mj500c mj6000c mj700v2c mj8000c ml600 necp6 npdl nullpage
   oce9050 oki182 oki4w okiibm omni oprp opvp paintjet pam pbm pbmraw pcl3
   pcx16 pcx24b pcx256 pcx2up pcxcmyk pcxgray pcxmono pdfwrite pgm pgmraw
   pgnm pgnmraw photoex picty180 pj pjetxl pjxl pjxl300 pkm pkmraw pksm
   pksmraw plan9bm png16 png16m png256 png48 pngalpha pnggray pngmono pnm
   pnmraw ppm ppmraw pr1000 pr1000_4 pr150 pr201 ps2write psdcmyk psdrgb
   psgray psmono psrgb pswrite pxlcolor pxlmono r4081 rinkj rpdl samsunggdi
   sgirgb sj48 spotcmyk st800 stcolor sunhmono svg t4693d2 t4693d4 t4693d8
   tek4696 tiff12nc tiff24nc tiff32nc tiffcrle tiffg3 tiffg32d tiffg4
   tiffgray tifflzw tiffpack tiffsep uniprint wtscmyk wtsimdi x11 x11alpha
   x11cmyk x11cmyk2 x11cmyk4 x11cmyk8 x11gray2 x11gray4 x11mono x11rg16x
   x11rg32x xcf xes
Search path:
   . : %rom%Resource/Init/ : %rom%lib/ :
   /usr/local/share/ghostscript/8.70/Resource/Init :
   /usr/local/share/ghostscript/8.70/lib :
   /usr/local/share/ghostscript/8.70/Resource/Font :
   /usr/local/share/ghostscript/fonts :
   /usr/local/share/fonts/default/ghostscript :
   /usr/local/share/fonts/default/Type1 :
   /usr/local/share/fonts/default/TrueType : /usr/lib/DPS/outline/base :
   /usr/openwin/lib/X11/fonts/Type1 : /usr/openwin/lib/X11/fonts/TrueType
Initialization files are compiled into the executable.

99.99% of our PDF's translate fine, and this is the only document that we've run into problems with.
Comment 1 Dylan 2009-08-18 13:14:53 UTC
Created attachment 5308 [details]
Problem file

This is the file that has missing characters after parsing:
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=exo_result.pdf -dBATCH exo.pdf
Comment 2 Ken Sharp 2009-10-26 08:03:28 UTC
This does not appear to be a problem with pdfwrite. When run through GS to the
display device there are a number of zlib errors (which are ignored) and a
number of 'substitutuing .notdef for glyph ' errors. (The glyph errors are not
present when the device is pdfwrite)

I'm unsure as to whether the problem is the decompression of streams (zlib) or
somethign we don't like about the font (Trebuchet, apparently). However the
pdfwrite output file contains the usual hollow square for .notdef where the
glyphs are 'missing'. The effect does not appear to be random, its the same
glyphs missing every time.

Its likely this is simply a bad PDF file, I notice the PDF Producer field is
empty, never a good sign when the software is embarrassed enough to remain
anonymous. The fact that Acrobat reads it and doesn't complain doesn't mean the
file is valid, Acrobat silently fixes all kinds of broken files.

I'm reassigning this to Alex to see if the problem is zlib, the PDF interpreter
or the font.
Comment 3 Alex Cherepanov 2010-04-26 19:09:12 UTC
Created attachment 6234 [details]
h4i7.pdf - simplified sample file.

This is a bug in Ghostscript.
The sample file refers to the same font descriptor through
different font dictionaries with different encodings. It looks like
Ghostscript doesn't expect this.
Comment 4 Alex Cherepanov 2010-04-28 21:56:33 UTC
Created attachment 6238 [details]
patch

Associate cached font instance with PDF font resource dictionary instead of
font descriptor. The latter may be shared by font resource dictionaries with
different encodings causing incorrect rendering.

Besides fixing the problem, the patch causes minor rendering differences in
Bug690837.pdf . Some of the characters get shifted by one pixel left or right
relatively to the previous position.
Comment 5 Alex Cherepanov 2010-04-28 22:00:08 UTC
The patch has been committed as a rev. 11148.
Comment 6 Alex Cherepanov 2010-05-02 22:45:45 UTC
*** Bug 691004 has been marked as a duplicate of this bug. ***