Bug 689757 - Extra square characters rendering PDF file
Summary: Extra square characters rendering PDF file
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Text (show other bugs)
Version: master
Hardware: Macintosh MacOS X
: P1 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-03-18 19:35 UTC by Marcos H. Woehrmann
Modified: 2008-12-19 08:31 UTC (History)
0 users

See Also:
Customer: 531
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2008-03-18 19:35:25 UTC
The customer reports and I've verified that in the text box in the upper right under the "Assembly:" 
section there are several boxes that appear when the attached file is rendered via Ghostscript, see 
screenshot.png, attached.  Every version I tried, including gs8.54, gs8.62, and gshead (r8601) is the same.  
Adobe Acrobat and Apple Preview do not display the boxes.

The command line I'm using for testing:

  bin/gs -sDEVICE=tiff24nc -o test.tif -r300 ./W1187YD.PDF

I'll try and post a simplified example later.
Comment 1 Marcos H. Woehrmann 2008-03-18 19:35:48 UTC
Created attachment 3873 [details]
W1187YD.PDF
Comment 2 Marcos H. Woehrmann 2008-03-18 19:36:18 UTC
Created attachment 3874 [details]
screenshot.png
Comment 3 Marcos H. Woehrmann 2008-03-18 20:20:47 UTC
Created attachment 3875 [details]
simplified.pdf
Comment 4 Marcos H. Woehrmann 2008-03-18 20:22:07 UTC
Comment on attachment 3875 [details]
simplified.pdf

Much simplified PDF file, all but the applicable text (shown in the screenshot)
as been removed. 

BTW, I found a cool, free plugin for Acrobat Professional that allows the page
size to be reduced: http://www.windjack.com/products/freestuff.html
Comment 5 Ken Sharp 2008-03-19 02:38:58 UTC
This looks to me like Ghostscript is using the 'standard' TrueType /.notdef
glyph which is a hollow square. Not all PDF readers do this, in particular
Acrobat simply ignores glyphs in a TrueType font which are not present.

uncompressing the PDF file reveals the following text strings:

[(  \(Refer to )55(ASSEMBL)74(Y)18( DIAGRAM\).  \t )]TJ
( keeping top edges flush.\t\t)Tj
(using\tScrews )Tj


You can see the '\t' in the text strings correspond to the squares in the GS
output. I imagine the font doesn't have a matching glyph and GS (correctly) uses
the standard TrueType /.notdef glyph.

In my former employment, after much complaint from customers, we implemented a
switch to either render TrueType /.notdef glyphs, or to ignore them. I think
something similar will be required here. In my opinion, technically GS is
correct, but it doesn't match Acrobat which is regarded as the standard.

Also, annoyingly, the Producer of the file is Adobe PDF Library from Adobe
Illustrator, so I don't think we can ignore it or ask the producer to fix it.
Comment 6 leonardo 2008-03-19 12:07:04 UTC
Ghostscript does paint notdef glyph when the required glyph is absent, rather I 
didn't check whether it is the case here. Note we have had a lot of bugs 
about "boxes", which were resolved as wontfix. Maybe now we should make another 
resolition.
Comment 7 Ray Johnston 2008-03-19 12:40:11 UTC
Note that this is a high priority customer and FWIW, I like Ken's idea of
an option better than WONTFIX.

Displaying/rendering the same as Adobe Acrobat (and our competition) is generally
a (IMHO, reasonable) expectation.
Comment 8 leonardo 2008-03-19 13:47:37 UTC
The simplest resolution is like this to be inserted into .type42build :

  currentdevice .devicename /pdfwrite eq {
    /.notdef CharStrings /space get
  } {
       /.notdef CharStrings /.notdef get
  } ifelse

One problem here is that somtimes with pdfwrite currentdevice is the cache, so 
maybe will need a global flag set with with .schedule_init .
Comment 9 Ken Sharp 2008-03-20 03:55:52 UTC
Ah, I think the fact that I replied may have misled Leonardo into thinking this
is a pdfwrite-specific problem, which I don't think it is. I think the customer
is rendering the PDF file (produced by Adobe), and doesn't want the .notdef
glyphs rendered.

However the point about .type42build looks good, perhaps we could define a new
global flag (a new '-d' parameter ?) and inspect that, instead of checking if
the current device is pdfwrite ?

I do see one potential (if unlikely) problem, what happens if the font doesn't
contain a /space glyph ?

I'll shut up now, this isn't my bug...
Comment 10 Ray Johnston 2008-03-20 11:55:41 UTC
A 'space' glyph is NOT the same as ignoring the .notdef -- it moves the current
point.

I still favor Ken's original suggestion (as Jaws did) of having an option to
ignore .notdef -- if this is done by substituting a different glyph for .notdef,
it should be one that does not move the currentpoint (and this should be a
command line option).
Comment 11 leonardo 2008-03-20 19:50:15 UTC
Regarding comment #9 : I did not assume it is pdfwrite issue. It definitely is 
PDF interpreter problem. Also we don't need a new -d parameter, because the 
interpreter knows the file type. I have no objection if Ken will work on this 
bug (and other font bugs), because I'm busy with kernel.

Regarding comment #13 : Why Ray thinks that the current point must not move ? 
From old bugs I know it must, so substituting space is right at least for old 
examples. For the this bug's document it looks safe as well. 
Comment 12 Ray Johnston 2008-03-21 00:07:21 UTC
Regarding comment #11:

I think that this needs to be an option because 'strict' rendering of .notdef
characters should (AFAIK) use the definition in the font -- ignoring .notdef
character definition in a font or printing a 'space' instead is (IMHO) not
something that we should assume is the desired behavior.

Regarding the comment "From old bugs I know it must [move the currentpoint], 
so substituting space is right at least for old examples." I request that
Igor provides more details on these cases. This does not seem to be usually
expected from the TT fonts I've looked at that either have a 'small square' or
a 'null' character that doesn't print anything nor move the currentpoint.

We really need to get someone to 'own' this problem and fix it, since this is
an important issue.
Comment 13 Ken Sharp 2008-03-21 02:38:04 UTC
Since its important I'll take this one on, at least initially. If it turns out
later that I need help I'll ask to re-assign it.

I'll check the position of the glyphs, it looks to me like Acrobat is 'probably'
using the width defined in the /Widths array for the glyph. For sure it isn't
leaving no gap, because otherwise the 'using\tScrews' would run together into
one word, which it doesn't on the Acrobat display. The Widths array does have a
non-zero entry for the glyph, I can check if Acrobat is using that by modifying
the value and seeing if the positioning changes.

Right now I'm looking at a different TrueType issue (pdfwrite and PDF/A output)
so if anyone does want to jump in, feel free. Otherwise I'll get to it next.

Comment 14 leonardo 2008-03-21 12:06:00 UTC
Here are related bugs : bug 687929 bug 436099 bug 633996 bug 686959 bug 687387 
Note sometimes the problem is reported as pdfwrite problem, sometimes as PDF 
interp;reter problem. I believe they both have same nature. Likely Adobe 
ignores .notdef when rendering a PDF.

Regarding comment #12: I do not suggest to ignore .notdef in Postscript. I 
suggest to replace with space only when rendering a PDF.
Comment 15 leonardo 2008-03-21 12:08:39 UTC
The list of bugs in comment #14 is not neccesserily complete. I listed ones 
that I could fing in half hour. An additional search would be useful. The 
listed bugs were found as containing "boxes" in a comment.
Comment 16 Ray Johnston 2008-03-27 09:51:40 UTC
Fixing the assignment according to comment #13
Comment 17 Ken Sharp 2008-04-09 07:41:01 UTC
I've spent quite some time trying to work out what prompts Acrobat to render a
/.notdef glyph, and when it doesn't. I haven't been able to draw any
conclusions, but I do have some negative information.

Starting with a file produced by Distiller, the /.notdef glyphs are not rendered
in Acrobat. Its not the font name, changing the standard 'base 14' name to
something totally different makes no difference. Making the font symbolic has no
effect, adding the /.notdef glyph to a /Differences in the Encoding has no effect.

Starting with the output of pdfwrite, the /.notdef glyphs are rendered in
Acrobat. Making the font non-symbolic and removing the direct reference to
/.notdef from the /Encoding causes Acrobat to stop rendering the /.notdef glyphs.

Whenever Acrobat elides the /.notdef it appears to still use the width of the
original glyph, if it is specified in a /Widths array. If there is no /Widths
array, it seems to use the width of the /.notdef glyph.

I've got a patch which I'm just completing testing which addresses the 'issue'.
It introduces a new command line switch RENDERTTNOTDEF, which the PDF
interpreter uses to set a user parameter '\RenderTTNotdef'. The default
behaviour is to render /.notdef glyphs when in PostScript and NOT render them
when interpreting a PDF file.

The reason this is a switch, is because Acrobat does still render /.notdef
glyphs, I just can't work out the rules it is using.

I've checked the behaviour of all the prior issues that Leonardo mentions, and
done a trawl through the database for other cases. There were surprisingly few,
most are related to encoding problems and the presence of the /.notdef was to be
expected. I did find three other case, 687387, 688315 and 687929. All render as
per Acrobat with the patch, and display boxes as reported without.
Comment 18 Ken Sharp 2008-04-09 08:34:05 UTC
Patch committed as:

http://ghostscript.com/pipermail/gs-cvs/2008-April/008217.html

which resolves this, and similar issues, for me.