Bug 695259 - both (incorrect) B/W and (correct) AA rendering of Libertine font in same PDF output
Summary: both (incorrect) B/W and (correct) AA rendering of Libertine font in same PDF...
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 9.14
Hardware: PC Linux
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-24 09:52 UTC by Werner Lemberg
Modified: 2014-05-29 06:44 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments
Input file for ps2pdf. (597.42 KB, application/postscript)
2014-05-24 09:52 UTC, Werner Lemberg
Details
Result of ps2pdf. (87.31 KB, application/pdf)
2014-05-24 09:53 UTC, Werner Lemberg
Details
modified input file (597.42 KB, application/postscript)
2014-05-26 00:55 UTC, Ken Sharp
Details
more bitmaps in rendering (682.63 KB, application/postscript)
2014-05-26 08:06 UTC, Werner Lemberg
Details
Result of ps2pdf. (131.63 KB, application/pdf)
2014-05-26 08:07 UTC, Werner Lemberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Werner Lemberg 2014-05-24 09:52:33 UTC
Created attachment 10927 [details]
Input file for ps2pdf.

The attached PS file gets displayed fine with gs.  However, after calling

  ps2pdf zzz.ps

I see the message

  GPL Ghostscript 9.14:
    Can't embed the complete font LinLibertineO
    as it is too large, embedding a subset.

twice, and the first `o' glyph gets apparently rendered without AA.

The problem was also present in version 9.06.
Comment 1 Werner Lemberg 2014-05-24 09:53:25 UTC
Created attachment 10928 [details]
Result of ps2pdf.
Comment 2 Ken Sharp 2014-05-26 00:35:49 UTC
The problem is in the PostScript file. Lilypond create, frankly, very bad PostScript, all text is drawn using the glphshow operator,which allows arbitrary glyphs to be pulled directly from a font by name, rather than using the Encoding vector, which is the preferred method.

In terms of printing this defeats caching, and is therefore very inefficient, in terms of creating PDF this simply can't be done, there is no equivalent of glyphshow and all glyphs must be accessed via the encoding vector.

So, when faced with this file, we must create an encoding vector which maps the actual glyphs used into set positions, then access them from those positions. However, in this case we find that the code uses the same glyph with two different glyph names:

29.7991 -16.8697 moveto /LinLibertineO 3.25488281 output-scale div selectfont
1.1798 0.0000 0.0000 /omicron
1 print_glyphs

26.7836 -14.0697 moveto /LinLibertineO 3.25488281 output-scale div selectfont
1.2292 0.0000 0.0000 /o
1 print_glyphs

That is, the glyph is referenced as both /omicron and /o. This leads to a collision when we try to create the Encoding vector, and the code is unable to deal with it. As a result the second instance (which, bizarrely, is the first case in the reading order) can't be encoded into the font, and so we 'fall back' to rendering the glyph as an image.

Hence the different appearance, the 'first' glyph is not a glyph at all, its a bitmap.

Because this is a rare case caused by very poor PostScript generation, and the final result is readable, we don't propose to do anything about this.
Comment 3 Ken Sharp 2014-05-26 00:55:06 UTC
Created attachment 10932 [details]
modified input file

Attached here is a copy of the input file with /omicron replaced by /o, this prevents the collision in the encoding vector generation and produces a PDF file where the initial 'o' is not converted to an image.

Note that the warning about the font size is benign, pdfwrite will simply emit two (or more) subsets for very large fonts.
Comment 4 Werner Lemberg 2014-05-26 03:12:23 UTC
Thanks for the analysis.  The PS code generated by Lilypond is quite intentionally very primitive, since we rely on ps2pdf to fix all issues.  Using `glyphshow' is very convenient, since we don't have to handle encoding vectors at all.

Until now: This report is the first time where ps2pdf fails in a fundamental way...

What approach do you recommend, given that we always use ps2pdf?  Can you provide links?
Comment 5 Ken Sharp 2014-05-26 04:53:53 UTC
(In reply to Werner Lemberg from comment #4)
> Thanks for the analysis.  The PS code generated by Lilypond is quite
> intentionally very primitive, since we rely on ps2pdf to fix all issues. 
> Using `glyphshow' is very convenient, since we don't have to handle encoding
> vectors at all.
> 
> Until now: This report is the first time where ps2pdf fails in a fundamental
> way...
> 
> What approach do you recommend, given that we always use ps2pdf?  Can you
> provide links?

Well there's nothing actually *wrong* with what you are doing, in PostScript, but there's no way to replicate that in PDF, so we must manufacture an Encoding for you in order to write a PDF file if you don't supply one.

My opinion would be that the best way to deal with this would be to create a properly encoded font (or fonts) and use those, using the show operator (and its many variants) to emit the glyphs. Note that this allows you to access all the show variants which can do useful things with text.

Now that may be a lot of effort, given you have a solution which works mostly adequately. So the next best option would be not to emit the same glyph under different names. The reason this fails is simply because you are using the names /o and /omicron which end up referencing the same glyph position in the encoding. Since these are (or seem to be) the same actual glyph, just use one or the other, don't have two names which encode the same glyph. The problem is that we can't construct an Encoding which has two names at the same index.
Comment 6 Werner Lemberg 2014-05-26 08:06:02 UTC
Created attachment 10933 [details]
more bitmaps in rendering
Comment 7 Werner Lemberg 2014-05-26 08:07:40 UTC
Created attachment 10934 [details]
Result of ps2pdf.
Comment 8 Werner Lemberg 2014-05-26 08:13:49 UTC
Oops, here's the comment text:

I'm not sure that your analysis is completely correct.  The files in comment #6 and #7 show that the problem is not restricted to glyphs with identical glyph names – glyph `g' in the Libertine font certainly does appear only once...
Comment 9 Ken Sharp 2014-05-26 08:34:42 UTC
(In reply to Werner Lemberg from comment #8)
> Oops, here's the comment text:
> 
> I'm not sure that your analysis is completely correct.  The files in comment
> #6 and #7 show that the problem is not restricted to glyphs with identical
> glyph names – glyph `g' in the Libertine font certainly does appear only
> once...

Identical glyphs but with *different* names......

The problem is that we end up with two different names referencing the same entry in the Encoding vector, which is impossible. In the first example the names were /o and /omicron, referencing the same glyph in the font.

The more text you add to the file, the harder it is for me to debug it, especially when there are multiple collisions in the Encoding vector, but I'll try and look at the second file as well.
Comment 10 Ken Sharp 2014-05-26 08:56:35 UTC
The collision in this case is /g with /gamma, as well as /n with /nu and so on.

I just debugged more deeply through the code;

If a glyph is used which is not present in the Encoding of the font, then we are forced to make a guess at where we might encode it. (this can only happen for a glyphshow, as all other operations use the Encoding vector, or equivalent for other font types)

Because glyphshow is a comparatively rare operation in normal PostScript we simply use the first byte of the name as the character index. So if you have names which are defined as /o and /omicron, and neither are present in the font Encoding vector, then they will be identified as the same encoding position ('o', ie index 111).

The same is true for any other glyph (or glyphs) which are emitted using the glyphshow operator, are not present in the font's Encoding vector, and have the same initial byte in their name.

So basically, to fix this you will need to create font instances with appropriate Encoding vectors and use the encoded font instances.

I'm not terribly sure we can do anything about this with the current architecture of pdfwrite, even if we can it will be a lot of work.
Comment 11 Ken Sharp 2014-05-27 06:17:36 UTC
While looking at another problem, I came up with a bright idea that fixes both. I believe that commit 64dd281abf84ba7383aa85c99599b5aebea3998a will cause this to work correctly.

There is a possible downside, we may now create multiple font instances in order to encode all the glyphs which could result in some glyph being present in more than one font, thus increasing the size of the PDF file. However, the improvement in quality seems to significantly outweigh the problem (its also likely that the unwanted bitmap is larger than the extra glyph data anyway)
Comment 12 Werner Lemberg 2014-05-28 09:21:57 UTC
Great, thanks!  How can I test this?  Is there a document with instructions to compile gs from the git repository?
Comment 13 Ken Sharp 2014-05-28 09:32:31 UTC
(In reply to Werner Lemberg from comment #12)
> Great, thanks!  How can I test this?  Is there a document with instructions
> to compile gs from the git repository?

If you clone form the git repository, you'll find there's a ghostpdl/gs/doc folder
Make.htm in there has instructions for building GS.
Comment 14 Werner Lemberg 2014-05-29 06:44:57 UTC
I've tested it, and it works.  Thanks a lot!