Bug 700436 - gs ignores fonts in directory CIDFont after first usage
Summary: gs ignores fonts in directory CIDFont after first usage
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: master
Hardware: PC Linux
: P4 normal
Assignee: Ken Sharp
QA Contact: Bug traffic
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-01-02 12:53 UTC by Knut Petersen
Modified: 2019-03-19 17:58 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
source pdf (8.06 KB, application/pdf)
2019-01-02 12:53 UTC, Knut Petersen
Details
Emmentaler 16 CID font (60.43 KB, application/octet-stream)
2019-01-02 12:55 UTC, Knut Petersen
Details
broken pdf produced by gs master (42.44 KB, application/pdf)
2019-01-02 12:56 UTC, Knut Petersen
Details
expected result (40.86 KB, application/pdf)
2019-01-02 13:03 UTC, Knut Petersen
Details
Fix for e005c87e09f67f37ce4ae2f80f24cf9182e86d8d (856 bytes, patch)
2019-03-19 17:12 UTC, Knut Petersen
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Knut Petersen 2019-01-02 12:53:52 UTC
Created attachment 16634 [details]
source pdf

ghostscript used: git master (ef075a44f37f89)

Emmentaler-16 is a cid font used and generated in a development version of lilypond. As after commit 04a517f39cc3e2 it is impossible to use a postscript file to instruct ghostscript to use this font, Emmentaler-16 is stored in the ./CIDFont directory.

test.pdf is a pdf generated by XeTeX 3.14159265-2.6-0.99999 (TeX Live 2018/TeX Live for SUSE Linux) from test.tex. It includes two pdfs that were generated by ghostscript from postscript code generated by lilypond. Both pdfs use the Emmentaler-16 font, but that font intentionally is not included in the pdfs.

It is expected that after

gs -I. -dBATCH  -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=testfinal.pdf test.pdf

the file testfinal.pdf would correctly use Emmentaler-16 glyphs from the Emmentaler-16 font in the ./CIDFont directory. Unfortunately only references to Emmentaler-16 glyphs in the first included pdf are handled correctly. Ghostscript  incorretly attempts to replace references to Emmentyler-16 glyphs in the 2nd included pdf by glyphs from DroidSansFallback (but there are none at the CID values used).

Using pdfs without embedded fonts as intermediate files to be postprocessed by ghostscript saves more than 100MB of disk space in the final documentation of lilypond. Ken Sharp introduced commit 04a517f39cc3e2 to fix https://bugs.ghostscript.com/show_bug.cgi?id=699937, in https://bugs.ghostscript.com/show_bug.cgi?id=700367 he suggested using -Ipath as a workaround to fix our workflow broken by commit 04a517f39cc3e2. Unfortunately this bug report shows that the suggested workaround is broken.
Comment 1 Knut Petersen 2019-01-02 12:55:30 UTC
Created attachment 16635 [details]
Emmentaler 16 CID font
Comment 2 Knut Petersen 2019-01-02 12:56:44 UTC
Created attachment 16636 [details]
broken pdf produced by gs master
Comment 3 Knut Petersen 2019-01-02 13:03:01 UTC
Created attachment 16637 [details]
expected result

This pdf demonstrated the expected result, produced by a ghostscript master with commit 04a517f39cc3 reverted.
Comment 4 Ken Sharp 2019-01-02 13:39:52 UTC
(In reply to Knut Petersen from comment #0)

> test.pdf is a pdf generated by XeTeX 3.14159265-2.6-0.99999 (TeX Live
> 2018/TeX Live for SUSE Linux) from test.tex. It includes two pdfs

No, it does not. It is a single PDF file, it does not 'include' any other PDF files.


> Using pdfs without embedded fonts as intermediate files to be postprocessed
> by ghostscript saves more than 100MB of disk space in the final
> documentation of lilypond.

My response to this in the past has been that you should rethink your documentation strategy, I still think this is your best approach.

The problem is caused by using identical CIDFonts on the same page, but not including the font data. This is such a marginal use case for Ghostscript that I don't consider it urgent. I'll look into it when time permits.
Comment 5 Knut Petersen 2019-01-02 14:48:29 UTC
(In reply to Ken Sharp from comment #4)
> (In reply to Knut Petersen from comment #0)
> 
> > test.pdf is a pdf generated by XeTeX 3.14159265-2.6-0.99999 (TeX Live
> > 2018/TeX Live for SUSE Linux) from test.tex. It includes two pdfs
> 
> No, it does not. It is a single PDF file, it does not 'include' any other
> PDF files.

Yes, obviously. xelatex processed a source file that instructed it to 'include' two pdfs using \includegraphics{...}. Not a usage scenario specific to lilypond.

> My response to this in the past has been that you should rethink your
> documentation strategy, I still think this is your best approach.

All the documentation (html, pdf) is generated from a single source using texinfo and other tools. I don't see that this might be changed in the next years.

> The problem is caused by using identical CIDFonts on the same page, but not
> including the font data. This is such a marginal use case for Ghostscript
> that I don't consider it urgent. I'll look into it when time permits.

Fine ... I hope ;-)
Comment 6 Chris Liddell (chrisl) 2019-02-26 14:30:48 UTC
Since Ken has been busy with other tasks, and I've been poking around a similar area, I've taken a look at this problem.

Basically, the original problem stems from a limitation imposed upon us by the PDF interpreter being written in Postscript and, thus, having to use much of the Postscript "machinery" for things like fonts and CIDFonts.

The fix in 04a517f39cc3e2 is to deal with files that, on the same page, have an embedded CIDFont and then, later, reference non-embedded CIDFont with the same name. For a non-embedded CIDFont, we resort to loading the CIDFont resource by name. In order to minimise processing time, we don't repeatedly reload resources (CIDFonts included) that are already in VM.

In the above case, the problem arose that the embedded CIDFont was found already in VM, and thus reused - but since the embedded CIDFont was a subset, it did not work correctly.

Thus we need to identify embedded CIDFonts and avoid using those when loading a CIDFont by name. The fix uses the result of resourcestatus, but this doesn't work with your "pre-loaded" CIDFont(s) because they are not "real" Postcript resources - as far as the interpreter is concerned, they only exist in VM.

I cannot see a way to completely solve this in our code.

I can, however, change how we identify a substitute CIDFont (which would still allow us to work "correctly" with the above outlined PDFs), which you could take advantage of with a small-ish change in your code. Basically, our substitution code always adds a "/Path" key to the font dictionary - we can use that to decide whether we can safely reuse the CIDFont in VM.

In your code, where you currently have:
   
   (/mypath/emmentaler-16.cid) (r) file .loadfont
   /Emmentaler-16  /Identity-H [ /Emmentaler-16 ]  composefont  pop

You would change to:

    (/mypath/emmentaler-16.cid) (r) file .loadfont
    /Emmentaler-16 /CIDFont findresource
    <<
      exch
      {} forall
      /Path (/mypath/emmentaler-16.cid)
    >>
    /Emmentaler-16 exch /CIDFont defineresource pop
    /Emmentaler-16  /Identity-H [ /Emmentaler-16 ]  composefont  pop

Simply adding that /Path key to the Emmentaler-16 CIDFont dictionary. It would be best to make the /Path the correct path - I'm not sure of the side-effects, otherwise.

Does that sound like a workable solution?

P.S. Sorry for the extended ramble - but I want the background recorded here.
Comment 7 Knut Petersen 2019-02-28 08:00:34 UTC
(In reply to Chris Liddell (chrisl) from comment #6)

> You would change to:
> 
>     (/mypath/emmentaler-16.cid) (r) file .loadfont
>     /Emmentaler-16 /CIDFont findresource
>     <<
>       exch
>       {} forall
>       /Path (/mypath/emmentaler-16.cid)
>     >>
>     /Emmentaler-16 exch /CIDFont defineresource pop
>     /Emmentaler-16  /Identity-H [ /Emmentaler-16 ]  composefont  pop
> 
> Simply adding that /Path key to the Emmentaler-16 CIDFont dictionary. It
> would be best to make the /Path the correct path - I'm not sure of the
> side-effects, otherwise.
> 
> Does that sound like a workable solution?

As we don't use an external library but our own postscript code generator this indeed would be a very workable and easy to implement solution. Thanks in advance!

Knut
Comment 8 Chris Liddell (chrisl) 2019-03-06 11:01:12 UTC
(In reply to Knut Petersen from comment #7)
> 
> As we don't use an external library but our own postscript code generator
> this indeed would be a very workable and easy to implement solution. Thanks
> in advance!


Sorry for the delay, we were all travelling for a meeting.

Here's the diff:
http://git.ghostscript.com/?p=user/chrisl/ghostpdl.git;a=commitdiff;h=8a34b0d53ee7

If you have the chance, give it a try. If it works for you (with the above modded Postscript), we'll pull it into the next release.
Comment 9 Knut Petersen 2019-03-06 22:51:36 UTC
> Here's the diff:
> http://git.ghostscript.com/?p=user/chrisl/ghostpdl.git;a=commitdiff;
> h=8a34b0d53ee7
> 
> If you have the chance, give it a try. If it works for you (with the above
> modded Postscript), we'll pull it into the next release.

Yes, I tested the patch on top of current master (77152d4b71c) and it does work as expected. Thanks!
Comment 10 Knut Petersen 2019-03-07 09:32:00 UTC
> If you have the chance, give it a try. If it works for you (with the above
> modded Postscript), we'll pull it into the next release.

As stated in the comment above the patch works for me.

But to be fair: The patch allows 

    gs -dBATCH  -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=out.pdf definecidfonts.ps
    in.pdf

to succeed with an appropriate definecidefont.ps.

But afaics the patch completely breaks the already partially broken 

    gs -I. -dBATCH  -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=out.pdf in.pdf

that would only succeed if a suitable substitution font in ./CIDFont is loaded not more than once per page.

Wouldn't it be a better solution to teach gs to add a /ThisReallyIsASubstitutionFontFromDisk key to every substition font loaded from a CIDFont directory and to test for the presence of that key in /findCIDFont?
Comment 11 Chris Liddell (chrisl) 2019-03-07 10:40:03 UTC
(In reply to Knut Petersen from comment #10)
> > If you have the chance, give it a try. If it works for you (with the above
> > modded Postscript), we'll pull it into the next release.
> 
> As stated in the comment above the patch works for me.
> 
> But to be fair: The patch allows 
> 
>     gs -dBATCH  -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=out.pdf
> definecidfonts.ps
>     in.pdf
> 
> to succeed with an appropriate definecidefont.ps.
> 
> But afaics the patch completely breaks the already partially broken 
> 
>     gs -I. -dBATCH  -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=out.pdf in.pdf
> 
> that would only succeed if a suitable substitution font in ./CIDFont is
> loaded not more than once per page.
> 
> Wouldn't it be a better solution to teach gs to add a
> /ThisReallyIsASubstitutionFontFromDisk key to every substition font loaded
> from a CIDFont directory and to test for the presence of that key in
> /findCIDFont?

Firstly because, if a CIDFont is loaded by name from disk, it's not really a substitute, in our terms (or in Postcript terms, for that matter). But mainly because that's a far from trivial suggestion.

TBH, I'm rather confused: I was under the impression that your workflow was to explicitly pre-load CIDFonts in Postscript, before running the PDF file. Now you're saying it's not?
Comment 12 Knut Petersen 2019-03-07 11:09:12 UTC
> TBH, I'm rather confused: I was under the impression that your workflow was
> to explicitly pre-load CIDFonts in Postscript, before running the PDF file.
> Now you're saying it's not?

You were right, my workflow is to explicitly pre-load CID fonts.

I mentioned the change regarding '-I.' just to make sure that your patch is not revoked shortly after application because of that change. I don't know if someone else depends on the pre-patch behaviour ...
Comment 13 Chris Liddell (chrisl) 2019-03-07 19:06:58 UTC
I actually may have a solution that can write the /Path entry to all CIDFonts read from disk. I need to assess if there are other implications, though.
Comment 14 Chris Liddell (chrisl) 2019-03-19 11:37:26 UTC
Fixed in:

http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=e005c87e09f6


As described above, you can still have your workflow preload CIDFonts by adding the /Path key to the dictionary. In addition, "real" CIDFont resources have a /ResourcePath key added to the dictionary. This means we can, as necessary, differentiate between, embedded, substitute and "real" disk based CIDFonts.
Comment 15 Knut Petersen 2019-03-19 17:03:33 UTC
(In reply to Chris Liddell (chrisl) from comment #14)
> Fixed in:
> 
> http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=e005c87e09f6
> 
> 
> As described above, you can still have your workflow preload CIDFonts by
> adding the /Path key to the dictionary. In addition, "real" CIDFont
> resources have a /ResourcePath key added to the dictionary. This means we
> can, as necessary, differentiate between, embedded, substitute and "real"
> disk based CIDFonts.

Hmm. Still breaks loading from CIDFont resource directories ...
Comment 16 Knut Petersen 2019-03-19 17:12:47 UTC
Created attachment 17199 [details]
Fix for e005c87e09f67f37ce4ae2f80f24cf9182e86d8d

I sthis what you intended to write?
Comment 17 Chris Liddell (chrisl) 2019-03-19 17:16:43 UTC
(In reply to Knut Petersen from comment #16)
> Created attachment 17199 [details]
> Fix for e005c87e09f67f37ce4ae2f80f24cf9182e86d8d
> 
> I sthis what you intended to write?

Balls, yes, thanks....
Comment 18 Knut Petersen 2019-03-19 17:58:46 UTC
(In reply to Chris Liddell (chrisl) from comment #17)
> (In reply to Knut Petersen from comment #16)
> > Created attachment 17199 [details]
> > Fix for e005c87e09f67f37ce4ae2f80f24cf9182e86d8d
> > 
> > I sthis what you intended to write?
> 
> Balls, yes, thanks....

So this issue is really fixed now. Thanks again for your work and patience.

Knut