Bug 692608 - better support for EmbedAllFonts=false with PCL to PDF conversion
Summary: better support for EmbedAllFonts=false with PCL to PDF conversion
Status: RESOLVED FIXED
Alias: None
Product: GhostPCL
Classification: Unclassified
Component: PCL fonts (show other bugs)
Version: 9.04
Hardware: All All
: P2 enhancement
Assignee: Henry Stiles
URL:
Keywords: bountiable
Depends on:
Blocks:
 
Reported: 2011-10-17 07:50 UTC by Joe
Modified: 2014-02-17 04:44 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
write standard 14 font names as well as setting standard encoding flag for non-embedded URW fonts (7.09 KB, patch)
2011-11-04 08:08 UTC, Hin-Tak Leung
Details | Diff
fontlist.pcl with pdf generated from Visual PCL2PDF and GhostPCL (862.75 KB, application/x-zip-compressed)
2011-11-11 20:11 UTC, Joe
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joe 2011-10-17 07:50:53 UTC
optionally suppress embedding of builtin fonts in pdf output.
use standard font names like CourierNew, Arial etc.
and let the pdf reader do the substitution.
This would result in smaller pdf files, and one would not have to worry about font licensing.

If you set -dEmbedAllFonts=false in the current version, font embedding is suppressed, but URW font names are put into the pdf,
which means that the URW fonts must be installed on the pdf reader client.

Note: PCL2PDF from Visual does not embed fonts, I have found this message from 2004.
http://www.ghostscript.com/pipermail/bug-gs/2004-September/003421.html
Comment 1 Henry Stiles 2011-11-03 16:36:10 UTC
Make bountiable.
Comment 2 Hin-Tak Leung 2011-11-04 08:08:35 UTC
Created attachment 8074 [details]
write standard 14 font names as well as setting standard encoding flag for non-embedded URW fonts

Write standard 14 font names as well as setting standard encoding flags for
non-embedded URW fonts.

Actually Acrobat Reader (at least 9.x on Linux) does not use font names for substitution, but rather rely on whether "standard encoding" flag is set to decide whether substitution is possible. But xpdf/libpoppler does use font names, so might as well do it.

"StandardSymL" and "Dingbats" probably should not be in the URW->standard look-up table in the patch. Also another refinement might be to call pdf_compute_font_descriptor() to compute flag and or otherwise set FONT_IS_SERIF/FONT_IS_ITALIC, to make Acrobat uses a serif font or italic font for substitution. (or just use a hard coded mapping of which URW fonts corresponds to Times, etc). In fact if compatibility with other non-adobe readers are not needed, just setting the font-stype and encoding flag might do it, without mapping the font names back. (it is safer to do so though).
Comment 3 Ken Sharp 2011-11-06 14:36:21 UTC
I applied the patch but had to make some changes as it wouldn't compile as-is on Visual Studio. I also changed the 'compat' names to 'base14' because I thought it captured the essence better (purely cosmetic).

Unfortunately, the patch causes our regression test suite to die, I'm not sure why, but I suspect it may be due to many files failing and a correspondingly large log file, because it gives up while uploading logs. 

I will try testing the language builds individually to see if that helps, or at least lets me see where the problems actually are!
Comment 4 Ken Sharp 2011-11-06 20:27:49 UTC
(In reply to comment #3)
 
> I will try testing the language builds individually to see if that helps, or at
> least lets me see where the problems actually are!

Running just the PCL tests got me 1591 tests with differences out of 2228 tests. I'm not sure why yet but at least I have something to look at now.
Comment 5 Ken Sharp 2011-11-07 10:02:11 UTC
Committed as: 63a5fe390d2534f6b48e2dd58f46ed9941582e83

Regression testing shows no differences. There is no original PCL file to test against but my local testing indicates that the Base 14 names are now used as expected.

Thanks for the patch Hin-Tak!
Comment 6 Hin-Tak Leung 2011-11-10 05:41:33 UTC
(In reply to comment #5)
> Committed as: 63a5fe390d2534f6b48e2dd58f46ed9941582e83

Argh, you have more confidence on my patch than me :-). I was hoping that the original reporter might comment on whether he cares about symbols and dingbats, and whether he cares if styles (italic/serif) are matched; and whether pcl2pdf does better on those two aspects. But the truth is, as we all know, there are limits to fidelity when not embedding fonts. 

The patch shouldn't cause any regressions, as -dEmbedAllFonts=false (or equivalent) isn't used in regression?

Leaved as closed, but if the reporter submits a test file and/or comments on those issues, I shall prepare further refinements as appropriate.
Comment 7 Ken Sharp 2011-11-10 08:17:17 UTC
(In reply to comment #6)

> Argh, you have more confidence on my patch than me :-).

The approach looks good, its certainly valuable to use the standard 'base 14' names instead of the URW ones and my limited testing looked fine, the 'base 14' names were correctly substituted.

As you pointed out in one of your comments, Acrobat does 'something else' other than use just the names, but I felt that just getting the names in was good and would give decent (ie correct) substitution on a wide range of PDF consumers, whereas the URW font names would not.


> The patch shouldn't cause any regressions, as -dEmbedAllFonts=false (or
> equivalent) isn't used in regression?

Exactly. There are too many possible combinations of settings to test exhaustively so we just use the defaults for regression testing as its what most people use.

 
> Leaved as closed, but if the reporter submits a test file and/or comments on
> those issues, I shall prepare further refinements as appropriate.

That would be good too, thanks Hin-Tak.
Comment 8 Joe 2011-11-11 20:11:27 UTC
Created attachment 8103 [details]
fontlist.pcl with pdf generated from Visual PCL2PDF and GhostPCL
Comment 9 Joe 2011-11-11 20:12:22 UTC
Thank you Hin-Tak and Ken for your patch, it is an important first step on my wishlist. 
I have attached a PCL test file fontlist.pcl with reference output from Visual PCL2PDF. fontlist-visual.pdf has 90 KB, fontlist-small.pdf with EmbedAllFonts=false has 120 KB, fontlist.pdf with embedded fonts has 885 KB. For us the important font is monospaced, that is Courier. 
PCL2PDF uses standard name CourierNew instead of Courier, an easy modification.
PCL2PDF also uses encoding Ansi, while GhostPCL uses Standard.
Compare the Umlauts in the first line.
Comment 10 Ken Sharp 2011-11-12 09:26:00 UTC
(In reply to comment #9)

> For us the important font is monospaced, that is Courier. 
> PCL2PDF uses standard name CourierNew instead of Courier, an easy modification.
> PCL2PDF also uses encoding Ansi, while GhostPCL uses Standard.
> Compare the Umlauts in the first line.

CourierNew is *not* a standard 'base 14' font name, the PDF standard name is 'Courier'. 

If you choose to not embed fonts then you must abide by the *PDF* conventions for what is standard, not PCL, or you must expect incorrect output. In fact if you don't embed fonts then you should generally expect incorrect output. In general we would recommend that all PDF files embed the fonts they use, and in fact this is why Adobe relaxed the restrictions on 'base 14' embedding.

While I'm happy to alter the font names from 'URW substitute names' to 'PDF standard names' I am not happy about converting them to PCL standard names. pdfwrite has to cope with input from languages other than PCL and this would break the situation where EmbedAllFonts is false and the input is PostScript, PDF or XPS.
Comment 11 Hin-Tak Leung 2011-11-12 12:55:05 UTC
(In reply to comment #9)

Thanks for the test files. I'll see what I can do out of them; as I wrote earlier there is limit to fidelity if fonts are not embedded. The Umlaut may or may not be one of those.

Ken is right - "CourierNew" is a Microsoft-ism, and "Courier" is according to the Adobe pdf spec, so that cannot and will not be changed. There is a good chance of matching font style i.e. suggest the viewer software to use a monospace font, though; I think that's one of the flags.
Comment 12 Joe 2011-11-14 17:32:46 UTC
Thanks, I do now understand that PCL2PDF from Visual Software does not conform to the PDF standard using font name CourierNew.
I still have some hope that display of Umlauts with no fonts embedded should be possible. I mean the western european characterset covered by windows-1252 (WinAnsi), or at least ISO-8859-1 (Latin1).
Comment 13 Joe 2011-11-19 10:46:03 UTC
Courier-BoldItalic is not a base 14 font
--- gdevpdtb.c.orig     2011-11-07 10:57:41.000000000 +0100
+++ gdevpdtb.c  2011-11-19 10:01:09.843750000 +0100
@@ -73,3 +73,3 @@
     {"NimbusMono-Ita",        "Courier-Oblique"      },
-    {"NimbusMono-BolIta",     "Courier-BoldItalic"   },
+    {"NimbusMono-BolIta",     "Courier-BoldOblique"  },
     {"NimbusSan-Reg",         "Helvetica"            },
Comment 14 Ken Sharp 2011-11-21 08:56:28 UTC
(In reply to comment #13)
> Courier-BoldItalic is not a base 14 font

Yes, well spotted, fixed in commit: bd108fc1a4b52c885a8e26362f346c0cd2fb6670