Bug 692000 - pdfwrite: incorrect treatment of substitution-based Type 3 Fonts
Summary: pdfwrite: incorrect treatment of substitution-based Type 3 Fonts
Status: IN_PROGRESS
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: All All
: P4 enhancement
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-24 16:11 UTC by Zvi Gilboa
Modified: 2020-12-27 08:46 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
Source postscript file using "vowel-friendly" Type 3 font (226.44 KB, application/postscript)
2011-02-24 16:11 UTC, Zvi Gilboa
Details
Outcome produced by pdfwrite (using gswin32c) (905.90 KB, application/pdf)
2011-02-24 16:14 UTC, Zvi Gilboa
Details
Outcome produced by pdfwrite (using Adobe's Acrobat Distiller 9) (33.32 KB, application/pdf)
2011-02-24 16:18 UTC, Zvi Gilboa
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Zvi Gilboa 2011-02-24 16:11:13 UTC
Created attachment 7289 [details]
Source postscript file using "vowel-friendly" Type 3 font

BACKGROUND: In order to place vowels and diacritical marks correctly, type 3 fonts use the original type 1 glyphs, however while adding additional postscript positioning commands.

PROBLEM: Ghostscript fails to recognize that the only glyphs actually used are the font's original type 1 glyphs.  In consequence, it embeds in the pdf output file custom-type, Type 3 fonts.  Even worse, every set of glyphs of letter-vowels combination ends up having its own type 3 font subset.

OUTCOME: The above results in an UNNECESSARILY HUGE pdf file.  Another negative side effect is that the output pdf file cannot be searched for text phrases.

IS THERE AN ALTERNATIVE?: Since Adobe's Acrobat Distiller (9.0) treats the ps source file correctly, there is apparently an a way to identify the "trick" used by Type 3 Fonts such as the one used in the example source file.

MORE THOUGHTS: it seems that the problem lies in the point in time when font-usage is recorded and embedded.  When a Type 3 glyph command is executed, the postscript interpreter should not immediately embed the glyph (or font subset), but instead "wait" to see which commands the glyph executes.  If these commands only include---in addition to the Type 1 glyphs---only positioning commands, then it is the Type 1 font/glyph (rather than Type 3), that should be "recorded" and subsequently embedded in the output pdf file.
Comment 1 Zvi Gilboa 2011-02-24 16:14:15 UTC
Created attachment 7290 [details]
Outcome produced by pdfwrite (using gswin32c)

pdf-reader --> Properties --> fonts --> although only Type 1 glyphs are used, the document contains an "endless" repetitive list of type 3 font subsets
Comment 2 Zvi Gilboa 2011-02-24 16:18:25 UTC
Created attachment 7291 [details]
Outcome produced by pdfwrite (using Adobe's Acrobat Distiller 9)

Acrobat Distiller: the Type 3 "font substitution & positioning trick" is identified correctly.  Only Type 1 fonts are embedded, the characters are correctly associated with Type 1 glyphs (rather than mere graphics), and the document can be searched for text phrases.
Comment 3 Ken Sharp 2011-02-28 13:17:55 UTC
(In reply to comment #0)

> OUTCOME: The above results in an UNNECESSARILY HUGE pdf file.  Another negative
> side effect is that the output pdf file cannot be searched for text phrases.

The two are related. The encoding is not able to be identified uniquely, which results in a number of type 3 fonts being embedded. These have different encodings to the original text, because there is a 'collision' detected where two apparently different glyphs attempt to use the same encoding position. This may be due to the fact that the type 3 BuildChar uses glyphshow to access glyphs from the original type 1 font without regard to that fonts encoding.

 
> MORE THOUGHTS: it seems that the problem lies in the point in time when
> font-usage is recorded and embedded.  When a Type 3 glyph command is executed,
> the postscript interpreter should not immediately embed the glyph (or font
> subset), but instead "wait" to see which commands the glyph executes.  If these
> commands only include---in addition to the Type 1 glyphs---only positioning
> commands, then it is the Type 1 font/glyph (rather than Type 3), that should be
> "recorded" and subsequently embedded in the output pdf file.

Adobe Acrobat Distiller converts the type 1 font to a CFF font when embedding, and I'm not a Hebrew speaker, so I'm not in a position to be able to tell if Acrobat has :

1) Created a new font with the 'adjustments' included

2) Left the font untouched, but individually adjusted the position of each glyph in the output PDF file.

3) Simply ignored the type 3 font movements.

I strongly suspect that '2' is what has been done, but it would take some effort to be certain. '1' is unlikely as there would be no way to differentiate between the original and modified font, it would be necessary to include two type 1 fonts.

The way that type 3 fonts are handled at present reproducing this behaviour would be extremely difficult with pdfwrite.

However, since the output is visually correct (at least I assume it is) this is not a bug, it is an enhancement request.

To me it seems that if the type 3 font does nothing but reposition glyphs from the type 1 font, then the type 1 font is incorrect, and the correct solution is to fix and use the type 1 font.

I will leave this open as a future enhancement.
Comment 4 Zvi Gilboa 2011-02-28 15:13:08 UTC
Thank you, Ken, for looking at this and making the problem clearer!  Here are just a couple of minor follow-up comments:

> 2) (Acrobat Distiller) left the font untouched, but individually adjusted 
> the position of each glyph in the output PDF file.

I have checked the pdf file, and this is indeed the case.


> ... since the output is visually correct (at least I assume it is)...

Yes, the output pdf file is visually correct (albeit unsearchable)



> if the type 3 font does nothing but reposition glyphs from
> the type 1 font, then the type 1 font is incorrect

While the task of the type 3 font is indeed to reposition (or, better, relative-position) the diacritic marks, what makes such repositioning necessary is not a problem with the type 1 font, but rather inherent aspects of letter-vowel combinations.  These aspects, as well as the type 3 technique used to overcome the challenges that they present, are described in Sivan Toledo's short article on the topic, which can be found at http://www.tau.ac.il/~stoledo/Pubs/vowels.ps


As of now, the only alternative on the font level would be to use an OpenType font, which would accordingly require to switch to a substantially different LaTeX framework.
Comment 5 Ken Sharp 2011-02-28 16:05:57 UTC
(In reply to comment #4)

> > if the type 3 font does nothing but reposition glyphs from
> > the type 1 font, then the type 1 font is incorrect
> 
> While the task of the type 3 font is indeed to reposition (or, better,
> relative-position) the diacritic marks, what makes such repositioning necessary
> is not a problem with the type 1 font, but rather inherent aspects of
> letter-vowel combinations.  These aspects, as well as the type 3 technique used
> to overcome the challenges that they present, are described in Sivan Toledo's
> short article on the topic, which can be found at
> http://www.tau.ac.il/~stoledo/Pubs/vowels.ps

I've read the document, and I'm still of the opinion that this is the wrong way to solve the problem. Its a clever solution, indeed very clever, but in my opinion the real way to solve this is in the typesetting application, not the PostScript interpreter. 

As is demonstrated by the Distiller output, the same effect can be achieved by simply placing each glyph from the type 1 font in the correct position. It seems to me better to do this in a 'typesetting' application which really ought to know about the properties of the text, rather than building intelligence into the PostScript program. 

After all, the typesetting program is already responsible for placing all the other glyphs, adding this ability would seem like the correct solution, rather than placing the burden on the PostScript interpreter.

Indeed the author notes that there are problems with this approach since the code is unable to take into account the text to left and right of the glyph being marked, which looks like it is potentially important (presumably to avoid clashing of vowels).

This approach is rather like trying to write Arabic text by having the PostScript interpreter apply kashidas and generate the initial, medial and terminal glyphs rather than having the typesetting application do it using all its information about surrounding text.

As I said, I'll leave it open as an enhancement.
Comment 6 Peter Cherepanov 2020-12-27 08:46:42 UTC
Ghostscript still generates Type 3 fonts.