Bug 689420

Summary:	Errors with ps2write and special chars in FontName
Product:	Ghostscript	Reporter:	SaGS <sags5495>
Component:	PS Writer	Assignee:	Ken Sharp <ken.sharp>
Status:	NOTIFIED FIXED
Severity:	normal	CC:	jesse
Priority:	P4	Keywords:	bountiable
Version:	master
Hardware:	PC
OS:	All
Customer:		Word Size:	---
Attachments:	Suggested patch. Revised patch.

Description SaGS 2007-08-26 04:26:23 UTC

Font names, like any PostScript name, may contain any characters. In 
particular these may contain separators and other special characters.
When the PS input to ps2write contains such a font, passing ps2write's 
output to a PS interpreter (including GS) results in various errors.

Real-world sample: attachment #1264 [details] (from bug #688001 comment #1).

Steps to reproduce;
  - Convert the mentioned sample using ps2write;
  - Attempt to display the result using GS;
  - The result is an "Error: undefined in Script".
  - Bypassing this error, one will get a default font (Courier) 
    instead of the requested one.

There are 2 independent parts of this bug:

(1) To write out the font name inside the Type 1 font data, ps2write 
    outputs "/FontName", a slash, then simply dumps the characters of 
    the name ("/FontName /Brush Script MT [ITALIC] def"). When the 
    name contains special characters, this text will be tokenized 
    differently than expected when parsed by a PS interpreter, with 
    unexpected results.

    pdfwrite generates CFF font data, and is not affected.

(2) The PDF part of the ps2write's output is written according to 
    PDF syntax. In particular, the font name in the PDF Font object, 
    as all PDF names, may contain "#hh" hex escapes; these have to 
    be generated when the name contains special chars. These "#hh" 
    have no special meaning to a PS interpreter, the result being the 
    FontName in the PDF Font object, used by the procset to locate 
    the font, does not match the FontName in the Type 1 font data, 
    used by definefont.

Comment 1 SaGS 2007-08-26 04:27:21 UTC

Created attachment 3312 [details]
Suggested patch.

Part #1 of 2:
    ps2write: if the FontName cannot be represented as a literal PS 
    name, write it as "(fontname) cvn", escaping chars as necessary.
    
    ATM compatibility notes:
  - In theory, for compatibility with Type 1 renderers that do not 
    include a full PostScript interpreter, the FontName should be a 
    literal name. Only in theory. Time ago, I verified that in 
    practice Adobe Type Manager, Adobe Reader and the Type 1 
    interpreter built into Windows accept and handle fonts that have 
    "/FontName (name) cvn def".
  - Since ps2write's output is consumed by a full-blown PostScript 
    interpreter, ATM compatibility is not necessary.

Part #2 of 2:
    opdfread.sp: un-hex-escape (/#41 -> /A) font names from PDF Font 
    and related objects. Note: this un-escaping is potentially 
    necessary for any name, but font names are the only place that 
    I can think of where these escapes appear and make a difference.

Comment 2 Ray Johnston 2007-08-28 09:54:09 UTC

Rather than encoding names into PS strings, I recommend using the much simpler
approach of hex strings. For example:

   /FontName <41424320444546> cvn def

instead of:

   /FontName (ABC DEF) cvn def

Since the target printers are PS Level 2, hex strings are supported.

Comment 3 Alex Cherepanov 2007-08-28 14:01:44 UTC

I vote for the string with escaped characters because it is much easier to read.
When we get the generated files back attached to bug reports, there will be
one less hurdle to jump.

Comment 4 Ken Sharp 2007-08-29 00:37:20 UTC

I think I agree with Alex, its easier to read the result, and I'm not sure what
the hex representation gains, other than not having to escape some characters.

Comment 5 Ray Johnston 2007-08-29 08:34:55 UTC

The only gain is _much_ simpler code in 'write_font_name'.

If we are going to do the PS style string, then I change my recommedation to:

Re-use the existing s_PSSE filter rather than implementing yet another form
of this conversion. Refer to the code in obj_cvp (iutil.c) where it handles
t_string for an example.

Since implementing the PS style string using the filter needs extra work from
the patch submitter, I am making the 'bountiable' to encourage the follow-up
development.

Comment 6 Ken Sharp 2007-08-29 08:51:16 UTC

Ah, thanks Ray, I didn't realise there was already code to handle this. I guess
since its P4 I can just sit back and wait to see if the OP would like to modify
the patch and claim the bounty :-)

Comment 7 Ray Johnston 2007-08-29 10:10:10 UTC

BTW, the code to use the s_PSSE_template.process function doesn't need to be
quite as general as that in iutil.c since much of it is do handle long strings.
If the destination string (str) is large enough to accomodate the worst case
FontName (allowing for worst case expansion due to escapes), then the 'process'
only needs to be called once (not in a loop).

Comment 8 SaGS 2007-09-02 10:31:41 UTC

Created attachment 3341 [details]
Revised patch.

As requested, attached is a patch revised to use an existing function 
for PS string encoding, rather than implementing this conversion again.

The functionality is the same as before (comment #1).

Comment 9 leonardo 2007-10-11 11:51:14 UTC

Please take into account one more thing. With a light modification the old code 
allows us to write a PDF with ebmedded Tyoe 1 font (with no converting into 
CFF). Particularly this feature is used in the FAPI development. I checked the 
revised patch, and it looks safe against that. Anyway please be careful with 
further improvements.

Comment 10 leonardo 2007-10-11 12:16:51 UTC

Regarding the "Revised patch" : 

I see the updated write_font_name encodes with Postscript string escapes, and 
the updated opdfread.ps decodes the font name from PDF encoding. I believe they 
are inconsistent.

By another hand, it's a good observation that the PDF name decoding to be 
implemented in opdfread and called in appropriate places. But I believe there 
exist much more places to call it than the patch does. Likely we need to open a 
separate bug for it. Ken, please check for sure and open one.

Comment 11 SaGS 2007-10-11 23:16:41 UTC

> With a light modification the old code allows us to write a PDF 
> with ebmedded Tyoe 1 font (with no converting into CFF).

Time ago (when I noticed this problem in the sample for bug #688001), 
I checked that PDFs that use this syntax work OK with Reader and in 
a few other situations - see "ATM compatibility notes" in comment #1. 
But yes, one never knows what another PDF/font consummer could expect.

> I see the updated write_font_name encodes with Postscript string 
> escapes, and the updated opdfread.ps decodes the font name from 
> PDF encoding. I believe they are inconsistent.

There is a difference, but these names are in different places:

- the PS string encode is used inside the PDF stream that contains 
  Type 1 font data, which is expected to be kind of PostScript code;
- the PDF hex encoding (and thus the necessary PDF hex decoding) 
  is for the FontName entry is the PDF dictionary that describes 
  the font (this data is passed to the PDF interpreter).

The problem described as "Bypassing this error, one will get a 
default font (Courier) instead of the requested one." in the 
original report (comment #0) comes exactly from the fact that names 
in these 2 places didn't match (due to lack of PDF hex decoding).

Comment 12 leonardo 2007-10-14 12:05:42 UTC

In Comment #10 I wrote "I see the updated write_font_name encodes with 
Postscript string escapes, and the updated opdfread.ps decodes the font name 
from PDF encoding. I believe they are inconsistent."

This statement is wrong, because those encodings are applied to different 
occurances of the name. I withdraw it.

Comment 13 Ken Sharp 2007-10-16 00:56:27 UTC

Patch http://ghostscript.com/pipermail/gs-cvs/2007-October/007876.html,
essentially as submitted by SaGS, resolves this issue.