Bug 689637 - Many PDF files from Microsoft® Office Word 2007 fails with error /undefined in --get--
Summary: Many PDF files from Microsoft® Office Word 2007 fails with error /undefined i...
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.61
Hardware: PC Windows XP
: P4 normal
Assignee: Alex Cherepanov
URL:
Keywords:
: 689644 (view as bug list)
Depends on:
Blocks:
 
Reported: 2008-01-02 02:02 UTC by Zina
Modified: 2010-08-07 22:55 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments
PDF file (23.25 KB, application/pdf)
2008-01-02 02:05 UTC, Zina
Details
modified sample: uncompressed, and "[( )] TJ" is replaced by "[() ] TJ" (86.01 KB, application/pdf)
2008-01-03 05:23 UTC, mpsuzuki
Details
modified sample: uncompressed, "[( )] TJ" is replaced by "[() ] TJ", font F2 is replaced by F1 (86.01 KB, application/pdf)
2008-01-03 05:24 UTC, mpsuzuki
Details
patch to make PDF interpreter ignore preloaded font if it's embedded font (1.81 KB, patch)
2008-01-07 01:33 UTC, mpsuzuki
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zina 2008-01-02 02:02:14 UTC
I've used Microsoft® Office Word 2007 "Save as PDF" plug-in to convert doc 
files to pdf.
Many files fails with the same error:

Error: /undefined
in --get--
Operand stack:
   --nostringval--   --dict:9/18(L)--   F2   18   --dict:9/9(L)--   true   --
dict:218/256(L)--   --dict:1/4(L)--   --dict:11/13(L)--   --dict:11/13(L)--   --
dict:11/13(L)--   CharStrings
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval-
-   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   
false   1   %stopped_push   1905   1   3   %oparray_pop   1904   1   3   %
oparray_pop   1888   1   3   %oparray_pop   --nostringval--   --nostringval--   
2   1   1   --nostringval--   %for_pos_int_continue   --nostringval--   --
nostringval--   --nostringval--   --nostringval--   %array_continue   --
nostringval--   false   1   %stopped_push   --nostringval--   %loop_continue   -
-nostringval--   --nostringval--   --nostringval--   --nostringval--   --
nostringval--   --nostringval--
Dictionary stack:
   --dict:1148/1684(ro)(G)--   --dict:2/20(G)--   --dict:75/200(L)--   --
dict:75/200(L)--   --dict:107/127(ro)(G)--   --dict:275/300
02/01 11:48:43.302 FLW 1248 (ro)(G)--   --dict:22/25(L)--   --dict:4/6(L)--   --
dict:25/40(L)--
Current allocation mode is local
GPL Ghostscript 8.61: Unrecoverable error, exit code 1
Comment 1 Zina 2008-01-02 02:05:00 UTC
Created attachment 3669 [details]
PDF file

I have more files
Comment 2 Marcos H. Woehrmann 2008-01-02 07:07:05 UTC
I've verified that the problem is still in gshead (r8472).  Also gs8.54 reports a different error:

    Error: /invalidfont in -dict-

Both Apple Preview and Acrobat Professional 8.0 open the file without error.  Acrobat reports that the file 
contains an embedded True Type (CID) font with Encoding: Identity-H, so this may be related to the Bug 689538.
Comment 3 mpsuzuki 2008-01-02 22:58:38 UTC
Zina, bug 689538 is the fallback for buggy PDF generator
which embeds CFF OpenType as TrueType (false declaration),
so it is not related to this issue.
Maybe bug 689623 (Roman/Latin TTFs are embedded as CIDfont)
may be related.
Comment 4 Zina 2008-01-02 23:16:47 UTC
Thanks!
Comment 5 mpsuzuki 2008-01-03 05:23:17 UTC
Created attachment 3670 [details]
modified sample: uncompressed, and "[( )] TJ" is replaced by "[() ] TJ"
Comment 6 mpsuzuki 2008-01-03 05:24:48 UTC
Created attachment 3671 [details]
modified sample: uncompressed, "[( )] TJ" is replaced by "[() ] TJ", font F2 is replaced by F1
Comment 7 mpsuzuki 2008-01-03 06:31:27 UTC
As Adobe Reader's "document property" tells,
the sample PDF refers 2 "Times New Roman" font objects.
One is embedded TrueType whose interface is CID (referred by the name F1),
another is un-embedded "Times New Roman" whose interface is WinAnsiEncoding
(referred by the name F2).
If you display the PDF on the system missing real Times New Roman
(e.g. Adobe Reader on Linux), F2 is substituted by Adobe Sans MM.
The error caused by F2, not F1.

Between attachment 3670 [details] vs 3671, I rewrote object 7 0 in
the PDF as following.

7 0 obj 
/Length 271
stream
 /P <</MCID 0/Lang (ru-RU)>> BDC BT
/F1 18 Tf
1 0 0 1 90.024 703.66 Tm
[<023C0268>-5<025E025A>] TJ
 EMC  /P <</MCID 1>> BDC BT
/F2 18 Tf ------------------------------------> /F1 18 Tf
1 0 0 1 128.21 703.66 Tm
[() ] TJ
 EMC  /P <</MCID 2>> BDC BT
1 0 0 1 90.024 683.02 Tm
[() ] TJ
 EMC 
endstream 
endobj

By this modification, attachment 3671 [details] eliminates the reference to F2,
and ghostscript can render attachment 3671 [details] correctly. So, there's
no bug around CID-CMap related machanism, I think.

Why unembedded "Times New Roman" is not correctly substituted?
The reason might be the naming of embedded "Times New Roman" F1.
Traditionally, such embedded font is expected to be renamed with
random prefix aslike "JEGEBG+TimesNewRoman", to avoid external
font resource. In the sample PDF, such prefix is not used.

ghostscript resource management tries to use "Times New Roman"
defined by F1, but it fails because the difference between
interface types (F1 is CID-composite, F2 is not composite).

Alex, there's good method to restrict the access to "Times New Roman"
derived from F1, when another font object F2 is being resolved?
Comment 8 mpsuzuki 2008-01-07 01:33:17 UTC
Created attachment 3680 [details]
patch to make PDF interpreter ignore preloaded font if it's embedded font

I expect "is_resource" flag tells whether the font is
embedded in PDF or comes from external resource (see
discussion in bug 688058).

This patch enables that PDF interpreter checks the font
is embedded in PDF /or not in PostScript space, by new
operator .isregisteredfont.

If a font with same name is cached but it's embedded,
PDF interpreter ignores it and tries to resolve by tracking
the object reference. If PDF interpreter cannot reach
explicitly referred embedded font object, then external
font resource is checked.

# If PDF interpreter cannot find explicitly referred embedded
# font object or external font resource with same name,
# a cached object with same name may be tried, but this is just
# fallback. Not required by PDF Ref.
Comment 9 mpsuzuki 2008-01-07 01:34:29 UTC
now under regression test.
Comment 10 Ray Johnston 2008-01-08 09:49:42 UTC
This looks similar to the other PDF interpreter 'name space' bugs where we get confused by the same 
name showing up as embedded, CIDFont or 'standard' font. 
 
Assigning to Alex for review and possible classification as duplicate 
Comment 11 mpsuzuki 2008-01-08 15:25:15 UTC
no regression is found for patch 3680.
Comment 12 mpsuzuki 2008-01-08 16:54:04 UTC
*** Bug 689644 has been marked as a duplicate of this bug. ***
Comment 13 Alex Cherepanov 2008-01-14 19:56:45 UTC
I agree with with the patch. It's rather simple and solves the customer's
problem.

Probably, embedded PDF fonts should not be registered as PostScript resources at
all but creating a special version of --definefont-- operator for PDF is
much more difficult.
Comment 14 leonardo 2008-01-14 23:58:48 UTC
IMO the patch is good.
Comment 15 Zina 2008-01-16 06:18:26 UTC
Hi,

As I understand, patch is ok and works.
When is the next planned official version that includes a binary with this fix?

Thanks!
Zina
Comment 16 mpsuzuki 2008-01-16 07:30:31 UTC
Thank you for comments. Within this week, I will add
patch log message to SVN HEAD. Please wait 24 hours.
Comment 17 mpsuzuki 2008-01-16 08:58:28 UTC
Alex, could you comment on this patch log?

--------------------------------------------------------------------
Fix: ignore the embedded font resource when PDF interpreter resolves
     the unembedded font resource.

DETAILS:

Some PDF generators (e.g. Microsoft Office 2007 add-on to export the
documents to PDF format) emits incompatible font objects with same
resource name. The sample PDF in bug 689637 includes 2 "Times New
Roman" font objects: one is embedded CID-keyed TrueType for Cyrillic
glyphs, another is unembedded WinAnsiEncoding TrueType (possibly for
empty page header or footer). When PDF interpreter resolves latter
unembedded "Times New Roman", external font resource should be used
(Adobe Reader does so). But current ghostscript uses former embedded
"Times New Roman", because the sample PDF includes "Times New Roman"
without randomization prefix.

To avoid the confusion between embedded and unembedded fonts with
same name, pfont->is_resource flag (=0 embedded, =1 unembedded)
is checked during font object resolving. Even if a cached font
object with same name is found, it is ignored once when it is
embedded font. To execute this check in PostScript space
(pdf_font.ps), new operator ".isregisteredfont" is introduced.
This patch assumes that embedded font object in PDF is resolvable
by tracking the indirect object references. If a PDF assumes name-
based resolving of embedded font object (without indirect object),
it may be rendered by external font resource. At present, we don't
have such sample.

By this patch, bug 689637 is fixed.

EXPECTED DIFFERENCES:

None.
Comment 18 Alex Cherepanov 2008-01-16 22:33:18 UTC
I'd suggest the following changes.

ghostscript  -> Ghostscript
resolving   -> resolution
is ignored once when it is embedded font -> is ignored if it is an embedded font
Comment 19 Alex Cherepanov 2008-01-26 17:51:45 UTC
Please commit the path.
Comment 20 mpsuzuki 2008-01-28 02:38:04 UTC
Just I've committed my patch to SVN revision 8509,
following is revised log message.

--

Fix: ignore the embedded font resource when PDF interpreter resolves
     the unembedded font resource.

DETAILS:

Some PDF generators (e.g. Microsoft Office 2007 add-on to export the
documents to PDF format) emits incompatible font objects with same
resource name. The sample PDF in bug 689637 includes 2 "Times New
Roman" font objects: one is embedded CID-keyed TrueType for Cyrillic
glyphs, another is unembedded WinAnsiEncoding TrueType (possibly for
empty page header or footer). When PDF interpreter resolves latter
unembedded "Times New Roman", external font resource should be used
(Adobe Reader does so). But current Ghostscript uses former embedded
"Times New Roman", because the sample PDF includes "Times New Roman"
without randomization prefix.

To avoid the confusion between embedded and unembedded fonts with
same name, pfont->is_resource flag (=0 embedded, =1 unembedded)
is checked during font object resolution. Even if a cached font
object with same name is found, it is ignored if it is embedded
font. To execute this check in PostScript space (pdf_font.ps),
new operator ".isregisteredfont" is introduced. This patch assumes
that embedded font object in PDF is resolvable by tracking the
indirect object references. If a PDF assumes name-based resolution
of embedded font object (without indirect object), it may be
rendered by external font resource. At present, we don't have
such sample.

By this patch, bug 689637 is fixed.

EXPECTED DIFFERENCES:

None.
Comment 21 Alex Cherepanov 2008-02-28 07:54:00 UTC
The patch doesn't consider files loaded through FONTPATH as resource files.
Using memory fonts in PDF files would be also handy - at least for debugging.
Perhaps, we can accept any fonts that are defined when the PDF processing
starts.

For an example of an undesired effect look at the sample file from the
bug 689495, which doesn't use the font loaded through FONTPATH now.
Comment 22 leonardo 2008-05-21 07:49:21 UTC
Changing assignment according the ownership due to no progress.
Comment 23 Alex Cherepanov 2008-05-24 22:48:50 UTC
Revert the rev. 8509 because it is not needed after rev. 8774 but interferes
with FONTPATH search and memory font resources.

Since embedded PDF fonts are now not registered as resources, there's no
need to distinguish between disk and memory font resources. Embedded fonts
bypass resource machinery; non-embedded fonts are loaded by name.
 
The following patch is committed as a rev. 8775.
http://ghostscript.com/pipermail/gs-cvs/2008-May/008353.html
Regression testing shows no differences.
Comment 24 Dave 2010-01-31 08:28:24 UTC
I'm getting the same error with version GS 8.7.  The file opens fine with adobe 
9 but will not open with GS 8.7.

Loading NimbusSanL-ReguCond font from %rom%Resource/Font/NimbusSanL-ReguCond... 
2594408 1274278 13162680 11856210 3 done.
Error: /undefined in --get--
Operand stack:
   --dict:6/6(L)--   --nostringval--   false   709700   false   --dict:9/9(L)--
   --nostringval--   --nostringval--   --dict:9/9(L)--   File
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval-
-   2   %stopped_push   --nostringval--   --nostringval--   false   1   %
stopped_push   1862   1   3   %oparray_pop   1861   1   3   %oparray_pop   
1845   1   3   %oparray_pop   1739   1   3   %oparray_pop   --nostringval--   %
errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--
   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   --
nostringval--   --nostringval--   %array_continue   --nostringval--   false   
1   %stopped_push   --nostringval--   %loop_continue   --nostringval--   
1761605   --nostringval--   --nostringval--   --nostringval--   --nostringval--
   --nostringval--
Dictionary stack:
   --dict:1159/1684(ro)(G)--   --dict:1/20(G)--   --dict:79/200(L)--   --
dict:106/127(ro)(G)--   --dict:285/300(ro)(G)--   --dict:22/25(L)--   --dict:4/6
(L)--   --dict:21/40(L)--   --dict:13/15(L)--   --dict:1/1(ro)(G)--   --dict:1/1
(ro)(G)--   --dict:9/17(L)--   --dict:6/9(L)--
Current allocation mode is local
Last OS error: No such file or directory
pdf_page failed
Comment 25 Ken Sharp 2010-01-31 09:52:48 UTC
The current HEAD revision of GS works fine for me (both rendering and converting
to PDF). There will be a new release (8.71) in the next week or so, I'd
recommend waiting and trying that.
Comment 26 Alex Cherepanov 2010-08-07 22:55:23 UTC
All sample files work fine on display, pnmraw, and pdfwrite devices
at 72 and 600 dpi on 32-bit Windows and 64-bit Linux systems
in the current development version, which will be released
in about a week as v. 9.00 .