Bug 688187 - Volapyk text after re-saving GS-8.51-created PDF from Acrobat 5, 6 and 7
Summary: Volapyk text after re-saving GS-8.51-created PDF from Acrobat 5, 6 and 7
Status: RESOLVED WORKSFORME
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PS Interpreter (show other bugs)
Version: 8.51
Hardware: PC Windows XP
: P5 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-07-05 08:09 UTC by Jacob Schäffer
Modified: 2010-09-29 07:21 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
Page as it's output from Ghostscript (146.55 KB, application/pdf)
2005-07-06 03:16 UTC, Jacob Schäffer
Details
Result of the same page AFTER save from Acrobat (147.93 KB, application/pdf)
2005-07-06 03:17 UTC, Jacob Schäffer
Details
Screen dumps showing specifics for Page 11 (1.58 MB, application/x-zip-compressed)
2005-07-06 08:52 UTC, Jacob Schäffer
Details
decompr-attachment-g.pdf (298.71 KB, application/pdf)
2005-07-13 05:11 UTC, Igor Melichev
Details
decompr-attachment-a.pdf (299.67 KB, application/pdf)
2005-07-13 05:12 UTC, Igor Melichev
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jacob Schäffer 2005-07-05 08:09:08 UTC
It appear that some PDF documents created from PostScript with AFPL 
Ghostscript 8.51 cannot be re-saved with Acrobat 5, 6 nor 7 without spoiling 
text contents.

The same PostScript converted to PDF with Distiller 5.0, 5.05, 6.0 and 7.0 
does not trigger this error.
Comment 1 Dan Coby 2005-07-05 10:24:28 UTC
Please attach a sample PostScript file which causes this problem.  Also attach 
the resulting PDF file that you are producing.

What do you mean by "without spoiling text contents"?  Is the text garbled? 
Poorly positioned? etc.
Comment 2 Jacob Schäffer 2005-07-05 11:27:59 UTC
I have tried to upload an attachment 3 [details] times now, but with no luck. The file 
we have here contain 89 pages, is 94 Mb in size and is created from a 3 Gb 
PostScript stream originally produced by Windows Framemaker.

Besides ordinary Windows ANSI text it contain 437 EPS files from various 
sources. The 89 pages represent a small part of a Danish telephone directory, 
and is interesting in a technical sense because a *HUGE* amount of different 
fonts are in use in the same document.

However, upon upload I keep running into time-outs.

Tomorrow I will provide a link from which the file can be downloaded.

- Jacob
Comment 3 Jacob Schäffer 2005-07-06 03:16:36 UTC
Created attachment 1507 [details]
Page as it's output from Ghostscript


This page is extracted from the full document BEFORE the document was re-saved
in full from Acrobat.
Comment 4 Jacob Schäffer 2005-07-06 03:17:50 UTC
Created attachment 1508 [details]
Result of the same page AFTER save from Acrobat


This page is extracted from the full document AFTER the document was re-saved
in full from Acrobat.
Comment 5 Jacob Schäffer 2005-07-06 08:36:04 UTC
I've now studied the situation further.

What happens upon save seem to be the same 'consolidation' of fonts that occur 
when one PDF document is merged into another. Both the Adobe PDF Library and 
Acrobat have always - and still have under certain circumstances - problems 
when clashing font names exists. I've only seen this error with TrueType 
fonts, where Adobe acknowledge that a problem exists with the 
PDPageAcquirePDEContent API (though it's said to be resolved with 7.0 I'm not 
so sure that's entirely true). I've never seen this error with Type 1 fonts 
unless the encoding information was wrong/corrupted.

The strange thing here is that all font subsets seem to be properly prefixed 
in the subset name (i.e. not the font name as displayed in Acrobat). I can't 
find any clashing prefix names. I can't find encoding errors, but haven't 
studied the encoding details (compared contents). Hence, I don't think the 
problem is identical with the well-known Acrobat/PDFL 'clashing font names' 
bug after all.

In this case Ghostscript has created a document where for example Myriad-Bold 
and Myriad-Italic is stored in multiple versions - with different 
subsets/encodings. In this document I can count 14 Myriad-Bold variants (all 
Type 1) of which 5 has CID encoding. The CID encoded variants is found on 
document pages 6, 48, 77, 78 and 87, but none of those appear to be the 
offender.

On Page 6 I can find 'LORPMQ+Myriad-Bold' (plain Type 1, which seem to be in 
use on most pages) and 'TACLYS+Myriad-Bold' (Type 1 CID). This combination 
could traditionally cause a potential problem, but doesn't seem to.

On Page 11 a new 'OZEZEA+Myriad-Bold' variant is introduced (which also is a 
plain Type 1 subset). The interesting thing about this variant is that it's 
used in parallel with 'LORPMQ+Myriad-Bold' on almost all remaining pages, for 
example in a phone number such as '86 62 01 96' the text is broken into the 
following sequence:

86 62              LORPMQ+Myriad-Bold
      01 9         OZEZEA+Myriad-Bold
          6        LORPMQ+Myriad-Bold

When Page 11 is extracted from the document BEFORE it was re-saved, and then 
is saved separately, those two font versions are consolidated into one, 
namely 'OZEZEA+Myriad-Bold' which now is used for both font references (both 
remain, but points now to the same font). The 'LORPMQ+Myriad-Bold' version is 
effectively gone. I will soon upload screen dumps that demonstrate this.

I mention Myriad-Bold specificly because the error is visible throughout the 
document with this font (errors start on page 2).

I haven't studied the encoding details for of each included Myriad-Bold 
variants nor looked at character mapping for each.

ERRATA: It was INCORRECT what I wrote earlier about Acrobat 5.0 also producing 
the error. It appears that Acrobat 5.0.1 (27-03-2001) actually CAN re-save 
this document WITHOUT making any harm. Any Acrobat 6.X and 7.X versions we 
have produce the error as described.

Hope this helps.

All the best
Jacob Schäffer
Grafikhuset
Denmark
Comment 6 Jacob Schäffer 2005-07-06 08:52:06 UTC
Created attachment 1510 [details]
Screen dumps showing specifics for Page 11


These screen dumps show page 11 before and after it has been extracted from
Acrobat.

In this case font information and the 'View only' parts are created with Quite
Revealing.

- Jacob
Comment 7 Igor Melichev 2005-07-13 04:07:15 UTC
I recieved the 2035M test file from user on a DVD.
WinZip compresses it to 1365M - still too much.
Comment 8 Igor Melichev 2005-07-13 04:38:45 UTC
I do not see anything wrong with the document created.
pdfwrite may create several subsets from a single font, when the single font is 
used with different encodings. Especially when an original font contains over 
256 glyphs and various encodings define incompatible subsets (Rather that 
special case is probably not applicable to the test document, the algorythm 
must be general).

In contradiction to Adobe Distiller, Ghostscript preserves the original 
encoding, i.e. it never re-encodes a text. We choosen this way sinse we've got 
sample documents and fonts, in which Adobe standard glyph names are used to 
name some instandard glyphs, such as the name /C for a Russian character "Tse".

Since Jacob found out that Acrobat 5.0.1 produces a correct result, I think the 
problem to be reported to Adobe. In same time I'll ask to assign 
the "bountiable" status to this bug - maybe somebody else will bring an idea.
Comment 9 Igor Melichev 2005-07-13 05:11:18 UTC
Created attachment 1526 [details]
decompr-attachment-g.pdf 

Decompressed attachment 1507 [details]
Comment 10 Igor Melichev 2005-07-13 05:12:05 UTC
Created attachment 1527 [details]
decompr-attachment-a.pdf 

Decompressed attachment 1508 [details]
Comment 11 Igor Melichev 2005-07-13 05:19:08 UTC
I compare contents in the decompressed 1526/1507 and 1527/1508, regarding to 
the telephone number starting with "Tlf." :

q
8.33333 0 0 8.33333 0 0 cm BT
/R113 13.9705 Tf
0.998824 0 0 1 477.462 389.39 Tm
(T)Tj
-0.13972 Tc
7.51634 0 Td
[(lf)38.9587(.)]TJ
0 Tc
10.4359 0 Td
[( )10.0189(8)9.99792(7)9.99792( )10.0525(2)9.99792(7)9.99792( )]TJ
-0.13972 Tc
38.5015 0 Td
(11)Tj
0 Tc
15.2279 0 Td
( )Tj
-0.13972 Tc
2.68253 0 Td
(00)Tj
ET
Q

This fragment appears same in both files, and refers to same font resource with 
same Encoding and same FontDescriptor. However the embedded font file "55 0 
obj" appears different, and CharSet is slightly different. Since 1526/1507 
works fine, I guess that Acrobat wrongly associated glyphs with glyph names in 
the font file "55 0 obj". 
Comment 12 leonardo 2007-08-29 19:52:29 UTC
Passing to Ken since he handles pdfwrite from now.
Comment 13 Ken Sharp 2007-10-02 06:25:58 UTC
Help....

I don't have the original PostScript file (and at 2Gb I don' thin I want it
either ;-), nor a file which reproduces the problem. I've picked up both the PDF
files, and examined them, and yes, the PDF file apparently produced from 'Save
As' in Acrobat is indeed wrong.

However, I can't reproduce this when saving the 'original' PDF file from any of
Acrobat 6, 7 or 8. So either I'm doing something wrong, the issue can't be
reproduced this way, or there's something I don't know about how to save the
file ;-)

I've spent a bit of time trying hard to reproduce the issue, but without
something to go on I'm afraid I'm going to have to leave this one alone.

Comment 14 Ken Sharp 2007-10-03 01:08:15 UTC
I've tried opening the one page attachment 'from Ghostscript', and then resaving
using File->Save As, from copies of Acrobat Professional versions 6.0, 7.0.9 and
8.0. In all three cases the resulting PDF file was fine when opened in any
version of Acrobat (Professional). 

Checking the internals of the files, I see that the font names have changed
(which confirms Acrobat actually did something), but the Myraid fonts themselves
do not have the altered CharSet changes observed by Igor, and present in the
'after' file.

So although I can see a problem with the attachment 'AFTER save from Acrobat',
I'm not able to reproduce the problem, which makes it kind of hard to work on.

I've asked Igor to bring the DVD along to the next staff meeting where I'll take
a copy of the original PostScript file and try to reproduce the issue from there.

In the meantime I'm dropping this issue again, unless someone can tell me how to
reproduce the problem with the files here.

Jacob, if you're listening, do please let me know what I'm doing wrong...
Comment 15 Jacob Schäffer 2007-10-03 08:34:40 UTC
I don't have the original job anymore. However, we have run several similar 
jobs with GPL Ghostscript 8.57 and 8.60 without seeing the problem.

In general GPL Ghostscript 8.57 and 8.60 seem to produce much better results 
in regard to fonts.

I think you should let this error on "pause" since it appear not reproducable. 
I'll report it and upload new test files if/when it shows up again.

All the best
Jacob Schäffer
Grafikhuset
Denmark
Comment 16 Ken Sharp 2007-10-03 09:05:03 UTC
Hmm, the odd thing was that I couldn't reproduce the problem with the original
(8.51 produced) PDF file.

Still, if it seems to be OK for you now, its possible that an update to Acrobat
has resolved the problem. I'll reduce the priority so it falls down to the
bottom of my list. Of course if you find another example I can look again. I
will get the original file from Igor, but not until November.

Anyway, thanks for the feedback Jacob, let me know if this happens again.
Comment 17 Jacob Schäffer 2007-10-03 09:58:14 UTC
Ken Sharp wrote:
< Hmm, the odd thing was that I couldn't reproduce the problem with the 
original (8.51 produced) PDF file. >

It certainly makes a difference with Acrobat if the fonts used actually is 
installed on the system or not. It seem that Acrobat does perform differently 
in regard to consolidating fonts when the font is available locally at the 
time of consolidating.

Perhaps that would explain that you can't reporduce the problem.

Let me know when you have the original PS file from Igor. I'll then dig into 
our system to find the fonts you potentionally need to install to reproduce.

All the best
Jacob Schäffer
Grafikhuset
Denmark
Comment 18 Ken Sharp 2007-10-04 00:16:29 UTC
Aha, you have to have the fonts installed ? Well that would explain why I can't
reproduce it for sure, I don't have most of the fonts, in particular the Myriad
and Myriad-Bold fonts which I think are being 'consolidated' by Acrobat.

To be honest, this really does sound like an Acrobat bug, and its not clear to
me that there's much we can do about it. We need to define multiple fonts in
order to address different Encodings, if Acrobat tries to collapse thos into
one, and messes it up, I'm not sure there's much we can do.

But looking at what Distiller does would be useful.

I won't get the file from Igor until after November 6th, but if you have the
font Myriad-Bold I think that's the one causing the problem.

Comment 19 Ken Sharp 2010-09-29 07:21:54 UTC
This has been idle for three years now, so I'm going to close it, It seems that whatever the original problem was it has been resolved some time in the past.

Of course, should it reappear the issue can be reopened.