Summary: | Volapyk text after re-saving GS-8.51-created PDF from Acrobat 5, 6 and 7 | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Jacob Schäffer <js> |
Component: | PS Interpreter | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | ||
Priority: | P5 | ||
Version: | 8.51 | ||
Hardware: | PC | ||
OS: | Windows XP | ||
Customer: | Word Size: | --- | |
Attachments: |
Page as it's output from Ghostscript
Result of the same page AFTER save from Acrobat Screen dumps showing specifics for Page 11 decompr-attachment-g.pdf decompr-attachment-a.pdf |
Description
Jacob Schäffer
2005-07-05 08:09:08 UTC
Please attach a sample PostScript file which causes this problem. Also attach the resulting PDF file that you are producing. What do you mean by "without spoiling text contents"? Is the text garbled? Poorly positioned? etc.
I have tried to upload an attachment 3 [details] times now, but with no luck. The file
we have here contain 89 pages, is 94 Mb in size and is created from a 3 Gb
PostScript stream originally produced by Windows Framemaker.
Besides ordinary Windows ANSI text it contain 437 EPS files from various
sources. The 89 pages represent a small part of a Danish telephone directory,
and is interesting in a technical sense because a *HUGE* amount of different
fonts are in use in the same document.
However, upon upload I keep running into time-outs.
Tomorrow I will provide a link from which the file can be downloaded.
- Jacob
Created attachment 1507 [details]
Page as it's output from Ghostscript
This page is extracted from the full document BEFORE the document was re-saved
in full from Acrobat.
Created attachment 1508 [details]
Result of the same page AFTER save from Acrobat
This page is extracted from the full document AFTER the document was re-saved
in full from Acrobat.
I've now studied the situation further. What happens upon save seem to be the same 'consolidation' of fonts that occur when one PDF document is merged into another. Both the Adobe PDF Library and Acrobat have always - and still have under certain circumstances - problems when clashing font names exists. I've only seen this error with TrueType fonts, where Adobe acknowledge that a problem exists with the PDPageAcquirePDEContent API (though it's said to be resolved with 7.0 I'm not so sure that's entirely true). I've never seen this error with Type 1 fonts unless the encoding information was wrong/corrupted. The strange thing here is that all font subsets seem to be properly prefixed in the subset name (i.e. not the font name as displayed in Acrobat). I can't find any clashing prefix names. I can't find encoding errors, but haven't studied the encoding details (compared contents). Hence, I don't think the problem is identical with the well-known Acrobat/PDFL 'clashing font names' bug after all. In this case Ghostscript has created a document where for example Myriad-Bold and Myriad-Italic is stored in multiple versions - with different subsets/encodings. In this document I can count 14 Myriad-Bold variants (all Type 1) of which 5 has CID encoding. The CID encoded variants is found on document pages 6, 48, 77, 78 and 87, but none of those appear to be the offender. On Page 6 I can find 'LORPMQ+Myriad-Bold' (plain Type 1, which seem to be in use on most pages) and 'TACLYS+Myriad-Bold' (Type 1 CID). This combination could traditionally cause a potential problem, but doesn't seem to. On Page 11 a new 'OZEZEA+Myriad-Bold' variant is introduced (which also is a plain Type 1 subset). The interesting thing about this variant is that it's used in parallel with 'LORPMQ+Myriad-Bold' on almost all remaining pages, for example in a phone number such as '86 62 01 96' the text is broken into the following sequence: 86 62 LORPMQ+Myriad-Bold 01 9 OZEZEA+Myriad-Bold 6 LORPMQ+Myriad-Bold When Page 11 is extracted from the document BEFORE it was re-saved, and then is saved separately, those two font versions are consolidated into one, namely 'OZEZEA+Myriad-Bold' which now is used for both font references (both remain, but points now to the same font). The 'LORPMQ+Myriad-Bold' version is effectively gone. I will soon upload screen dumps that demonstrate this. I mention Myriad-Bold specificly because the error is visible throughout the document with this font (errors start on page 2). I haven't studied the encoding details for of each included Myriad-Bold variants nor looked at character mapping for each. ERRATA: It was INCORRECT what I wrote earlier about Acrobat 5.0 also producing the error. It appears that Acrobat 5.0.1 (27-03-2001) actually CAN re-save this document WITHOUT making any harm. Any Acrobat 6.X and 7.X versions we have produce the error as described. Hope this helps. All the best Jacob Schäffer Grafikhuset Denmark Created attachment 1510 [details]
Screen dumps showing specifics for Page 11
These screen dumps show page 11 before and after it has been extracted from
Acrobat.
In this case font information and the 'View only' parts are created with Quite
Revealing.
- Jacob
I recieved the 2035M test file from user on a DVD. WinZip compresses it to 1365M - still too much. I do not see anything wrong with the document created. pdfwrite may create several subsets from a single font, when the single font is used with different encodings. Especially when an original font contains over 256 glyphs and various encodings define incompatible subsets (Rather that special case is probably not applicable to the test document, the algorythm must be general). In contradiction to Adobe Distiller, Ghostscript preserves the original encoding, i.e. it never re-encodes a text. We choosen this way sinse we've got sample documents and fonts, in which Adobe standard glyph names are used to name some instandard glyphs, such as the name /C for a Russian character "Tse". Since Jacob found out that Acrobat 5.0.1 produces a correct result, I think the problem to be reported to Adobe. In same time I'll ask to assign the "bountiable" status to this bug - maybe somebody else will bring an idea. Created attachment 1526 [details] decompr-attachment-g.pdf Decompressed attachment 1507 [details] Created attachment 1527 [details] decompr-attachment-a.pdf Decompressed attachment 1508 [details] I compare contents in the decompressed 1526/1507 and 1527/1508, regarding to the telephone number starting with "Tlf." : q 8.33333 0 0 8.33333 0 0 cm BT /R113 13.9705 Tf 0.998824 0 0 1 477.462 389.39 Tm (T)Tj -0.13972 Tc 7.51634 0 Td [(lf)38.9587(.)]TJ 0 Tc 10.4359 0 Td [( )10.0189(8)9.99792(7)9.99792( )10.0525(2)9.99792(7)9.99792( )]TJ -0.13972 Tc 38.5015 0 Td (11)Tj 0 Tc 15.2279 0 Td ( )Tj -0.13972 Tc 2.68253 0 Td (00)Tj ET Q This fragment appears same in both files, and refers to same font resource with same Encoding and same FontDescriptor. However the embedded font file "55 0 obj" appears different, and CharSet is slightly different. Since 1526/1507 works fine, I guess that Acrobat wrongly associated glyphs with glyph names in the font file "55 0 obj". Passing to Ken since he handles pdfwrite from now. Help.... I don't have the original PostScript file (and at 2Gb I don' thin I want it either ;-), nor a file which reproduces the problem. I've picked up both the PDF files, and examined them, and yes, the PDF file apparently produced from 'Save As' in Acrobat is indeed wrong. However, I can't reproduce this when saving the 'original' PDF file from any of Acrobat 6, 7 or 8. So either I'm doing something wrong, the issue can't be reproduced this way, or there's something I don't know about how to save the file ;-) I've spent a bit of time trying hard to reproduce the issue, but without something to go on I'm afraid I'm going to have to leave this one alone. I've tried opening the one page attachment 'from Ghostscript', and then resaving using File->Save As, from copies of Acrobat Professional versions 6.0, 7.0.9 and 8.0. In all three cases the resulting PDF file was fine when opened in any version of Acrobat (Professional). Checking the internals of the files, I see that the font names have changed (which confirms Acrobat actually did something), but the Myraid fonts themselves do not have the altered CharSet changes observed by Igor, and present in the 'after' file. So although I can see a problem with the attachment 'AFTER save from Acrobat', I'm not able to reproduce the problem, which makes it kind of hard to work on. I've asked Igor to bring the DVD along to the next staff meeting where I'll take a copy of the original PostScript file and try to reproduce the issue from there. In the meantime I'm dropping this issue again, unless someone can tell me how to reproduce the problem with the files here. Jacob, if you're listening, do please let me know what I'm doing wrong... I don't have the original job anymore. However, we have run several similar jobs with GPL Ghostscript 8.57 and 8.60 without seeing the problem. In general GPL Ghostscript 8.57 and 8.60 seem to produce much better results in regard to fonts. I think you should let this error on "pause" since it appear not reproducable. I'll report it and upload new test files if/when it shows up again. All the best Jacob Schäffer Grafikhuset Denmark Hmm, the odd thing was that I couldn't reproduce the problem with the original (8.51 produced) PDF file. Still, if it seems to be OK for you now, its possible that an update to Acrobat has resolved the problem. I'll reduce the priority so it falls down to the bottom of my list. Of course if you find another example I can look again. I will get the original file from Igor, but not until November. Anyway, thanks for the feedback Jacob, let me know if this happens again. Ken Sharp wrote: < Hmm, the odd thing was that I couldn't reproduce the problem with the original (8.51 produced) PDF file. > It certainly makes a difference with Acrobat if the fonts used actually is installed on the system or not. It seem that Acrobat does perform differently in regard to consolidating fonts when the font is available locally at the time of consolidating. Perhaps that would explain that you can't reporduce the problem. Let me know when you have the original PS file from Igor. I'll then dig into our system to find the fonts you potentionally need to install to reproduce. All the best Jacob Schäffer Grafikhuset Denmark Aha, you have to have the fonts installed ? Well that would explain why I can't reproduce it for sure, I don't have most of the fonts, in particular the Myriad and Myriad-Bold fonts which I think are being 'consolidated' by Acrobat. To be honest, this really does sound like an Acrobat bug, and its not clear to me that there's much we can do about it. We need to define multiple fonts in order to address different Encodings, if Acrobat tries to collapse thos into one, and messes it up, I'm not sure there's much we can do. But looking at what Distiller does would be useful. I won't get the file from Igor until after November 6th, but if you have the font Myriad-Bold I think that's the one causing the problem. This has been idle for three years now, so I'm going to close it, It seems that whatever the original problem was it has been resolved some time in the past. Of course, should it reappear the issue can be reopened. |