Bug 693083 - Trouble merging 5 PDFs created with MS Word 2010
Summary: Trouble merging 5 PDFs created with MS Word 2010
Status: RESOLVED DUPLICATE of bug 689236
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: PC Windows 7
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-03 21:25 UTC by Kilobajt
Modified: 2012-06-05 08:01 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
Five sample PDFs to reproduce the issue (372.55 KB, application/x-zip-compressed)
2012-06-03 21:25 UTC, Kilobajt
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kilobajt 2012-06-03 21:25:10 UTC
Created attachment 8651 [details]
Five sample PDFs to reproduce the issue

Merging five simple PDF files (created with Word 2010, they consist of single number (font Calibri)) to PDF/A file results in error on page 3 - page is blank and Acrobat Reader complains about not being able to extract font Calibri. It might be Word 2010 issue, because if I convert each file separately to PDF/A and then merge, resulting file is OK. Here are the parameters I used:
-dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -r300 -dCompatibilityLevel=1.4 -sDEVICE=pdfwrite -sOutputFile=out.pdf -sProcessColorModel=DeviceGray -dPDFACompatibilityPolicy=1 pdfa_def.ps 1.pdf 2.pdf 3.pdf 4.pdf 5.pdf

I also attached sample files.

Thanks, Greg
Comment 1 Ken Sharp 2012-06-05 08:01:07 UTC
(In reply to comment #0)

> Merging five simple PDF files (created with Word 2010, they consist of single
> number (font Calibri)) to PDF/A file results in error on page 3 - page is blank
> and Acrobat Reader complains about not being able to extract font Calibri. It
> might be Word 2010 issue, because if I convert each file separately to PDF/A
> and then merge, resulting file is OK.

The problem does indeed seem to be at least related to the production of the PDF file by Word. Each file contains a subset of the font Calibri, and each subset font has the *same* prefix.

If the rules for generating prefixes are followed this is extremely unlikely, and so I surmise that Word is generating the prefix in a way which makes it predictable.

When pdfwrite sees the 5 separate subset fonts it believes they are all the same font (as the prefixes are all the same) and consolidates them into a single font.

It seems that our somewhat naive TrueType code is emitting a broken GSUB table, because it is using the same one for the 5 merged fonts, and that simply won't work. At the moment there is no way to address this, it will have to wait until the TrueType font code is rewritten. However, I'd like to point out that the initial problem is indeed from Word, which should not be using the same prefix for every subset of the font.

Although not strictly a duplicate I'm going to bundle this one in with #689236 for future work on TrueType fonts.

*** This bug has been marked as a duplicate of bug 689236 ***