Bug 692589 - "Error CIDSystemInfo and CMap dict not compatible" when converting merged file to PDF/A - #1522
Summary: "Error CIDSystemInfo and CMap dict not compatible" when converting merged fil...
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 9.04
Hardware: PC Windows 7
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-13 16:32 UTC by jritmeijer
Modified: 2014-02-17 04:41 UTC (History)
4 users (show)

See Also:
Customer:
Word Size: ---


Attachments
The file before conversion to PDF/A is carried out. (155.47 KB, application/pdf)
2011-10-13 16:32 UTC, jritmeijer
Details
The file after conversion to PDF/A (14.40 KB, application/pdf)
2011-10-13 16:33 UTC, jritmeijer
Details
A screenshot of Acrobat's preflight output. (84.44 KB, image/png)
2011-10-13 16:33 UTC, jritmeijer
Details
PDF based report of the preflight output (259.89 KB, application/pdf)
2011-10-13 16:33 UTC, jritmeijer
Details
The file before conversion to PDF/A is carried out (Updated) (19.44 KB, application/pdf)
2011-10-17 13:28 UTC, jritmeijer
Details
The file after conversion to PDF/A (Updated) (12.74 KB, application/pdf)
2011-10-17 13:29 UTC, jritmeijer
Details
input with itext header (78.69 KB, application/x-pdf)
2012-01-17 20:41 UTC, William Fausser
Details
output PDF/A (490.74 KB, application/x-pdf)
2012-01-17 20:41 UTC, William Fausser
Details
PDFA_def.ps (1.47 KB, application/postscript)
2012-01-17 20:44 UTC, William Fausser
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jritmeijer 2011-10-13 16:32:10 UTC
Not sure to what degree we can attribute this one to Ghostscript. However, the fact is that the file cannot be converted to a PDF/A file that passes Acrobat X's preflight validation test for PDF/A.

The situation is as follows:
1. Two (very basic) files generated by MS-Word are merged together using a 3rd party library (See attached)
2. This merged file is then passed through Ghostscript to convert to PDF/A.
3. When checking the converted file using Acrobat X's preflight facility the following validation error is raised: "Error CIDSystemInfo and CMap dict not compatible".

The Command line options and definition file are as per Bug #692587.

All related files are attached, specifically:
1. Original file.pdf: The file before conversion to PDF/A is carried out.
2. PDFA File.pdf: THe file after conversion to PDF/A
3. Merged preflight screenshot.PNG: A screenshot of Acrobat's preflight output.
4. Merged preflight report.pdf: PDF based report of the preflight output

I did a search through your bug database and couldn't find anything related.
Comment 1 jritmeijer 2011-10-13 16:32:46 UTC
Created attachment 7996 [details]
The file before conversion to PDF/A is carried out.
Comment 2 jritmeijer 2011-10-13 16:33:10 UTC
Created attachment 7997 [details]
The file after conversion to PDF/A
Comment 3 jritmeijer 2011-10-13 16:33:32 UTC
Created attachment 7998 [details]
A screenshot of Acrobat's preflight output.
Comment 4 jritmeijer 2011-10-13 16:33:56 UTC
Created attachment 7999 [details]
PDF based report of the preflight output
Comment 5 Henry Stiles 2011-10-14 16:59:11 UTC
Can we please have access to the original problem report #1522.  Where can we find that?  Thanks.
Comment 6 jritmeijer 2011-10-14 17:04:58 UTC
(In reply to comment #5)

Sorry, 1522 is my internal tracking number. All the available information is in this report.
Comment 7 Henry Stiles 2011-10-14 18:04:10 UTC
(In reply to comment #6)
> (In reply to comment #5)
> 
> Sorry, 1522 is my internal tracking number. All the available information is in
> this report.

Well we like to know who we are fixing bugs for, often commercial users are better off with a commercial release of ghostscript and a support contract.  Is this bug associated with:

http://www.sharepointappmarket.com/ads/muhimbi-pdf-converter-for-sharepoint/

or some other commercial product?
Comment 8 jritmeijer 2011-10-14 20:30:43 UTC
Hi Henry,

Getting a commercial support agreement in place with Artifex is the first thing I tried when I started to investigate PDF to PDF/A converters. Unfortunately, according to the Artifex  sales team it is not possible to get (paid) support on an individual project basis, which is what I need this for.... for now.

I'll contact your sales team again on Monday and see what is possible as I am very interested in formalising support. 

With regards to this bug, if it is one, does it matter who reports it?

Have a nice weekend,

Jeroen
Comment 9 Henry Stiles 2011-10-14 21:30:29 UTC
> 
> With regards to this bug, if it is one, does it matter who reports it?
> 

No, Ken will look at it when he gets a chance, customers, of course, get priority service and we help free users as time permits.  Thanks for your info.
Comment 10 Ray Johnston 2011-10-15 03:23:42 UTC
On the subject of our priorities...

Uncommon cases from non-customers don't really get much attention from us,
particularly w.r.t. PDF/A compliance tools (which are themselves often flawed).

It takes quite a bit of our senior engineer time for duplication and analysis
and finding a change that doesn't break some other tool can be VERY time
consuming.

We (often) do this for customers, but free users may have to wait quite a while.

Frankly, a "free user" that is selling software that relies on our work (even
if that use manages to squeak by the GPL rules) may not even get as much
attention as another user of the GPL that is also developing or maintaining
GPL or other OpenSource software. I may be "speaking out of turn", but want to
make sure that the submitter realizes that licensing really has advantages,
not the least of which is that the end user doesn't have to separately download
GPL Ghostscript, but that it can legally be distributed with the application
that makes use of our software.
Comment 11 jritmeijer 2011-10-15 08:13:41 UTC
Hi Henry,

I understand and agree with your support policy, it makes perfect sense. You are not speaking out of turn at all.

For me this is all an experiment based on a request from a single customer for a piece of functionality that is a very small part of the overall product. We'll see where it goes from here.

In the meanwhile I'll reopen the dialogue with your sales team and see if we can come to some kind of agreement that makes financial sense to both parties.

Unless you tell me otherwise, when I find what I think is a bug I'll report it in this system. You are then naturally free to action or ignore it :-)

Have a nice weekend,

Jeroen
Comment 12 Ken Sharp 2011-10-15 08:42:29 UTC
I can see the problem with the supplied file, it has null entries. However I cannot reproduce this problem.

I've tried the current master code and the 9.04 binary release, neither has this issue, whether using the pdfa_def.ps and command line supplied in Bug #692587 or not.

Are you certain you are using Ghostscript 9.04 ? The ToUnicode CMap in the file you have supplied (after conversion) does not match the output from the released 9.04 and the structure of the files is different which I would not expect.

The Title Producer and other metadata fields contained in the supplied invalid file do not match the data in the pdfa_def.ps file, so it cannot have been produced using the same pdfa_def.ps file.

Fundamentally, I am unable to reproduce the problem, and no other customer has ever reported this issue (older versions of Acrobat also complain when pre-flight is applied so this is not unique to or new in Acrobat X).

If you can reproduce the problem reliably I'm still willing to look into it (please reopen the bug), but you need to give me a way to reproduce it.
Comment 13 jritmeijer 2011-10-15 16:41:18 UTC
9.04 64bit is the only version on my system. To the best of my knowledge the steps provided are correct, but I will double check and get back to you.

Thanks for your support.
Comment 14 Ken Sharp 2011-10-17 09:59:08 UTC
(In reply to comment #13)
> 9.04 64bit is the only version on my system. To the best of my knowledge the
> steps provided are correct, but I will double check and get back to you.

Tested again using a 64-bit version of Windows, the 64-bit released code from the download site and the command line and pdfa_def file from the report #692587. The resulting file passes pre-flight on Acrobat and does not have the null entries in CIDSystemInfo.

Still 'WORKSFORME'
Comment 15 jritmeijer 2011-10-17 13:27:17 UTC
Comment on attachment 7996 [details]
The file before conversion to PDF/A is carried out.

Incorrect file, please ignore this one.
Comment 16 jritmeijer 2011-10-17 13:28:18 UTC
Created attachment 8015 [details]
The file before conversion to PDF/A is carried out (Updated)

This replaces a previous (incorrect) file.
Comment 17 jritmeijer 2011-10-17 13:29:31 UTC
Created attachment 8016 [details]
The file after conversion to PDF/A (Updated)
Comment 18 jritmeijer 2011-10-17 13:33:46 UTC
Hi Ken,

I tested again and it appears I sent you an incorrect test file. Really annoyed
with myself as I hate it when people do the same to me.

The scenario is actually slightly different from what I reported before:

1. There are 2 separate files that were previously converted to PDF/A (and
validate fine)
2. These 2 files are then merged into a document that is not PDF/A compliant
(See attached file: 'Original file.pdf')
3. This merged document is then fed through Ghostscript resulting in the
attached 'PDFA File.pdf'

I am really sorry for wasting your time with the previous submission.

Jeroen
Comment 19 Ken Sharp 2011-10-17 15:41:54 UTC
This new file does i fact fail as described, I have no idea why as yet.
Comment 20 William Fausser 2012-01-17 20:37:54 UTC
I'm having the same problem.  Attached are the input before conversion
and the PDF/A from the conversion.

following command with messages:
 /home/fausser/ghostscript-9.04/bin/gs -sDEVICE=pdfwrite  -q -dNOPAUSE  -dBATCH  -dPDFA -dUseCIEColor  -sProcessColorModel=DeviceCMYK   -sOutputFile=/home/fausser/pdf77-PDFA1.pdf /home/fausser/PDFA_def.ps /home/fausser/pdf77-fix.pdf
   **** Warning: Outline has invalid link that was discarded.
   **** Warning: Outline has invalid link that was discarded.

   **** This file had errors that were repaired or ignored.
   **** The file was produced by:
   **** >>>> iText1.1 by lowagie.com (based on itext-paulo-142) <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.


INput: pdf77-fix.pdf
Output pdf77-PDFA1.pdf
Comment 21 William Fausser 2012-01-17 20:41:02 UTC
Created attachment 8288 [details]
input with itext header
Comment 22 William Fausser 2012-01-17 20:41:51 UTC
Created attachment 8289 [details]
output PDF/A
Comment 23 William Fausser 2012-01-17 20:44:26 UTC
Created attachment 8290 [details]
PDFA_def.ps
Comment 24 William Fausser 2012-01-18 17:49:11 UTC
Hi,

Most likely my problem :) was  attributed to the Liberation font set 
that was part of an addition tar gzip I download sometime ago.

I substituted Helvetica and did not get any conversion warning or
suggestion of an error.


Perhaps ghostscript should not have returned what it thought was a 
converted PDF/A,  Instead just return the PDF with explicit errors.

BR,
Bill
Comment 25 William Fausser 2012-01-20 16:58:47 UTC
Hi,
   If it is of any help or may provide a clue.....

I'm getting the same problem when using a Calibri font is contained
with the concatenated input pdf.


BR,
Bill
Comment 26 Ken Sharp 2012-01-23 13:53:01 UTC
The real problem is with the PDF interpreter. It does not parse the CIDSystemInfo from CMaps embedded in PDF files, but it *does* parse the CIDSystemInfo from CIDFonts.

When pdfwrite emits the CIDFont and CMap it emits what it has, and for the CMap it has no CIDSystemInfo, so it emits a null.

This is really a PDF interpreter problem, though of course it only exhibits with pdfwrite.
Comment 27 William Fausser 2012-01-23 17:08:23 UTC
Thanks

Do I have to file this bug under PDF Interpreter or is it okay where it
resides?

BR, Bill
Comment 28 Ken Sharp 2012-01-23 17:26:05 UTC
(In reply to comment #27)
> Thanks
> 
> Do I have to file this bug under PDF Interpreter or is it okay where it
> resides?

You didn't report the bug so please do not alter it.
Comment 29 Ken Sharp 2012-02-06 08:11:27 UTC
Re-assigning to Alex to investigate retrieving the CIDSystemInfo entries from the CMap, at least when the output device is pdfwrite.
Comment 30 Alex Cherepanov 2012-03-19 02:46:59 UTC
CIDSystemInfo is now extracted from PDF CMap dictionary and
copied to the embedded CMap resource.

http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=1c558e93a7f63b470880ea44fbf883c61de4b13a

Returning to Ken.
Comment 31 William Fausser 2012-03-19 13:44:51 UTC
Hi,

I installed Alex's change to pdf_font.ps and re-ran my test.

I still get "CIDSystemInfo and CMap dict not compatible" error.


I realize that the staus is still "ASSIGNED" and my retest may have been
premature.

BR,
Bill
Comment 32 Ken Sharp 2012-03-29 15:44:42 UTC
(In reply to comment #31)
> Hi,
> 
> I installed Alex's change to pdf_font.ps and re-ran my test.
> 
> I still get "CIDSystemInfo and CMap dict not compatible" error.
> 
> 
> I realize that the staus is still "ASSIGNED" and my retest may have been
> premature.

Very.

The CIDSystemInfo passed through to pdfwrite is still incorrect. Currently the font is marked as Adobe Japan1 when it should be Adobe Identity. It seems that gs_ttf.ps is arbitrarily assigning Japan1 Oredering.

Even with a hack to test that, Acrobat still complains though I have no real idea why at the moment.

Its still assigned, I'm still looking into it. When its fixed or its status otherwise changes you will be notified.
Comment 33 Ken Sharp 2012-03-30 12:08:09 UTC
Commit:

8eb4118573d2d6959f8578a10f9d76ce9d802799

patch here:

http://ghostscript.com/pipermail/gs-cvs/2012-March/014423.html

should resolve this problem. Note that *both* this and the patch in comment #30 are required.