Bug 691319 - PDF interpreter does not preserve the Flags from at least some annotations when creating pdfmarks
Summary: PDF interpreter does not preserve the Flags from at least some annotations wh...
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.71
Hardware: All All
: P2 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-05-18 15:02 UTC by T. Fischer
Modified: 2013-01-03 20:46 UTC (History)
2 users (show)

See Also:
Customer: 170
Word Size: ---


Attachments
aPDFtest2.zip (4.20 MB, application/binary)
2010-05-19 08:20 UTC, T. Fischer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description T. Fischer 2010-05-18 15:02:18 UTC
I converted a PDF to PDF/A using Ghostscript V.8.71 and checked it with Acrobat Pro Preflight Version 9.2, which reports two errors:
German:
- Mehr als ein Encoding in CMap einer TrueType-Symbolschrift.
- “Encoding”-Eintrag in TrueType Symbol-Schrift nicht zulässig.

Which means something like:
- More than one encoding in CMap of a TrueType-symbol font.
- "Encoding"-entry in TrueType symbol font not allowed.

It marks the RWECorporate-Regular font in the text as PDF/A-error. The font is Arial-like, TrueType, ANSI-coded and embedded in the original PDF.

The same PDF file converted to PDF/A with Ghostscript V.8.70 results in "no problems found" (PDF/A-compliant) in the test with the same Preflight version.

The gs-command used:
gswin32c.exe -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 -sDSCEncoding=PDFDocEncoding -sFONTPATH="C:\WINDOWS\Fonts" -sOutputFile="a.pdfa.pdf" "PDFA_def.ps" "a.pdf"
Comment 1 Ken Sharp 2010-05-18 15:11:23 UTC
Please attach a specimen input file to reproduce the problem.
Comment 2 T. Fischer 2010-05-19 06:42:37 UTC
Please use the following quarterly report Q1/2008 PDF-download for testing purposes:
http://www.rwe.com/app/Mediencenter/Mediencenter.aspx?SelCatID=206
"2008/05 - Bericht über das erste Quartal 2008 - Januar bis März 2008 ( Datum: 14.5.2008, Format: pdf, Größe: 602 KB )"

This file has an additional PDF/A-Preflight error in both Ghostscript versions (which does not concern me - so far):
"CIDSet in Schrift-Untergruppe ist unvollständig."
English: CISSet in font sub group is incomplete.

But the point is, that is has the two additional Preflight-errors from the bug description using Ghostscript V.8.71, which are not produced with Ghostscript V.8.70.
Comment 3 Ken Sharp 2010-05-19 06:59:27 UTC
(In reply to comment #2)
> Please use the following quarterly report Q1/2008 PDF-download for testing
> purposes:
> http://www.rwe.com/app/Mediencenter/Mediencenter.aspx?SelCatID=206
> "2008/05 - Bericht über das erste Quartal 2008 - Januar bis März 2008 ( Datum:
> 14.5.2008, Format: pdf, Größe: 602 KB )"

Please attach the file here. It can be some time before we get to bugs reported by free users, and files can vanish from the URL in that period. I did go to the site but was unable to quickly find the file.
Comment 4 T. Fischer 2010-05-19 08:19:03 UTC
The problem is reproduced with the PDF file from Bug 690768, too:
- aPDF2test.pdf (from the Bug 690768 attachment aPDFAtest.zip)

I attached the slightly modified files again on this bug:
In the ZIP-archiv aPDF2test2.pdf you find:
- PDFA_def_short.ps (a slightly reduced PDFA_def.ps)
- aPDFtest.bat (the gs-command used)
- ISOcoated_v2_300_eci.icc (for completeness)
- aPDF2test.pdf (just a PDF file, taken from Bug 690803)
- aPDF2test.pdfa.pdf (the PDF/A produced with Ghostscript V.8.71 /
aPDFtest.bat)
- aPDF2test.pdfa_report.pdf (the Preflight V.9.2 error report)
Comment 5 T. Fischer 2010-05-19 08:20:06 UTC
Created attachment 6299 [details]
aPDFtest2.zip
Comment 6 Ken Sharp 2010-09-23 08:18:50 UTC
The addition of an Encoding to symbolic TrueType fonts has been removed in revision 11735:


However, this then flags another problem which is that the TrueType font contains multiple CMAP subtables, which is also forbidden in PDF/A. This is non-trivial to solve, as it means we must synthesize a CMAP table instead of copying the existing one. Fixing this will need to wait until the TrueType font generation code is re-written.

Leaving this bug open as a reminder that this work needs to be done.
Comment 7 Marcos H. Woehrmann 2012-06-22 18:22:36 UTC
This is now a customer issue:

We are passing a blank PDF file with a blank comment annotation into Ghostscript to convert it to PDF/A-1b, when we validate the output file our validator reports an error message “More than one encoding in symbolic TrueType font Cmap for the font 'KPSHBO+Calibri'”. We do not get this error message if we remove the comment annotation.
 
Our command line arguments;
-q -dBATCH -dNOPAUSE -dUseCIEColor -dQUIET -sDEVICE=pdfwrite -sProcessColorModel=DeviceRGB -dPDFA -sOutputFile="f88cbf94-c9f1-4ad5-8eae-d49716e0a912-pdfa.pdf" "f88cbf94-c9f1-4ad5-8eae-d49716e0a912.pdf"
 
From my understanding comment annotations should have nothing to do with fonts, as they are rendered in the readers chosen font. Can you think of any reason Ghostscript can’t convert this PDF (f88cbf94-c9f1-4ad5-8eae-d49716e0a912.pdf)  to valid PDF/A-1b?
Comment 11 Ken Sharp 2012-06-23 09:41:59 UTC
(In reply to comment #7)

> We are passing a blank PDF file with a blank comment annotation into
> Ghostscript to convert it to PDF/A-1b, when we validate the output file our
> validator reports an error message “More than one encoding in symbolic TrueType
> font Cmap for the font 'KPSHBO+Calibri'”. We do not get this error message if
> we remove the comment annotation.

The content stream for the page is not 'blank', as it contains a glyph from the Calibri font:

stream
q 0.12 0 0 0.12 0 0 cm
/R7 gs
0 0 0 RG
0 0 0 rg
q
8.33333 0 0 8.33333 0 0 cm BT
/R8 11.04 Tf
0.999402 0 0 1 72 759.56 Tm
()Tj
ET
Q
Q
endstream


Its not obvious in the Bugzilla preview unfortunately, but there is content between the parentheses of the Tj.

> From my understanding comment annotations should have nothing to do with fonts,
> as they are rendered in the readers chosen font. Can you think of any reason
> Ghostscript can’t convert this PDF (f88cbf94-c9f1-4ad5-8eae-d49716e0a912.pdf) 
> to valid PDF/A-1b?

Because the page uses a glyph from the Calibri font. I can't see why removing the annotation would remove the glyph from the page, though the customer doesn't say *how* they remove the annotation.

This just leaves us with the original problem as a customer issue, which will be addressed as soon as time permits a rewrite of the TrueType font embedding.
Comment 12 Ken Sharp 2012-07-02 07:58:38 UTC
The annotation is set to 'non-printing' which is not allowed in PDF/A-1.

In current code the 'PDFACompatibilityPolicy' can be used to determine what is done about this. With the Policy set to 0 (default) a warning is issued and the file is produced as regular PDF instead of PDF/A.

If the Policy is set to 1 then a warning is emitted, the annotation is elided, and the file is PDF/A-1 compliant.

Since we write all embedded TrueType fonts as CIDFonts (required by PDF/A), and these do not include a CMAP subtable, the 'multiple CMAP' problem no longer exists. The files attached to this bug are all converted to compliant PDF/A-1 files (if the PDFACompatibilityPolicy is set to 1).

There is still at least one open bug requiring a rewrite of the TrueType code, but this is no longer required for this bug, so I'm going to close it.
Comment 14 Marcos H. Woehrmann 2012-07-22 23:15:49 UTC
I neglected to remove some confidential information in comment #13 so I've marked it private.  Here it is again, this time properly redacted:


I've reopened this bug since the customer has replied with a response to
comment 12:

Thanks for following up on this bug, I have read the comments on Bugzilla and
Ken mentions that our comment annotation is set to ‘non-printing’ which is not
allowed in PDF/A-1. I am setting the value of the /F flag on our comment
annotation to 28 which if I am reading the spec correctly it should set the 
following bit values;

4 – Printing

8 – No Zoom

16 – No Rotate



I have tried setting the -dPDFACompatibilityPolicy=1 in the Ghostscript command
line and the result of this is that the text of the comment annotation is not
shown in the output PDF/A document. We should be able to show the comment text
in some representation in PDF/A.

"c:\Program Files (x86)\artifex\gswin32c.exe" -q -dBATCH -dNOPAUSE
-dUseCIEColor -dQUIET -sDEVICE=pdfwrite -sProcessColorModel=DeviceRGB -dPDFA
-dPDFACompatibilityPolicy=1
-sOutputFile="C5f3d15db-4e38-4312-b446-df5ece7c06d6\a90bc616-b65f-4b5f-be41-ce28351cada9-pdfa.pdf"
"c:\Program Files (x86)\artifex\Resources\PDFA_def.ps"
"5f3d15db-4e38-4312-b446-df5ece7c06d6\a90bc616-b65f-4b5f-be41-ce28351cada9.pdf"



Another question Ken raised in Bugzilla is how we removed the comment
annotation, we delete the comment annotation in our editor which will just
remove the annotation object from the PDF.
Comment 15 Marcos H. Woehrmann 2012-07-22 23:19:37 UTC
(Ken's reply to comment #14)
> 
> Thanks for following up on this bug, I have read the comments on Bugzilla and
> Ken mentions that our comment annotation is set to ‘non-printing’ which is not
> allowed in PDF/A-1. I am setting the value of the /F flag on our comment
> annotation to 28 which if I am reading the spec correctly it should set the 
> following bit values;
> 
> 4 – Printing
> 
> 8 – No Zoom
> 
> 16 – No Rotate
> 
>  

Yes, this is true when I look at the original file. What I was looking at, however, is what gets through to pdfwrite and in this case the Flags are not present in the annotation dictionary. This presumably means that the PDF interpreter isn't picking them up and passing them along.

This is a problem for Alex to look into.

> 
> 
> Another question Ken raised in Bugzilla is how we removed the comment
> annotation, we delete the comment annotation in our editor which will just
> remove the annotation object from the PDF.

Well the original report related to CMAPs in fonts, and that is *not* caused by the annotation. It is caused by the 'empty' glyph in the page contents stream. I showed that removing the annotation did not prevent the problem, but removing the page content did.

So can only conclude that their editor must be removing the page content along with the annotation.
Comment 16 Ken Sharp 2012-10-01 07:43:55 UTC
commit b81847783db2c17e11d40feeb0812ff7d129aca9

alters the handling of Text annotations to preserve the /F (flags) value from
the annotation dictionary.

The specimen file will now convert to PDF/A correctly.