Bug 693830

Summary: PDF converted to PDF/A fails some validators
Product: Ghostscript Reporter: Mark Berry <web>
Component: PDF WriterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED INVALID    
Severity: normal    
Priority: P4    
Version: 9.07   
Hardware: PC   
OS: Windows 7   
Customer: Word Size: ---
Attachments: PDF input file
PDF output file, converted to PDF/A using Ghostscript
PDFA prefix file
PDFA_def.ps sample that emits /N 3 if ProcessColorModel=DeviceRGB

Description Mark Berry 2013-04-02 18:43:20 UTC
Created attachment 9477 [details]
PDF input file

I'm trying to set up a workflow for archiving PDFs received from various sources in PDF/A-1b format.  When I use GS 9.07 to convert to PDF/A-1b, the output file fails two of three validators that I have tried.

I'm converting using this command:

gswin32c ^
   -dPDFA ^
   -dNOOUTERSAVE ^
   -sProcessColorModel=DeviceCMYK ^
   -dUseCIEColor ^
   -sDEVICE=pdfwrite ^
   -o test_output.pdf ^
   -dPDFACompatibilityPolicy=1 ^
    "C:\GS_Test\PDFA_def.ps" ^
    test_input.pdf

PDFA_def.ps references sRGB_IEC61966-2-1_black_scaled.icc, downloaded from http://www.color.org/srgbprofiles.xalter#v2.

Validation results:

1. http://www.pdf-tools.com/pdf/pdfa-online-pruefen.aspx

"The value of the key N is 4 but must be 3."

If I use a text editor to change "<</Filter/FlateDecode /N 4/Length 2595>>" to "<</Filter/FlateDecode /N 3/Length 2595>>", this test passes.

2. http://www.intarsys.de/pdfa-check

2 Fehler (Errors)
PDFA Ouputintent mit fehlerhaften Angaben (PDFA output intent with erroneous information) - this error goes away if I change "/N 4" to "/N 3".
CMap mit inkonsistenten WMode Werten (CMAP iwth inconsistent WMode values)
	
2 Warnungen (Warnings)
Kein History-Eintrag vorhanden (No history entry present)
Font DWPXCF+Calibri ohne gültigen Metadata-Eintrag. (Font DWPXCF+Calibri without valid metadata entry)

3. Adobe Acrobat XI Professional preflight test

No errors.

------------------

Are these really errors?  What does that "/N" parameter mean?
Comment 1 Mark Berry 2013-04-02 18:44:45 UTC
Created attachment 9478 [details]
PDF output file, converted to PDF/A using Ghostscript
Comment 2 Mark Berry 2013-04-02 18:45:51 UTC
Created attachment 9479 [details]
PDFA prefix file
Comment 3 Ken Sharp 2013-04-03 07:57:29 UTC
This looks to me very much like the PDF/A validators you are using have a bug. The specification does not seem (to me) to require that an OutputIntent ICC Profile has a /N (number of colourants) of 3.

Setting /N to 3 means that the profile is, in effect, an RGB profile, while a value of 4 means it is a CMYK profile. If we insist that a profile has /N = 3 then we can never embed a CMYK profile. Section 6.2.3.3 of the ISO specification says :

"DeviceRGB may be used only if the file has a PDF/A-1 OutputIntent that uses an RGB colour space. DeviceCMYK may be used only if the file has a PDF/A-1 OutputIntent that uses a CMYK colour space."

So clearly we should be able to embed an OutputIntent profile for CMYK which means it must have /N = 4.

Your other question appears to relate to warnings rather than errors. Its not clear to me what 'metadata' the tool is expecting for a font, nor what 'History' entry it is expecting. However since these are warnings I think you can safely ignore them.


If you can get more information from the PDF/A validator suppliers as to what exactly is invalid about the PDF/A we are producing then please feel free to reopen this bug with the additional information. At present I believe our output is conforming to the specification and the validators are in error.
Comment 4 Mark Berry 2013-04-05 07:01:55 UTC
Thank you for the reply and information.  I may look into the Cmap error later, but the /N 4 is more of a concern since it was reported by two sites.  

With the clue that /N refers to the number of colourants, I've made some progress:

- The Ghostscript doc says that "DeviceRGB is not allowed" when creating PDF/A (http://svn.ghostscript.com/ghostscript/trunk/gs/doc/Ps2pdf.htm#PDFA), so I specified DeviceCMYK.  This apparently corresponds to the {4} in this line from pdfa_def.ps:

[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {4} ifelse >> /PUT pdfmark

- I now realize that ICC profiles include a color space.  The sRGB_IEC61966-2-1_black_scaled.icc that I used is in the RGB color space.

- The first two tests are reporting this discrepancy:  I told it to expect 4 colors but I only gave it a 3-color profile.

- If I use a CMYK profile from http://www.adobe.com/support/downloads/detail.jsp?ftpID=3680, e.g. USWebUncoated.icc, the file passes both tests without reporting the /N error.  Unfortunately the output file expands from 25K to 393K.

Why is DeviceRGB not allowed?  Your quote of the standard would seem to indicate that either is valid.  I'm trying to archive documents that are most likely to be viewed on screen, so it I thought sRGB would be the most applicable (and smallest) profile.

I decided to test it:  I used the sRGB_IEC61966-2-1_black_scaled.icc profile, changed the line in pdfa_def.ps to:

[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {3} ifelse >> /PUT pdfmark

and set sProcessColorModel=DeviceRGB.  This created a file containing "/N 3" which passes both validators.  (Only the font complaints remain on the second validator.)

Is it okay to create DeviceRGB output this way?  Have I correctly guessed the change needed to the {icc_PDFA} definition, where I substituted {3} for {4}?  I don't quite understand the syntax with "ifelse" at the end...
Comment 5 Ken Sharp 2013-04-05 07:18:59 UTC
(In reply to comment #4)
> 
> - The Ghostscript doc says that "DeviceRGB is not allowed" when creating
> PDF/A (http://svn.ghostscript.com/ghostscript/trunk/gs/doc/Ps2pdf.htm#PDFA),
> so I specified DeviceCMYK. 

We no longer use Subversion for source control, you are referring to a very old version of the documentation. You say you are using 9.07 so I would suggest that you refer to the documentation which is installed along with that version.

 
> - I now realize that ICC profiles include a color space.  

ICC Profiles don't 'include' a colour space, they are a description of a colour space.


> - The first two tests are reporting this discrepancy:  I told it to expect 4
> colors but I only gave it a 3-color profile.

Getting pdfa_defs.ps correct is, unfortunately, at the moment a job for the user. Ghostscript can only perform basic checks currently.


> - If I use a CMYK profile from
> http://www.adobe.com/support/downloads/detail.jsp?ftpID=3680, e.g.
> USWebUncoated.icc, the file passes both tests without reporting the /N
> error.  Unfortunately the output file expands from 25K to 393K.

Almost certainly because the profile is much larger.

 
> Why is DeviceRGB not allowed?

It is allowed.


> most likely to be viewed on screen, so it I thought sRGB would be the most
> applicable (and smallest) profile.

DeviceRGB and sRGB are not in fact the same.


> Is it okay to create DeviceRGB output this way?  Have I correctly guessed
> the change needed to the {icc_PDFA} definition, where I substituted {3} for
> {4}?  I don't quite understand the syntax with "ifelse" at the end...

It will work, if you want to know more you will have to learn a little PostScript.
Comment 6 Mark Berry 2013-04-05 17:24:59 UTC
(In reply to comment #5)
> We no longer use Subversion for source control, you are referring to a very
> old version of the documentation. You say you are using 9.07 so I would
> suggest that you refer to the documentation which is installed along with
> that version.
 
Aha! And I thought I was cleverly using the very latest trunk version.  Now I found http://ghostscript.com/doc/current/Ps2pdf.htm#PDFA.

> > most likely to be viewed on screen, so it I thought sRGB would be the most
> > applicable (and smallest) profile.
> 
> DeviceRGB and sRGB are not in fact the same.

But if I my ICC profile describes an RGB color space, I should use ProcessColorModel=DeviceRGB, right?

> It will work, if you want to know more you will have to learn a little
> PostScript.

So it seems my problem with /N goes back to the PDFA_def.ps sample only handling DeviceGray and DeviceCMYK. I've updated the sample to emit /N 3 for DeviceRGB, and added a comment re. using an ICC profile with a corresponding color space.  Attached.

I tested with all DeviceGray, DeviceRGB, and DeviceCMYK and got /N 1, /N 3, and /N 4 respectively.  As long as the ICC profile's color space describes the corresponding number of colors, the output passes validation.
Comment 7 Mark Berry 2013-04-05 17:32:24 UTC
Created attachment 9493 [details]
PDFA_def.ps sample that emits /N 3 if ProcessColorModel=DeviceRGB