I haven't tested this but the customer reports that the attached PDF file increases in size to 11 megs when converted to PDF/A by Ghostscript. Here's are some details from the customer's email:
You’ll notice that the input pdf has a watermark on it. I believe there is something wrong with how this watermark is done and is causing the bad interaction with GhostScript. When I configure the engine to leave out the watermark, the PDFA comes out acceptable. Was wondering if there was any insight you could give. Thanks very Much!
Here's the command line they are using:
gswin32c -dPDFA -dUseCIEColor -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=2 -dNOPAUSE -r600 -dBATCH -sOutputFile=out.pdf ocred.pdf
This is probably due to transparency being used to draw the watermark.
The PDF/A version we currently generate doesn't allow transparency, so the
entire page is "flattened" to an image.
I'm sure Ken will correct me if this is not the cause, and he can comment on
the support for the newer (not yet finalized, AFAIK) PDF/A standard that does
allow transparency (PDF 1.4)
(In reply to comment #2)
> This is probably due to transparency being used to draw the watermark.
This is precisely the problem. The 'Demonstration licence' test is drawn using Text rendering mode 2 (fill then stroke). The graphics state sets both CA and ca to 0.399, these are the 'constant alpha' for both stroke and fill operations. Since these are less than 1, the text is not opaque.
> The PDF/A version we currently generate doesn't allow transparency, so the
> entire page is "flattened" to an image.
PDF/A-1 does not permit transparency. Because the watermark covers so much of the page, the whole page is indeed rasterised to an image. Obviously this will also render any text unsearchable, since it will no longer be text.
> I'm sure Ken will correct me if this is not the cause, and he can comment on
> the support for the newer (not yet finalized, AFAIK) PDF/A standard that does
> allow transparency (PDF 1.4)
I believe the PDF/A-2 specification is complete, I did try to purchase a copy before Christmas but had some hiccups with the web site, I'll try again now that people should be back at work.
PDF/A-2 permits transparency, but it may not be generally accepted yet, since it is so very new.
The customer has an additional comment/question:
This is a follow up question to bug 692773. That’s an issue where transparency in a watermark of a pdf converted to PDFA comes out huge. To get around that, we’re doing our stamping without the transparency. But when we do that, the stamp seems to be double printed and misaligned after running it through GhostScript and converting it to PDFA. We’re using GS9.04 win32 but we have fixes applied for bugs 692717, 692569 and 692422.
The commandline we’re using is
gswin32c -dPDFA -dUseCIEColor -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 -dNOPAUSE -r600 -dBATCH -sOutputFile=out.pdf NYTimes_ocr.pdf
Note that when I run the above commandline gs gives many “PDFA Doesn’t allow images with interpolate true” warnings. I don’t know if this relates to the stamp issue we’re seeing. I assumed that this is just a warning telling us a feature was omitted for PDFA.
Attached is NYTimes_ocr.pdf which is the input and has our altered watermark without transparency.
Out.pdf is the output showing the watermark printed very oddly
(In reply to comment #4)
> around that, we’re doing our stamping without the transparency. But when we do
> that, the stamp seems to be double printed and misaligned after running it
> through GhostScript and converting it to PDFA. We’re using GS9.04 win32 but we
> have fixes applied for bugs 692717, 692569 and 692422.
The text is not drawn twice, it is drawn (in effect) 3 times. Two of these are performed by what one might reasonably consider to be the 'watermark', which is contained in a form XObject :
192 0 obj
/BBox [ -0.240005 -3.92409 180.876 13.2359 ]
/Matrix [ 1 0 0 1 0 0 ]
/TT0 198 0 R
/ProcSet [ /PDF /Text ]
/TT0 1 Tf
0 Tc 0 Tw 0 Ts 100 Tz 1 Tr 12 0 0 12 0 0 Tm
/TT0 1 Tf
12 0 0 12 0 0 Tm
The font used here is 'Arial-Black' which is *NOT* embedded in the input PDF file. As a result (and because PDF/A requires that fonts be embedded) pdfwrite embeds Helvetica-Bold as a substitute.
This works perfectly well. However, the image underlying this form has been created with the same watermark in the same location, as part of the image. Obviously as this is part of the image, it is not text, so no font substitution is required, or takes place.
Because the underlying image text was rendered using the correct font, it does not match up with the watermark which uses a substituted font.
You can open the original file in Adobe Acrobat Professional, use the Touch-Up object tool to select and delete the watermark, and you will see that the watermark text remains underneath, but now the surrounding white area which makes the original watermark stand out has vanished (clearly this wasn't present when the original image was marked).
The underlying page is not a single image, so its possible to select and delete portions of it, if you do this you will see that some of the watermark disappears each time, demonstrating that this watermark is in fact part of the image.
Embedding the Arial-Black font, or making it available in an embeddable format so that pdfwrite can embed it as required should eliminate this problem.
Created attachment 8274 [details]
input PDF. Transparency removed from stamp and font embedded
Created attachment 8275 [details]
Output pdfa. rotated 90 degrees
We're so close. And I know I'm being a pain - I'm sorry.
We've removed the transparency from the watermark. Embedded the font we want as well.
Now when converting the pdf to pdfa with the commandline noted previously, the output gets rotated 90 degrees.
We notice that when we remove the rotation from the watermark, the output pdfa does not get rotated.
Should we just quit while we're ahead and not rotate the watermark or is there something else we could do to get GS to not rotate.
The only text in the file now is the watermark.
By default pdfwrite will rotate the output page in order to make the majority of the text run left to right and top to bottom. The watermark text runs left to right, bottom to top so pdfwrite rotates the page.
Try setting "-dAutoRotatePages=false" see "/ghostpdl/gs/doc/Ps2pdf.htm#Orientation"
Thanks for the explanation Ken. I appreciate it.