Summary: | Some watermarked PDF files are rasterized when converting to PDF/A - #1524 | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | jritmeijer |
Component: | PDF Writer | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED INVALID | ||
Severity: | minor | CC: | jritmeijer |
Priority: | P4 | ||
Version: | 9.04 | ||
Hardware: | PC | ||
OS: | Windows 7 | ||
Customer: | Word Size: | --- | |
Attachments: |
A simple PDF file that contains an image. The source of the image is a JPG file
The same file after converting to PDF/A. Note that it has been rasterized A PDF file of a form before conversion. Very basic, no images The same form after conversion to PDF/A. It is not clear to me why this is rasterized. |
Description
jritmeijer
2011-10-13 12:18:44 UTC
Created attachment 7992 [details]
A simple PDF file that contains an image. The source of the image is a JPG file
Created attachment 7993 [details]
The same file after converting to PDF/A. Note that it has been rasterized
Created attachment 7994 [details]
A PDF file of a form before conversion. Very basic, no images
Created attachment 7995 [details]
The same form after conversion to PDF/A. It is not clear to me why this is rasterized.
(In reply to comment #0) > Rasterization > - Text (or any object) using transparency. PDF/A-1 does not support transparency *at all*. You can either accept the current approach which produces an opaque representation, or use -DNOTRANSPARENCY whihc will ignore all transparent operations, but obviously the output will be incorrect. > - The addition of any image, see attached file. This is certainly not the case, I have many examples from customers creating PDF/A-1 output which contain images and which do not fall back to rendering the entire document. > So my question is: What determines if a page is rasterized when converting to > PDF/A? If it can't be represented in PDF/A-1 format. Modulo bugs of course, but in general we need to take very specific action in pdfwrite to render any content, so its unlikely this is accidental. (In reply to comment #5) Thanks for getting back so quickly. Other files that contain images do convert to PDF/A just fine. How can I find out why this one (see attachment) does not? Interestingly when I specify "-DNOTRANSPARENCY" the problem goes away and the image is still visible. This same switch solves the problem with the other file as well (Source-Form.pdf) and the resulting PDF/A file looks identical to the source file. So if there is no list readily available of elements that cause a page to be rasterised, is there some kind of diagnostics output I can enable this so I can run tests on documents that don't behave as expected. Thanks again for your assistance, this is very helpful. (In reply to comment #6) > Other files that contain images do convert to PDF/A just fine. How can I find > out why this one (see attachment) does not? Interestingly when I specify > "-DNOTRANSPARENCY" the problem goes away and the image is still visible. That means the input contains transparency. It may not do anything useful, but the code can't tell that, it contains transparency so we assume it will have some effect and treat it accordingly. NB I haven't actually looked at the file, but this is what must be happening. > This same switch solves the problem with the other file as well > (Source-Form.pdf) and the resulting PDF/A file looks identical to the source > file. Same problem then. > So if there is no list readily available of elements that cause a page to be > rasterised, There are no elements which will cause a page to be rasterised, but the presence of transparency will, because the specification doesn't permit transparency. >is there some kind of diagnostics output I can enable this so I can > run tests on documents that don't behave as expected. Hmm, I don't think so, no. Transparency in PDF documents is unfortunately complicated, the elements can appear in all sorts of places and there is no overriding 'this document contains transparency' in the PDF file. Ah, the pdf_info.ps file supplied as part of Ghostscript will tell you if a given page uses transparency. This tool reports that both your files use transparency. Awesome, thanks. BTW, I will be reporting a few more issues in the next couple of days, I have been saving them up till the end of my project. I am fairly sure at least some of them are real bugs :-) (In reply to comment #8) > Awesome, thanks. > > BTW, I will be reporting a few more issues in the next couple of days, I have > been saving them up till the end of my project. I am fairly sure at least some > of them are real bugs :-) May I ask what the purpose of your project is ? Are you writing something for in-house conversion, or an academic exercise or something ? Nothing academic I am afraid. Evaluating PDF to PDF/A converters for a customer who needs to archive off a bunch of forms. Note that Ghostscript has a little utility that will tell you if a PDF has transparency. gs -- toolbin/pdf_info.ps _____.pdf where _____.pdf is the file you want information on. |