Bug 691370 - Dictionary size not limited for PDF/A-1
Summary: Dictionary size not limited for PDF/A-1
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 8.71
Hardware: PC Windows XP
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-06-06 12:00 UTC by mw
Modified: 2010-06-15 07:09 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description mw 2010-06-06 12:00:03 UTC
The PDF reference limits the max size of a dictionary to 4095 entries.
This limit is contained at least in version 1.4 and 1.5 of the reference.

I have seen that GS will happily create dictionaries with more than that number of entries (sorry that I can not provide a simple test file).
Comment 1 Ken Sharp 2010-06-06 17:00:45 UTC
(In reply to comment #0)
> The PDF reference limits the max size of a dictionary to 4095 entries.
> This limit is contained at least in version 1.4 and 1.5 of the reference.

This is not a limit of the PDF language, it is an implementation limit in the version of Acrobat which was current at that time (approximately July 2000).

There isn't any real point in limiting the production of PDF file, even old format PDF files, to the implementation limit of an archaic PDF consumer.

Especially since pdfwrite will only produce dictionaries of the required size. There is not usually any way to substitute a single large dictionary with a number of smaller dictionaries, so if we were to limit the size of dictionaries the only option would be to fail the conversion with a limitccheck error.

Since non-Adobe consumers of PDF 1.4, as well as later Adobe consumer, are able to read PDF 1.4 files with large dictionaries, we won't change the current behaviour.
Comment 2 mw 2010-06-07 08:24:23 UTC
Ken,

thank you for the real helpful answer.

You are completely right - the table inth references states "Table C.1 describes the architectural limits for Acrobat viewer applications running on 32-bit machines."

Sorry that I did not read that completely.
FYI: That thing popped up, because some preflight tools complain about dictionaries with more than 4095 entries in PDF 1.4 files.

Best regards,
Markus
Comment 3 mw 2010-06-07 08:46:16 UTC
Sorry to comment this again, but while it is no error for PDF 1.4, for PDF/A-1 this is indeed an error. The PDF/A-1 standard defines that the architectural limits of PDF 1.4 shall not be violated.
Comment 4 Ken Sharp 2010-06-07 13:24:18 UTC
(In reply to comment #3)
> Sorry to comment this again, but while it is no error for PDF 1.4, for PDF/A-1
> this is indeed an error. The PDF/A-1 standard defines that the architectural
> limits of PDF 1.4 shall not be violated.

Given that the dictionary would only be exceeded if required, I'm doubtful that this would ever happen in practice when producing a PDF/A file. Of course if you can supply an example I'll look into it further.
Comment 5 mw 2010-06-07 19:58:56 UTC
I have seen that once, but unfortunately I have no test case for you.
But as there is currently no limitation to the dictionary size, it will happen (given the input is complex enough). 

You said:

> There is not usually any way to substitute a single large dictionary 
> with a number of smaller dictionaries, so if we were to limit the 
> size of dictionaries the only option would be to fail the conversion 
> with a limitcheck error.

I suggest handling that case based on the value of PDFACompatibilityPolicy.

Given one of the limits is exceeded, option 1 should terminate file creation with an error.

P.S: I would consider it helpful if there was generally a third option for PDFACompatibilityPolicy:

2 - PDF creation is terminated with an error.
Comment 6 Ken Sharp 2010-06-14 12:05:12 UTC
(In reply to comment #5)
> I have seen that once, but unfortunately I have no test case for you.
> But as there is currently no limitation to the dictionary size, it will happen
> (given the input is complex enough).

Complexity isn't really the issue with dictionary sizes. Also I imagine that fonts, for example, are probably exempt from this limit.

However, revision 11372 introduces a limit on dictionary and array sizes when producing PDF/A output. Strings are already limited to a compatible value in GS so that won't be a problem. 

Obviously without a test file I can't say whether this actually does anything useful...
 
> I suggest handling that case based on the value of PDFACompatibilityPolicy.
> 
> Given one of the limits is exceeded, option 1 should terminate file creation
> with an error.

Or fall back to a regular PDF file as per the documentation. WHich is what we do in this case (see below).
 
> P.S: I would consider it helpful if there was generally a third option for
> PDFACompatibilityPolicy:

That was the reason for making the policy an integer, I simply didn't have a reason to include an abort option previously. This has now been added (and documentation updated).

Because it is now possible to select an abort, I'm choosing to treat Policy 1 (drop element and continue to make PDF/A) as option 0 (just make a regular PDF file) when it isn't possible to simply eliminate the troubleseome object.
Comment 7 Ken Sharp 2010-06-14 12:06:11 UTC
Oops, forgot to note; the patch is here:

http://ghostscript.com/pipermail/gs-cvs/2010-June/011194.html

Documentation update here:

http://ghostscript.com/pipermail/gs-cvs/2010-June/011192.html
Comment 8 mw 2010-06-14 19:37:11 UTC
(In reply to comment #7)
> Oops, forgot to note; the patch is here:
> http://ghostscript.com/pipermail/gs-cvs/2010-June/011194.html

That looks really good, but you should check the other spot(s) where PDFACompatibilityPolicy is used. The current implementation will fall back to the behaviour of policy 0 if PDFACompatibilityPolicy=2 was specified, which is definitely not intended.

Also I think falling back to policy 0 when policy 1 was specified and cannot be followed is dangerous. If a caller specifies policy 1, he/she expects a PDF/A file being created. Silently falling back to policy 0 (and thus creating a plain PDF file) might be troublesome.
Comment 9 Ken Sharp 2010-06-15 07:09:33 UTC
(In reply to comment #8)
> (In reply to comment #7)
 
> That looks really good, but you should check the other spot(s) where
> PDFACompatibilityPolicy is used.

I did, all have been modified.


> The current implementation will fall back to
> the behaviour of policy 0 if PDFACompatibilityPolicy=2 was specified, which is
> definitely not intended.

Yes it is, I documented that way too ;-)

 
> Also I think falling back to policy 0 when policy 1 was specified and cannot be
> followed is dangerous. If a caller specifies policy 1, he/she expects a PDF/A
> file being created. Silently falling back to policy 0 (and thus creating a
> plain PDF file) might be troublesome.

I choose to view this a different way. The user specified that they wanted a file created, even if it meant dropping bits of it off (which is what Policy 1 does). I choose to treat that as a strong desire to have a file produced and fall back to Policy 0 when 1 can't be followed.

In neither case is this done silently, a message is sent on the back channel.