When creating PDF/X compatible PDF files from postsript files, sometimes the File Identifier is written as string like '(text)', most of the time it is written as bytestring '<812FDA7524C3467318D2A0B185604AD0>'. According to ISO 32000-1:2008, section 7.5.5: (Required if an Encrypt entry is present; optional otherwise; PDF 1.1) An array of two byte-strings constituting a file identifier (see 14.4, "File Identifiers") for the file. If there is an Encrypt entry this array and the two byte-strings shall be direct objects and shall be unencrypted. So some programs expect the bytestring, even when the file is not encrypted. The problem is in a signature-server, which is used to sign the pdf. There is no problem with viewers, like Acrobat Reader. At the moment I have modified the ghostscript source to ensure hexadecial output of the file identifier: diff -E -w -r ghostscript-8.70/base/gdevpdf.c ghostscript-8.70,org/base/gdevpdf.c 1346,1347c1346,1347 < psdf_write_string(pdev->strm, pdev->fileID, sizeof(pdev->fileID), PRINT_HEX); < psdf_write_string(pdev->strm, pdev->fileID, sizeof(pdev->fileID), PRINT_HEX); --- > psdf_write_string(pdev->strm, pdev->fileID, sizeof(pdev->fileID), 0); > psdf_write_string(pdev->strm, pdev->fileID, sizeof(pdev->fileID), 0); diff -E -w -r ghostscript-8.70/base/spsdf.c ghostscript-8.70,org/base/spsdf.c 78,79c78 < if ((print_ok != PRINT_HEX) && < (added < size || (print_ok & PRINT_HEX_NOT_OK))) { --- > if (added < size || (print_ok & PRINT_HEX_NOT_OK)) { diff -E -w -r ghostscript-8.70/base/spsdf.h ghostscript-8.70,org/base/spsdf.h 34d33 < #define PRINT_HEX 8
>When creating PDF/X compatible PDF files from postsript files, sometimes the >File Identifier is written as string like '(text)', most of the time it is >written as bytestring '<812FDA7524C3467318D2A0B185604AD0>'. That is not a 'byte string' as defined in the PDF Reference, that is a Hexadecimal string. I don't have a copy of the ISO spec, so I'm working from the PDF Reference 1.7, quotes are from that document. on p155, Table 3.31 PDF Data Types: "byte string A series of 8-bit bytes that represent characters or other binary data. If such a type represents characters, the encoding is not identified." This is detailed further on p157: "byte string (PDF 1.7) Used for binary data represented as a series of 8-bit bytes, where each byte can be any value representable in 8 bits. The string may represent characters or glyphs but the encoding is not known. The bytes of the string may not represent characters. This type is used for data such as MD5 hash values, signature certificates, and Web Capture identification values. " So a byte string is a binary sequence, it is not a hexadecimal encoded sequence. The File Identifier is documented on p847: "File identifiers are defined by the optional ID entry in a PDF file’s trailer dictionary (see Section 3.4.4, “File Trailer”; see also implementation note 162 in Appendix H). The value of this entry is an array of two byte strings." So it seems from this that an ID which is hexadecimal encoded would actually be *incorrect*. >So some programs expect the bytestring, even when the file is not encrypted. >The problem is in a signature-server, which is used to sign the pdf. There is >no problem with viewers, like Acrobat Reader. Its not clear to me what the problem is, you seem to be saying that an application doesn't like these strings unless they are hex encoded, which seems to be incorrect. You also seem to infer that the strings are hex encoded when the file is Encrypted, but not otherwise. I'm doubtful this is the case, I suspect that a stream or other content object, when encrypted, might be hex encoded and that object contains an array of strings which are not hex-encoded when decrypted. When not encrypted the array is an array of byte strings, as expected. Currently it seem to me that the application complaining about the ID is incorrect, but without examples it is very difficult to be sure. Note that Acrobat Preflight, amongst other tools, is known to happily validate GS output as PDF/X conforming. Note that the presence or absence of encryption does not affect whether the strings are byte strings or not, they are *always* byte strings. The presence of encryption simply means that these strings are mandatory, not optional. If you still believe there is a problem please attach some examples which will allow us to reproduce your problem. We will need an example input file (or files) and example command line specifications.
Thank you for your fast reply. I now think, it's not a bug nor a wheak interpretation of the spec in ghostscript. In the few sentences of the excerpt of the spec, that I got, there is no statment, that the string has to be written in a hexadecimal representation. I was missleaded by 'direct object', but that has nothing to do with the kind of the string output. Sincerely yours > That is not a 'byte string' as defined in the PDF Reference, that is a > Hexadecimal string. I don't have a copy of the ISO spec, so I'm working > from the PDF Reference 1.7, quotes are from that document. > The word 'bytestring' was my naming for hexadecimal encoded sting, sorry for the confusion. In the case, where the pdf is not accepted, the created trailer looks like trailer << /Size 19 /Root 1 0 R /Info 2 0 R /ID [([\246\347;J\243RU~\375Nw,B`c)([\246\347;J\243RU~\375Nw,B`c)] >> >> startxref 28887 %%EOF When it is accepted, it looks like this: trailer << /Size 40 /Root 1 0 R /Info 2 0 R /ID [<BEBD81F87396DC4BB86CF445CA694474><BEBD81F87396DC4BB86CF445CA694474>] >> startxref 492248 %%EOF
OK when encrypted all strings and streams in the PDF file (but not other objects such as integers and booleans) are encrypted. What you have is an unencrypted dictionary, with an unencrypted array, which contains encrypted strings. The spec doesn't actually explicitly say so (unless the ISO spec says something different) but the encryption in effect trumps the byte string requirement. This is because decryption must take place before any other use of the string. So after the strings have been unencrypted they will in fact be binary strings again, and therefore acceptable as File IDs.