Acrobat 9.1.3 Pro Preflight reports errors with: 1.) Umlaut, parentheses, special characters (e.g. brackets) in pdfmark (Title, Subject, ...) and 2.) ModDate and CreationDate as inconsistent XMP-Metadata with document info (respectively document properties) see also: [gs-devel] PDF/A xmp/document properties escaping http://www.ghostscript.com/pipermail/gs-devel/2009-May/008375.html When the Title or Subject contains braces '(' or ')' they get escaped by '\(' and '\)' respectively. They display correctly unescaped in the document properties/Additional Metadata/Extended tree, But when executing preflight validation of the document, it complains about inconsistent document properties and XMP data in the Title and Subject fields. PDFA_def.ps template: http://www.ghostscript.com/pipermail/gs-devel/2009-January/008083.html Extract: % Define entries to the document Info dictionary : /ICCProfile (ISOcoated_v2_300_eci.icc) % Customize. def [ /Title (Test Title bzw. Titel ©(äöüßÄÖÜ)) % The title as shown in Preflight-Metadata: "Test Title bzw. Titel \251\(\344\366\374\337\304\326\334\)" % BTW: The title shows up alright in the Acrobat-Dialog: % File->Properties...->Additional Metadata...->Advanced->XMP Core-Properties % Datei->Eigenschaften...->Zusätzliche Metadaten...->Erweitert->XMP Core-Eigenschaften /Subject (Test Subject bzw. Thema bzw. IPTC Inhalt Beschreibung) /Author (Test Author bzw. Verfasser Mr. X (c), © Copyright Symbol) %Verfasser(optional) /Producer (Test Producer bzw. erzeugt mit demo-software) /Keywords (Test Keywords bzw. Stichwörter, comma, separated) /Creator (Test Creator bzw. erstellt mit (©) GPL Ghostscript 8.70 PDF Writer) % 4 errors with Acrobat 9.1.3 Pro Preflight: (German Acrobat Version) % Creator unterschiedlich in Dokument-Eigenschaften und XMP-Metadaten % Stichwort nicht einheitlich in Dokument-Info und XMP-Metadaten % Uneinheitliche Angaben zum Autor in Dokument-Eigenschaften und XMP-Metadaten % Uneinheitliche Angaben zum Titel in Dokument-Eigenschaften und XMP-Metadaten % /CreationDate (D:20090917) %no time allowed (problem with timezones), e.g. D:20090917152755+0100 or D:20090917152755Z % /ModDate (D:200808080808Z) % all three date formats are not validated with Acrobat 9.1.3 Pro Preflight: % Error message: inconsistent XMP-Metadata with document info, respectively document properties (German Acrobat Version) % - Die Angaben zum Erzeugungsdatum in Dokument-Eigenschaften und XMP-Metadaten ist nicht einheitlich % - Letztes Änderungsdatum nicht einheitlich in Dokument-Info und XMP-Metadaten /DOCINFO pdfmark Command used to create PDF/A: gswin32c.exe -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -sFONTPATH=C:\WINDOWS\Fonts -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -sOutputFile=out.pdf PDFA_def.ps a-Eff.pdf uses ISOcoated_v2_300_eci.icc from ECI_Offset_2009: http://www.eci.org/doku.php?id=de:downloads
Attached demo PDF to reproduce the PDF/A-errors to Bug 690803.
This is, or may be, expected behaviour. Please see gs/doc/ps2pdf.htm under the -sDSCEncoding switch. If you do not specify a value for this then there are two problems. Firstly the parentheses remain escaped (which is required for PostScript strings), secondly the data is copied directly to the output, including any octal escapement which is required for PostScript. The first issue is resolved with revision 10142: http://ghostscript.com/pipermail/gs-cvs/2009-October/009866.html Which 'unescapes' data. Note that octal escapes will be converted into single byte 'binary' data. This may or may not work with the other characters you describe (Umlauts), you haven't attached a PDFA_def file to work with and I do not trust cut and paste from HTML, so I can't be sure. If it does not work then you might like to try setting DSCDocEncoding to PDFDocEncoding which will convert the characters into Unicode, using the PDFDocEncoding to decide which characters are represented by the binary values. I do not see a problem with the ModDate or CreationDate when Acrobat preflight is applied. Since the parentheses and escapement issue is dealt with under bug #690471, the other 'special' characters are probably dealt with under the DSCDocEncoding and I don't see a problem with the dates. closing this as 'worksforme'. If you continue to see a problem please reopen the issue and attach an example file and an example PDFA_def.ps file.
Created attachment 5454 [details] PDFA_def.ps (with Umlaut in Title, Subject, Keywords)
Thank you! No more problems with the brackets using the parameter "-sDSCEncoding=PDFDocEncoding". Cannot check the umlaute (הצ�ִײ��) since my version 8.70 (2009-07-31) is before October 2009. Could not find a nightly or developer build. Ghostscript command used: gswin32c.exe -dPDFA -dBATCH -dNOPAUSE -dNOOUTERSAVE -sFONTPATH=C:\WINDOWS\Fonts -dUseCIEColor -sProcessColorModel=DeviceCMYK -sDEVICE=pdfwrite -dPDFACompatibilityPolicy=1 -sDSCEncoding=PDFDocEncoding -sOutputFile=aOut.pdf PDFA_def.ps a.pdf
Since the only 'build' (executable) we ever distribute is for windows, this comment assumes that you need help building on Windows. The current sources are available from our repository svn.ghostscript.com The web page at that host has the overview. Check out the source using: svn co http://svn.ghostscript.com/ghostscript/trunk/gs The TortoiseSVN svn client is used by several Artifex staff on Windows. Then build it from a MS-DOS prompt window by changing to the top 'gs' directory and using: nmake -f psi/msvc32.mak The resulting binary .exe and .dll will be in 'bin' Note that the free 'Visual Studio C++ Express' from Microsoft is all you need to build Ghostscript (if you don't already have MSVC).
If you don't select an Encoding using DSCEncoding, then the data written to the XMP section is incorrect, even with the escapement changes. Although the characters are no longer escaped, they is required to be in Unicode for the XMP section, and unless already is Unicode will be written incorrectly as the string is written without conversion. Even if it is Unicode, the PDF Metadata (/Title etc) is not, its encoded using PDFDocEncoding. So its probably best to set -sDSCEncoding=PDFDocEncoding whenever you use any characters outside the regular 7-bit ASCII set.
Created attachment 6297 [details] aPDFAtest.zip
The attached file aPDFAtest.zip reproduces the Preflight-error described in comment 0 (and has the additional two Preflight-errors from Bug 691319, too). In the ZIP-archiv you find: - PDFA_defUmlaut.ps (like the attached PDFA_def.ps, only the gs version no changed in the text) - aPDFtest.bat (the gs-command used) - ISOcoated_v2_300_eci.icc (for completeness) - aPDF2test.pdf (just a PDF file, taken from Bug 690803) - aPDF2test.pdfa.pdf (the PDF/A produced with Ghostscript V.8.71 / aPDFtest.bat) - aPDF2test.pdfa_report.pdf (the Preflight V.9.2 error report)
if you uncomment the two following lines in PDFA_defUmlaut.ps: /CreationDate (D:200808080808Z) /ModDate (D:200808080808Z) then Preflight reports two additional errors: ModDate and CreationDate: inconsistent XMP-Metadata with document info (respectively document properties) see attachment aPDF2test.pdfa_report2.pdf
Created attachment 6298 [details] aPDF2test.pdfa_report2.pdf
(In reply to comment #9) > if you uncomment the two following lines in PDFA_defUmlaut.ps: > /CreationDate (D:200808080808Z) > /ModDate (D:200808080808Z) > > then Preflight reports two additional errors: > ModDate and CreationDate: inconsistent XMP-Metadata with document info > (respectively document properties) This is a really *bad* idea. The CreationDate and ModDate are normally filled in at the time the document is created. You can override them in PostScript, but you can't override the XMP metadata creation that way. As a result the two will not match if you do that. It is not intended that it is possible to create a false CreationDate and/or ModDate by using PostScript.
(In reply to comment #8) > The attached file aPDFAtest.zip reproduces the Preflight-error described in > comment 0 (and has the additional two Preflight-errors from Bug 691319, too). Executing the aPDFtest.bat (with suitable alteration to the path for GS), using current source code, the file passes Acrobat 9 pre-flight without error (except for the TT issue noted below). I've checked the Title, Subject and Author fields in both the Info dictionary and the XMP metadata. The characters with umlauts, the parentheses and the copyright symbol are all present in both sets of metadata, and appear to match. Given that the preflight doesn't complain I'm inclined to believe they do match. The issue with Encodings being applied to Symbolic TrueType fonts already has several bug reports against it (#690744, #691036, #691319). Closing the issue as worksforme, as the original issue was fixed in rev 10142 and by setting DSCEncoding to PDFDocEncoding, the TrueType issue is being separately tracked, and the attempt to modify dates is not supported.