Summary: | invalid xml char value in PDF/A XMP | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Matteo Gamboz <gamboz> |
Component: | PDF Writer | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | ||
Priority: | P4 | ||
Version: | master | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: | all files involved |
(In reply to comment #0) > Hi, I get an invalid xml char in the value of the attribute DocumentID of the > rdf:Description tag How are you identifying an invalid character ? > In the attached b.pdf, the problematic char is the sequence "" at line 128. b.pdf is not a valid PDF/A file, in particular it does not have a PDF/A entry and is missing Document Metadata. Did you really set the -dPDFA flag ? I believe that you are not using the master copy of Ghostscript, especially since the document is marked as having been created by Ghostscript 9.04, the master is at Ghostscript 9.05 (PRERELEASE). This looks very much like a duplicate of bug #692422 which has already been fixed. You can either build the current source, or wait until the next release where this fix will be included. *** This bug has been marked as a duplicate of bug 692422 *** (In reply to comment #1) > (In reply to comment #0) > > > Hi, I get an invalid xml char in the value of the attribute DocumentID of the > > rdf:Description tag > > How are you identifying an invalid character ? I was trying to validate another file with https://www.pdf-tools.com/pdf/pdfa-online-pruefen.aspx and found got the xml-error. So I tried to simplify the source as much as I could and still get the error. I ended up with x.tex/b.pdf Trying to validate b.pdf on the above link I get: Validating file "b.pdf" for conformance level pdfa-1b XML line 9:101: xmlParseCharRef: invalid xmlChar value 8. The value of the key N is 4 but must be 3. The document does not conform to the requested standard. The document doesn't conform to the PDF reference (missing required entries, wrong value types, etc.). The document's meta data is either missing or inconsistent or corrupt. Done. But if you were asking just for the xml, I think I would do something like: $ sed -n '/<x:xmpmeta/,/<\/x:xmpmeta/p' b.pdf > f.xml $ xmllint f.xml > > > > In the attached b.pdf, the problematic char is the sequence "" at line 128. > > b.pdf is not a valid PDF/A file, in particular it does not have a PDF/A entry > and is missing Document Metadata. Did you really set the -dPDFA flag ? yep, just did it again > I believe that you are not using the master copy of Ghostscript, especially > since the document is marked as having been created by Ghostscript 9.04, the > master is at Ghostscript 9.05 (PRERELEASE). I am really sorry, about this, I did not realize that a new version was out. I can confirm that with gs 9.05 the problem disappears. |
Created attachment 7852 [details] all files involved Hi, I get an invalid xml char in the value of the attribute DocumentID of the rdf:Description tag The command line that I use is: gs -P -o b.pdf \ -sDEVICE=pdfwrite \ -dProcessColorModel=/DeviceCMYK \ -dPDFA PDFA_def.ps \ x.pdf I am attaching all the involved files: x.tex (to be compiled with pdflatex from texlive 2010), PDFA_def.ps (that differ from the distributed version only in the customization of /ICCProfile, /Title and /OutputConditionIdentifier), the icc profile and the resulting pdf files. In the attached b.pdf, the problematic char is the sequence "" at line 128. I think that the problem can be related to the macro "\section", since without it, the problem also disappears. Thanks