Bug 692474 - invalid xml char value in PDF/A XMP
Summary: invalid xml char value in PDF/A XMP
Status: RESOLVED DUPLICATE of bug 692422
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: PC Linux
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-29 13:12 UTC by Matteo Gamboz
Modified: 2011-08-29 15:31 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
all files involved (28.28 KB, application/gzip)
2011-08-29 13:12 UTC, Matteo Gamboz
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matteo Gamboz 2011-08-29 13:12:52 UTC
Created attachment 7852 [details]
all files involved

Hi, I get an invalid xml char in the value of the attribute DocumentID of the rdf:Description tag 

The command line that I use is:
gs -P -o b.pdf \
-sDEVICE=pdfwrite \
-dProcessColorModel=/DeviceCMYK \
-dPDFA PDFA_def.ps \
x.pdf

I am attaching all the involved files:
x.tex (to be compiled with pdflatex from texlive 2010),
PDFA_def.ps (that differ from the distributed version only in the customization of /ICCProfile, /Title and /OutputConditionIdentifier),
the icc profile and the resulting pdf files.

In the attached b.pdf, the problematic char is the sequence "" at line 128.

I think that the problem can be related to the macro "\section", since without it, the problem also disappears.

Thanks
Comment 1 Ken Sharp 2011-08-29 14:12:06 UTC
(In reply to comment #0)

> Hi, I get an invalid xml char in the value of the attribute DocumentID of the
> rdf:Description tag 

How are you identifying an invalid character ?

 
> In the attached b.pdf, the problematic char is the sequence "" at line 128.

b.pdf is not a valid PDF/A file, in particular it does not have a PDF/A entry and is missing Document Metadata. Did you really set the -dPDFA flag ?

I believe that you are not using the master copy of Ghostscript, especially since the document is marked as having been created by Ghostscript 9.04, the master is at Ghostscript 9.05 (PRERELEASE).

This looks very much like a duplicate of bug #692422 which has already been fixed. You can either build the current source, or wait until the next release where this fix will be included.

*** This bug has been marked as a duplicate of bug 692422 ***
Comment 2 Matteo Gamboz 2011-08-29 15:31:24 UTC
(In reply to comment #1)
> (In reply to comment #0)
> 
> > Hi, I get an invalid xml char in the value of the attribute DocumentID of the
> > rdf:Description tag 
> 
> How are you identifying an invalid character ?


I was trying to validate another file with
https://www.pdf-tools.com/pdf/pdfa-online-pruefen.aspx
and found got the xml-error. So I tried to simplify the source as much as I could and still get the error. I ended up with x.tex/b.pdf

Trying to validate b.pdf on the above link I get:

Validating file "b.pdf" for conformance level pdfa-1b
XML line 9:101: xmlParseCharRef: invalid xmlChar value 8.
The value of the key N is 4 but must be 3.
The document does not conform to the requested standard.
The document doesn't conform to the PDF reference (missing required entries, wrong value types, etc.).
The document's meta data is either missing or inconsistent or corrupt.
Done.




But if you were asking just for the xml, I think I would do something like:
$ sed -n '/<x:xmpmeta/,/<\/x:xmpmeta/p'  b.pdf > f.xml
$ xmllint f.xml


> 
> 
> > In the attached b.pdf, the problematic char is the sequence "&#8;" at line 128.
> 
> b.pdf is not a valid PDF/A file, in particular it does not have a PDF/A entry
> and is missing Document Metadata. Did you really set the -dPDFA flag ?

yep, just did it again

> I believe that you are not using the master copy of Ghostscript, especially
> since the document is marked as having been created by Ghostscript 9.04, the
> master is at Ghostscript 9.05 (PRERELEASE).

I am really sorry, about this, I did not realize that a new version was out.

I can confirm that with gs 9.05 the problem disappears.