Bug 693215

Summary: A few problem with xmp
Product: Ghostscript Reporter: roucaries.bastien+gs
Component: PDF WriterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED INVALID    
Severity: minor CC: jackie.rosen
Priority: P4    
Version: 9.05   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: test postscript
test.pdf

Description roucaries.bastien+gs 2012-07-23 18:17:11 UTC
According to two debian bug report they are a few problem with xmp:

[0] When run on the attached file, ps2pdf generates invalid XML (and hence,
invalid XMP) on line 9 of the metadata by inserting a character
reference (&#6;).  That character is not allowed in XML and evince
complains when trying to read the XMP metadata.  The broken PDF is
included as well.

For instance

jonas@atreju:~$ evince test.pdf
Entity: line 9: parser error : xmlParseCharRef: invalid xmlChar value 6
.com/xap/1.0/mm/' 
xapMM:DocumentID='uuid:6573824f-4ee4-11ec-0000-7897baf4782&#6;

[1] ghostscript lies about the toolkit used to create the XMP in x:xmptk.
It uses an identifier normally associated with Adobe's toolkit even
though it does not use that toolkit[0].  The attribute is unspecified by
the XMP specification, so it should probably be removed.

[2] The value for rdf:about is not a URI.  XMP is based on RDF, and RDF
requires that rdf:about be a URI.  The XMP specification does as well.
There's really no reason to generate a UUID per document and
rdf:about="" is more meaningful anyway, so again, it should probably be
removed.

[3] "uuid:" (as used in xapMM:DocumentID) is not a registered URI scheme.
There is a perfectly good existing URN specification for that, so
"urn:uuid:" should be used instead.  (The use of "adobe:ns:meta/" as a
namespace is unfortunate, but we're stuck with it now.)

Bastien
Comment 1 roucaries.bastien+gs 2012-07-23 18:17:51 UTC
Created attachment 8788 [details]
test postscript

test postscript
Comment 2 roucaries.bastien+gs 2012-07-23 18:18:12 UTC
Created attachment 8789 [details]
test.pdf
Comment 3 Ken Sharp 2012-08-02 07:54:53 UTC
(In reply to comment #0)
> According to two debian bug report they are a few problem with xmp:
> 
> [0] When run on the attached file, ps2pdf generates invalid XML (and hence,
> invalid XMP) on line 9 of the metadata by inserting a character
> reference (&#6;).  That character is not allowed in XML and evince
> complains when trying to read the XMP metadata.  The broken PDF is
> included as well.

This looks to be a duplicate of bug #692422 which is already fixed, I cannot reproduce the problem.

> [1] ghostscript lies about the toolkit used to create the XMP in x:xmptk.
> It uses an identifier normally associated with Adobe's toolkit even
> though it does not use that toolkit[0].  The attribute is unspecified by
> the XMP specification, so it should probably be removed.
> 
> [2] The value for rdf:about is not a URI.  XMP is based on RDF, and RDF
> requires that rdf:about be a URI.  The XMP specification does as well.
> There's really no reason to generate a UUID per document and
> rdf:about="" is more meaningful anyway, so again, it should probably be
> removed.
> 
> [3] "uuid:" (as used in xapMM:DocumentID) is not a registered URI scheme.
> There is a perfectly good existing URN specification for that, so
> "urn:uuid:" should be used instead.  (The use of "adobe:ns:meta/" as a
> namespace is unfortunate, but we're stuck with it now.)

In all these cases I do not intend to make any changes. These do not cause any current problems and at least one validity checker is known to parse the XML looking for uuid: (number 3 above).

Removing any of these is (in my opinion) more likely to cause problems with validation tools than alleviate them. Of course if any of these can be shown to cause problems I will be happy to rethink this.