Bug 696472

Summary: pdfwrite lacks support for /Metadata pdfmark
Product: Ghostscript Reporter: Reinhard Nißl <reinhard.nissl>
Component: PDF WriterAssignee: Ken Sharp <ken.sharp>
Status: RESOLVED FIXED    
Severity: enhancement CC: akasha.jethwa
Priority: P4    
Version: 9.18   
Hardware: PC   
OS: Windows 7   
Customer: Word Size: ---
Attachments: ZUGFeRD meta data sample
XML file which is to be embedded into the PDF file
A conforming ZUGFeRD PDF file sample
ZUGFeRD implementation guide (schema on pages 13 and 14)
minimal test files
ZUGFeRD meta data sample (updated)
The first validated ZUGFeRD PDF document created by GhostScript

Description Reinhard Nißl 2015-12-21 09:36:04 UTC
Created attachment 12194 [details]
ZUGFeRD meta data sample

Hi,

I'd like to create a PDF document with an embedded ZUGFeRD invoice, i. e. a XML file. ZUGFeRD further requires that the PDF contains some meta data which declares its conformity with ZUGFeRD and references for example the embedded XML file with provides the invoice data.

This is where I got stuck at the moment, as GhostScript lacks support for the /Metadata pdfmark. As GhostScript also provides meta data on its own for PDF/A conformity, I cannot work around this limitation by putting my own meta data into the {Catalog} as GhostScript overwrites this entry later.

Therefore I'd like to file a feature request to add support for /Metadata pdfmark.

The attached file can be used to test the implementation.

Thanks in advance.
Comment 1 Ken Sharp 2015-12-22 06:56:21 UTC
The attached file has several problems, firstly it does not include the XmlFileData (stored in d:\\ZUGFeRD-invoice-1.xml), so the file as it stands simply won't work.

Secondly the (apparent) intention of the Metadata pdfmark is to totally replace the Metadata which would be created by pdfwrite. In order to do this your Metadata would have to include the full PDF/A-3b XML, as well as including the Creator, Producer, Title etc in XML format, and these would have to be consistent with the data emitted in the non-XML Info dictionary. The supplied file doesn't do that either.

Nor does the attached file include the entire schema, which is also a requirement, the file only covers the usage of the extensions schema but does not include the schema itself.

Its not entirely clear to me from the (not very good as far as the PDF format is concerned) specification exactly where the additional XML is expected to be found. The only example file I can find doesn't even contain the required fields.

Attempting to use a Metadata pdfmark with Adobe Acrobat Distiller was futile, it appears Distiller always overrides the Metadata too.

I'm going to need (at the very least) an example of a conforming PDF file before I can decide whether this is even possible or not.
Comment 2 Reinhard Nißl 2015-12-22 07:55:16 UTC
Created attachment 12199 [details]
XML file which is to be embedded into the PDF file

As this file was missing, the previously posted sample didn't work.
Comment 3 Reinhard Nißl 2015-12-22 08:19:46 UTC
Created attachment 12200 [details]
A conforming ZUGFeRD PDF file sample

Object 2 0 contains the Metadata.

Regarding "pdfmark Reference Manual", section "Add Metadata to the Catalog (Metadata)", sentence "Otherwise, the metadata associated with the stream XMPStreamName is added to the Catalog object with the key Metadata.":

I think the word "added" is meant that way, that /Metadata pdfmark may be repeated multiple times and Distiller or GhostScript takes care to collect all meta data and put it into the {Catalog} or whichever other object it is meant for.

The EMBED pdfmark may also be repeated multiple times and it is up to Distiller or GhostScript to collect the information and write the /Names /EmbeddedFiles /Names array to the {Catalog}.

At least an implementation like that for /Metadata pdfmark would be very convenient. But I cannot tell whether this is right interpretation of the pdfmark Reference Manual and whether Distiller's current implementation is correct.

Providing all the meta data on my own would be possible too but isn't very convenient. Maybe the convenient implementation of /Metadata pdfmark could be enabled by some command line switch or PostScript fragment if emulation of Distiller's behavior has highest priority.
Comment 4 Ken Sharp 2015-12-22 08:54:44 UTC
(In reply to Reinhard Nißl from comment #3)

> Regarding "pdfmark Reference Manual", section "Add Metadata to the Catalog
> (Metadata)", sentence "Otherwise, the metadata associated with the stream
> XMPStreamName is added to the Catalog object with the key Metadata.":
> 
> I think the word "added" is meant that way, that /Metadata pdfmark may be
> repeated multiple times and Distiller or GhostScript takes care to collect
> all meta data and put it into the {Catalog} or whichever other object it is
> meant for.

That's one way to interpret it, but I disagree with that interpretation. The spec says 'is added to the Catalog object with the key Metadata' it does not say its is concatenated with any existing value associated with that key.

PostScript (pdfmark is a PostScript operator) does not concatenate values in dictionaries (the catalog is a dictionary). If you add a value to a dictionary with a key which is already present in the dictionary then the old value associated with the key is replaced with the new value.

Other pdfmark operations do not concatenate in this circumstance either, so this would be an exception to the general rule. Since it is not clearly stated as an exception I am inclined to believe it should follow the usual implementation. In which case the Metadata in the Catalog object would be completely replaced.


> At least an implementation like that for /Metadata pdfmark would be very
> convenient. But I cannot tell whether this is right interpretation of the
> pdfmark Reference Manual and whether Distiller's current implementation is
> correct.

Nor can I, and since I cannot (so far) get Distiller to honour this pdfmark I cannot tell what Adobe may have intended.


> Providing all the meta data on my own would be possible too but isn't very
> convenient.

I'm not certain it is entirely possible.


> Maybe the convenient implementation of /Metadata pdfmark could
> be enabled by some command line switch or PostScript fragment if emulation
> of Distiller's behavior has highest priority.

We regard the behaviour of Adobe implementations as the reference in all such circumstances. Which means that right now, as far as I can tell, we are completely conforming, since we too ignore the pdfmark.
Comment 5 Ken Sharp 2015-12-22 09:08:21 UTC
As I rather suspected the only example I could find was not compliant (so much for that creator....)

The Metadata sample you supplied isn't sufficient since you didn't include the schema. In addition to the metadata you have:

<rdf:Description rdf:about="" xmlns:zf="urn:ferd:pdfa:invoice:rc#">
    <zf:DocumentType>INVOICE</zf:DocumentType>
    <zf:DocumentFileName>ZUGFeRD-invoice.xml</zf:DocumentFileName>
    <zf:Version>RC</zf:Version>
    <zf:ConformanceLevel>BASIC</zf:ConformanceLevel>
</rdf:Description>	

you also need:

<rdf:Description rdf:about=""
      xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
      xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#"
      xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#">
  <pdfaExtension:schemas>
    <rdf:Bag>
      <rdf:li rdf:parseType="Resource">
         <pdfaSchema:schema>ZUGFeRD PDFA Extension Schema</pdfaSchema:schema>
         <pdfaSchema:namespaceURI>urn:ferd:pdfa:invoice:rc#</pdfaSchema:namespaceURI>
         <pdfaSchema:prefix>zf</pdfaSchema:prefix>
         <pdfaSchema:property>
           <rdf:Seq>
             <rdf:li rdf:parseType="Resource">
               <pdfaProperty:name>DocumentFileName</pdfaProperty:name>
               <pdfaProperty:valueType>Text</pdfaProperty:valueType>
               <pdfaProperty:category>external</pdfaProperty:category>
               <pdfaProperty:description>name of the embedded XML invoice file</pdfaProperty:description>
             </rdf:li>
             <rdf:li rdf:parseType="Resource">
               <pdfaProperty:name>DocumentType</pdfaProperty:name>
               <pdfaProperty:valueType>Text</pdfaProperty:valueType>
               <pdfaProperty:category>external</pdfaProperty:category>
               <pdfaProperty:description>INVOICE</pdfaProperty:description>
             </rdf:li>
             <rdf:li rdf:parseType="Resource">
               <pdfaProperty:name>Version</pdfaProperty:name>
               <pdfaProperty:valueType>Text</pdfaProperty:valueType>
               <pdfaProperty:category>external</pdfaProperty:category>
               <pdfaProperty:description>The actual version of the ZUGFeRD data</pdfaProperty:description>
             </rdf:li>
             <rdf:li rdf:parseType="Resource">
               <pdfaProperty:name>ConformanceLevel</pdfaProperty:name>
               <pdfaProperty:valueType>Text</pdfaProperty:valueType>
               <pdfaProperty:category>external</pdfaProperty:category>
               <pdfaProperty:description>The conformance level of the ZUGFeRD data</pdfaProperty:description>
             </rdf:li>
           </rdf:Seq>
         </pdfaSchema:property>
      </rdf:li>
    </rdf:Bag>
  </pdfaExtension:schemas>
</rdf:Description>

Which is the actual schema description.

I will try again to persuade Distiller to use the pdfmark.
Comment 6 Reinhard Nißl 2015-12-22 09:49:14 UTC
Created attachment 12201 [details]
ZUGFeRD implementation guide (schema on pages 13 and 14)

Ken, I must admit, that I'm doing my first steps regarding PDF schema metadata while trying to support ZUGFeRD. So thanks a lot for your explanations on that issue.

With you explanation, I now know what the appendix on pages 13 and 14 of the attached document is good for.

Thanks for your support so far.
Comment 7 Ken Sharp 2015-12-24 09:26:31 UTC
Created attachment 12209 [details]
minimal test files

I have added support for the Metadata pdfmark in commit d4056b5dab63199d86c8fb140807b9b307a427c0

This works the way I believe it should and completely replaces the Metadata produced by the pdfwrite device, or any previous Metadata pdfmark execution. I never was able to persuade Distiller to do anything useful with this pdfmark, but this is how other ones work.

I have also implemented a new, non-standard, pdfmark called "Ext_Metadata" which takes a string parameter. The data in the string will be injected into the XMP produced by pdfwrite (provided there is no Metadata pdfmark as well!)

The attached files demonstrate this feature.

This *should* allow the possibility to create ZugFERD invoices by creating a stream to hold the XML representation (using the EMBED pdfmark), and injecting the extension schema and variable XML data into the Metadata referenced from the Catalog by using the EXT_Metadata pdfmark.

Please note that this feature is essentially untested (I don't have a validation tool for these PDF files) and may require more work.

The same commit fixes a bug I accidentally introduced when permitting PDF/A-3 output, which would prevent the EMBED pdfmark working unless the output file was produced to be at least PDF/A-2 compliant (oops)
Comment 8 Reinhard Nißl 2015-12-27 12:07:04 UTC
Created attachment 12210 [details]
ZUGFeRD meta data sample (updated)

I've updated the attache file to make use of /Ext_Metadata pdfmark and added the missing schema definitions for ZUGFeRD.
Comment 9 Reinhard Nißl 2015-12-27 12:12:07 UTC
Created attachment 12211 [details]
The first validated ZUGFeRD PDF document created by GhostScript

This attachment validates now (with the same minor issues) as attachment 12200 [details] "A conforming ZUGFeRD PDF file sample" does.

Thanks for your support Ken, and a Happy New Year ;-)
Comment 10 Ken Sharp 2015-12-28 04:06:35 UTC
(In reply to Reinhard Nißl from comment #9)
> Created attachment 12211 [details]
> The first validated ZUGFeRD PDF document created by GhostScript
> 
> This attachment validates now (with the same minor issues) as attachment
> 12200 [details] "A conforming ZUGFeRD PDF file sample" does.
> 
> Thanks for your support Ken, and a Happy New Year ;-)

Thanks for testing it out for me, its good to know it actually works! If you find any problems feel free to open a new report.
Comment 11 John 2017-09-12 01:32:57 UTC
Hello,

I'm working on the same idea and would like to create a PDF document with embedded ZUGFeRD invoice. So far I used the comments in this issue to achieve the added /EXT_Metadata. But I'm not able to add the embedded xml file. I'm testing it on my computer and using your attached files like this:
-------------------------------------------------
/XmlFileName	(ZUGFeRD-invoice-1.xml)				def
/XmlFileDesc	(Rechnungsdaten im ZUGFeRD-XML-Format)		def
/XmlFileDate	(D:20130121081433+01'00')			def
/XmlFileData	(Documents/git/vorpal_test/ZUGFeRD-invoice-1.xml) (r) file		def
-------------------------------------------------

and finally
-------------------------------------------------

[
  	/_objdef 	{ContentStream}
  	/type 		/stream
  /OBJ pdfmark

  [
  	{ContentStream}	<<
  			/Type		/EmbeddedFile
  			/Subtype	(text/xml) cvn
  			/Params		<<
  					/ModDate	XmlFileDate
  					>>
  			>>
  /PUT pdfmark

  [
  	{ContentStream}	XmlFileData
  /PUT pdfmark

  [
  	{ContentStream}
  /CLOSE pdfmark
-------------------------------------------------
-------------------------------------------------

Sorry for the code fragments but i cannot attach files in this comment.
EX_METADATA is shown perfectly but no output for the embedded xml invoice. Could you be so kind and point me in the right direction?

Regards

John
Comment 12 Reinhard Nißl 2017-09-12 01:51:53 UTC
Hello John,

I don't know what you are going to show with your code fragment.

If this is all you feed to GhostScript, then it won't work because the PDF file specification and further dictionary entries are missing (see PDF specs for more details about attaching files).

Just use the complete embed_xml.ps and it should work.

Bye.
Comment 13 John 2017-09-12 01:57:10 UTC
Hello Reinhard,

I am using your provided file but made some changes. Because PDF/A-3 needs a ICC Profile I added this. I hope I'm not missing something here. I'm very new to PostScript and Ghostscript. The complete file looks like this:

--------------------------------
%!
% This is a sample prefix file for creating a PDF/A document.
% Feel free to modify entries marked with "Customize".
% This assumes an ICC profile to reside in the file (ISO Coated sb.icc),
% unless the user modifies the corresponding line below.

% istring SimpleUTF16BE ostring
/SimpleUTF16BE
{
	dup length
	1 add
	2 mul
	string

	% istring ostring
	dup 0 16#FE put
	dup 1 16#FF put
	2
	3 -1 roll

	% ostring index istring
	{
		% ostring index ichar
		3 1 roll
		% ichar ostring index
		2 copy 16#00 put
		1 add
		2 copy
		5 -1 roll
		% ostring index ostring index ichar
		put
		1 add
		% ostring index
	}
	forall

	% ostring index
	pop
}
bind def

%Title

[ /Title (out-a.pdf)       % Customise
  /DOCINFO pdfmark

% Define entries in the document Info dictionary :
/ICCProfile (/Users/Empire/Documents/gits/vorpal_test/sRGB_v4_ICC_preference_displayclass.icc) % Customise
def

% Define an ICC profile :

[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA}
<<
  /N currentpagedevice /ProcessColorModel known {
    currentpagedevice /ProcessColorModel get dup /DeviceGray eq
    {pop 1} {
      /DeviceRGB eq
      {3}{4} ifelse
    } ifelse
  } {
    (ERROR, unable to determine ProcessColorModel) == flush
  } ifelse
>> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark

% Define the output intent dictionary :

[/_objdef {OutputIntent_PDFA} /type /dict /OBJ pdfmark
[{OutputIntent_PDFA} <<
  /Type /OutputIntent             % Must be so (the standard requires).
  /S /GTS_PDFA1                   % Must be so (the standard requires).
  /DestOutputProfile {icc_PDFA}            % Must be so (see above).
  /OutputConditionIdentifier (sRGB)      % Customize
>> /PUT pdfmark
[{Catalog} <</OutputIntents [ {OutputIntent_PDFA} ]>> /PUT pdfmark

/XmlFileName	(ZUGFeRD-invoice-1.xml)				def
/XmlFileDesc	(Rechnungsdaten im ZUGFeRD-XML-Format)		def
/XmlFileDate	(D:20130121081433+01'00')			def
/XmlFileData	(Documents/git/vorpal_test/ZUGFeRD-invoice-1.xml) (r) file		def



  % Object {ContentStream} anlegen und bef�llen
  [
  	/_objdef 	{ContentStream}
  	/type 		/stream
  /OBJ pdfmark

  [
  	{ContentStream}	<<
  			/Type		/EmbeddedFile
  			/Subtype	(text/xml) cvn
  			/Params		<<
  					/ModDate	XmlFileDate
  					>>
  			>>
  /PUT pdfmark

  [
  	{ContentStream}	XmlFileData
  /PUT pdfmark

  [
  	{ContentStream}
  /CLOSE pdfmark

  % Object {FSDict} f�r File Specification anlegen und bef�llen
  [
  	/_objdef	{FSDict}
  	/type		/dict
  /OBJ pdfmark

  [
  	{FSDict}	<<
  			/Type 		/FileSpec
  			/F 		XmlFileName
  			/UF 		XmlFileName SimpleUTF16BE
  			/Desc		XmlFileDesc
  			/AFRelationship	/Alternative
  			/EF 		<<
  					/F		{ContentStream}
  					/UF		{ContentStream}
  					>>
  			>>
  /PUT pdfmark

  % Object {AFArray} f�r Associated Files anlegen und bef�llen
  [
  	/_objdef	{AFArray}
  	/type		/array
  /OBJ pdfmark

  [
  	{AFArray}	{FSDict}
  /APPEND pdfmark



  % Associated Files im Object {Catalog} eintragen
  [
  	{Catalog}	<<
  			/AF	{AFArray}
  			>>
  /PUT pdfmark



  % File Specification unter Names/Embedded Files/Names im Object {Catalog} eintragen
  [
  	/Name		XmlFileName
  	/FS		{FSDict}
  /EMBED pdfmark

  % Metadata im Object {Catalog} eintragen
  [
  	/XML		(

      <!-- XMP extension schema container for the zugferd schema -->
      <rdf:Description rdf:about=""
  	xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
  	xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#"
  	xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#">

  	<!-- Container for all embedded extension schema descriptions -->
  	<pdfaExtension:schemas>
  	    <rdf:Bag>
  		<rdf:li rdf:parseType="Resource">
  		    <!-- Optional description of schema -->
  		    <pdfaSchema:schema>ZUGFeRD PDFA Extension Schema</pdfaSchema:schema>
  		    <!-- Schema namespace URI -->
  		    <pdfaSchema:namespaceURI>urn:ferd:pdfa:invoice:rc#</pdfaSchema:namespaceURI>
  		    <!-- Preferred schema namespace prefix -->
  		    <pdfaSchema:prefix>zf</pdfaSchema:prefix>

  		    <!-- Description of schema properties -->
  		    <pdfaSchema:property>
  			<rdf:Seq>!
  			    <rdf:li rdf:parseType="Resource">
  				<!-- DocumentFileName: Name of the embedded file;
  				    must be equal with the value of the /F tag in the /EF
  				    structure -->
  				<pdfaProperty:name>ZUGFeRD-invoice-1.xml</pdfaProperty:name>
  				<pdfaProperty:valueType>Text</pdfaProperty:valueType>
  				<pdfaProperty:category>external</pdfaProperty:category>
  				<pdfaProperty:description>name of the embedded xml invoice file</pdfaProperty:description>
  			    </rdf:li>
  			    <rdf:li rdf:parseType="Resource">
  				<!-- DocumentType: INVOICE -->
  				<pdfaProperty:name>DocumentType</pdfaProperty:name>
  				<pdfaProperty:valueType>Text</pdfaProperty:valueType>
  				<pdfaProperty:category>external</pdfaProperty:category>
  				<pdfaProperty:description>INVOICE</pdfaProperty:description>
  			    </rdf:li>
  			    <rdf:li rdf:parseType="Resource">
  				<!-- Version: The actual version of the
  				    ZUGFeRD standard -->
  				<pdfaProperty:name>Version</pdfaProperty:name>
  				<pdfaProperty:valueType>Text</pdfaProperty:valueType>
  				<pdfaProperty:category>external</pdfaProperty:category>
  				<pdfaProperty:description>The actual version of the ZUGFeRD data</pdfaProperty:description>
  			    </rdf:li>
  			    <rdf:li rdf:parseType="Resource">
  				<!-- ConformanceLevel: The actual conformance
  					level of the ZUGFeRD standard,
  					e.g. BASIC, COMFORT, EXTENDED -->
  				<pdfaProperty:name>ConformanceLevel</pdfaProperty:name>
  				<pdfaProperty:valueType>Text</pdfaProperty:valueType>
  				<pdfaProperty:category>external</pdfaProperty:category>
  				<pdfaProperty:description>The conformance level of the ZUGFeRD data</pdfaProperty:description>
  			    </rdf:li>
  			</rdf:Seq>
  		    </pdfaSchema:property>
  		</rdf:li>
  	    </rdf:Bag>
  	</pdfaExtension:schemas>
      </rdf:Description>

      <rdf:Description rdf:about="" xmlns:zf="urn:ferd:pdfa:invoice:rc#">
  	<zf:DocumentType>INVOICE</zf:DocumentType>
  	<zf:DocumentFileName>ZUGFeRD-invoice-1.xml</zf:DocumentFileName>
  	<zf:Version>RC</zf:Version>
  	<zf:ConformanceLevel>BASIC</zf:ConformanceLevel>
      </rdf:Description>

  			)
  /Ext_Metadata pdfmark

 

------------------------------------------------------
Comment 14 John 2017-09-12 02:09:29 UTC
Hello again, 

some additional info:
I think the XML is actually added but not as XML somehow.
The part of the embedded file in the out-a.pdf looks like this:

/Subtype /text#2fxml/Length 3292>>stream
x��[�n�����XZ �%ˎ����L�q&��L�w�D�ld)��q��>F���g؛�����sHI����tf�n��D�!��w~Hj���5~xy�3!���e�%�|'p�?8(�tO�wK?�{������������#i~�%̸5HŴl�f�Pz�2���c�C���A���Ӿ��AH�0t����g��E#�K*AⲐ�xsvʮ[�E%���Q�q$\F|�I�����'��'���36��1`~0��D`)=�'��`r ��w9P�6����!Б�"����E�	Q��c.
C₠|�Jv��"@E�Հ)j�����.��gC���`��]��©
����/��NF��!gB*f�u��ޡ�|A]��|R
���ԓ�n���3Q�9��q:~A�N��|=��D9��E&��EX ��ȉ�

...and so on.....
---------

I'm using your provided xml invoice for testing and also tested my own. All of them show these errors.

Regards

John
Comment 15 John 2017-09-12 02:11:38 UTC
I'm sorry, this seems to be not part of my issue and not part of the embedded file.

(In reply to John from comment #14)
> Hello again, 
> 
> some additional info:
> I think the XML is actually added but not as XML somehow.
> The part of the embedded file in the out-a.pdf looks like this:
> 
> /Subtype /text#2fxml/Length 3292>>stream
> x��[�n�����XZ
> �%ˎ����L�q&��L�w�D�ld)��q��>F���g؛�����sHI����tf�n��D�!��w~Hj���5~xy�3!
> ���e�%�|'p�?8(�tO�wK?�{������������#i~�%̸5HŴl�f�Pz�2���c�C���A���Ӿ��AH�0t����
> g��E#�K*AⲐ�xsvʮ[�E%���Q�q$\F|�I�����'��'���36��1`~0��D`)=�'��`r
> ��w9P�6����!Б�"����E�	Q��c.
> C₠|�Jv��"@E�Հ)j�����.��gC���`��]��©
> ����/��NF��!gB*f�u��ޡ�|A]��|R
> ���ԓ�n���3Q�9��q:~A�N��|=��D9��E&��EX ��ȉ�
> 
> ...and so on.....
> ---------
> 
> I'm using your provided xml invoice for testing and also tested my own. All
> of them show these errors.
> 
> Regards
> 
> John
Comment 16 John 2017-09-14 02:29:14 UTC
Hello again,

I checked again and now it works. It seems I used the wrong XML file. I never doubted it, but it was truly my fault ;)
Thanks for your work and your support.

Regards

John