697684 – Regression - Ghostscript 9.21 fails when pdfmark contains certain Unicode characters

Bug 697684 - Regression - Ghostscript 9.21 fails when pdfmark contains certain Unicode characters

Summary: Regression - Ghostscript 9.21 fails when pdfmark contains certain Unicode cha...

Status:	RESOLVED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Writer (show other bugs)
Version:	master
Hardware:	Macintosh MacOS X

Importance:	P4 normal
Assignee:	Chris Liddell (chrisl)

URL:
Keywords:

Depends on:
Blocks:

Reported:	2017-03-24 14:19 UTC by James R Barlow
Modified:	2017-03-28 00:09 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
pdfa_def.ps (1.41 KB, application/postscript) 2017-03-24 14:19 UTC, James R Barlow	Details
Working pdfa.ps (1.40 KB, application/postscript) 2017-03-24 14:37 UTC, James R Barlow	Details
Test PDF file (101.42 KB, application/pdf) 2017-03-24 14:38 UTC, James R Barlow	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description James R Barlow 2017-03-24 14:19:15 UTC

Created attachment 13489 [details]
pdfa_def.ps

I attempted to create a PDF with

Comment 1 James R Barlow 2017-03-24 14:20:51 UTC

Apologies for incomplete comment above.

I attempted to create a PDF/A with pdfa_def.ps.

/Author <feff5b545b50>

Comment 2 James R Barlow 2017-03-24 14:36:53 UTC

Apologies for the second incomplete comment. I don't get along with Bugzilla.

It seems that gs 9.21 does not tolerate multibyte characters or certain Unicode characters in the document info pdfmark, such as when creating a PDF/A.

When invoked under such conditions, pdfwrite will exit with an error such as:

    GPL Ghostscript 9.21: ERROR: VMerror (-25) on closing pdfwrite device.

Or in other cases:

Error: /undefinedfilename in --file--
Operand stack:
   --nostringval--   --nostringval--   (srgb.icc)   (r)
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1983   1   3   %oparray_pop   1982   1   3   %oparray_pop   1966   1   3   %oparray_pop   1852   1   3   %oparray_pop   --nostringval--   %errorexec_pop   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--
Dictionary stack:
   --dict:1208/1684(ro)(G)--   --dict:1/20(G)--   --dict:79/200(L)--
Current allocation mode is local
Last OS error: No such file or directory
Current file position is 793
GPL Ghostscript 9.21: Unrecoverable error, exit code 1

This seems to be a regression. This test comes from a test suite the passed for gs 9.20 and several earlier versions.

Full command line that demonstrates failure:

gs -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sProcessColorModel=DeviceRGB -sColorConversionStrategy=/RGB -dPDFA=2 -dPDFACompatibilityPolicy=1 -o _gs.pdf ccitt.pdf pdfa_def.ps

Full command line that works, the only difference being removal of multibyte characters from pdfmark:

gs -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sProcessColorModel=DeviceRGB -sColorConversionStrategy=/RGB -dPDFA=2 -dPDFACompatibilityPolicy=1 -o _gs.pdf ccitt.pdf pdfa_def_good.ps

Comment 3 James R Barlow 2017-03-24 14:37:24 UTC

Created attachment 13490 [details]
Working pdfa.ps

Comment 4 James R Barlow 2017-03-24 14:38:47 UTC

Created attachment 13491 [details]
Test PDF file

The original image used in this file was released under a free license (Creative Commons BY-SA 3.0).

Comment 5 Ken Sharp 2017-03-25 04:36:12 UTC

The undefinedfilename is caused by the fact that you are using the same name as the default we supply.

It seems we are opening the file with a .libfile (or something similar) which means that it searches the paths, apparently (at least sometimes) before searching the current working directory. If it hits the pdf_def.ps in ghostpdl/lib then it runs that one. Which is why you get an undefined filename on (srgb.icc), which is not the path and spec of the ICC prfile from your pdfa_def.ps

If you either change the name of the file, or use a fully qualified path for pdfa_def.ps, then I believe that problem will not exhibit. I'm going to pass that particular part of the puzzle to a colleague.

I'll look into the problem surrounding the VMerror separately, since this fails to produce a PDF file. For what its worth, both of these problems only seems to happen with the release branch, I can't reproduce either with the current HEAD on master.

Comment 6 Ray Johnston 2017-03-25 12:28:00 UTC

Note that there is a switch defined in:
  https://ghostscript.com/doc/current/Use.htm#Finding_files

1. The current directory if enabled by the -P switch

Adding this option to your command line, e.g.,

gs -P -dQUIET ...

should allow the file named pdfa.def from the current working directory to
be used rather than on from one of the LIBPATH paths.

Comment 7 James R Barlow 2017-03-25 20:40:19 UTC

Thanks Ray for the clarification about -P.  I introduced this unrelated issue while trying to create a test case for this report.  The test suite that picked up the regression uses absolute paths for all inputs to gs so it avoids the "undefinedfilename" issue.  That would also be why I observed different error output.  

When the absolute path is specified, the only error message I get is:

GPL Ghostscript 9.21: ERROR: VMerror (-25) on closing pdfwrite device.

Comment 8 Ken Sharp 2017-03-27 01:46:08 UTC

The problem is actually nothing to do with Unicode, and it also isn't a regression. The length of the Unicode strings is important, in order to set up a particular memory configuration, but the fact that they are Unicode is not relevant. You could also trigger exactly the same behaviour in previous versions, but getting the memory configuration just right is 'difficult'.

This is also why I couldn't reproduce the behaviour with HEAD, the memory layout is slightly different because the binary has changed. (I'm a little surprised, though relieved, that it was possible to reproduce it with a debug build)

The problem is actually caused by specifying empty strings for some metadata. When writing the XMP metadata we use gs_alloc_bytes() to allocate a string buffer, if the length of the buffer to be allocated is 0 bytes then the function can, sometimes, return a NULL pointer. Since we use a NULL pointer to indicate that the memory could not be allocated, this causes a VM error.

In commit c06135acc959dbc0458352579bafe238794f2733 we now take an early exit when presented with a 0 length string, rather than trying to allocate and fill a buffer for the metadata. This prevents the possibility of a VM error. This does not address any other places in the code which might suffer from the same problem (not testing the allocation length, and relying on a non-NULL return).

Re-assigning this to Chris to look at gs_alloc_bytes() to try and figure out why it might return a NULL pointer, and ideally prevent it.

NB, we won't be changing the behaviour of the path searching, so its best not to use a filename which matches one of the resource files, or use a fully qualified pathspec. As Ray has mentioned, if you can't do either of these, then use -P (see the documentation in ghopstpdl/doc/Use.htm, section 8, How Ghostscript finds files).

Comment 9 Chris Liddell (chrisl) 2017-03-27 02:54:17 UTC

Neither Ken nor I can now reproduce the issue with the memory manager, and with visual inspection of code, it doesn't look like there's a way for it to return NULL with a zero length allocation.

So closing

Comment 10 James R Barlow 2017-03-27 15:36:35 UTC

Do you have any suggestions to mitigate this problem in 9.21?

I have tried both removing empty keys and setting them to a single space, but I can still find examples of strings that will trigger this error.

e.g.

[ /Author (Just Author)
  /DOCINFO pdfmark

and:

[ /Author <feff0020>
  /Title <feff0020>
  /Subject <feff0020>
  /Keywords <feff0020>
  /Creator (Just Creator)
  /DOCINFO pdfmark

Perhaps the XMP metadata contains some empty strings by default or the set of five values I have there does not include all values handed over to XMP?

I threw together a crude test and it seems about 1% of short random non-empty strings can produce the error.  

In addition, a different error always seems to occur for valid Unicode characters above U+FFFF, possibly a distinct problem ("ERROR: rangecheck (-15) on closing pdfwrite device").

Comment 11 Ken Sharp 2017-03-28 00:09:35 UTC

(In reply to James R Barlow from comment #10)
> Do you have any suggestions to mitigate this problem in 9.21?

As I explained, this is not limited to 9.21, its present in every version of Ghostscript's pdfwrite device which handles Unicode strings in a pdfmark.

The solution is to rebuild Ghostscript from the current source.


> In addition, a different error always seems to occur for valid Unicode
> characters above U+FFFF, possibly a distinct problem ("ERROR: rangecheck
> (-15) on closing pdfwrite device").

Unicode strings in PDF must be UTF-16BE, no other form is permitted.