701753 – "Error reading a content stream" with pdf using Type3 font

Bug 701753 - "Error reading a content stream" with pdf using Type3 font

Summary: "Error reading a content stream" with pdf using Type3 font

Status:	RESOLVED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Interpreter (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P4 normal
Assignee:	Ken Sharp

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-10-19 00:58 UTC by Karl Berry
Modified:	2019-10-19 10:12 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
ch4.pdf, original character (7.16 KB, application/pdf) 2019-10-19 00:58 UTC, Karl Berry	Details
ch4-un.pdf, uncompressed version of pdf (8.52 KB, application/pdf) 2019-10-19 00:59 UTC, Karl Berry	Details
ch4-debug.txt, output of gs -dPDFDEBUG ch4.pdf (8.21 KB, text/plain) 2019-10-19 01:00 UTC, Karl Berry	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Karl Berry 2019-10-19 00:58:30 UTC

Created attachment 18327 [details]
ch4.pdf, original character

gs doesn't like the attached ch4.pdf, which is one colored character in a Type 3 font, reporting:

   **** Error reading a content stream. The page may be incomplete.             
               Output may be incorrect.                                         
                                                                                
   **** Warning: Type 3 glyph has unbalanced q/Q operators (too many q's)       
               Output may be incorrect.                                         
   **** Error: File did not complete the page properly and may be damaged.      
               Output may be incorrect.                                         

The results are the same with gs9.50, 9.26, and other versions I tried. (All on GNU/Linux, compiled from your released sources, although I doubt that matters.)

I am not sure if the problem is with the pdf or with gs, but the pdf can be read by mupdf, processed by qpdf, pdftk, etc., and viewed in xpdf v4 and okular without complaint, so I thought I'd report it here.

The character definition does have nested q ... q ... Q ... Q operators. I could not discern from the PDF standard whether that was allowed or not. I couldn't find anything saying it wasn't. But I think the stream parsing is the real problem, and then the q/Q mismatch is a result of that. I couldn't see any streams that were misdefined.

I'll also attach the result of gs -dPDFDEBUG on the file, and the result of pdftk ch4.pdf output ch4-un.pdf uncompress, which runs without error. Running gs on ch4-un.pdf has the same stream errors as above, but not the q/Q error.

Thanks,
Karl

Comment 1 Karl Berry 2019-10-19 00:59:20 UTC

Created attachment 18328 [details]
ch4-un.pdf, uncompressed version of pdf

Comment 2 Karl Berry 2019-10-19 01:00:11 UTC

Created attachment 18329 [details]
ch4-debug.txt, output of gs -dPDFDEBUG ch4.pdf

Comment 3 Ken Sharp 2019-10-19 10:12:35 UTC

(In reply to Karl Berry from comment #0)

> The results are the same with gs9.50, 9.26, and other versions I tried. (All
> on GNU/Linux, compiled from your released sources, although I doubt that
> matters.)

I'm sure it doesn't, but thanks for providing the information, its always good to know.

Its also useful, and sometimes vital, to knwo the command line you are using for Ghostscript as well.


> I am not sure if the problem is with the pdf or with gs, but the pdf can be
> read by mupdf, processed by qpdf, pdftk, etc., and viewed in xpdf v4 and
> okular without complaint, so I thought I'd report it here.

Again, its defintiely worth reporting, within reason we like to see problem files.

 
> The character definition does have nested q ... q ... Q ... Q operators. I
> could not discern from the PDF standard whether that was allowed or not. I
> couldn't find anything saying it wasn't.

It is allowed, but they are supposed to match, the error means they didn't, but....

If we get halfway through a content stream, but abort due to an error, then carry on with the next content stream its possible that the matching Q for an earlier q is in the portion that was aborted, leading to this error.

Essentially oncce something goes wrong, its possible for other problems to be reported as a result of the earlier error.


> the real problem, and then the q/Q mismatch is a result of that. I couldn't
> see any streams that were misdefined.

Its not that the stream parsing is incorrect per se, its just that the error in the PDF causes us to miss bits out. If you use -dPDFSTOPONERROR so that Ghostscript doesn't attempt to continue after an error then you'll get a better picture, but it also means that we'll stop on files that could reasonably be repaired.


Now, if you open this file with Adobe Acrobat (our reference, in general) what you will get is a blank page.


The problem is, fundamentally that the file is tripping up one of our heuristics to detect badly formed PDF files.

The Type 3 font uses a 1000x1000 design space, so the FontMatrix is:

  /FontMatrix [ 0.00100000005 0 0 0.00100000005 0 0 ]

It then fills the glyph using a Pattern colour space where the Pattern uses a Matrix:

  /Matrix [ 10 0 0 10 0 0 ]

Without going into tedious detail, we take all of this together with the CTM in order to map from from user space to device space (ie the actual pixels on the canvas). The way the maths works out, what this means is that, at low resolution (and you haven't said what resolution you're using) the area covered by each pattern cell becomes less than 1x10-6. We decide this is a degenerate matrix and throw an error.

If I run your file at a high resolution (eg -r1000) then it runs to completion, rendering an empty page, as would be expected because the pattern cells are too small to render even a single pixel.

I've pushed a commit :

618c3867b8edec9d0ea757949c926d4290995ac7

which relaxes the heuristic from 1x10-6 to 1x10-9 for detection of a degenerate matrix, but obviously a still smaller scaling would still trip over this.

You may want to consider how you're manufacturing this PDF file, since it renders substantially differently on different consumers.