Bug 691785 - File rendered with portions of characters missing
Summary: File rendered with portions of characters missing
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: master
Hardware: PC Linux
: P1 normal
Assignee: Robin Watts
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-11-21 22:49 UTC by Marcos H. Woehrmann
Modified: 2011-10-02 02:34 UTC (History)
0 users

See Also:
Customer: 170
Word Size: ---


Attachments
lost_doc.pdf (993.72 KB, application/pdf)
2010-11-21 22:50 UTC, Marcos H. Woehrmann
Details
lost_doc.png (223.63 KB, image/png)
2010-11-21 22:50 UTC, Marcos H. Woehrmann
Details
test_page.pdf (65.56 KB, application/pdf)
2010-11-21 22:51 UTC, Marcos H. Woehrmann
Details
lost_cutdown.pdf (2.29 MB, application/pdf)
2010-11-24 14:08 UTC, Robin Watts
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2010-11-21 22:49:40 UTC
The customer reports and I've verified that the two attached PDF files are rendered by Ghostscript 9.00 and head (r11906) with the first characters of each line plus some other misc. characters partially missing (see lost_doc.png for an example output).

The command line I'm using for testing:

  bin/gs -sDEVICE=ppmraw -r300 -o lost_doc.ppm ./lost_doc.pdf


Note that versions before r11704 cannot process the lost_doc.pdf file at all, the generate an error.

Also, this does not appear to be a JBIG2 decoding issue, running with the Luratech decoder results in an identical output image.
Comment 1 Marcos H. Woehrmann 2010-11-21 22:50:31 UTC
Created attachment 6934 [details]
lost_doc.pdf
Comment 2 Marcos H. Woehrmann 2010-11-21 22:50:50 UTC
Created attachment 6935 [details]
lost_doc.png
Comment 3 Marcos H. Woehrmann 2010-11-21 22:51:07 UTC
Created attachment 6936 [details]
test_page.pdf
Comment 4 Alex Cherepanov 2010-11-22 05:28:10 UTC
The text is created by drawing a masked image.
The text is a mask and the image has a few black rectangles
placed under the lines of text.

Both the image and mask have interpolation flag on.
Ghostscript appears to misplace the image when it renders the masked
image with interpolation. -dNOINTERPOLATE flag can be used to work
around the problem.
Comment 5 Robin Watts 2010-11-24 14:08:59 UTC
Created attachment 6948 [details]
lost_cutdown.pdf

Cutdown PDF file to show the problem. The vast majority of the content in the original file (all the text) is actually not used; the final image is purely a masked image.
Comment 6 Robin Watts 2010-11-25 19:53:04 UTC
In gxclipm.c, in clip_runs_enumerate, the code runs through detecting horizontal runs of pixels.

When it finds one, it does not process it immediately, but keeps it in a one place buffer 'prev'. This enables it to check the next run it finds - if it is directly below the previous one, then it combines the two. If not, it should output the previous one, and keep the current one as the previous one for next time around.

Unfortunately the code got this wrong; if it fails to match, it outputs the current one, and keeps the current one as the previous one. The net effect is that the first run on every line is (usually) lost.

The fix is simple:

Index: gs/base/gxclipm.c
===================================================================
--- gs/base/gxclipm.c   (revision 11916)
+++ gs/base/gxclipm.c   (working copy)
@@ -279,7 +279,7 @@
                prev.q.y = ty + 1;
            else {
                if (prev.q.y > prev.p.y) {
-                   code = (*process)(pccd, tx1, ty, tx, ty + 1);
+                   code = (*process)(pccd, prev.p.x, prev.p.y, prev.q.x, prev.q.y);
                    if (code < 0)
                        return code;
                }
Comment 7 Robin Watts 2010-11-25 21:03:50 UTC
Fixed in commit 11916.