Bug 691209 - PS->PDF->PNG results in lines PS->PNG does not
Summary: PS->PDF->PNG results in lines PS->PNG does not
Status: NOTIFIED WONTFIX
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 0.00
Hardware: PC All
: P1 normal
Assignee: Marcos H. Woehrmann
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-03-24 04:04 UTC by Marcos H. Woehrmann
Modified: 2012-04-16 19:16 UTC (History)
5 users (show)

See Also:
Customer: 170
Word Size: ---


Attachments
idiom to remove black rectangles behind images (986 bytes, text/plain)
2010-03-25 11:30 UTC, Ken Sharp
Details
An PostScript file containing an image composed of horizontal subimage strips, with black rectangles behind (272.04 KB, application/postscript)
2010-04-19 00:43 UTC, Cameron Stone
Details
A new idiom to solve the black fills (3.26 KB, application/octet-stream)
2010-04-19 16:07 UTC, Ken Sharp
Details
tiff.ps (468.67 KB, application/postscript)
2011-06-01 10:17 UTC, artifex
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2010-03-24 04:04:42 UTC
The customer reports:

We are experiencing rendering issues when we use ghostscript to convert certain pdfs to image formats (png and tiff). The problem manifests as horizontal lines in the images, as can be seen in the from_pdf.png file attached. The problem seems to be triggered when converting a pdf file composed only of image XObjects, such as the attached source.pdf file. This pdf file was created (using ghostscript) from the attached postscript file, source.ps.
 
This problem is occurring in at least versions 8.64 and 8.71 of ghostscript running on Windows XP.
 
As the output resolution is changed, different lines appear and disappear. These lines are mostly horizontal, but vertical lines occasionally did appear on the right hand side of the page, within an inch of the page edge. Increasing the resolution to a large enough number makes all the lines go away, but unfortunately this also slows down the process too much for our needs.
 
It appears that the lines are appearing at the boundaries of the embedded images (XObjects). This was confirmed by modifying the pdf file to remove one of the XObjects and noting that the edge of the removed object coincided with two of the lines that sometimes appeared. The other lines appeared at regular intervals down the page.
 
The following command lines were used.
Postscript to PDF:
gswin32c.exe -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -dCompatibilityLevel=1.4 -dNOPAUSE -dBATCH -sOutputFile=OUTPUT.PDF INPUT.PDF
 
PDF to PNG:
gswin32c.exe -dBATCH -dNOPAUSE -r48 -dPrinted=false -sDEVICE=png16 -sOutputFile=from_pdf.png source.pdf
 
Strangely, creating the png file directly from the postscript file does not result in the problem. (See attached file, from_ps.png).
gswin32c.exe -dBATCH -dNOPAUSE -r48 -dPrinted=false -sDEVICE=png16 -sOutputFile=from_ps.png source.ps
 
This is a high priority case as it was raised by a client some months ago and we have been unable to fix or workaround this ourselves.
Comment 2 Ken Sharp 2010-03-24 16:05:37 UTC
I think this *may* be a difference between the rules for filling and the rules for images, and is not a pdfwrite problem but a rendering issue. As such I'm re-assigning it to Alex, if Alex thinks it really is my problem he can assign it back.

The original PostScript file contains a number of images comprising horizontal strips of a scanned page. These are laid out so that the last scan line of one image is immediately above the first scan line of the next. 

This in itself can cause problems; scaling and positioning of images can lead to small gaps between consecutive images when scan-converted to device space.

Now, before the images are drawn, the job first constructs a path which is the size of the final image, less half a pixel *at the current resolution* in each direction, and fills it with black (I have no idea why).

When the PostScript file is run, the half pixel is evaluated at the current resolution, because PostScript is a programmable language. However PDF is not, so the difference between the path and the image is 'baked in' at the time the file is created.

When the PDF file is run, it looks to me as if the scan conversion rules can cause the filled path to be one pixel 'large' than the image, depending on exactly where the co-ordinates in user space end up being mapped to pixels in device space.

I note that using Acrobat Distiller to produce the PDF file creates a PDF with similar co-ordinate differences for the fill and image. However Acrobat ignored the requested resolution in the PostScript file, and uses a different CTM to Ghostscript, which results in different numbers. However, its probably possible (if not easy) to find a way to run the Acrobat PDF file in GS and get a similar result.


Given that the black rectangles appear to be placed by the PostScript driver there is probably no way to avoid their creation. I would suggest that the customer constructs the scanned image as a single image instead of multiple strips, which would avoid the problem.
Comment 3 Ray Johnston 2010-03-24 17:24:17 UTC
We have seen this bogus method (fill with black, then paint a masked image with white) several times before. These rendered differently with Acrobat 7 to Acrobat 8 or 9 (when saving to TIFF at various resolutions), so we closed the issue then. 

There are closed bugs from customer 531 and 700 if you want to look for the other examples.

I suggest that we start by seeing what Adobe Acrobat 9 does.
Comment 4 Alex Cherepanov 2010-03-25 03:32:37 UTC
The run-time behavior of the file can be modified by the idiom
recognition. One can draw an empty path instead of the rectangle
and replace the image mask with an image operation.

Another approach is to recognize this sequence of stripes
in the PDF generator and merge the stripes.

One can even preprocess the PS file using a script.
Comment 5 Ken Sharp 2010-03-25 08:09:08 UTC
Idiom recognition occurred to me last night, I plan to have a play this morning and see if it helps. Merging the images is possible, and is already on the enhancement list for pdfwrite, but even Adobe Distiller, which already merges images, does not do that with this file.
Comment 6 Ken Sharp 2010-03-25 11:30:29 UTC
Created attachment 6125 [details]
idiom to remove black rectangles behind images

Attached file, when stored in gs/Resource/IdiomSet as 'Pscript5Idiom', results in a PDF file without the black rectangles behind the images. This PDF file does not then produce black lines depending on the resolution it is rendered at.

This should be used with caution, while I haven't found any files for which this causes a problem the idiom does remove a path calculation (obviously since the aim is to remove this rectangle), any jobs which rely on this will not produce the expected output.
Comment 7 Ray Johnston 2010-03-25 16:26:26 UTC
Assigning to Marcos to give the Idiom to the customer for testing.

This is slightly risky in that it may result in different appearance on
some other files, but it may be all that we can do.
Comment 8 Cameron Stone 2010-04-16 07:56:35 UTC
That idiom does not work for our test documents. The images do not appear at all when it is installed. Removing it from the IdiomSet directory restores the old behaviour. I investigated by hacking the PostScript file so the black (irp) rectangles only covered the left half the width of the image, and only that half was visible when viewing that PostScript file. It seems the black rectangles are required for the images to render correctly.
Comment 9 Alex Cherepanov 2010-04-17 03:06:55 UTC
Please attach a sample file that doesn't work with the
Ken's idiom.

One can try to draw the black rectangle as an image in hope that
both images get mapped to the same pixels regardless to the resolution.
Comment 10 Cameron Stone 2010-04-19 00:36:41 UTC
(In reply to comment #9)
> Please attach a sample file that doesn't work with the
> Ken's idiom.

Sorry, I sent one with the original bug report. Here it is again.
Comment 11 Cameron Stone 2010-04-19 00:43:57 UTC
Created attachment 6166 [details]
An PostScript file containing an image composed of horizontal subimage strips, with black rectangles behind

This file, when converted to pdf, then to an image (both with ghostscript) acquires horizontal black lines. 

When the Pscript5Idiom is used, the page renders as blank.

I've similated the effects of the idiom directly in this file (by removing the black rectangles), and the postscript file then appears blank in GSView.
Comment 12 Ken Sharp 2010-04-19 16:07:53 UTC
Created attachment 6167 [details]
A new idiom to solve the black fills

It seems the point of the black fill is now explained. The images are in fact imagemasks, so only the 'set' pixels mark the page, and these use the current colour to do so.

What the code does is draw a black fill, then set the areas it doesn't want to be black by writing white pixels on top of it. I presume this is faster on some Adobe implementation than simply writing the black pixels.

The attached idiom is rather more extensive than the previous one. If we detect a white imagemask being drawn we now set a black colour instead, and invert the sense of the mask so that instead of drawing the white pixels we draw the black pixels. If we detect a non-white imagemask tehn we need to restore teh rectangle we ignored earlier and also draw the imagemask normally.

The idiom replacements for /scol and /Y must be used along with the idiom replacement for /irp.

The reason this is now so extensive is because I did detect a problem running a regression test with this code when the imagemask wasn't white (and the background therefore wasn't black) and had to extensively recode it.

While this code works for my test suite, and for the supplied example, there are some points to be aware of:

1) I can't guarantee this will work in all cases, the number of test I have for this condition are *very* small.

2) two colour images drawn as a background fill and mask combination where the background is not black will *still* be drawn as a fill and mask combination, which will result in the original problem, only in this case the background colour between the images will be something other than black. In order to deal with this situation I would need to expend quite a lot of time converting the imagemask into a 2 colour image. Its taken me a day to get this one working, it would take several days to implement and test the more general treatment.

3) As before we won't add this to the general Ghostscript idiomset, its just too risky.
Comment 13 Cameron Stone 2010-04-19 23:59:47 UTC
Thanks Ken. This works for the test image in our system.

I, too, am slightly terrified that this will cause weird hidden bugs at other times.
Comment 14 artifex 2011-06-01 10:17:36 UTC
Created attachment 7556 [details]
tiff.ps
Comment 15 artifex 2011-06-01 10:19:18 UTC
We ( customer 870 ) had the same problem. I tested the new idiom and it worked fine. But with the attached file tiff.ps a stackunderflow occures.
Comment 16 Ken Sharp 2011-06-01 14:49:44 UTC
(In reply to comment #15)
> We ( customer 870 ) had the same problem. I tested the new idiom and it worked
> fine. But with the attached file tiff.ps a stackunderflow occures.

This IdiomSet is supplied as-is, and does not form part of Ghostscript. As such we don't really want to do any further development or maintenance on it.

However:

The original Idiom assumes an RGB image, this one is in DeviceGray. So it only supplies one parameter to scol, not three, leading to a stackunderflow while trying to get the remaining parameters.

You can extend the idiom to test the current colour space in the replacement for /scol and retrieve a different number of parameters from the stack depending on the colour space. You would then also need to test the colour space in the replacement for /Y in order to provide the correct number of components for the current space before calling setcolor.

    {pop /IdiomB exch def /IdiomG exch def /IdiomR exch def IdiomR IdiomG IdiomB setcolor} bind

    newpath IdiomR IdiomG IdiomB setcolor