Bug 702386 - inefficient pdf gradient or transparency handling = very long processing time
Summary: inefficient pdf gradient or transparency handling = very long processing time
Status: RESOLVED WONTFIX
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Interpreter (show other bugs)
Version: 9.52
Hardware: PC Windows 7
: P4 normal
Assignee: Default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-10 16:43 UTC by Hakan
Modified: 2020-05-26 08:32 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Hakan 2020-05-10 16:43:05 UTC
Hello,

Using this command line

gswin64c.exe -dSAFER -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -o "output.pdf" "plan06.pdf"

The file in the link takes 194 seconds to process. Way too long for such a small file. Desired target speed is 2 to 3 seconds max. Other PDFs on the same CPU process within a few seconds.

https://www.dropbox.com/s/9oiacsqi63swsrw/plan06.pdf?dl=0

Trying the same file with raster output devices is no different

-sDEVICE=jpeg -r300 or -sDEVICE=pdfimage24 is almost equally slow.

MuPDF renders the file at 300dpi fairly fast

I am not sure but I suspect the large gradient or some overlapping transparency areas are causing a very ineffecient processing loop in GS.

Can something be done to improve 'pdfwrite' performance ?

thank you kindly
Comment 1 Tilman Hausherr 2020-05-13 05:43:07 UTC
866 patterns in that file. PDF.js is slow, Chrome is very fast, PDFBox is slow, Edge doesn't react at all, or I didn' wait long enough.
Comment 2 Ray Johnston 2020-05-23 20:06:28 UTC
mutool info only reports 822 patterns for that file, 818 of which are Tiling
(the others are Shading).

I confirm that on my laptop doing the PDF -> pdfwrite took 400 seconds (debug
build).

I'm not sure why you are using Ghostscript to convert a PDF to a PDF, but I
do note mupdf converts this file to ppm at 155 dpi in about 4 seconds, while
Ghostscript takes 190 sec. With -dNOINTERPOLATE gs takes 134 seconds.

As Tilman comments, the problem is the use of a Pattern colorspace for MANY
fills, but it is not the number of patterns, but two particular images,
object 906 and object 911, that are used in the pattern. Object 906 is a
2362x2362 Interpolated RGB image, ALL pixels with the same value "393E42".
Object 911 is similar, 2362x2362 Interpolated RGB, but the value is "6B675E".

Modifying the file so that those are 1x1 images, pdfwrite completes in 25 sec
(still a debug build) although converting to ppmraw still takes 100 sec.

Obviously, whatever app was used to create this -- possibly iTextSharp 4.1.6
by 1T3XT -- uses images withing patterns that are should be reduced.

Note that the app probably should just paint with a solid color.

Not much we can do about this, but I recommend that if the goal is to do some
kind of conversion from PDF to PDF, use mupdf or mutool (Artifex's other
PDF software that is also AGPL).
Comment 3 Hakan 2020-05-25 10:13:22 UTC
(In reply to Ray Johnston from comment #2)
> mutool info only reports 822 patterns for that file, 818 of which are Tiling
> (the others are Shading).
> 
> I confirm that on my laptop doing the PDF -> pdfwrite took 400 seconds (debug
> build).
> 
> I'm not sure why you are using Ghostscript to convert a PDF to a PDF, 

The reasons I try to convert PDF to PDF is 
1- an attempt to 'santize' the file and make it compatible with Adobe Acrobat Reader DC. Trying to Print the file starts with the famous 'flattening' process and eventually fails. Acrobat Reader DC gives up, which is rare. Acrobat Professional is able to render something to the print driver, but with huge processing time.
The fastest software that can open and render this type of file is PDF-XChange from Tracker Software. (free version)
2- This is just one example file, there is a continued stream of files created with the same producer. Some have unembedded FONTS and I need to keep the PDF in vector mode but embed all missing fonts. GS is able to do that.
3- Optionally the pdfimage24, jpg, tifscaled Ghostscript devices will be needed as well if a raster image is needed and the expected resolution is between 300 to 600. GS cannot produce an output in a reasonable amount of time.

but I
> do note mupdf converts this file to ppm at 155 dpi in about 4 seconds, while

correct, even at 300dpi, mutool produces a valid file within reasonable time

> Ghostscript takes 190 sec. With -dNOINTERPOLATE gs takes 134 seconds.
> 
> As Tilman comments, the problem is the use of a Pattern colorspace for MANY
> fills, but it is not the number of patterns, but two particular images,
> object 906 and object 911, that are used in the pattern. Object 906 is a
> 2362x2362 Interpolated RGB image, ALL pixels with the same value "393E42".
> Object 911 is similar, 2362x2362 Interpolated RGB, but the value is "6B675E".
> 
> Modifying the file so that those are 1x1 images, pdfwrite completes in 25 sec
> (still a debug build) although converting to ppmraw still takes 100 sec.
> 
> Obviously, whatever app was used to create this -- possibly iTextSharp 4.1.6

There has been a mix up when I supplied the file link, I have tried many known PDF to PDF converters including iText to see if I can pre-process the PDF quickly and then feed it into GS. No luck.

The original file has been created by a CAD Software named 'Allplan 2019' the PDF producer is the standard official Adobe PDF Library version 10.1

the original, unmodified file is available for a while in this link

https://www.dropbox.com/s/x79924s4lcf4old/Plan%2006.pdf?dl=0


> by 1T3XT -- uses images withing patterns that are should be reduced.
> 
> Note that the app probably should just paint with a solid color.
> 
> Not much we can do about this, but I recommend that if the goal is to do some
> kind of conversion from PDF to PDF, use mupdf or mutool (Artifex's other
> PDF software that is also AGPL).

I already tried cleaning, rewriting the pdf stream and santizing it with 

mutool clean -gg -c -i -s "plan 06.pdf" "clean.pdf"

the processing is extremely fast, a complete new file with different size and object structure is written too, but the output is still unusable to print from Adobe Acrobat Reader DC. The Primary goal is to be able to print a vector file from Adeobe Reader. I could not find any other tool that can rewrite the PDF in vector format that makes it printable in Acrobat Reader.

I appreciate you looking into this and understand that sometimes there isn't much than can be done. Not every file and be processed & printed with the same method.
Comment 4 Hakan 2020-05-25 10:18:44 UTC
and Tilman's comment is correct too. Chrome PDF Engine is the fastest of all in opening and rendering this file, even at maximum zoom, equivalent to almost 600 dpi rendering, chrome is almost instant with very little CPU and memory use.
Comment 5 Tilman Hausherr 2020-05-25 18:02:45 UTC
I was able to display the file you just linked to with Adobe Reader. Yeah, it's slow. But it finishes. Also printing (only tried print to pdf). The main difference to the other file is the optional content.

(Thanks also to Ray Johnston for the comment about these being the same images, I had missed that one. I found out a flaw in PDFBox, we weren't using the image cache in patterns, after fixing that we're less slow)
Comment 6 Hakan 2020-05-26 08:21:52 UTC
(In reply to Tilman Hausherr from comment #5)
> I was able to display the file you just linked to with Adobe Reader. Yeah,
> it's slow. But it finishes. Also printing (only tried print to pdf). The
> main difference to the other file is the optional content.
> 
> (Thanks also to Ray Johnston for the comment about these being the same
> images, I had missed that one. I found out a flaw in PDFBox, we weren't
> using the image cache in patterns, after fixing that we're less slow)


The display is fine, yes a bit slow, but it renders.
If you want to test, try printing to this Driver; a Canon TX4000 Largeformat Inkjet with Adobe Reader, it crashes.

https://www.canon-europe.com/support/products/imageprograf/imageprograf-tx/imageprograf-tx-4000.html?type=drivers 

Is there a Windows command line binary release of PDFBox that can make a new Vector (or Raster) PDF of this and re-write the streams in a more compatible way ?
Comment 7 Ken Sharp 2020-05-26 08:32:21 UTC
Folks, this bug is closed, if you want to contiue discussing the file, can you please take the discussion elsehere ? THanks.