Bug 705534 - Performance drop between 9.55.0 -> 9.56.1
Summary: Performance drop between 9.55.0 -> 9.56.1
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: General (show other bugs)
Version: unspecified
Hardware: PC Windows 10
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-16 06:13 UTC by tr-sc
Modified: 2022-09-06 15:46 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
.txt containing special characters (298.16 KB, text/plain)
2022-06-16 06:13 UTC, tr-sc
Details
pdf rendition of the sample .txt (1.94 MB, application/pdf)
2022-06-16 07:21 UTC, tr-sc
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tr-sc 2022-06-16 06:13:22 UTC
Created attachment 22750 [details]
.txt containing special characters

When processing this .txt file with Imagemagick to create a PDF, IM uses GS as a delegate for this task, I notice a big drop in performance, processing for 3 mins longer than in previous version.

GS 9.55 -> 9.56.1



STR
- Process the .txt file with GS different versions
- Notice performance degradation
Comment 1 Ken Sharp 2022-06-16 07:09:11 UTC
(In reply to tr-sc from comment #0)

> When processing this .txt file with Imagemagick to create a PDF, IM uses GS
> as a delegate for this task, I notice a big drop in performance, processing
> for 3 mins longer than in previous version.

This appears to be HTML, not text. In fact given that it's all in Chinese my first thought was that this is a spam bug report, however the content of your comment makes me think otherwise, so I'm treating this as a real report.


Ghostscript doesn't interpret HTML, so this is not the input file as sent to Ghostscript.

When reporting a bug we need you to supply us with the input sent to Ghostscript, and the command line used to interpret the file.


> STR
> - Process the .txt file with GS different versions

Ghostscript doesn't interpret 'text' files (or HTML come to that). Your comment #0 suggests you are using ImageMagick to process this file, not Ghostscript, and you haven't actually explained *how* you are using ImageMagick to do so.


> - Notice performance degradation

Since Ghostscript doesn't process the file, it gives a syntax error when presented with the file as input and immediately exits, I see no performance degradation.


We need you to tell us how to reproduce your problem using just Ghostscript.
Comment 2 tr-sc 2022-06-16 07:21:09 UTC
Created attachment 22755 [details]
pdf rendition of the sample .txt
Comment 3 tr-sc 2022-06-16 07:21:49 UTC
agree, thanks for the quick answer, maybe this pdf rendition of the .txt generated by Aspose is the issue then I'll attach the file

PS: funny that you thought I was a bot :D
Comment 4 tr-sc 2022-06-16 07:22:30 UTC
we create .jpg from that
Comment 5 Ken Sharp 2022-06-16 08:32:31 UTC
(In reply to tr-sc from comment #3)
> agree, thanks for the quick answer, maybe this pdf rendition of the .txt
> generated by Aspose is the issue then I'll attach the file

OK, so that file is the HTML source drawn as text. It takes a long time to start up because the PDF is 187 pages long and the interpreter needs to check all the pages and all the resources defined on each page to see if transparency is being used.

Are you sure this is the PDF file you are sending to Ghostscript ? I would have expected a rendering of the rendered HTML page rather than the HTML source.


> PS: funny that you thought I was a bot :D

I'm afraid we sometimes get postings/attachments containing either spam in a foreign language (often Chinese) or HTML security exploits. It's one of the reasons our Bugzilla is set up to require attachments such as JPEG to be downloaded rather than viewed in the browser.


(In reply to tr-sc from comment #4)
> we create .jpg from that

At what resolution, and what other parameters ? You may need to ask the ImageMagick people for answers to those questions.

For the record I tested this here using this command line :

gs -sDEVICE=jpeg -r300 -sOutputFile=/temp/out%d.jpg -dQUIET -dBATCH -dNOPAUSE sample_txt.pdf

with the 9.55.0 and 9.56.1 releases, as well as a release build from the current HEAD with the following results:

Testing 9.55.0
    Started at  9:18:33.65
    Finished at  9:19:15.67
Testing 9.56.1
    Started at  9:19:15.67
    Finished at  9:20:22.93
Testing Current
    Started at  9:20:22.93
    Finished at  9:21:34.96

So with that setup the current code is very slightly slower, probably due to being more thorough with checking resources, but its very marginal and certainly nowhere near 3 minutes. Note that is s single run for each binary; varying system load can result in different performance figures and if I wanted a reliable figure I would average this over several runs. But since the times are a long way from 3 minutes I don't think this can be exhibiting the problem.
Comment 6 tr-sc 2022-06-16 15:13:17 UTC
(In reply to Ken Sharp from comment #5)

Thanks a lot for that quick investigation, I'll try to have a deeper look on my side to see where exactly is the performance loss.

Maybe IM uses Ghostscript with different parameters or something.

I'll try to give an update soon
Comment 7 Ken Sharp 2022-06-16 15:22:03 UTC
(In reply to tr-sc from comment #6)

> Maybe IM uses Ghostscript with different parameters or something.

Almost certainly it does, and I can't guess what I'm afraid. I chose what seemed like a reasonable simple setup for creating JPEG output

 
> I'll try to give an update soon

Not a problem, I'll keep the bug open until we hear more at least.
Comment 8 Ken Sharp 2022-09-06 15:46:06 UTC
I had a little spare time and delved into the small slowdown with the new interpreter. This turned out to be an extreme example of a performance bottleneck we already knew about but had not yet addressed.

The problem is that the font contains 64K glyphs, ie its a complete font, where we would expect an efficient PDF producer to embed a subset, because that makes the file smaller.

I've made a commit here 36c3bc1dfb18d5181523e12e7dca9983b4d09b6c which should significantly improve files of this kind (which we would normally expect to be rare).

Running the same test as I did in comment 5 I now get:

Testing 9.55.0
    Started at 16:35:30.11
    Finished at 16:36:19.87
Testing 9.56.1
    Started at 16:36:19.87
    Finished at 16:37:46.67
Testing Current
    Started at 16:37:46.68
    Finished at 16:38:19.29

So 9.55.0 takes 49 seconds, 9.56.1 takes 87 seconds and current code takes 33 seconds.

So with that commit the current code is now actually faster than the old interpreter with files of this kind (large embedded CFF CIDFonts).

Since I couldn't reproduce the original reported performance decrease, the commit fixes the slowdown I can see, and we're about to do a release I'm going to close this as fixed.