Bug 690575

Summary:	PS to PDF Conversion extremely slow (possibly endless)
Product:	Ghostscript	Reporter:	Dr. Werner Fink <werner>
Component:	PDF Writer	Assignee:	Ken Sharp <ken.sharp>
Status:	RESOLVED FIXED
Severity:	normal
Priority:	P4
Version:	8.64
Hardware:	PC
OS:	Linux
Customer:		Word Size:	---
Attachments:	bug-509903_step2.ps bug-509903_step2.ps bug-509903_step2-small.ps x.pdf done by ESP ghostscript 8.15 y.pdf.gz done by GPL ghostscript 8.64 and gzipped due size gs-864.pdf LI_DEC_RIMA_frontal

Description Dr. Werner Fink 2009-06-25 07:32:30 UTC

Just the case for a large PostScript done by a2ps
... it was handled within a few minutes by the old
espgs 8.15.4 but with gpl ghostscript 8.6x it takes
endless.  I've no patch for this :(

Comment 1 Dr. Werner Fink 2009-06-25 07:36:36 UTC

Created attachment 5161 [details]
bug-509903_step2.ps

ths is the PostScript which takes that long with the ps2pdf script ...
the bug number is used Novells bugzilla.

Comment 2 Dr. Werner Fink 2009-06-25 07:51:34 UTC

Created attachment 5162 [details]
bug-509903_step2.ps

ths is the PostScript which takes that long with the ps2pdf script ...
the bug number is used Novells bugzilla.

Comment 3 Ken Sharp 2009-06-25 08:50:02 UTC

Please state the command line used for conversion, especially note whether you
have used the ps2pdf script or the Ghostscript command line.

The file does take a long time to process, it does not seem to hang though. The
job contains 623 pages, and each page takes a few seconds to process. At one
page per second the job would take 10 minutes. I don't know what version of
pdfwrite was included with ESP GS, but its entirely possible that the current
code takes longer due to doing a 'better' job of conversion.

Each page does take a surprisingly long time to process though, given the
simplicity of the content, and I'll look into it at some point.

FWIW the job ran to completion in 29 minutes on my PC under Windows Vista,
producing a 1.1 MB PDF file. That's ~2.75 seconds per page, not fast but not
hugely slow either. For comparison Acrobat 9 took 54 seconds to do the same job.

Comment 4 Dr. Werner Fink 2009-06-25 09:04:12 UTC

This should do the job:

 gs -dSAFER -dCompatibilityLevel=1.2 \
  -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=step3.pdf -dSAFER \
  -c .setpdfwrite -f bug-509903_step2.ps

the user its self had used ps2pdf from ghostscript. And this job has not
finished until an hour and more on Linux i586 and Linux x86_64.  Whereas
with the old ESP ghostscript 8.15.4 does the job within 5 minutes.

Comment 5 Marcos H. Woehrmann 2009-06-29 04:26:49 UTC

I'm having trouble reproducing this.  On my amd64 linux box ESP Ghoscript 8.15.4 took 9 minutes to run 
the command line from comment #4 whereas Ghostscript 8.64 took 13 minutes.  I agree this isn't optimal, 
but not quite as bad as the 5 minutes to forever you are seeing.

Comment 6 Dr. Werner Fink 2009-06-29 04:33:20 UTC

Please privde the glibc and gcc version.  Also I've the question
if FORTIFY is active during compile?

Comment 7 Marcos H. Woehrmann 2009-06-29 04:57:57 UTC

gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2)

/lib/libc-2.6.1.so

I'm not familiar with FORTIFY, but the procedure I followed to build ghostscript is "./autogen.sh ; make"

Comment 8 Dr. Werner Fink 2009-06-29 05:11:19 UTC

Currently I've here

  -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector
-funwind-tables -fasynchronous-unwind-tables

in the CFLAGS as default. The gcc version is 4.3 or 4.4 and glibc version
is 2.9 or 2.10.

Following setting and devices are active

  FILE_IMPLEMENTATION=both

  psl3.dev pdf.dev dps.dev dpsnext.dev ttfont.dev epsf.dev
  pipe.dev rasterop.dev fzlib.dev cidfont.dev fapi.dev
  posync.dev gsnogc.dev async.dev

... beside all printer and format drivers.

Comment 9 Dr. Werner Fink 2009-06-29 07:48:41 UTC

Just run `psselect -p2' for a smaller test case. This page takes
round about 17 seconds, that is for 632 pages round about 3 hours.
For one page for a test the resulting pdf file is 3MB large an
and viewing it with ghostscript is slow as I can view the drawing
step by step.  Xpdf is also slower but faster than gs/gv ... ,
acroread reports

   Cannot extract the embedded font 'T3Font_1'. Some
   characters may not display or print correctly

and indeed, then only word I see is `WAREHOUS' followed by dots
as replacements for all following characters.

Comment 10 Dr. Werner Fink 2009-07-03 06:13:09 UTC

Just found out that if I use a copy of the PostScript code with only one
line included:

       (WAREHOUSE) p n

all went OK that is it is very fast (< 0.3s) and acroread does not
complain any problem. But using a line like

       (WAREHOUSE ) p n

cause an error in acroread as described in comment #9 ...the space
can be replaced with any ascii character.  Acroread pops up then an
error after displaying

        WAREHOUS..

that is that always 9th and 10th character are replaced by dots.

Comment 11 Dr. Werner Fink 2009-07-07 03:40:49 UTC

Using pdfinfo from xpdf shows for ESP ghostscript for one page of the
example PostScript:

 pdfinfo ~/x.pdf 
 Producer:       ESP Ghostscript 8.15
 CreationDate:   Tue Jul  7 12:30:57 2009
 ModDate:        Tue Jul  7 12:30:57 2009
 Tagged:         no
 Pages:          1
 Encrypted:      no
 Page size:      595 x 842 pts (A4)
 File size:      11117 bytes
 Optimized:      no
 PDF version:    1.3

and for GPL ghostscript 8.64:

 pdfinfo ~/y.pdf
 Title:          step1
 Author:         Adam Tauno Williams
 Creator:        a2ps version 4.13
 Producer:       GPL Ghostscript 8.64
 CreationDate:   Tue Jul  7 10:32:36 2009
 ModDate:        Tue Jul  7 10:32:36 2009
 Tagged:         no
 Pages:          1
 Encrypted:      no
 Page size:      595 x 842 pts (A4)
 File size:      3063903 bytes
 Optimized:      no
 PDF version:    1.4

the file differ by a factor 275 in size.  Some of the speed
difference seems to be caused by increased I/O load IMHO.

Comment 12 Ken Sharp 2009-07-07 05:53:30 UTC

The size difference is surprising (to say the least), especially since the full
632 pages produced, for me, a PDF file slightly over 1MB. Its hard to see why
running a single page from the document should produce a file 3 times as large.
There must be some significant difference in the resulting PostScript.

Please post both files so I can see what is causing the difference. Please also
post the single page PostScript file used to create the two PDF files so I can
run it here and try to figure out why this is happening.

Comment 13 Dr. Werner Fink 2009-07-07 06:07:28 UTC

Created attachment 5188 [details]
bug-509903_step2-small.ps

The single page PostScript code which is a derivate
from the attachment #5161 [details] with the help og pstops
from the psutils tool package.

Comment 14 Dr. Werner Fink 2009-07-07 06:09:23 UTC

Created attachment 5189 [details]
x.pdf done by ESP ghostscript 8.15

It seems that there is only one real font included
that is only one /FontBox with one reencoding is
found therein.

Comment 15 Dr. Werner Fink 2009-07-07 06:14:18 UTC

Created attachment 5190 [details]
y.pdf.gz done by GPL ghostscript 8.64 and gzipped due size

Here I found 3402 /FontBBox entries and their reencodings.
It looks like as this was done for every character found
in the PostScript document.

Comment 16 Ken Sharp 2009-07-07 07:48:53 UTC

Created attachment 5191 [details]
gs-864.pdf

Hmm, well I see the problem you describe in the files you attach. In fact the
PDF files contains a separate type 3 font for every single glyph, which is why
the file is large. It also doesn't open with Acrobat which complains about a
missing resource. (GS does render it, comparatively slowly).

However, I'm unable to reproduce the file using current code. Using a command
line based on what you had previously given:

gs -dSAFER -dCompatibilityLevel-1.2 -q -dNOPAUSE -dBATCH  -sDEVICE=pdfwrite
-sOutputFile=out.pdf step2-small.ps

I get an 11Kb PDF file which contains one Type 3 font, works with Acrobat and
renders quickly with GS. On my slightly older Fedora installation I get a 13KB
file, still only one type 3 font.

So I reverted my Fedora installation to revision 9434, which I believe to be
the revision which was released as 8.64 (according to the Subversion logs), and
tried again. Same result, a small PDF file with only one type 3 font.

Attached is the file I get under Fedora using 8.64.

Comment 17 Dr. Werner Fink 2009-07-07 07:56:52 UTC

Strange ... which compiler and glibc version is used for compiling?
Beside this also the flags used for the compiler its self would
be interesting for me.

Comment 18 Ken Sharp 2009-07-07 08:19:32 UTC

I used ./autogen.sh and then make to build my executable.

Compiler flags appear to be:

-DHAVE_MKSTEMP -DHAVE_HYPOT -DHAVE_FILE64 -DHAVE_MKSTEMP64 -DHAVE_FONTCONFIG -O2
-Wall -Wstrict-prototypes -Wundef -Wmissing-declarations -Wmissing-prototypes
-Wwrite-strings -Wno-strict-aliasing -Wdeclaration-after-statement -fno-builtin
-fno-common -DHAVE_STDINT_H -DGX_COLOR_INDEX_TYPE="unsigned long long"  
-Ilibpng -Izlib  -DPNG_NO_ASSEMBLER_CODE

Bear in mind that different parts of the makefile can (I think) use different
compiler flags, but this should cover the main ones.


gcc -v reports:

Target: i386-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info --enable-shared --enable-threads=posix
--enable-checking=release --with-system-zlib --enable-__cxa_atexit
--disable-libunwind-exceptions
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk
--disable-dssi --enable-plugin
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic
--host=i386-redhat-linux
Thread model: posix
gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)


/lib contains libc-2.7.so


To summarise; there is a possible performance issue, the original job does seem
to run peculiarly slowly for such a simple file, and I will look into it.
However, I can't make the release source for 8.64 behave as per your
description, your experience is considerably worse than mine as regards
performance, and you are generating files with an error. All this makes me think
there is some difference between what we are running.

Comment 19 Dr. Werner Fink 2009-07-07 08:27:56 UTC

I agree that is a difference what are we running ... the remaining
questions what is the diference and what cause it.

The compiler flags found in Makefile

-Wall -Wstrict-prototypes -Wundef -Wmissing-declarations -Wmissing-prototypes
-Wwrite-strings -Wno-strict-aliasing -Wdeclaration-after-statement -fno-builtin
-fno-common -DHAVE_STDINT_H -DGX_COLOR_INDEX_TYPE="unsigned long long" -O2
-march=i586 -mtune=i686 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2
-fstack-protector -funwind-tables -fasynchronous-unwind-tables -g
-fno-strict-aliasing -fPIC -D_GNU_SOURCE -pipe  -Wno-write-strings
-Wno-return-type -Wno-unknown-pragmas -Wno-pointer-sign

The gcc -v reports:

Using built-in specs.
Target: i586-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info
--mandir=/usr/share/man --libdir=/usr/lib --libexecdir=/usr/lib
--enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release
--with-gxx-include-dir=/usr/include/c++/4.4 --enable-ssp --disable-libssp
--with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux'
--disable-libgcj --disable-libmudflap --with-slibdir=/lib --with-system-zlib
--enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch
--enable-version-specific-runtime-libs --program-suffix=-4.4
--enable-linux-futex --without-system-libunwind --with-arch-32=i586
--with-tune=generic --build=i586-suse-linux
Thread model: posix
gcc version 4.4.0 [gcc-4_4-branch revision 148163] (SUSE Linux)

Comment 20 Dr. Werner Fink 2009-07-09 04:49:01 UTC

I've found the cause of the problem, see 
http://bugs.ghostscript.com/show_bug.cgi?id=690559#c3
after removing the part used which was introduced by me to avoid
a crash of ghostscipt 8.62 in the pdfwrite device I see now that
the PostScript file in attachment #5161 [details] takes 24 minutes.
This is extremely better then before (but remains slower then with
ESP ghostscript 8.15.4).

Comment 21 Ken Sharp 2009-07-09 04:59:36 UTC

That's great, much more in line with my results. I still think there is a
potential problem, because the file seems slow given the content, so we'll keep
this issue open until that has been investigated.

Thanks for letting us know.

Comment 22 Hin-Tak Leung 2009-12-18 20:16:28 UTC

It is also worth asking what local patches Suse have - consider that local
change mentioned in bug 690559 modifies how pdf's are processed in general.
(unlike the bbx_create_compositor line removal -
http://bugs.ghostscript.com/show_bug.cgi?id=689340#c6 - which only affects how
x11alpha works, and also addresses 690559).

Comment 23 Dr. Werner Fink 2009-12-21 01:15:38 UTC

AFAICR I've tested step by step my patches started from a fresh
version, see comment #20.  Simply to avoid such problems.

Comment 24 Ken Sharp 2009-12-21 03:00:34 UTC

The job runs slowly on my setup, which is clean of any odd patches as its the
HEAD version, see comment #3, #18 and #21.

As stated, the file seems to run slowly, but not as slowly as the original
report, which Dr Fink has isolated to a specific work-around for bug #690559.
Removing the pdfwrite-specific code, which is not required with recent sources,
resolves the extreme problem, leaving only a smaller performance drop.

We should look at the poor performance but not as a matter of urgency.

Comment 25 Ken Sharp 2010-09-09 15:14:31 UTC

The basic problem seems to be that all the fonts in the program are converted into type 3 fonts. Type 3 fonts are slower than other types anyway, and in addition pdfwrite spends a lot of time trying to decide if a newly encountered Charproc is the same as an earlier one.

This can probably be improved. Note to self; look at revision 11618, it may be something similar.

Comment 26 Ken Sharp 2010-09-14 08:11:32 UTC

I've had a look at the performance issue. Partly its simply the fact that all the fonts are converted to type 3 fonts, and type 3 fonts are always slow because of the way that the CharProcs are captured.

However, I did find that the way we compare a new CharProc to all the existing ones is unreasonably slow, as it involves seeking through a file and reading the stored data. We do this for every stored CharProc against every new CharProc, and we do it twice.

Profiling the code showed that it was spending something like 75% of the time waiting for I/O operations to complete.

Revision 11717:

http://ghostscript.com/pipermail/gs-cvs/2010-September/011736.html

tackles this by creating an md5 hash as the data is written and where possible comparing md5 hashes rather than re-reading the data continuously. For me this improves the performance considerably, a factor of 2-3 times faster. Profiling the code now does not show any glaring problems. Much of the time is now spent in compressing and writing the final PDF file. Type 3 fonts are still unavoidably slow, but at least they are better.

Comment 27 cinapcejp 2014-02-10 11:54:22 UTC

Created attachment 10683 [details]
LI_DEC_RIMA_frontal

Comment 28 Ken Sharp 2014-02-10 12:34:46 UTC

(In reply to comment #27)
> Created attachment 10683 [details]
> LI_DEC_RIMA_frontal

For what purpose ? This bug is resolved fixed, please don't randomly attach files to closed bugs (especially not 300+ Mb files)

If you think you have a problem, open a new bug report, I'll warn you now that a 300+Mb file won;t be looked at any time soon.