Summary: | PS to PDF Conversion extremely slow (possibly endless) | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Dr. Werner Fink <werner> |
Component: | PDF Writer | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED FIXED | ||
Severity: | normal | ||
Priority: | P4 | ||
Version: | 8.64 | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: |
bug-509903_step2.ps
bug-509903_step2.ps bug-509903_step2-small.ps x.pdf done by ESP ghostscript 8.15 y.pdf.gz done by GPL ghostscript 8.64 and gzipped due size gs-864.pdf LI_DEC_RIMA_frontal |
Description
Dr. Werner Fink
2009-06-25 07:32:30 UTC
Created attachment 5161 [details]
bug-509903_step2.ps
ths is the PostScript which takes that long with the ps2pdf script ...
the bug number is used Novells bugzilla.
Created attachment 5162 [details]
bug-509903_step2.ps
ths is the PostScript which takes that long with the ps2pdf script ...
the bug number is used Novells bugzilla.
Please state the command line used for conversion, especially note whether you have used the ps2pdf script or the Ghostscript command line. The file does take a long time to process, it does not seem to hang though. The job contains 623 pages, and each page takes a few seconds to process. At one page per second the job would take 10 minutes. I don't know what version of pdfwrite was included with ESP GS, but its entirely possible that the current code takes longer due to doing a 'better' job of conversion. Each page does take a surprisingly long time to process though, given the simplicity of the content, and I'll look into it at some point. FWIW the job ran to completion in 29 minutes on my PC under Windows Vista, producing a 1.1 MB PDF file. That's ~2.75 seconds per page, not fast but not hugely slow either. For comparison Acrobat 9 took 54 seconds to do the same job. This should do the job: gs -dSAFER -dCompatibilityLevel=1.2 \ -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=step3.pdf -dSAFER \ -c .setpdfwrite -f bug-509903_step2.ps the user its self had used ps2pdf from ghostscript. And this job has not finished until an hour and more on Linux i586 and Linux x86_64. Whereas with the old ESP ghostscript 8.15.4 does the job within 5 minutes. I'm having trouble reproducing this. On my amd64 linux box ESP Ghoscript 8.15.4 took 9 minutes to run the command line from comment #4 whereas Ghostscript 8.64 took 13 minutes. I agree this isn't optimal, but not quite as bad as the 5 minutes to forever you are seeing. Please privde the glibc and gcc version. Also I've the question if FORTIFY is active during compile? gcc (GCC) 4.1.3 20070929 (prerelease) (Ubuntu 4.1.2-16ubuntu2) /lib/libc-2.6.1.so I'm not familiar with FORTIFY, but the procedure I followed to build ghostscript is "./autogen.sh ; make" Currently I've here -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables in the CFLAGS as default. The gcc version is 4.3 or 4.4 and glibc version is 2.9 or 2.10. Following setting and devices are active FILE_IMPLEMENTATION=both psl3.dev pdf.dev dps.dev dpsnext.dev ttfont.dev epsf.dev pipe.dev rasterop.dev fzlib.dev cidfont.dev fapi.dev posync.dev gsnogc.dev async.dev ... beside all printer and format drivers. Just run `psselect -p2' for a smaller test case. This page takes round about 17 seconds, that is for 632 pages round about 3 hours. For one page for a test the resulting pdf file is 3MB large an and viewing it with ghostscript is slow as I can view the drawing step by step. Xpdf is also slower but faster than gs/gv ... , acroread reports Cannot extract the embedded font 'T3Font_1'. Some characters may not display or print correctly and indeed, then only word I see is `WAREHOUS' followed by dots as replacements for all following characters. Just found out that if I use a copy of the PostScript code with only one line included: (WAREHOUSE) p n all went OK that is it is very fast (< 0.3s) and acroread does not complain any problem. But using a line like (WAREHOUSE ) p n cause an error in acroread as described in comment #9 ...the space can be replaced with any ascii character. Acroread pops up then an error after displaying WAREHOUS.. that is that always 9th and 10th character are replaced by dots. Using pdfinfo from xpdf shows for ESP ghostscript for one page of the example PostScript: pdfinfo ~/x.pdf Producer: ESP Ghostscript 8.15 CreationDate: Tue Jul 7 12:30:57 2009 ModDate: Tue Jul 7 12:30:57 2009 Tagged: no Pages: 1 Encrypted: no Page size: 595 x 842 pts (A4) File size: 11117 bytes Optimized: no PDF version: 1.3 and for GPL ghostscript 8.64: pdfinfo ~/y.pdf Title: step1 Author: Adam Tauno Williams Creator: a2ps version 4.13 Producer: GPL Ghostscript 8.64 CreationDate: Tue Jul 7 10:32:36 2009 ModDate: Tue Jul 7 10:32:36 2009 Tagged: no Pages: 1 Encrypted: no Page size: 595 x 842 pts (A4) File size: 3063903 bytes Optimized: no PDF version: 1.4 the file differ by a factor 275 in size. Some of the speed difference seems to be caused by increased I/O load IMHO. The size difference is surprising (to say the least), especially since the full 632 pages produced, for me, a PDF file slightly over 1MB. Its hard to see why running a single page from the document should produce a file 3 times as large. There must be some significant difference in the resulting PostScript. Please post both files so I can see what is causing the difference. Please also post the single page PostScript file used to create the two PDF files so I can run it here and try to figure out why this is happening. Created attachment 5188 [details] bug-509903_step2-small.ps The single page PostScript code which is a derivate from the attachment #5161 [details] with the help og pstops from the psutils tool package. Created attachment 5189 [details]
x.pdf done by ESP ghostscript 8.15
It seems that there is only one real font included
that is only one /FontBox with one reencoding is
found therein.
Created attachment 5190 [details]
y.pdf.gz done by GPL ghostscript 8.64 and gzipped due size
Here I found 3402 /FontBBox entries and their reencodings.
It looks like as this was done for every character found
in the PostScript document.
Created attachment 5191 [details]
gs-864.pdf
Hmm, well I see the problem you describe in the files you attach. In fact the
PDF files contains a separate type 3 font for every single glyph, which is why
the file is large. It also doesn't open with Acrobat which complains about a
missing resource. (GS does render it, comparatively slowly).
However, I'm unable to reproduce the file using current code. Using a command
line based on what you had previously given:
gs -dSAFER -dCompatibilityLevel-1.2 -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite
-sOutputFile=out.pdf step2-small.ps
I get an 11Kb PDF file which contains one Type 3 font, works with Acrobat and
renders quickly with GS. On my slightly older Fedora installation I get a 13KB
file, still only one type 3 font.
So I reverted my Fedora installation to revision 9434, which I believe to be
the revision which was released as 8.64 (according to the Subversion logs), and
tried again. Same result, a small PDF file with only one type 3 font.
Attached is the file I get under Fedora using 8.64.
Strange ... which compiler and glibc version is used for compiling? Beside this also the flags used for the compiler its self would be interesting for me. I used ./autogen.sh and then make to build my executable. Compiler flags appear to be: -DHAVE_MKSTEMP -DHAVE_HYPOT -DHAVE_FILE64 -DHAVE_MKSTEMP64 -DHAVE_FONTCONFIG -O2 -Wall -Wstrict-prototypes -Wundef -Wmissing-declarations -Wmissing-prototypes -Wwrite-strings -Wno-strict-aliasing -Wdeclaration-after-statement -fno-builtin -fno-common -DHAVE_STDINT_H -DGX_COLOR_INDEX_TYPE="unsigned long long" -Ilibpng -Izlib -DPNG_NO_ASSEMBLER_CODE Bear in mind that different parts of the makefile can (I think) use different compiler flags, but this should cover the main ones. gcc -v reports: Target: i386-redhat-linux Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre --enable-libgcj-multifile --enable-java-maintainer-mode --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-cpu=generic --host=i386-redhat-linux Thread model: posix gcc version 4.1.2 20070925 (Red Hat 4.1.2-33) /lib contains libc-2.7.so To summarise; there is a possible performance issue, the original job does seem to run peculiarly slowly for such a simple file, and I will look into it. However, I can't make the release source for 8.64 behave as per your description, your experience is considerably worse than mine as regards performance, and you are generating files with an error. All this makes me think there is some difference between what we are running. I agree that is a difference what are we running ... the remaining questions what is the diference and what cause it. The compiler flags found in Makefile -Wall -Wstrict-prototypes -Wundef -Wmissing-declarations -Wmissing-prototypes -Wwrite-strings -Wno-strict-aliasing -Wdeclaration-after-statement -fno-builtin -fno-common -DHAVE_STDINT_H -DGX_COLOR_INDEX_TYPE="unsigned long long" -O2 -march=i586 -mtune=i686 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fno-strict-aliasing -fPIC -D_GNU_SOURCE -pipe -Wno-write-strings -Wno-return-type -Wno-unknown-pragmas -Wno-pointer-sign The gcc -v reports: Using built-in specs. Target: i586-suse-linux Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib --libexecdir=/usr/lib --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.4 --enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.4 --enable-linux-futex --without-system-libunwind --with-arch-32=i586 --with-tune=generic --build=i586-suse-linux Thread model: posix gcc version 4.4.0 [gcc-4_4-branch revision 148163] (SUSE Linux) I've found the cause of the problem, see http://bugs.ghostscript.com/show_bug.cgi?id=690559#c3 after removing the part used which was introduced by me to avoid a crash of ghostscipt 8.62 in the pdfwrite device I see now that the PostScript file in attachment #5161 [details] takes 24 minutes. This is extremely better then before (but remains slower then with ESP ghostscript 8.15.4). That's great, much more in line with my results. I still think there is a potential problem, because the file seems slow given the content, so we'll keep this issue open until that has been investigated. Thanks for letting us know. It is also worth asking what local patches Suse have - consider that local change mentioned in bug 690559 modifies how pdf's are processed in general. (unlike the bbx_create_compositor line removal - http://bugs.ghostscript.com/show_bug.cgi?id=689340#c6 - which only affects how x11alpha works, and also addresses 690559). AFAICR I've tested step by step my patches started from a fresh version, see comment #20. Simply to avoid such problems. The job runs slowly on my setup, which is clean of any odd patches as its the HEAD version, see comment #3, #18 and #21. As stated, the file seems to run slowly, but not as slowly as the original report, which Dr Fink has isolated to a specific work-around for bug #690559. Removing the pdfwrite-specific code, which is not required with recent sources, resolves the extreme problem, leaving only a smaller performance drop. We should look at the poor performance but not as a matter of urgency. The basic problem seems to be that all the fonts in the program are converted into type 3 fonts. Type 3 fonts are slower than other types anyway, and in addition pdfwrite spends a lot of time trying to decide if a newly encountered Charproc is the same as an earlier one. This can probably be improved. Note to self; look at revision 11618, it may be something similar. I've had a look at the performance issue. Partly its simply the fact that all the fonts are converted to type 3 fonts, and type 3 fonts are always slow because of the way that the CharProcs are captured. However, I did find that the way we compare a new CharProc to all the existing ones is unreasonably slow, as it involves seeking through a file and reading the stored data. We do this for every stored CharProc against every new CharProc, and we do it twice. Profiling the code showed that it was spending something like 75% of the time waiting for I/O operations to complete. Revision 11717: http://ghostscript.com/pipermail/gs-cvs/2010-September/011736.html tackles this by creating an md5 hash as the data is written and where possible comparing md5 hashes rather than re-reading the data continuously. For me this improves the performance considerably, a factor of 2-3 times faster. Profiling the code now does not show any glaring problems. Much of the time is now spent in compressing and writing the final PDF file. Type 3 fonts are still unavoidably slow, but at least they are better. Created attachment 10683 [details]
LI_DEC_RIMA_frontal
(In reply to comment #27) > Created attachment 10683 [details] > LI_DEC_RIMA_frontal For what purpose ? This bug is resolved fixed, please don't randomly attach files to closed bugs (especially not 300+ Mb files) If you think you have a problem, open a new bug report, I'll warn you now that a 300+Mb file won;t be looked at any time soon. |