Summary: | ps2pdf generates damaged PDF | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Martin Rehak <rehak> |
Component: | PDF Writer | Assignee: | Default assignee <ghostpdl-bugs> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | chris.liddell |
Priority: | P2 | ||
Version: | 10.05.0 | ||
Hardware: | PC | ||
OS: | Solaris | ||
Customer: | Word Size: | --- | |
Attachments: |
Damaged PDF.
Working PDF. Input. Configure output Damaged PDF 1 |
Description
Martin Rehak
2025-06-19 07:53:16 UTC
Created attachment 26922 [details]
Working PDF.
Created attachment 26923 [details]
Input.
(In reply to Martin Rehak from comment #0) > after upgrade from ghostscript-10.04.0 to ghostscript-10.05.0 I have found > that ps2pdf generates a pdf which seems to be damaged (evince) and also gs > has problems to process it. Firstly: do not use 'ps2pdf' you get much better control using Ghostscript directly rather than a shell script. > evince (even on Linux, different box) gives me 'PDF document is damaged' > while trying to display it. No other error message. > > Ghostscript-9.54.0 on Linux box gives me this: Use a current version of Ghostscript, rather than one which is 4 years old. This: gs -sDEVICE=pdfwrite -o out.pdf testprinter.ps produces a PDF file which opens without complaint with Acrobat, Ghostscript, MuPDF and every PDF consumer I've tried I even tested this with the options from running ps2pdf: gs -P- -dSAFER -dCompatibilityLevel=1.4 -dWriteXRefStm=false -dWriteObjStms=false -q -P- -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf -P- -dSAFER -dCompatibilityLevel=1.4 -dWriteXRefStm=false -dWriteObjStms=false testprinter.ps And by running ps2pdf. Again the output files open without complaint. > Attaching working and damaged pdf output. Could anyone help me debug this, > please? Since this works for me, there is nothing I can suggest; we need to be able to reproduce a problem to address it. You could use Ghostscript rather than ps2pdf, this would enable you to supply a set of options which fail, and which we would have some chance of reproducing. Hi Ken, firstly thank you for your time. It seems that the issue is platforms specific as I am not able to reproduce the same behavior on Linux. I have the same trying to open pdf generated by 10.05.0 on Solaris while viewing it on Ubuntu with gs-10.02.1 and evince-46.3.1-0ubuntu1. This is what gs-10.02.1 on Ubuntu tells me when I try to process the damaged PDF generated on Solaris: $ gs testprinter.pdf GPL Ghostscript 10.02.1 (2023-11-01) Copyright (C) 2023 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. Catalog dictionary not located in file, unable to proceed **** Error: Couldn't initialise file. Output may be incorrect. No pages will be processed (FirstPage > LastPage). The following errors were encountered at least once while processing this file: missing white space after number xref table was repaired **** This file had errors that were repaired or ignored. **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification. GS>XIO: fatal IO error 0 (Success) on X server ":0" after 70 requests (70 known processed) with 0 events remaining. And if you say I should not use ps2pdf there is the output from gs-10.05.0 while trying to generate the PDF (which is going to be damaged) and is currently attached to this bug report as https://bugs.ghostscript.com/attachment.cgi?id=26921 $ gs -sDEVICE=pdfwrite -o testprinter.pdf testprinter.ps GPL Ghostscript 10.05.0 (2025-03-12) Copyright (C) 2025 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. Loading NimbusRoman-Regular font from /usr/share/ghostscript/10.05.0/Resource/Font/NimbusRoman-Regular... 4173272 2801392 1954624 661458 2 done. Could you try yourself to open the damaged PDF and try to help me find the reason why it is damaged, please? Thank you, m. (In reply to Martin Rehak from comment #4) > It seems that the issue is platforms specific as I am not able to reproduce > the same behavior on Linux. Ah, well I don't have access, even as a VM, to a Solaris machine, so I'm not going to be able to reproduce it. > Could you try yourself to open the damaged PDF and try to help me find the > reason why it is damaged, please? A number of references are written incorrectly. Perhaps the most important are entries in the trailer dictionary: trailer << /Size 13d /Root 1d 0 R /Info 2d 0 R /ID [<C8E40AF4D4A01C26AD92EA640DC8B9A8><C8E40AF4D4A01C26AD92EA640DC8B9A8>] >> 13d 0 R should be 13 0 R, similarly 1d 0 R should be 1 0 R, 2d 0 R should be 2 0 R. There are numerous similar problems with other dictionaries. You must (obviously) be building from source yourself so you could make a debug build and debug the problem. Most likely would be a problem with the PRId64 macro, though this has not changed at all recently. I'd suggest breaking in a debugger at line 1593 of gdevpdf.c /* * The PDF documentation allows, and this code formerly emitted, * a Contents entry whose value was an empty array. Acrobat Reader * 3 and 4 accept this, but Acrobat Reader 5.0 rejects it. * Fortunately, the Contents entry is optional. */ if (page->contents_id != 0) pprinti64d1(s, "/Contents %"PRId64" 0 R\n", page->contents_id); Check the string 's' which should read "/Contents <page->contents_id> 0 R" Will do, just give me some time. And thank you very much for looking into this. m. Hi Ken, $ file /usr/bin/gs /usr/bin/gs: ELF 64-bit LSB dynamic lib AMD64 Version 1, position-independent executable, dynamically linked, not stripped When I step into pprinti64d1() it gets following arguments: pprinti64d1 (s=0xaf216d200, format=0x7fce34e63c47 "/Contents %lld 0 R\n", v=5) at ./ghostscript-10.05.1/base/spprint.c:221 To be honest I don't know what the code does, but when I step further: (gdb) n 233 pputs_short(s, str); (gdb) p str $13 = "5\000\022+\335\177\000\000\316\270\"5\316\177\000\000$<\3464\316\177\000\000" (gdb) p s $9 = (stream *) 0xaf216d200 (gdb) p fp $10 = 0x7fce34e63c51 "%lld 0 R\n" (gdb) p z $11 = 3 Take a look at content of variable 'z' as I would expect it to be 4. Either I am blind or there is something seriously wrong. $ cat a.c #include <stdio.h> #include <string.h> #define PRId64 "lld" int main() { char *format = "/Contents %lld 0 R\n"; int v=5; printf(format, v); printf("%d\n", strlen("%"PRId64)); return 0; } $ gcc -m64 a.c $ ./a.out /Contents 5 0 R 4 Attaching the output from 'configure' and one of the gcc call follows so you can see what arguments it got. /usr/gcc/14/bin/gcc -m64 -DHAVE_MKSTEMP -DHAVE_FSEEKO -DHAVE_FONTCONFIG -DHAVE_SETLOCALE -DHAVE_SSE2 -DHAVE_DBUS -DHAVE_BSWAP32 -DHAVE_STRERROR -DHAVE_ISNAN -DHAVE_PREAD_PWRITE=1 -DGS_RECURSIVE_MUTEXATTR=PTHREAD_MUTEX_RECURSIVE -fPIC -O2 -DNDEBUG -m64 -Wall -Wstrict-prototypes -Wundef -Wmissing-declarations -Wmissing-prototypes -Wwrite-strings -fno-strict-aliasing -Werror=declaration-after-statement -fno-builtin -fno-common -Werror=return-type -Wno-unused-local-typedefs -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -DHAVE_STDINT_H=1 -DHAVE_DIRENT_H=1 -DHAVE_SYS_TIME_H=1 -DHAVE_SYS_TIMES_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_LIBDL=1 -DGX_COLOR_INDEX_TYPE="unsigned long long" -D__USE_UNIX98=1 -DHAVE_SNPRINTF -DGS_MEMPTR_ALIGNMENT=8 -DBUILD_PDF=1 -I/scratch/mrehak/gs/components/ghostscript/ghostscript-10.05.1/pdf -m64 -fPIC -DPIC -O3 -ffile-prefix-map=/scratch/mrehak/gs/components/ghostscript=. -g3 -O0 -DUSE_PDF_PERMISSIONS=1 -DHAVE_RESTRICT=1 -DHAVE_LIMITS_H=1 -DHAVE_STRING_H=1 -DUSE_LIBPAPER -I/usr/include/webp -fno-strict-aliasing -DHAVE_POPEN_PROTO=1 -DGS_DEVS_SHARED -DGS_DEVS_SHARED_DIR=\"/usr/lib/amd64/ghostscript/10.05.1\" -I/scratch/mrehak/gs/components/ghostscript/ghostscript-10.05.1/psi -I./obj -I./obj -I/scratch/mrehak/gs/components/ghostscript/ghostscript-10.05.1/base -I/scratch/mrehak/gs/components/ghostscript/ghostscript-10.05.1/devices -o ./obj/idstack.o -c /scratch/mrehak/gs/components/ghostscript/ghostscript-10.05.1/psi/idstack.c Ignore my '-g3 -O0' to get symbols. Any clue? Regards, m. Created attachment 26940 [details]
Configure output
But it seems PRId64 is defined as '%ld' in base/spprint.c context while it is being defined as '%lld' in devices/vector/gdevpdf.c context. pprinti64d1 (s=0xa40a8b490, format=0x7fe235263c47 "/Contents %lld 0 R\n", v=5) at ./ghostscript-10.05.1/base/spprint.c:219 219 const char *fp = pprintf_scan(s, format); (gdb) n 221 const size_t z = strlen("%"PRId64); (gdb) s 232 gs_snprintf(str, sizeof(str), "%"PRId64, v); (gdb) p "%"PRId64 $18 = "%ld" I would say it should be '%lld' in base/spprint.c. Will continue tomorrow. My colleague finished bisect in parallel and found this commit: https://github.com/ArtifexSoftware/ghostpdl/commit/c174527d944f8c2a495ad009bf2383a0d48d33e1#diff-43ba2082cb88d09896859c03d865ae40956647b017b87d3a22881bf388899e85 And at least this seems to be wrong: - pprintld1(s, "/Contents %ld 0 R\n", page->contents_id); + pprinti64d1(s, "/Contents %"PRId64" 0 R\n", page->contents_id); I will prepare a fix and will test it. (In reply to Martin Rehak from comment #10) > My colleague finished bisect in parallel and found this commit: > > https://github.com/ArtifexSoftware/ghostpdl/commit/ > c174527d944f8c2a495ad009bf2383a0d48d33e1#diff- > 43ba2082cb88d09896859c03d865ae40956647b017b87d3a22881bf388899e85 > > And at least this seems to be wrong: > > - pprintld1(s, "/Contents %ld 0 R\n", page->contents_id); > + pprinti64d1(s, "/Contents %"PRId64" 0 R\n", page->contents_id); > > I will prepare a fix and will test it. Please don't simply revert the change; there was a reason for it. Also the PRId64 macro is used in (many) other places, if it doesn't work properly here, then the others presumably don't either, its simply that nobody has found them yet. I'll discuss it with one of my colleagues later. That is not my plan, the change looks reasonable. What I still don't know why PRId64 is defined as '%ld' in spprint.c. Trying to find out. (In reply to Martin Rehak from comment #12) > That is not my plan, the change looks reasonable. What I still don't know > why PRId64 is defined as '%ld' in spprint.c. Trying to find out. OK I will be very interested to see what you come up with. The macro is, obviously, device dependent and IIRC is defined in ghostpdl/base/stdint_.h I'd have to guess there's something wrong with the selection. (In reply to Martin Rehak from comment #12) > That is not my plan, the change looks reasonable. What I still don't know > why PRId64 is defined as '%ld' in spprint.c. Trying to find out. You put hardware as "PC" but not specified word length. Is it x86 or x86_64 in question? Hi Chris, Ken, We are building the ghostscript for both (-m32, -m64) platforms to deliver both libraries, but only 64bit executables are delivered, so we are talking about 64bit -- x86_64 and -m64 /usr/bin/gs. As far as I understand both %lld and %ld refers to 64bit integers on x86_64. --- When spprint.c is compiled NO Unix standard level is specified so Solaris headers, namely inttypes.h, assumes the latest standard level and includes a definition of #define PRId64 "ld" --- When gdevpdf.c is compiled it includes base/unistd_.h where it defines #define _XOPEN_SOURCE 500 Thus SUSv2 is selected and inttypes.h DOES NOT define PRId64. Then stdint_.h uses ghostscript default definition 182 # ifndef PRId64 183 # define PRId64 "lld" 184 # endif in base/stdint_.h. --- As far as I understand PRId64 macro is a part of UNIX standard since SUSv3/XPG6, so I think Solaris does nothing wrong here. The issue is not the bit depths of %lld and %ld, because they are both 64bit, but objects uses different PRId64 definition and thus 221 const size_t z = strlen("%"PRId64); in base/spprint.c is 3 and the code doesn't count with that. The question here is how to fix that. I believe inclusion of base/unistd_.h in base/spprint.c could help to make the definition same. But it's rather a hack then a real solution. Any comment, please? m. (In reply to Martin Rehak from comment #15) > base/unistd_.h > > in base/spprint.c could help to make the definition same. But it's rather a > hack then a real solution. > > Any comment, please? It seems like a reasonable solution. Testing here shows no problem with our regression tests but since I'm unable to test it on Solaris would you confirm this diff resolves the problem for you : diff --git a/base/spprint.c b/base/spprint.c index ce1698938..2981a539b 100644 --- a/base/spprint.c +++ b/base/spprint.c @@ -15,12 +15,12 @@ /* Print values in ASCII form on a stream */ +#include <unistd.h> #include "math_.h" /* for fabs */ #include "stdio_.h" /* for stream.h */ #include "string_.h" /* for strchr */ #include "stream.h" #include "spprint.h" (In reply to Ken Sharp from comment #16) Actually could you make that: > +#include <unistd_.h> That's what I'm currently testing as unistd.h doesn't (D'oh!) resolve on other platforms. System unistd.h doesn't dictate the unix standard. But ghostscript's base/unistd_.h does. Following patch $ cat patches/spprint.patch --- ghostscript-10.05.1/base/spprint.c +++ ghostscript-10.05.1/base/spprint.c @@ -15,6 +15,7 @@ /* Print values in ASCII form on a stream */ +#include "unistd_.h" #include "math_.h" /* for fabs */ #include "stdio_.h" /* for stream.h */ #include "string_.h" /* for strchr */ makes the PDF much better: /Contents 5 0 R But it is still considered damaged by evince and this is the gs output: $ gs -sDEVICE=png16 -o out.png ~/testprinter.pdfGPL Ghostscript 10.05.1 (2025-04-29) Copyright (C) 2025 Artifex Software, Inc. All rights reserved. This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY: see the file COPYING for details. **** Error: Couldn't initialise file. Output may be incorrect. No pages will be processed (FirstPage > LastPage). It seems there are more places where that happens. Attached as 'Damaged PDF 1'. Could you look into it and give me a advice what is wrong there? m. Created attachment 26946 [details]
Damaged PDF 1
Well the output is now missing generation numbers for some objects. For example: 10 obj <</Type /Catalog /Pages 3 0 R /Metadata 130 R >> That should be : 10 0 obj <</Type /Catalog /Pages 3 0 R /Metadata 130 0 R >> Notice the missing '0' in both places, one defining the object, and one referring to an object with a reference. I hate over-engineered projects. :) There are of course much more places where PRId64 is referenced. Where _XOPEN_SOURCE is defined, it works with ghostscript definition '%lld'. Where the standard is not defined it works with Solaris (also right, but just in -m64 mode) '%ld'. I tried to build the whole gs with _XOPEN_SOURCE=600 (to make inttypes.h define system definition) and with _XOPEN_SOURCE=500 (to use gs default). Both exploded very quickly on memstreams and other issues. No go for me. Could you tell me why there is the pprinti64d1() function and what it does more then sprintf(), please? If we know exactly the formating string doesn't matter where it comes from, why can't we just call sprintf()? 217 pprinti64d1(stream *s, const char *format, int64_t v) 218 { 219 const char *fp = pprintf_scan(s, format); 220 char str[25]; 221 const size_t z = strlen("%"PRId64); 222 223 #ifdef DEBUG 224 size_t i; 225 226 for (i = 0; i < z; i++) 227 if (fp[i] != ("%"PRId64)[i]) 228 break; 229 if (i != z) 230 lprintf1("Bad format in pprinti64d: %s\n", format); 231 #endif 232 gs_snprintf(str, sizeof(str), "%"PRId64, v); 233 pputs_short(s, str); 234 return pprintf_scan(s, fp + z); 235 } Thanks, m. Sorry. s/(also right, but just in -m64 mode)/(also right, but different)/ I apologize for the emotional storm, I will humbly accept any solution. (In reply to Martin Rehak from comment #21) > Could you tell me why there is the pprinti64d1() function and what it does > more then sprintf(), please? No, because it existed and was (frequently) used in the pdfwrite device before I joined Artifex 18 years ago. One of the original author or the first maintainer felt it was required. Of those one is long since retired and the other is dead so I can't ask either of them. > If we know exactly the formating string doesn't > matter where it comes from, why can't we just call sprintf()? Well for a start there's a problem with sprintf, we forbid its use entirely. If you try to use it the compiler should throw an error. We have wrap around functions but still. (In reply to Martin Rehak from comment #23) > I apologize for the emotional storm, I will humbly accept any solution. I can't give you a solution because I don't have access to a Solaris system that will build Ghostscript. I can say that the 'x 0 obj' sequence is from line 960 of gdevpdfu.c: if (!pdev->WriteObjStms || pdev->strm != pdev->ObjStm.strm) pprinti64d1(s, "%"PRId64" 0 obj\n", id); That seems to be identical to the other instance so I'm kind of puzzled why it would behave differently. I'd suggest using the 'clean' or 'debugclean' target and then trying again. Thanks for explanation. I respect that. Finally I have resigned to try to find specific places and tried to distribute the inclusion of unistd_.h to all pdfwrite sources files. I think this is much better solution with respect to future changes. Let me know if base/gx.h is a good choice. This is a final change which is enough to make gs to generate processable PDFs on Solaris. --- ghostscript-10.05.1/base/gx.h +++ ghostscript-10.05.1/base/gx.h @@ -19,6 +19,7 @@ #ifndef gx_INCLUDED # define gx_INCLUDED +#include "unistd_.h" #include "stdio_.h" /* includes std.h */ #include "gserrors.h" #include "gsio.h" --- ghostscript-10.05.1/base/spprint.c +++ ghostscript-10.05.1/base/spprint.c @@ -15,6 +15,7 @@ /* Print values in ASCII form on a stream */ +#include "unistd_.h" #include "math_.h" /* for fabs */ #include "stdio_.h" /* for stream.h */ #include "string_.h" /* for strchr */ These changes will make sure the pdf souce code and spprint code gets the same PRId64 definition. But that change keeps the definition of UNIX standard version in base/unistd_.h: #define _XOPEN_SOURCE=500 That version is not enough to let OS header files to define PRId64 in inttypes.h on OSes which are strict UNIX compliant (ex. Solaris), so gs default definition in base/stdint_.h will take place. If you would like to let OS to define its inttypes.h definitions you need to use also this chunk: --- ghostscript-10.05.1/base/unistd_.h +++ ghostscript-10.05.1/base/unistd_.h @@ -51,7 +51,7 @@ #else /* _XOPEN_SOURCE 500 define is needed to get * access to pread and pwrite */ -# define _XOPEN_SOURCE 500 +# define _XOPEN_SOURCE 600 # define __USE_UNIX98 # include <unistd.h> #endif Both variants tested on Solaris and work. Let me know what's your preference and if your testing passed. Thank you very much for your time and your valuable help! m. (In reply to Martin Rehak from comment #25) > Finally I have resigned to try to find specific places and tried to > distribute the inclusion of unistd_.h to all pdfwrite sources files. I think > this is much better solution with respect to future changes. Let me know if > base/gx.h is a good choice. Well its not a bad choice, but its more widely distributed than I'm completely happy with. If we have to, then we can do that. However I think this is a simpler and more focused change: diff --git a/base/spprint.h b/base/spprint.h index ac11a54df..bea6d3287 100644 --- a/base/spprint.h +++ b/base/spprint.h @@ -21,6 +21,7 @@ #include "stdpre.h" #include "scommon.h" +#include "unistd_.h" /* Put a character on a stream. */ #define stream_putc(s, c) spputc(s, c) The file spprint.h is already included in gdevpdfx.h, which is the basic header for the pdfwrite device and with only a few exceptions, almost all involving only debug prints, is included in all the files which use 'PRId64'. So I'd also modify those files to include gdevpdfx.h (to be honest I think they probably should anyway): diff --git a/devices/vector/gdevpsf2.c b/devices/vector/gdevpsf2.c index dbc1682da..92267673d 100644 --- a/devices/vector/gdevpsf2.c +++ b/devices/vector/gdevpsf2.c @@ -31,6 +31,7 @@ #include "gxfcid.h" #include "stream.h" #include "gdevpsf.h" +#include "gdevpdfx.h" /* Define additional opcodes used in Dicts, but not in CharStrings. */ #define CD_LONGINT 29 diff --git a/devices/vector/gdevpsft.c b/devices/vector/gdevpsft.c index bffaada13..ece79a2b1 100644 --- a/devices/vector/gdevpsft.c +++ b/devices/vector/gdevpsft.c @@ -30,6 +30,7 @@ #include "stream.h" #include "spprint.h" #include "gdevpsf.h" +#include "gdevpdfx.h" /* Internally used options */ #define WRITE_TRUETYPE_STRIPPED 0x1000 /* internal */ diff --git a/devices/vector/gdevpsu.c b/devices/vector/gdevpsu.c index aa09aed65..177403f97 100644 --- a/devices/vector/gdevpsu.c +++ b/devices/vector/gdevpsu.c @@ -26,6 +26,7 @@ #include "spprint.h" #include "stream.h" #include "gserrors.h" +#include "gdevpdfx.h" /* ---------------- Low level ---------------- */ That compiles and runs for me, and I'm running our regression tests now. But this is pointless if it doesn't fix the problem, and I can't test that. Would you mind trying this to see if it resolves the issue for you please ? Also, since you've done the work, I'd like to attribute it to you in the commit. is "Martin Rehak <rehak@tekkirk.org>" acceptable ? Hello, I am happy to test, just give me some time. "Martin Rehak <rehak@tekkirk.org>" is fine. Thank you! m. I was able to do some testing on Solaris 11, using gcc 13.3 and I did see some problems. This addresses the problems that I saw on that setup: https://cgit.ghostscript.com/cgi-bin/cgit.cgi/ghostpdl.git/commit/?id=ef1292db075a Hi Ken, Chriss, sorry for delay, I am back from vacation. Before I left I have tested Ken's patch which still had some issues and there were some 'd' leftovers after decimals in the output PDFs. But I am done with testing Chriss's patch and that works like a charm standalone. As far as I understand this is already in master. Which release I can expect it to be part of, please? Thank you both very much for your effort! Regards, m. (In reply to Martin Rehak from comment #29) > standalone. As far as I understand this is already in master. Which release > I can expect it to be part of, please? It will be in the next release. Again, thank you very much. |