Created attachment 25353 [details] Original PDF files with the converted files In our application, we first convert the original pdf file to Black and White format with GhostScript, and then split the Black and White pdf file into invidual pdf files for each page with Aspose PDF Java package. The GhostScript command is as follows sudo gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sFONTPATH=/usr/share/fonts/ -dHaveTransparency=true -dProcessColorModel=/DeviceGray -dColorConversionStrategy=/Gray -o /tmp/YAX.PC.00000142_gray.pdf -f /tmp/processing/YAX.PC.00000142.pdf Note that the following warning was found in the output of the command: Artifex Ghostscript 10.02.1 (2023-11-01) Copyright (C) 2023 Artifex Software, Inc. All rights reserved. Processing pages 1 through 1. Page 1 The following warnings were encountered at least once while processing this file: A Form XObject had a BBox with a width or height of 0 **** This file had errors that were repaired or ignored. **** The file was produced by: **** >>>> Aspose.PDF for Java 23.12 <<<< **** Please notify the author of the software that produced this **** file that it does not conform to Adobe's published PDF **** specification. The Black and White pdf file converted with GhostScript 10.01.2 has no problem with the Aspose PDF Java package (23.12). But The Black and White pdf file converted with GhostScript 10.02.1 causes Aspose PDF Java package (23.12) to fail with the following error when the single page pdf file is saved. class com.aspose.pdf.internal.ms.System.l6n: Wrong format of page's content. com.aspose.pdf.internal.l4t.lu.lI(Unknown Source) com.aspose.pdf.internal.l4t.lu.lI(Unknown Source) com.aspose.pdf.internal.l4t.lu.lI(Unknown Source) com.aspose.pdf.Annotation.lh(Unknown Source) com.aspose.pdf.AnnotationCollection.lf(Unknown Source) com.aspose.pdf.ADocument.preSave(Unknown Source) com.aspose.pdf.ADocument.lf(Unknown Source) com.aspose.pdf.ADocument.lf(Unknown Source) com.aspose.pdf.ADocument.save(Unknown Source) com.aspose.pdf.Document.save(Unknown Source) Wonder if there is something wrong in GhostScript 10.02.1 that causes the output pdf file to have invalid format. Attached are four original PDF files, the Black and White pdf files (no problem with Aspose) converted with GhostScript 10.01.2 and the Black and White pdf files (have problem with Aspose) converted with GhostScript 10.02.1. Hope they can help you to analyze the problem and find out a solution.
(In reply to Xiaohong Yang from comment #0) > In our application, we first convert the original pdf file to Black and > White format with GhostScript, and then split the Black and White pdf file > into invidual pdf files for each page with Aspose PDF Java package. You could do that in a single operation, which would be more efficient. If you supply %d in the filename then pdfwrite will write each page to a separate file. > The GhostScript command is as follows > > sudo gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sFONTPATH=/usr/share/fonts/ > -dHaveTransparency=true -dProcessColorModel=/DeviceGray > -dColorConversionStrategy=/Gray -o /tmp/YAX.PC.00000142_gray.pdf -f > /tmp/processing/YAX.PC.00000142.pdf Umm, I find it surprising that you are running gs with sudo! That seems somewhat insecure to me..... You should not set ProcessColorModel when using ColorConversionStrategy. HaveTransparency defaults to true so you don't need to set that either. If you set "-o /tmp/YAX.PC.00000142_gray-%d.pdf" then you'll get the output split into individual pages. > Note that the following warning was found in the output of the command: > > Artifex Ghostscript 10.02.1 (2023-11-01) Copyright (C) 2023 Artifex > Software, Inc. All rights reserved. > Processing pages 1 through 1. > Page 1 > The following warnings were encountered at least once while processing this > file: > A Form XObject had a BBox with a width or height of 0 > **** This file had errors that were repaired or ignored. > **** The file was produced by: > **** >>>> Aspose.PDF for Java 23.12 <<<< > **** Please notify the author of the software that produced this > **** file that it does not conform to Adobe's published PDF > **** specification. I'm going to look into that tomorrow as it seems to be causing a problem for other things. > The Black and White pdf file converted with GhostScript 10.01.2 has no > problem with the Aspose PDF Java package (23.12). But The Black and White > pdf file converted with GhostScript 10.02.1 causes Aspose PDF Java package > (23.12) to fail with the following error when the single page pdf file is > saved. Decompressing the two files, and then comparing, them reveals essentially no differences. A few points do crop up: You have used GPL Ghostscript for 10.01.2 and Artifex Ghostscript (the commercial release) for 10.02.1. This causes the Metadata stream to be different (it would be anyway because of date stamps) The date stamps of the files are obviously different, and this causes the file ID array to be different. Because of the difference in the length of the Metadata strings, the xref tables differ. Other than that, there are no differences between the files as far as I can see. I'm afraid I cannot tell you why someone else's application doesn't like one file in contrast to the other. Looking at the messages it is possible that the Aspose package doesn't like the Annotations, that is where the BBox complaint comes from, but these seem to be identical in the two files. The problem is the Appearance stream has an invalid object number, as I said above I'm going to look into that one tomorrow. > Wonder if there is something wrong in GhostScript 10.02.1 that causes the > output pdf file to have invalid format. As far as I can see there is nothing invalid about the files. Ghostscript can open them, MuPDF can open them, my web browser can open them and the de facto standard, Adobe Acrobat, can open them without complaint.
Created attachment 25357 [details] File produced by modified code So, as I said, I can see *no* significant differences between the pairs of files you have sent. I do see the warning about the BBox with an area of 0, and this does cause *a* problem with the output; tis causes pdfwrite to create an annotation with a /N (Normal) Appearance (/AP) with an invalid object number of -1. Given that the Aspose error references an annotation I had suspected this might be the problem, however the PDF files produced by 10.01.2 and 10.02.1 contain the same illegal object number. So either that's not the problem, or there is some mistake in the files you have sent me. Since I don't have a copy of the Aspose software I obviously cannot test my theory. I've fixed the pdfwrite code so that it no longer causes this problem (actually fixed in the PDF interpreter), and I've attached an example produced by the modified code using one of your input files (SMJ.NW.00002128.pdf) with a version of your command line. Can you please test this file with your Aspose software and let me know the result ? I would point out again that Ghostscript's pdfwrite device is capable of splitting the file as well, either during production of the gray scale PDF (thus saving time and effort) or as a separate step, simply by putting %d in the output filename.
This commit 25f11d66e19747cfbbd88634979096d2b7de5385 resolves the problem I could see, described in comment #2. I'll leave this open for a while in the hope of hearing if this helps with the Aspose software.
Hi ken, I downloaded the sample file converted with the modified code and tested it with the Aspose PDF package (24.1). The error disappeared and the split pdf files for each page were successfully saved. So you code change resolved the issue. Thank you very much!
(In reply to Xiaohong Yang from comment #4) > I downloaded the sample file converted with the modified code and tested it > with the Aspose PDF package (24.1). The error disappeared and the split pdf > files for each page were successfully saved. > > So you code change resolved the issue. Thanks for confirming. The fix will be in the next release or you can just pull the commit in and rebuild yourselves.