Created attachment 15010 [details]
When the attached file in.pdf is passed through Ghostscript 9.23 with this command line:
gs -dQUIET -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -o out.pdf in.pdf
qpdf reports that the output file is damaged. (The input file has no issues.)
$ qpdf --check out.pdf
PDF Version: 1.5
File is not encrypted
File is not linearized
WARNING: out.pdf (offset 304198): error decoding stream data for object 12 0: invalid jpeg data reading from buffer
WARNING: out.pdf (offset 304198): stream will be re-processed without filtering to avoid data loss
To confirm, I extracted the JPEG with pdfimages -j, and used jpeginfo -c to check the JPEG separately. jpeginfo reports:
_img-000.jpg 1232 x 1728 24bit JFIF N 42469 Premature end of JPEG file [WARNING]
I extracted the JPEG from in.pdf as well. jpeginfo reports no error and shows the file length as 42471 bytes instead of 42469. It appears Ghostscript omitted two bytes from the end of the JPEG.
out.pdf still opens in PDF viewers and nothing is obviously wrong with it, compared to in.pdf.
in.pdf sets "/Filter [ /FlateDecode /DCTDecode ]", which is unusual and likely the cause of the issue. in.pdf was generated by a HP Officejet 8620 scanner.
"mupdf clean" cannot detect or correct the error.
This is a regression. Previous versions of Ghostscript processed this file without damaging it.
I found another example of this issue in a file I cannot share, that had /Filter /DCTDecode, and a complex /ColorSpace /Separation with CMYK.
What the two files have in common is that both JPEGs are used with image/stencil masks.
Created attachment 15045 [details]
Another test case -- pdf with JPEG image
I also spotted the same problem -- attached PDF (with DCT stream, /Length 301160), after processing by new Ghostscript results in PDF with truncated DCT stream i.e. /Length 301158. Some viewers accept generated PDF, but Adobe Reader fails.
When -dPassThroughJPEGImages=false is added to arguments, the issue does not occur.
(In reply to Piotr Strzelczyk from comment #2)
> Created attachment 15045 [details]
> Another test case -- pdf with JPEG image
> I also spotted the same problem -- attached PDF (with DCT stream, /Length
> 301160), after processing by new Ghostscript results in PDF with truncated
> DCT stream i.e. /Length 301158. Some viewers accept generated PDF, but Adobe
> Reader fails.
This must be some difference in Acrobat Pro and Reader, or some recent change. My version of Acrobat Pro is entirely happy with the original files.
In any event fixed in commit b61071c9411c3f6aa0dd594da2c5a20ff4ecd914