Bug 704160

Summary: PCLm and raster-only PDF ("pclm"/"pdfimage8"/"pdfimage24") output devices require seekable output file
Product: Ghostscript Reporter: Till Kamppeter <till.kamppeter>
Component: PDF WriterAssignee: Default assignee <ghostpdl-bugs>
Status: RESOLVED FIXED QA Contact: Bug traffic <tech>
Severity: normal    
Priority: P4 CC: robin.watts
Version: 9.53.3   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---

Description Till Kamppeter 2021-08-02 15:42:12 UTC
I want to use the mentioned output devices for print filters in Printer Applications.

Printer Applications are the replacement of classic CUPS printer drivers, they emulate a driverless IPP printer and internally they convert incoming print jobs to the supported printer's native language and pass on the data to the printer.

The Printer Application development framework PAPPL treats input in Apple or PWG Raster format in a streaming mode, so that it is not required to store the whole job in a temporary file before starting to filter and print it, but treat the data in small portions instead and pass each portion on to the printer immediately, saving local resources (ex. IoT device as print server/network print adapter) and also to allow infinite jobs.

In general, PDF is not streamable, needing the whole file be present before being able to print it, but there are variants/subsets of PDF which are streamable, as PDF/is or PCLm. In general a raster-only PDF should also not be too difficult to make it streamable. PCLm is especially streamable as it is made for raster-only printers which can have low resources, especially not even enough memory to hold a the raster image of one single page.

Now I am using the different PDF output devices of Ghostscript in print filters and try to stream (and avoid whole-job temporary files) as much as possible and pipe the job data from one filter to the next one.

With the general PDF output device "pdfwrite" this works without problem. I can simply set the output file to stdout and pipe this to the next filter (or the printer), but if I use the output devices "pclm", "pdfimage8", or "pdfimage24" I get a message that the output file has to be seekable. For these formats I especially expect that I can stream/pipe them, as once the files would get very large due the fact that they are pure raster and second, at least "pclm" is a format made for streaming.

Therefore I appreciate a lot if you could make these output devices work also on a sequential, non-seekable output, like stdout or a pipe.
Comment 2 Robin Watts 2021-08-13 14:49:31 UTC
Fixes for this have gone in as:

commit a30e693d45f3bc5af43c28a5b341d76d3d259965
Author: Robin Watts <Robin.Watts@artifex.com>
Date:   Thu Aug 5 11:31:26 2021 +0100

    Bug 704160: Fix pdfimage devices to not need seekable stream.

    Avoid the need to seek to write a Length field by using a new object
    for the length.

commit a1017203639bf6ab74360e0dd2501bc7c69c6d30
Author: Robin Watts <Robin.Watts@artifex.com>
Date:   Wed Aug 4 17:40:14 2021 +0100

    Bug 704160: Fix pclm device to not need seekable stream.

    The pdfimage device relies on seeking to fill in the 'Length'
    entry in each dict. PCLm uses a temporary file, so doesn't need
    that. I had inadvertently left the init function asking for a
    seekable stream though. Fix that here.


There have been other fixes for PCLm and pdfimage devices since, so you may want to move to the latest commits rather than just cherry-picking these.