Bug 691739

Summary: Using runpdf with stdin
Product: Ghostscript Reporter: Marcos H. Woehrmann <marcos.woehrmann>
Component: GeneralAssignee: Ray Johnston <ray.johnston>
Status: RESOLVED WORKSFORME    
Severity: enhancement CC: a.hirth, alex
Priority: P3    
Version: 9.00   
Hardware: PC   
OS: All   
Customer: Word Size: ---
Attachments: example_stdin.zip

Description Marcos H. Woehrmann 2010-10-28 19:05:45 UTC
A customer asks:

I have a problem using ghostscript and providing it the input PDF via a
stdin callback. I get the PDF data in-memory and don't want to buffer it in
a temporary file if I can avoid it (I know that ghostscript will buffer it
in a temporary file if I use stdin but that’s the better option for me
anyway).
So I implemented a stdin callback. It works fine, I think, but I cannot
figure out which command line to send to make ghostscript behave as we need
it. What we currently do is the following:
...
run_string_continue "/NOPAUSE true def\n/NOPROMPT true def\n(<INPUT
FILENAME>) (r) file runpdfbegin exch pop\n"
...
run_string_continue "<PAGENUM> <PAGENUM> dopdfpages\n"
...

To render a certain page <PAGENUM> of file <INPUT FILENAME>.

Now for using stdin I tried to replace <INPUT FILENAME> by "-" but that does
not work. It seems that I cannot use runpdfbegin without a real file for
input? What I want to achieve is 1. open a PDF file (which is provided via
stdin) and 2. select a certain page for rendering later.

Is there a way to do this?
Comment 1 Alex Cherepanov 2010-10-28 23:54:58 UTC
I think, the easiest way to use in-memory data is to create a
reusable stream on top of the buffer. This may need some custom
programming but should not take more than a day.
Comment 2 Andreas Hirth 2010-10-29 08:42:51 UTC
Being the customer who asked the question originally.

Alex, thank you for your comment. I think that I understand your idea in principle. Looking into the gs sources (stream.h/.c) I see how I could create a reusable stream on top of my memory buffer. 

However, how would I pass this to gs over the interface defined in iapi.h?

Andreas
Comment 3 Ray Johnston 2010-10-30 19:20:12 UTC
Created attachment 6855 [details]
example_stdin.zip

The concept is to pass the data to Ghostscript using 'run_string_continue',
writing a ReusableStream file object (which is seekable), then use that
file object as the input to 'runpdfbegin'. The use 'dopdfpages' as before.

This runs for me using:

gswin32c -q - < example_stdin.dat

Your program would supply the 'prefix' up to and including the 'exec\n', then
pass the PDF file, then the 'EOD' string, followed by the /FILE exch def

FILE runpdfbegin
_ _ dopdfpages

Thus the PDF file never get written to disk.
Comment 4 Alex Cherepanov 2010-10-31 04:09:08 UTC
IMHO, the approach suggested in the comment #3 has no advantages over
sending the PDF file to stdin. In both cases PDF stream is copied to
PostScript VM.

First, I thought about creating a reusable file by hand following make_rss()
example. However, this won't work through current API. The API can be extended
to create PostScript string on top of foreign memory. The stream can be easily
constructed on top of an array of strings.

Alternatively, /ReusableStreamDecode filter can be extended to take a memory
mapping attribute, get a file descriptor from the file object, mmap the file
descriptor, and create a reusable stream on top of the mapped memory buffer.
Ideally, memory mapped files should not be flushed to disk but everything
depends on the 3rd parties here.
Comment 5 Ray Johnston 2010-10-31 18:44:12 UTC
I don't follow Alex's comment:

> IMHO, the approach suggested in the comment #3 has no advantages over
> sending the PDF file to stdin. In both cases PDF stream is copied to
> PostScript VM.

The submitter stated:

> I get the PDF data in-memory and don't want to buffer it in a
> temporary file if I can avoid it (I know that ghostscript will buffer it
> in a temporary file if I use stdin but that’s the better option for me
> anyway).

Sending the PDF file to stdin will trigger the logic in pdf_main.ps that
reads stdin to a tempfile:

	    dup (%stdin) (r) file eq {
	      % Copy PDF from stdin to temporary file then run it.

whereas the ResuableStreamDecode filter will not trigger this logic.
Comment 6 Andreas Hirth 2010-11-01 10:12:13 UTC
Ray,

thank you very much for your help. I changed my implementation regarding your suggestions and it worked from the start. 

This solution is nearly perfect for me. The only disadvantage left is that I end up with having the PDF data in memory twice (1. my buffer, 2. the gs buffer). But I fear there is no easy way of getting a reusable stream object that makes gs calling via callback functions into my code when reading?

Thanks!

Andreas
Comment 7 Ray Johnston 2011-06-05 20:48:47 UTC
This is mostly a duplicate of bug 226943, but for customer 532 I have a
%ram% IODevice that allows files to be created as 'foreign' using C calls
on a block by block basis and used as input for ANY PDL that performs the
IODevice access (as with sfopen, sfread, ... defined in base/strmio.h) or
by the PS 'file' operator which parses the %___% IODevice prefix.

As long as the stream returned to the PS interpreter from the "file"
is "seekable" (returns 'true' from s_can_seek) then the PDF interpreter will
not create a ResuableStreamDecode filter.

This approach is better than the %memory% device I provided as a quick
and dirty for a customer that required that the data be contiguous.
Comment 8 Ray Johnston 2011-08-30 16:15:05 UTC
Making this enhancement as the customer is satisfied.
Comment 9 Ray Johnston 2019-12-28 16:56:27 UTC
We've discussed this, and since this customer is happy, and we have a %ram%
device, the extra method of having C callbacks providing a stream was not
reviewed favorably.

Closing as WORKSFORME as it works reasonably.