Bug 692127 - File size limit with pdfwrite
Summary: File size limit with pdfwrite
Status: NOTIFIED DUPLICATE of bug 692290
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: PC All
: P1 enhancement
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-04-06 15:09 UTC by Marcos H. Woehrmann
Modified: 2012-04-21 04:10 UTC (History)
0 users

See Also:
Customer: 661
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2011-04-06 15:09:24 UTC
The customer reports:

>
> We currently face an issue when using Ghostscript; the temporary files that
> are created
> when using pdfwrite cannot grow beyond 2 GB. This issue has been seen with
> version
> 8.71 and 9.01. In version 9.00 apparently the temporary files are written to
> memory, which
> then also becomes a bottleneck. Using PostScript output, there is no problem
> and we
> could create a 85 GB PS-file (with 100000+ pages).
> Can the limit for PDF's be removed/expaned? Or is there a specific reason
> for it?
>
Comment 1 Ken Sharp 2011-04-06 16:06:30 UTC
pdfwrite uses the fopen/fseek/fread/fwrite family of C run-time routines to access the temporary files. The fseek routine has a limit of +/-2^31 on the 'distance to seek' argument. This means that for a SEEK_SET (from the beginning of file) the maximum distance that can be used is 2^31 = 2GB.

In addition the offsets to objects within the temporary files are stored internally as 'long' data types, which are usually limited to 32-bits, and as such also have a value between +/-2^31.

We could (with some effort) increase this, however PDF itself has an architectural limit; the offset value in the xref table is limited to 10 decimal digits, which makes the maximum offset 9,999,999,999, or ~9.3GB. This is an absolute architectural maximum which cannot be exceeded.

PDF files have a xref table precisely so that it is possible to have random access to the contents of the file. PostScript files, by contrast, are inherently linear. The interpreter starts at the beginning and progresses to the end. As such there is no xref table and no realistic limit to the size of the PostScript file (Note that the PostScript language *does* impose limits on the language's ability to handle files, and the underlying operating system may impose limits on file sizes).

Marcos, if its decided that we want to go to the effort of increasing the maximum that can be written by pdfwrite, can you please alter this to an enhancement and reopen it please.
Comment 2 Ray Johnston 2011-04-06 16:38:38 UTC
Ghostscript does have 64-bit file I/O functions (used by the clist logic in
gxclfile.c) defined via gp_***_64 (platform independent) calls defined in
base/gp.h:

FILE *gp_fopen_64(const char *filename, const char *mode);

FILE *gp_open_scratch_file_64(const gs_memory_t *mem,
                              const char        *prefix,
                                    char         fname[gp_file_name_sizeof],
                              const char        *mode);
FILE *gp_open_printer_64(const gs_memory_t *mem,
                               char         fname[gp_file_name_sizeof],
                               int          binary_mode);

int64_t gp_ftell_64(FILE *strm);

int gp_fseek_64(FILE *strm, int64_t offset, int origin);

We don't define gp_fread_64, gp_fwrite_64, because
  (1) known platforms allow regular fread, fwrite to be applied to a
      file opened with O_LARGEFILE, fopen64, etc.;
  (2) Ghostscript code does not perform writing/reading a long
      (over 4gb) block in one operation.

This probably could be done as an enhancement.
Comment 3 Ken Sharp 2012-04-17 08:46:35 UTC

*** This bug has been marked as a duplicate of bug 692290 ***