687466 – ps2ascii fails on PDF generated by Adobe InDesign

Bug 687466 - ps2ascii fails on PDF generated by Adobe InDesign

Summary: ps2ascii fails on PDF generated by Adobe InDesign

Status:	NOTIFIED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Interpreter (show other bugs)
Version:	8.14
Hardware:	PC Linux

Importance:	P2 normal
Assignee:	Alex Cherepanov

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-05-14 15:47 UTC by Jason Rhinelander
Modified:	2008-12-19 08:31 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
Simple PDF causing problem (29.85 KB, application/pdf) 2004-05-14 15:47 UTC, Jason Rhinelander	Details
patch (1.11 KB, patch) 2004-05-15 14:53 UTC, Alex Cherepanov	Details \| Diff
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jason Rhinelander 2004-05-14 15:47:10 UTC

I've been attempting to use pstotext or ps2ascii to extract text from some
PDF's, but whenever I run either on a PDF generated by Adobe InDesign, it gives
me a fatal error:

$ ps2ascii fails.pdf
 
 
\Gamma Error: /rangecheck in --get--
Operand stack:
   --nostringval--   --dict:10/10(L)--   600   2007   5307   68  
--nostringval--   68
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--  
--nostringval--   2   %stopped_push   --nostringval--   --nostringval--  
--nostringval--   false   1   %stopped_push   2 3   %oparray_pop   2   3  
%oparray_pop   2   3   %oparray_pop   --nostringval--   2   1   1
--nostringval--   %for_pos_int_continue   --nostringval--   --nostringval--  
--nostringval--  --nostringval--   %array_continue   --nostringval--   false   1
  %stopped_push   --nostringval--   %loop_continue   --nostringval--   3   10  
%oparray_pop   --nostringval--   6   10   %oparray_pop  
(\000V\000G\000I\000D\000V\000G\000I\000G\000V\000D\000I)   --nostringval--  
%string_continue   --nostringval--
Dictionary stack:
   --dict:1166/1686(ro)(G)--   --dict:0/20(G)--   --dict:78/200(L)--  
--dict:78/200(L)--   --dict:104/127(ro)(G)--   --dict:238/347(ro)(G)--  
--dict:20/24(L)--   --dict:4/6(L)--   --dict:21/32(L)--   --dict:20/31(L)--
Current allocation mode is local
AFPL Ghostscript 8.14: Unrecoverable error, exit code 1


I'm not sure if the error is caused by something InDesign is doing (perhaps
InDesign's forced use of CID fonts has something to do with it?).  I'll attach
the fails.pdf file as well - any help would be appreciated.

Comment 1 Jason Rhinelander 2004-05-14 15:47:55 UTC

Created attachment 666 [details]
Simple PDF causing problem

Comment 2 Alex Cherepanov 2004-05-15 14:53:53 UTC

Created attachment 667 [details]
patch

There's no way to recover ASCII from the strings encofed for a
CID font. The patch attached fixes the PostScript error but generates
wrong results. It just dumps the strings in the unmodified encoding.

Extraction of text from PDF should be done before conversion to PostScript
using /ToUnicode CMap. The latter is an enhancement request, not a bug.

Comment 3 Ray Johnston 2004-05-26 10:18:31 UTC

We should apply the patch and close the bug, but open a new
enhancement request for the Unicode mode.

Comment 4 Alex Cherepanov 2004-05-31 19:02:57 UTC

The patch is committed to head branch.
An enhancement request (bug 687492) was creates to track the
development of ps2ascii utility.

There are 2 issues here:
(1) Decode source strings with well-known CMap files into Unicode or ASCII
    when possible.
(2) Use ToUnicode CMap if possible, but first we need to pass it from PDF to
    PostScript level (bug 685335).