688465 – /rangecheck in readstring on PDF

Bug 688465 - /rangecheck in readstring on PDF

Summary: /rangecheck in readstring on PDF

Status:	NOTIFIED FIXED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Interpreter (show other bugs)
Version:	master
Hardware:	All All

Importance:	P2 normal
Assignee:	Alex Cherepanov

URL:
Keywords:

Duplicates (3):	688492 688577 688612 (view as bug list)
Depends on:
Blocks:

Reported:	2005-12-21 09:01 UTC by Raph Levien
Modified:	2008-12-19 08:31 UTC (History)
CC List:	3 users (show)

See Also:
Customer:	850
Word Size:	---

Attachments
patch (3.04 KB, patch) 2005-12-28 08:56 UTC, Alex Cherepanov	Details \| Diff
preliminary patch (7.17 KB, patch) 2006-02-20 06:50 UTC, Alex Cherepanov	Details \| Diff
3nd patch (12.47 KB, patch) 2006-02-25 07:15 UTC, Alex Cherepanov	Details \| Diff
4th path (12.57 KB, patch) 2006-02-27 04:36 UTC, Alex Cherepanov	Details \| Diff
5th path (13.80 KB, patch) 2006-03-23 07:51 UTC, Alex Cherepanov	Details \| Diff
Show Obsolete (4) Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Raph Levien 2005-12-21 09:01:06 UTC

On processing the test file, Ghostscript throws an error:

/Length 28464 /Length1 28464 >>
stream
%FilePosition: 1109298
endobj
%Resolving: [22 0]
Error: /rangecheck in --readstring--
Operand stack:
   --dict:9/9(L)--   F0   4.8   --dict:9/9(L)--   --dict:9/9(L)--   1091346   --dict:9/9(L)--   tables   --
nostringval--   ()
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   
--nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1   3   %oparray_pop   
1   3   %oparray_pop   1   3   %oparray_pop   --nostringval--   --nostringval--   2   1   1   --
nostringval--   %for_pos_int_continue   --nostringval--   --nostringval--   --nostringval--   --
nostringval--   %array_continue   --nostringval--   false   1   %stopped_push   --nostringval--   %
loop_continue   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   
--nostringval--   --nostringval--
Dictionary stack:
   --dict:1122/1686(ro)(G)--   --dict:2/20(G)--   --dict:75/200(L)--   --dict:75/200(L)--   --dict:
105/127(ro)(G)--   --dict:259/347(ro)(G)--   --dict:21/24(L)--   --dict:4/6(L)--   --dict:27/32(L)--   
--dict:33/50(ro)(G)--   --dict:6/40(L)--
Current allocation mode is local
AFPL Ghostscript CVS PRE-RELEASE 8.54: Unrecoverable error, exit code 1

Comment 1 Raph Levien 2005-12-21 09:33:31 UTC

Created attachment 1877 [details]
test PDF which throws the rangecheck

Comment 2 Alex Cherepanov 2005-12-21 23:17:50 UTC

The PDF file is invalid. It has FontFile2 key that refers to a raw PFB stream.
The required Length2 and Length3 keywords are missing.

It os possible to recover the file but PFB-handling logic should be moved on the
higher level to avoid the access to Length2 or Length3.

Comment 3 Alex Cherepanov 2005-12-28 08:56:28 UTC

Created attachment 1897 [details]
patch

This is what it takes to make the sample file work.

Comment 4 leonardo 2006-01-17 04:02:15 UTC

The file's producer isn't specified.
I'll ask the customer whether they are critical to run it, or agree to consider 
it is invalid.
For a while assigning it to myself.

Comment 5 leonardo 2006-01-17 23:05:12 UTC

Also I want to understand what all font loading hacks in the PDF interpreter 
were done for, and whether we can drop all them and replace with a regular 
Postscript font loader. Probably a better logic is : (1) recognize 
TT/PFA/PFB/Type1C data from a beginning of stream after stream filters applied; 
(2) If it is a Type 1, run a regular Postscript font loader, which doesn't need 
Length1, Length2, Length3. (3) Check whether Type 1 zeros section is absent 
with putting a special name and another mark on ostack before loading the font. 
(4) Type1C and TT should work as before.

Comment 6 leonardo 2006-01-18 09:52:06 UTC

Returning the bug to Alex to allow him to consider all similar bugs as a single 
project.

Comment 7 Ray Johnston 2006-01-18 09:53:04 UTC

*** Bug 688492 has been marked as a duplicate of this bug. ***

Comment 8 Alex Cherepanov 2006-02-20 06:50:43 UTC

Created attachment 2051 [details]
preliminary patch

This is a preliminary version of the procedure that recognizes font types by
the content and dispatches the control to the appropriate font loader. It's
included here mainly as a progress report.

Comment 9 leonardo 2006-02-21 03:53:35 UTC

Nice to hear that it goes at last.
The patch probably is in the right way (I din't spent much time for it). There 
are 3 remarks for now :

1. PS-style.htm defines to place '{' in same line as the condition. Also "}{" 
in one line before ifelse, "} if", "} ifelse". I realize that the old code in 
this module doesn't follow this rule, but IMO at least the new code should 
follow it. The change would be a little more compact and therefore the 
algorithm would be better observable in a window.

2. "% Keeping for now as a reference" - What does it mean exactly ? Please 
don't put commnets which requires more quessing than the code itself.

3. bad_stream : Are you surely properly recover the operand stack at this 
error ? Likely it neads mark ..... cleartomark, etc.

Comment 10 Alex Cherepanov 2006-02-25 07:15:04 UTC

Created attachment 2057 [details]
3nd patch

Recognize PDF fonts by the first 4 bytes of the font stream. Simplify
Type 1 font reader and PFB font reader.

DETAILS:
Some PDF files mis-identify font type of the embedded font streams or
include raw PDF font streams. Length1, Length2, Length3 may be wrong or
missing. Adobe Acrobat corrects these types transparently to the user.

All PDF font streams can be easily recognized by the 1st 4 bytes of the
font stream. The PFB stream can be recognized by the 1st 2 bytes.

The Type 1 font reader doesn't need to follow 3-part structure of a Type
1 font but interpret it as a single stream. PDF specifies that Type 1
fonts with binary-encoded eexec  streams can be embedded. One can
imagine a binary eexec stream that is mis-identified as a hexadecimal
eexec stream by a standard PostScript eexec operator. So the encoded
stream was converted to hexadecimal stream before rev. 6568 . In
practice we've never seen a stream that need a hexadecimal conversion
but had several real hexadecimal streams. When the hexadecimal
conversion was removed in rev. 6568, handling of 3-part Type 1 structure
became a rudiment with is removed now.

Existing code restores the operand and dictionary stacks after the font
interpretation. So no extra code is required to support fonts without
the 3rd part - zeros and cleartomark.

These changes enable Ghostscript to interpret many kinds of malformed
font streams but reduce it's utility as a verification tool. Checking
of the declared type, subtype, and length parameters vs. actual values
can be added later if needed.

DIFFERENCES:
None

Regarding the Leo's comments

1. The patch was posted as it was written. It is reformatted according
   to the coding standards now. 

2. The following fragment is removed.
   % Keeping for now as a reference
   { ... } pop

3. bad_stream procedure is called only when readstring fails. It doesn't
   need any stack protection measures.

Comment 11 leonardo 2006-02-25 07:41:06 UTC

The patch looks almost good, but I'm unclear what happens with 'cleartomark' 
when the zeros section of a Type 1 font is not embedded. Please clarify.

Comment 12 Alex Cherepanov 2006-02-25 08:13:14 UTC

Existing code in readtype1 procedure creates a closure that stores stores
the currend length of the operand stack. No matter what junk the font
leaves on the stack it is removed by the following code 
"count exch sub { pop } repeat".  Although this hack was designed to
work around the bugs in the fonts it also removes the mark when the implied
cleartomark is not executed.

Comment 13 Alex Cherepanov 2006-02-27 04:36:07 UTC

Created attachment 2060 [details]
4th path

This patch add one more OTC type <01000404> and improves a error message
about unidentified fonts. There's no changes in the log message.

Comment 14 leonardo 2006-02-28 08:49:04 UTC

Alex,

The algorythm looks good, but I'd like to see more comments in code. Generally 
a part of the log message to be moved to code. Particularly please add 
explanations :

- a general assumption that it recognized the font type from data rathr from 
Subtype, and why so.
- what it does with Length1,2,3 when they presents and when they are not.
- the cleartomark trick from comment #12.
- in which cases it ignores the embedded font and finds an installed font by 
name.
- anything else what you find useful.

Besides that, you probably should review the documentation about "how 
Postscript finds fonts". Maybe it needs a change.

Feel free to request more hours for this job.

Thank you.

Comment 15 Alex Cherepanov 2006-03-01 15:38:19 UTC

*** Bug 688577 has been marked as a duplicate of this bug. ***

Comment 16 leonardo 2006-03-08 10:14:31 UTC

We also need to valudate the PDF conformance. Please add checks and warnings to 
your patch.

Comment 17 Alex Cherepanov 2006-03-23 03:41:32 UTC

*** Bug 688612 has been marked as a duplicate of this bug. ***

Comment 18 Alex Cherepanov 2006-03-23 07:51:17 UTC

Created attachment 2123 [details]
5th path

This is a patch for the current version of pdf_fonts.ps with improved comments
but no changes in the code. Perhaps, it can be committed now and the PDF
verification
may be added later.

Comment 19 leonardo 2006-04-01 00:52:52 UTC

Alex,

Please commit the patch, close this bug, and open a new bug about the PDF 
validation with P3 assigned to yourself.

Comment 20 Alex Cherepanov 2006-04-01 07:15:26 UTC

The patch #5 is committed, revision 6695.
Validation of the declared type, subtype, and length parameters vs. actual values
is moved to the bug 688627 .