On processing the test file, Ghostscript throws an error: /Length 28464 /Length1 28464 >> stream %FilePosition: 1109298 endobj %Resolving: [22 0] Error: /rangecheck in --readstring-- Operand stack: --dict:9/9(L)-- F0 4.8 --dict:9/9(L)-- --dict:9/9(L)-- 1091346 --dict:9/9(L)-- tables -- nostringval-- () Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1 3 %oparray_pop 1 3 %oparray_pop 1 3 %oparray_pop --nostringval-- --nostringval-- 2 1 1 -- nostringval-- %for_pos_int_continue --nostringval-- --nostringval-- --nostringval-- -- nostringval-- %array_continue --nostringval-- false 1 %stopped_push --nostringval-- % loop_continue --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- Dictionary stack: --dict:1122/1686(ro)(G)-- --dict:2/20(G)-- --dict:75/200(L)-- --dict:75/200(L)-- --dict: 105/127(ro)(G)-- --dict:259/347(ro)(G)-- --dict:21/24(L)-- --dict:4/6(L)-- --dict:27/32(L)-- --dict:33/50(ro)(G)-- --dict:6/40(L)-- Current allocation mode is local AFPL Ghostscript CVS PRE-RELEASE 8.54: Unrecoverable error, exit code 1
Created attachment 1877 [details] test PDF which throws the rangecheck
The PDF file is invalid. It has FontFile2 key that refers to a raw PFB stream. The required Length2 and Length3 keywords are missing. It os possible to recover the file but PFB-handling logic should be moved on the higher level to avoid the access to Length2 or Length3.
Created attachment 1897 [details] patch This is what it takes to make the sample file work.
The file's producer isn't specified. I'll ask the customer whether they are critical to run it, or agree to consider it is invalid. For a while assigning it to myself.
Also I want to understand what all font loading hacks in the PDF interpreter were done for, and whether we can drop all them and replace with a regular Postscript font loader. Probably a better logic is : (1) recognize TT/PFA/PFB/Type1C data from a beginning of stream after stream filters applied; (2) If it is a Type 1, run a regular Postscript font loader, which doesn't need Length1, Length2, Length3. (3) Check whether Type 1 zeros section is absent with putting a special name and another mark on ostack before loading the font. (4) Type1C and TT should work as before.
Returning the bug to Alex to allow him to consider all similar bugs as a single project.
*** Bug 688492 has been marked as a duplicate of this bug. ***
Created attachment 2051 [details] preliminary patch This is a preliminary version of the procedure that recognizes font types by the content and dispatches the control to the appropriate font loader. It's included here mainly as a progress report.
Nice to hear that it goes at last. The patch probably is in the right way (I din't spent much time for it). There are 3 remarks for now : 1. PS-style.htm defines to place '{' in same line as the condition. Also "}{" in one line before ifelse, "} if", "} ifelse". I realize that the old code in this module doesn't follow this rule, but IMO at least the new code should follow it. The change would be a little more compact and therefore the algorithm would be better observable in a window. 2. "% Keeping for now as a reference" - What does it mean exactly ? Please don't put commnets which requires more quessing than the code itself. 3. bad_stream : Are you surely properly recover the operand stack at this error ? Likely it neads mark ..... cleartomark, etc.
Created attachment 2057 [details] 3nd patch Recognize PDF fonts by the first 4 bytes of the font stream. Simplify Type 1 font reader and PFB font reader. DETAILS: Some PDF files mis-identify font type of the embedded font streams or include raw PDF font streams. Length1, Length2, Length3 may be wrong or missing. Adobe Acrobat corrects these types transparently to the user. All PDF font streams can be easily recognized by the 1st 4 bytes of the font stream. The PFB stream can be recognized by the 1st 2 bytes. The Type 1 font reader doesn't need to follow 3-part structure of a Type 1 font but interpret it as a single stream. PDF specifies that Type 1 fonts with binary-encoded eexec streams can be embedded. One can imagine a binary eexec stream that is mis-identified as a hexadecimal eexec stream by a standard PostScript eexec operator. So the encoded stream was converted to hexadecimal stream before rev. 6568 . In practice we've never seen a stream that need a hexadecimal conversion but had several real hexadecimal streams. When the hexadecimal conversion was removed in rev. 6568, handling of 3-part Type 1 structure became a rudiment with is removed now. Existing code restores the operand and dictionary stacks after the font interpretation. So no extra code is required to support fonts without the 3rd part - zeros and cleartomark. These changes enable Ghostscript to interpret many kinds of malformed font streams but reduce it's utility as a verification tool. Checking of the declared type, subtype, and length parameters vs. actual values can be added later if needed. DIFFERENCES: None Regarding the Leo's comments 1. The patch was posted as it was written. It is reformatted according to the coding standards now. 2. The following fragment is removed. % Keeping for now as a reference { ... } pop 3. bad_stream procedure is called only when readstring fails. It doesn't need any stack protection measures.
The patch looks almost good, but I'm unclear what happens with 'cleartomark' when the zeros section of a Type 1 font is not embedded. Please clarify.
Existing code in readtype1 procedure creates a closure that stores stores the currend length of the operand stack. No matter what junk the font leaves on the stack it is removed by the following code "count exch sub { pop } repeat". Although this hack was designed to work around the bugs in the fonts it also removes the mark when the implied cleartomark is not executed.
Created attachment 2060 [details] 4th path This patch add one more OTC type <01000404> and improves a error message about unidentified fonts. There's no changes in the log message.
Alex, The algorythm looks good, but I'd like to see more comments in code. Generally a part of the log message to be moved to code. Particularly please add explanations : - a general assumption that it recognized the font type from data rathr from Subtype, and why so. - what it does with Length1,2,3 when they presents and when they are not. - the cleartomark trick from comment #12. - in which cases it ignores the embedded font and finds an installed font by name. - anything else what you find useful. Besides that, you probably should review the documentation about "how Postscript finds fonts". Maybe it needs a change. Feel free to request more hours for this job. Thank you.
*** Bug 688577 has been marked as a duplicate of this bug. ***
We also need to valudate the PDF conformance. Please add checks and warnings to your patch.
*** Bug 688612 has been marked as a duplicate of this bug. ***
Created attachment 2123 [details] 5th path This is a patch for the current version of pdf_fonts.ps with improved comments but no changes in the code. Perhaps, it can be committed now and the PDF verification may be added later.
Alex, Please commit the patch, close this bug, and open a new bug about the PDF validation with P3 assigned to yourself.
The patch #5 is committed, revision 6695. Validation of the declared type, subtype, and length parameters vs. actual values is moved to the bug 688627 .