Bug 688124

Summary: /UserUnit is not supported,yet in PDF1.6.
Product: Ghostscript Reporter: Toru Ukita <ukita>
Component: PDF InterpreterAssignee: Alex Cherepanov <alex>
Status: NOTIFIED FIXED    
Severity: normal CC: artifex, sags5495
Priority: P2 Keywords: bountiable
Version: 8.51   
Hardware: All   
OS: All   
URL: n/a
Customer: 870 Word Size: ---
Attachments: Open the PDF file by Acrobat7. Hige Page size
I captured this from my uploaded PDF
Suggested patch.
Test file: UserUnit-tests.pdf
Patch relative to today's TRUNK (svn rev 7001).

Description Toru Ukita 2005-06-03 12:47:31 UTC
A new feature"/UserUnit at PDF Spec 1.6 is not supported,yet. I expanded a PDF
user area with /UserUnit 100, ghostscript generates an original size of PS file
ignoring /UserUnit operand.
Comment 1 Dan Coby 2005-06-05 10:05:44 UTC
Please provide an example file.
Comment 2 Toru Ukita 2005-06-06 09:06:38 UTC
Created attachment 1424 [details]
Open the PDF file by Acrobat7. Hige Page size
Comment 3 Stefan Kemper 2005-06-08 06:42:56 UTC
Acrobat sees a 20x20inch blank page.
   
Comment 4 Toru Ukita 2005-06-08 06:53:19 UTC
Created attachment 1432 [details]
I captured this from my uploaded PDF

I have Acrobat7 and I opened my uploaded PDF file by Firefox.
See the Page size left bottom or scale value at center top.

I believe you don't open it by Acrobat7. Acrobat6 or earlier show it as 2 x 2
in.
Comment 5 artifex 2006-02-24 03:09:12 UTC
Created attachment 2054 [details]
Sample file that uses UserUnit 2

The attached file test_userunit2.pdf uses a UserUnit value of 2. Adobe Reader 7
shows the correct size of 170x140 mm. Acrobat Reader 5 shows the wrong size
85x70 mm. GhostScript also creates output files of size 85x70 mm.
Comment 6 Dan Coby 2006-03-28 10:20:41 UTC
We received a question from a customer about this issue.  I am bumping the 
priority.
Comment 7 leonardo 2006-04-01 00:40:50 UTC
It is a PDF interpreter problem. Passing the bug to the PDF interpreter expert.
Comment 8 SaGS 2006-04-11 16:29:49 UTC
Created attachment 2148 [details]
Suggested patch.

The attached patch adds support for /UserUnit. It also fixes 
some related bugs that stayed in the way while testing it. In 
particular it fixes bug #688359.

Details about the changes follow.

---
(A) The basic implementation: scaling the PS user space

/UserUnit implementation is very similar to -dPDFFitPages.

- If -dPDFFitPages=true, then /UserUnit is ignored; no matter 
  how small or haw large is 1 PDF user space unit, it has to 
  be scaled so that the PDF page fits on the paper.
- Else, the /UserUnit scales the PS user space, exactly like 
  -dPDFFitPage does. This solves [almost] everything related 
  to drawing marks on the page.

This also means that all problems with -dPDFFitPages affect 
the implementation of /UserUnit. The biggest problem was to 
get rid of those, especially for PDF->PDF "conversion" where 
some elements of the source PDF (outlines, links...) needed 
to be preserved.

---
(B) Don't scale the border width in a border style dict
    (the change to pdf_draw.ps)

See comment in code. Note that a border width in a border style 
ARRAY is specified in user space units, so it grows with 
/UserUnit and the scaling of the PS user space suffices.

---
(C) No more /PAGES for /CropBox
    (pdf_main.ps hunk @@ -129,10 +129,6 @@ 
    and the "pget" instead of "knownoget" for /PAGE pdfmark)

/UserUnit, /Rotate and the translation due to non-(0,0) PDF page 
origin are "flattened" into the page; also -dPDFFitPage scales 
the page. This means the even if 2 source PDF pages had the same 
/CropBox, in the destination PDF these may need different 
/CropBox-es.

Example:

source:  page #1 /UserUnit 1 and /CropBox [0 0 100 100]
	 page #2 /UserUnit 2 and /CropBox [0 0 100 100] (the same)
becomes: page #1 no /UserUnit and /CropBox [0 0 100 100]
	 page #2 no /UserUnit and /CropBox [0 0 200 200] (differs)

so the /CropBox cannot be inherited anymore. The new code puts a 
/CropBox into each page that has or inherits one.

NOTE:
  There are 2 more places in pdf_main.ps that do a "knownget" 
  for /CropBox, one a few lines after "%****** DOESN'T HANDLE 
  COLOR TRANSFER YET ******" and one after "(Adobe Tech Note 
  5407, sec 9.2)". I think both should be using "pget".

---
(D) EXTRA: /CropBox-es in intermediate /Pages were ignored

Old code preserved only /CropBox-es that appeared in the root 
PDF /Pages object and in /Page objects, the ones in intermediate 
nodes of the /Pages tree being lost. In the new code, this 
gets fixed for PDF->PDF as a side effect of the implementation 
for (C). See also the note above for clipping of the marks 
drawn on the page.

---
(E) PDF->PS "default" user space transform

The new code computes a matrix that transforms the PDF default 
user space of a page to the PS default user space. This matrix 
accounts for the rotation (/Rotate), scaling (-dPDFFitPage or 
/UserUnit) and any translation needed to move the PDF 
lower-left corner the the lower-left corner of the paper.

- This is done in the new pdf_main.ps::pdf_PDF2PS_matrix, 
  which inherits, with the needed changes, almost all of the 
  code in the old .pdfshowpage_Install.

- The matrix is page-specific because different pages may have 
  different dimensions (so -dPDFFitPage scales them 
  differently), different /Rotate or /UserUnit.

- pdf_main.ps::pdf_cached_PDF2PS_matrix is a utility proc that 
  ensures the matrix for a given page is computed, caches it 
  in the PDF page dictionary under the key given by 
  pdf_main.ps::PDF2PS_matrix_key, then returns the matrix.

- (The definition for PDF2PS_matrix_key exists only to allow 
  binding of a complicated name into pdf_cached_PDF2PS_matrix; 
  avoids doing "(complicated.name) cvn" at run-time.)

- This matrix is currently used:
  (E.1) by .pdfshowpage_Install (which is now reduced to 2 
	lines) for setting up the PS user space;
  (E.2) for transforming the /Crop- or /MediaBox in 
	pdfshowpage_setpage
  (E.3) to transform coordinates in view destinations (used 
	by outline entries, PDF links ...).

---
(F) /Orientation now always 0

pdfmark does not work correctly with /Orientation != 0 (long 
story). The old code used the /Orientation page device 
parameter to handle the /Rotate from PDF pages. The new 
code always sets /Orientation to 0 and handles /Rotate by 
explicitely doing a "rotate" (in pdf_PDF2PS_matrix).

- Avoids GS-specific hackery otherwise needed to work around 
  pdfmark problems when /Orientation != 0.

- Simplifies the code, because a single transformation matrix 
  needs to be computed both for setting up the PS user space 
  and for transforming varions coordinates used in pdfmarks
  (/CropBox, view destinations).

---
(G) PDF->PDF-migrated view destinations were wrong
    (pdf_main.ps hunks @@ -939,6 +935,45 @@ 
    and @@ -947,18 +982,30 @@)

Coordinates appearing in view destinations need to be recomputed, 
due the PS default user space, AS USED BY the pdfwrite driver, 
not being identical to the original PDF default user space. The 
list of causes included rotation due to /Rotate (implemented either 
with /Orientation or a simple "rotate"), translation due to 
non-(0,0) PDF page origin, and scaling due to -dPDFFitPages; now 
we add /UserUnit to this list.

---
(H) EXTRA: -dPDFFitPage now chooses portrait or landscape
    (pdf_main.ps, near the end of hunk @@ -1031,62 +1078,133 @@)

If -dPDFFitPage, the code after "% Preserve page size," chooses 
portrait or landscape orientation depending on the PDF page's 
width:height ratio. I consider this results in a better "fit to 
page". Example: PDF with mixed portrait + landscape letter 
pages, to be printed on A4 paper. Old code sometimes fitted 
landscape pages on portrait paper.

---
(I) EXTRA: better placement of imaged area

If -dPDFFitPage and the PDF page's width:height ratio differs 
from the paper's width:height, some unused space remains. With 
the old code, this extra space was placed at left/right/top/bottom 
depending on the page's /Rotate. New code always puts the extra 
space either at right or top (depending only on the PDF page 
being relatively "taller" or "wider" than the paper). This 
is mainly a side effect of not using /Orientation anymore.

---
(J) -dNOUSERUNIT

I added a new option that can be used to disable processing of 
/UserUnit. Named NOUSERUNIT, defaults to "false" meaning 
/UserUnit being taken into account. I implemented this default 
following Adobe Reader's 7.0.7/Windows default, but see note.

Note:
  I suggest to set the default to ignore UserUnit, and have a 
  -dDOUSERUNIT option to activate it. I can do this change 
  if desired.
  I'll explain the reson for such a choice through an example:
- I THINK that UserUnit was introduced by Adobe as part of 
  it [Adobe] entering the CAD world.
- Consider a floorplan plotted on a sheet of paper at a certain 
  scale, let's say 1:50.
- In this scenario, the PDF page corresponds to the plotted 
  paper, so it has a MediaBox of that size.
- If the scale is 1:50, set UserUnit = 50. This allows someone, 
  given a suitable UI, to easily and accurately MEASURE 
  various elements of the floorplan.
- Ghostscript does not have such a UI, and I think such a UI 
  is beyond GS's purpose.
- GS's role, however, is to PRINT that PDF in order to obtain 
  the "plotted" floorplan.
- To obtain the equivalent of the plotted paper, printing must 
  ignore UserUnit. Observing UserUnit for printing would 
  require a building-sized sheet of paper!

---
(K) TESTING DETAIL: "transform" returns reals

"<x> <y> <matrix> transform" returns 2 realtype objects, even 
when <x> <y> are integertype and the matrix is [1 0 0 1 0 0] 
(identity, containing only integers). This makes, for example, 
a Media- or CropBox of [0 0 612 792] in a source PDF to be 
written as [0 0 612.0 792] in the destination, which is 
annoying if comparing the output of unpatched and patched 
Ghostscript.
Comment 9 SaGS 2006-04-11 16:32:02 UTC
Created attachment 2149 [details]
Test file: UserUnit-tests.pdf

First 16 pages include all combinations of Rotate 0/90/180/270, 
with/without UserUnit, with/without a CropBox. There are 
bookmarks for each page, and links between each other. All the 
view destinations involved are of type /FitR. The CropBox-es, if 
present, are placed where the large dotted rectangles are.

Last 8 pages contain links with all types of view destinations 
(/Fit, /XYZ, etc) to the first 16 pages, to verify the 
coordinates involved are transformed correctly.

Links are somehow connected to the dark-green dotted rectangles 
around the destination page number. For example /FitR magnifies 
tha page to show exactly that rectangle, /XYZ points to its 
upper-left corner, etc.

Notes:

- Attachement #1424 is not suitable for testing because it 
  requires a huge page size (200 x 200 inches), so taking 
  /UserUnit into account ends with a configurationerror in 
  setpagedevice.

- Adobe Reader for Windows sometimes has trouble computing the
  zoom after clicking a link to a page with UserUnit > 1. It 
  seems to forget to take UserUnit into account, and the zoom 
  factor results exactly UserUnit times larger. After PDF->PDF 
  conversion with patched Ghostscript, these links work OK 
  because the UserUnits are "flattened into the pages".

- It [Reader] also has problems with /FitBV. I stopped trying 
  to understand why, those links are identical to the /FitV 
  ones, and /FitV works OK.
Comment 10 artifex 2006-04-12 01:08:52 UTC
We would prefer that UserUnit is interpreted by default. There are no
recommendations how to choose the UserUnit. Therefore there is no relation
between  the extent of a PDF-file that uses a UserUnit and the extent that it
would have ignoring the UserUnit. UserUnit is a valid PDF-element and it should
be handled.
The only reason for not interpreting it could be to be compatible with previous
versions of GhostScript.
I believe the reason for introducing the UserUnit is more trivial. The
implementation limit for Acrobat page size is 14400x14400 units. These are the
well known 200x200 inches regarding 1/72 inch sizes. Adobe could not remove the
limit of 14400x14400 for two reasons:
1.: The value is represented as a fixed float value (16 bit mantissa, 16 bit
fractional part) whose range can not be increased significantly (+/-32000).
2.: PDF-files using a page size larger than 14400x14400 units can not be
displayed with Acrobat.

To be able to handle files larger than 200x200 inches, an additional "scale
factor" element, the UserUnit, was added. This is just ignored by old Adobe Readers.
Comment 11 SaGS 2006-08-13 09:28:00 UTC
Any feedback on the patch?

I mention the patch, as posted, does take UserUnit into account by default.

Also, fixing bug 688829 "Merging PDF files using gs: outlines and links not 
updated" seems to require changing "linkdest". If that patch will be relative 
to current TRUNK, applying the 2 patches will produce conflicts (not 
incompatibilities, just conflicts).
Comment 12 SaGS 2006-08-20 09:57:04 UTC
Created attachment 2419 [details]
Patch relative to today's TRUNK (svn rev 7001).

I attach an updated patch, because commits made during the last 
4+ months created conflicts that are not so easy to resolve.

The fuctionality is the same as before; for details please see 
comment #8.

Differences between the 2 patches:

(1) Account for changes in rev 6850->6893 "Use /PageSize from the 
    currrent page device dictionary when the /MediaBox pget fails. 
    Bug 688771, customer 581."

(2) Account for changes in rev 6893->6897 "Replace empty MediaBox or 
    CropBox box with a box that is equal to the current page size. 
    Bug 688744, customer 384."

(3) Includes the change "knownoget->pget" mentioned in the note at 
    the end of point (C) from comment #8 (pdf_main.ps, the 2 new 
    hunks @@ -1213,7 +1313,7 @@ and @@ -1230,7 +1330,7 @@).
Comment 13 Alex Cherepanov 2007-05-23 13:32:48 UTC
The patch is committed as a rev. 7999.

I've tested the patch ageainst our PDF file collection,
changed -dNoUserUnit to th mixed case for better readability
and documented the new option in Use.htm.

Thank you, SaGS.