Bug 692968 - Wrong characters displayed for dollar sign
Summary: Wrong characters displayed for dollar sign
Status: NOTIFIED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Text (show other bugs)
Version: master
Hardware: PC All
: P1 normal
Assignee: Alex Cherepanov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-05 21:05 UTC by Marcos H. Woehrmann
Modified: 2014-02-17 04:43 UTC (History)
2 users (show)

See Also:
Customer: 780
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2012-04-05 21:05:28 UTC
The dollar sign ($) in the attached PDF is replaced by a Thorn <http://en.wikipedia.org/wiki/%C3%9E> followed by a lowercase Y with diaeresis when converted by Ghostscript.  The customer is using gs9.01 but other versions I've tried all behave the same way.

Apple Preview and Adobe Acrobat display the expected dollar sign, but mu output matches ghostscript.

The command line I'm using:

  bin/gs -sDEVICE=ppmraw -o test.ppm ./RF-6561.pdf

Note that my attempts to simply the attached file using Acrobat have not been successful, the newly created files are all processed correctly by Ghostscript.
Comment 2 Alex Cherepanov 2012-04-05 21:34:21 UTC
The sample file has incorrect appearance stream, which is faithfully
displayed by Ghostscript.

The appearance stream selects single-byte Helvetica font but, uses
Unicode in the string data. Viewers, that show the file correctly,
ignore appearance stream and re-create annotations from scratch. 

/Helv 10 Tf 0 g 2 3.3355 Td <FEFF010600320030002C003300320038002E00300030> Tj
Comment 3 Chris Liddell (chrisl) 2012-04-13 17:25:12 UTC
Reassigning to Alex - Henry would like a fuller explanation of how/why we differ from Acrobat's output.
Comment 4 Alex Cherepanov 2012-04-14 02:05:15 UTC
We render appearance stream. Adobe re-creates the appearance of the annotation
from its attributes. Appearance stream and attributes don't match.

I don't know a good approach to this problem.
-- Always re-creating appearance streams is difficult.
-- Detecting incorrect use of Unicode and replacing it with an
   appropriate encoding is silly. There are many other ways to break the file.
Comment 5 Marcos H. Woehrmann 2012-04-15 06:26:05 UTC
I've checked as many PDF viewers as I was able to find and here is the list of ones that display a '$':

Adobe Acrobat
Apple Preview
Foxit Reader
Nitro PDF Reader
Google Chrome
evince
xpdf


And these viewers display a thorn followed by a 'y' with diaeresis:

Ghostscript
mupdf
Comment 7 Chris Liddell (chrisl) 2012-04-17 16:27:57 UTC
Created attachment 8531 [details]
Shows Acrobat disobeying the PDF Spec
Comment 8 Alex Cherepanov 2012-04-19 19:04:46 UTC
Regenerate appearance streams when it is requested by the file
and implemented in PDF interpreter. (NeedAppearances is set)
Otherwise, continue to use appearance streams provided by the
file.

The patch has been committed as
http://git.ghostscript.com/?p=ghostpdl.git;a=commitdiff;h=d252b4f9d3a949778894a86bb71cc2206fce11cf

Regression testing shows some improvements in comparefiles/Bug689450.pdf
but multi-line annotations are not yet implemented.