Bug 691849 - pdfwriter regression: pdf text element is broken
Summary: pdfwriter regression: pdf text element is broken
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: 9.00
Hardware: PC Windows 2000
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2010-12-20 21:23 UTC by jojelino
Modified: 2011-01-02 02:52 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
testcase (404.11 KB, application/futuresplash)
2010-12-20 21:23 UTC, jojelino
Details
generated pdf from 9.01 for preceeding attachment. (7.12 KB, application/pdf)
2010-12-20 21:25 UTC, jojelino
Details
expected result(from 8.71) (7.09 KB, application/pdf)
2010-12-20 21:30 UTC, jojelino
Details
another testcase (372.73 KB, application/postscript)
2010-12-25 05:23 UTC, jojelino
Details
expected result(from 8.71) (43.24 KB, application/pdf)
2010-12-25 05:26 UTC, jojelino
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jojelino 2010-12-20 21:23:23 UTC
Created attachment 7051 [details]
testcase

this document contains text info following
136 <0061> def
137 <0062> def
138 <0063> def
139 <0064> def
140 <0065> def
141 <0066> def
142 <0067> def
143 <0068> def
144 <0069> def
145 <006A> def
146 <006B> def

and copy & paste works correctly on gs 8.71.
but in recent trunk, something got definitely different.
gs 9.01 can't do proper mapping between glyph and text

it is easily caused by downloading sourceforge's official release build 8.71, 9.01 and running seperate ps2pdf run for both gs.. to this attachment. compare the result from copy&pasting text element to notepad. the pdf file from 8.71 works great and other one from 9.01 is awkward
Comment 1 jojelino 2010-12-20 21:25:36 UTC
Created attachment 7052 [details]
generated pdf from 9.01 for preceeding attachment.
Comment 2 jojelino 2010-12-20 21:30:28 UTC
Created attachment 7053 [details]
expected result(from 8.71)
Comment 3 Ken Sharp 2010-12-23 14:41:17 UTC
Fixed in revision 11975:

http://ghostscript.com/pipermail/gs-cvs/2010-December/012056.html
Comment 4 jojelino 2010-12-25 05:23:47 UTC
Created attachment 7062 [details]
another testcase

this bug not resolved. i added another testcase that causes same problem
Comment 5 jojelino 2010-12-25 05:26:31 UTC
Created attachment 7063 [details]
expected result(from 8.71)
Comment 6 Ken Sharp 2010-12-31 13:21:44 UTC
(In reply to comment #4)
> Created an attachment (id=7062) [details]
> another testcase
> 
> this bug not resolved. i added another testcase that causes same problem

This is quite a different issue, nothing to do with CMaps at all.

The font in question (Arial) is embedded as a Symbolic TrueType font. Symbolic TrueType fonts should not have an Encoding (and in PDF/A *must* not). Previously we embedded an Encoding anyway, because (as in this case) Acrobat would use the Encoding to search and extract text.

However this *is* technically invalid, caused us to create invalid PDF/A files and recently started causing other problems. As a result the code was revised to create proper Symbolic TrueType fonts with no Encoding.

If you would like to raise an enhancement request I'm happy to consider whether we can do a better job of identifying Symbolic fonts, but as it stands this is not a bug nor a regression, even though it does appear to be.
Comment 7 jojelino 2011-01-01 10:15:02 UTC
--- base/gdevpdtt.c	(revision 11735)
+++ base/gdevpdtt.c	(revision 11734)
in this changeset, font->FontType == ft_TrueType is commented out. 
but it causes some non-cid truetype cid font ( fonts which has valid encodings such as winansi...isolatin..) loses its original encoding. not depending whether it is tagged with symbolic flag. am i understanding it correctly? i dun know this is truetype font with symbolic flag. but there would be several improvements for workaround this non-bug problem...
Comment 8 Ken Sharp 2011-01-01 14:10:45 UTC
(In reply to comment #7)
> --- base/gdevpdtt.c    (revision 11735)
> +++ base/gdevpdtt.c    (revision 11734)
> in this changeset, font->FontType == ft_TrueType is commented out. 
> but it causes some non-cid truetype cid font ( fonts which has valid encodings
> such as winansi...isolatin..) loses its original encoding. not depending
> whether it is tagged with symbolic flag. am i understanding it correctly?

I'm afraid I don't quite understand what you mean. The change prevents the addition of an Encoding to a font whose type is TrueType. Since we always write TrueType fonts as Symbolic fonts (as noted in the comment) it doesn't matter what the original font's Encoding was. This change had no effect on any font other than a TrueType font.

I'm not sure what you mean by a 'non-cid truetype cid font'.
Comment 9 jojelino 2011-01-02 02:52:57 UTC
>This change had no effect on any font other than a TrueType font.
yes. indeed.
> I'm not sure what you mean by a 'non-cid truetype cid font'.
i'm sorry for this confusing phrase, it would mean non-cid truetype font.