696263 – Latin alphabets with grave accent are broken in pdf exported by LibreOffice

Bug 696263 - Latin alphabets with grave accent are broken in pdf exported by LibreOffice

Summary: Latin alphabets with grave accent are broken in pdf exported by LibreOffice

Status:	RESOLVED INVALID

Alias:	None

Product:	Fonts
Classification:	Unclassified
Component:	free URW (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P4 normal
Assignee:	Chris Liddell (chrisl)

URL:
Keywords:

Duplicates (1):	696495 (view as bug list)
Depends on:
Blocks:

Reported:	2015-10-11 12:38 UTC by Tom Yan
Modified:	2016-01-05 23:15 UTC (History)
CC List:	2 users (show)

See Also:
Customer:
Word Size:	---

Attachments
Libreoffice's export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9 (2.74 MB, application/pdf) 2015-10-11 12:39 UTC, Tom Yan	Details
Libreoffice's export with commit e5b3fce0aadb091699b409be325468c682bd436d (2.68 MB, application/pdf) 2015-10-11 12:40 UTC, Tom Yan	Details
Calligra Words' export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9 (79.15 KB, application/pdf) 2015-10-11 12:40 UTC, Tom Yan	Details
broken pdf from LO with one string in Roman Regular (119.55 KB, application/pdf) 2015-10-21 07:17 UTC, Tom Yan	Details
working pdf from LO with one string in Roman Regular (115.11 KB, application/pdf) 2015-10-21 07:17 UTC, Tom Yan	Details
New font data in hacked up PDF (121.95 KB, application/pdf) 2015-10-21 10:05 UTC, Chris Liddell (chrisl)	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Tom Yan 2015-10-11 12:38:17 UTC

With commit c983ed400dc278dcf20bdff68252fad6d9db7af9, AT LEAST àèìòù of nimbus fonts (sans, roman, mono but not sans narrow) in LibreOffice's exported pdf are shown incorrectly by gs and evince. The problem does not seem to exist with commit e5b3fce0aadb091699b409be325468c682bd436d.

I also tried exporting PDF in Calligra Words with the latest commit. Although it works fine, it seems that CW embeds fonts differently than LO. According to evince CW embeds CID while LO embeds Type 1.

Attached are the PDFs exported in the three test cases mentioned above.

Comment 1 Tom Yan 2015-10-11 12:39:33 UTC

Created attachment 11977 [details]
Libreoffice's export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9

Comment 2 Tom Yan 2015-10-11 12:40:11 UTC

Created attachment 11978 [details]
Libreoffice's export with commit e5b3fce0aadb091699b409be325468c682bd436d

Comment 3 Tom Yan 2015-10-11 12:40:44 UTC

Created attachment 11979 [details]
Calligra Words' export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9

Comment 4 Tom Yan 2015-10-21 07:17:23 UTC

Created attachment 12000 [details]
broken pdf from LO with one string in Roman Regular

Comment 5 Tom Yan 2015-10-21 07:17:55 UTC

Created attachment 12001 [details]
working pdf from LO with one string in Roman Regular

Comment 6 Chris Liddell (chrisl) 2015-10-21 09:56:33 UTC

Right, okay, that made things rather easier.

In the working file, the font is embedded with an explicit encoding entry in the PDF font object, and that encoding looks like:

<</Type /Encoding
/Differences [ 0
/agrave
/egrave
/igrave
/ograve
/ugrave
]
>>

In other words, take the first 5 entries of the embedded font's own encoding, and replace them respectively with the names /agrave, /egrave, /igrave, /ograve and /ugrave. (The Widths array for the font is also tweaked so it's first five entries contain the appropriate width metrics for the above listed glyph names), then the string printed in the PDF content stream is character codes: 0, 1, 2, 3 and 4 (corresponding to the glyphs added in those positions to the encoding).

In the non-working PDF, the PDF font object does not contain any encoding entry (so the interpreter will use the font's own encoding - StandardEncoding), and the string in the PDF content stream uses character codes: 224, 232, 236, 242 and 249. In StandardEncoding, those character codes map to the following glyph names: /.notdef, /Lslash, /.notdef and /.notdef.


If I take the differences array, the values for the Widths array and the string contents from the old file, and paste them into the new one (leaving the embedded font data exactly the same), the output is correct.

Basically, the PDF has been created incorrectly. I can't see anything obvious in the fonts (both use StandardEncoding and both have the required glyphs in the CharStrings dictionary under the same names) to cause such a change in behaviour, but it's clear this is *not* a problem with the fonts!

Comment 7 Chris Liddell (chrisl) 2015-10-21 10:05:19 UTC

Created attachment 12002 [details]
New font data in hacked up PDF

This is the "roman.pdf" file attached above, that I have decompressed, and edited to contain the same differences array, width and string as the "roman-old.pdf" also attached above (I also fixed up the file offsets and stream lengths so it's a valid PDF). To be clear, the embedded font data stream is unchanged from "roman.pdf".

This one displays correctly. To reiterate, the embedded font stream is *exactly* the same, and the difference between the working file and the non-working one *not* in the font.

I've left this PDF uncompressed and relatively human readable, in case anyone wants to inspect it.

Comment 8 Chris Liddell (chrisl) 2015-10-21 10:05:57 UTC

Based on the above analysis..... I'm closing this.

Comment 9 Chris Liddell (chrisl) 2016-01-05 23:15:13 UTC

*** Bug 696495 has been marked as a duplicate of this bug. ***