Bug 696263

Summary:	Latin alphabets with grave accent are broken in pdf exported by LibreOffice
Product:	Fonts	Reporter:	Tom Yan <tom.ty89>
Component:	free URW	Assignee:	Chris Liddell (chrisl) <chris.liddell>
Status:	RESOLVED INVALID
Severity:	normal	CC:	cheinzmann3, chris.liddell
Priority:	P4
Version:	unspecified
Hardware:	PC
OS:	Linux
Customer:		Word Size:	---
Attachments:	Libreoffice's export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9 Libreoffice's export with commit e5b3fce0aadb091699b409be325468c682bd436d Calligra Words' export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9 broken pdf from LO with one string in Roman Regular working pdf from LO with one string in Roman Regular New font data in hacked up PDF

Description Tom Yan 2015-10-11 12:38:17 UTC

With commit c983ed400dc278dcf20bdff68252fad6d9db7af9, AT LEAST àèìòù of nimbus fonts (sans, roman, mono but not sans narrow) in LibreOffice's exported pdf are shown incorrectly by gs and evince. The problem does not seem to exist with commit e5b3fce0aadb091699b409be325468c682bd436d.

I also tried exporting PDF in Calligra Words with the latest commit. Although it works fine, it seems that CW embeds fonts differently than LO. According to evince CW embeds CID while LO embeds Type 1.

Attached are the PDFs exported in the three test cases mentioned above.

Comment 1 Tom Yan 2015-10-11 12:39:33 UTC

Created attachment 11977 [details]
Libreoffice's export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9

Comment 2 Tom Yan 2015-10-11 12:40:11 UTC

Created attachment 11978 [details]
Libreoffice's export with commit e5b3fce0aadb091699b409be325468c682bd436d

Comment 3 Tom Yan 2015-10-11 12:40:44 UTC

Created attachment 11979 [details]
Calligra Words' export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9

Comment 4 Tom Yan 2015-10-21 07:17:23 UTC

Created attachment 12000 [details]
broken pdf from LO with one string in Roman Regular

Comment 5 Tom Yan 2015-10-21 07:17:55 UTC

Created attachment 12001 [details]
working pdf from LO with one string in Roman Regular

Comment 6 Chris Liddell (chrisl) 2015-10-21 09:56:33 UTC

Right, okay, that made things rather easier.

In the working file, the font is embedded with an explicit encoding entry in the PDF font object, and that encoding looks like:

<</Type /Encoding
/Differences [ 0
/agrave
/egrave
/igrave
/ograve
/ugrave
]
>>

In other words, take the first 5 entries of the embedded font's own encoding, and replace them respectively with the names /agrave, /egrave, /igrave, /ograve and /ugrave. (The Widths array for the font is also tweaked so it's first five entries contain the appropriate width metrics for the above listed glyph names), then the string printed in the PDF content stream is character codes: 0, 1, 2, 3 and 4 (corresponding to the glyphs added in those positions to the encoding).

In the non-working PDF, the PDF font object does not contain any encoding entry (so the interpreter will use the font's own encoding - StandardEncoding), and the string in the PDF content stream uses character codes: 224, 232, 236, 242 and 249. In StandardEncoding, those character codes map to the following glyph names: /.notdef, /Lslash, /.notdef and /.notdef.


If I take the differences array, the values for the Widths array and the string contents from the old file, and paste them into the new one (leaving the embedded font data exactly the same), the output is correct.

Basically, the PDF has been created incorrectly. I can't see anything obvious in the fonts (both use StandardEncoding and both have the required glyphs in the CharStrings dictionary under the same names) to cause such a change in behaviour, but it's clear this is *not* a problem with the fonts!

Comment 7 Chris Liddell (chrisl) 2015-10-21 10:05:19 UTC

Created attachment 12002 [details]
New font data in hacked up PDF

This is the "roman.pdf" file attached above, that I have decompressed, and edited to contain the same differences array, width and string as the "roman-old.pdf" also attached above (I also fixed up the file offsets and stream lengths so it's a valid PDF). To be clear, the embedded font data stream is unchanged from "roman.pdf".

This one displays correctly. To reiterate, the embedded font stream is *exactly* the same, and the difference between the working file and the non-working one *not* in the font.

I've left this PDF uncompressed and relatively human readable, in case anyone wants to inspect it.

Comment 8 Chris Liddell (chrisl) 2015-10-21 10:05:57 UTC

Based on the above analysis..... I'm closing this.

Comment 9 Chris Liddell (chrisl) 2016-01-05 23:15:13 UTC

*** Bug 696495 has been marked as a duplicate of this bug. ***