Bug 696263

Summary: Latin alphabets with grave accent are broken in pdf exported by LibreOffice
Product: Fonts Reporter: Tom Yan <tom.ty89>
Component: free URWAssignee: Chris Liddell (chrisl) <chris.liddell>
Status: RESOLVED INVALID    
Severity: normal CC: cheinzmann3, chris.liddell
Priority: P4    
Version: unspecified   
Hardware: PC   
OS: Linux   
Customer: Word Size: ---
Attachments: Libreoffice's export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9
Libreoffice's export with commit e5b3fce0aadb091699b409be325468c682bd436d
Calligra Words' export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9
broken pdf from LO with one string in Roman Regular
working pdf from LO with one string in Roman Regular
New font data in hacked up PDF

Description Tom Yan 2015-10-11 12:38:17 UTC
With commit c983ed400dc278dcf20bdff68252fad6d9db7af9, AT LEAST àèìòù of nimbus fonts (sans, roman, mono but not sans narrow) in LibreOffice's exported pdf are shown incorrectly by gs and evince. The problem does not seem to exist with commit e5b3fce0aadb091699b409be325468c682bd436d.

I also tried exporting PDF in Calligra Words with the latest commit. Although it works fine, it seems that CW embeds fonts differently than LO. According to evince CW embeds CID while LO embeds Type 1.

Attached are the PDFs exported in the three test cases mentioned above.
Comment 1 Tom Yan 2015-10-11 12:39:33 UTC
Created attachment 11977 [details]
Libreoffice's export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9
Comment 2 Tom Yan 2015-10-11 12:40:11 UTC
Created attachment 11978 [details]
Libreoffice's export with commit e5b3fce0aadb091699b409be325468c682bd436d
Comment 3 Tom Yan 2015-10-11 12:40:44 UTC
Created attachment 11979 [details]
Calligra Words' export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9
Comment 4 Tom Yan 2015-10-21 07:17:23 UTC
Created attachment 12000 [details]
broken pdf from LO with one string in Roman Regular
Comment 5 Tom Yan 2015-10-21 07:17:55 UTC
Created attachment 12001 [details]
working pdf from LO with one string in Roman Regular
Comment 6 Chris Liddell (chrisl) 2015-10-21 09:56:33 UTC
Right, okay, that made things rather easier.

In the working file, the font is embedded with an explicit encoding entry in the PDF font object, and that encoding looks like:

<</Type /Encoding
/Differences [ 0
/agrave
/egrave
/igrave
/ograve
/ugrave
]
>>

In other words, take the first 5 entries of the embedded font's own encoding, and replace them respectively with the names /agrave, /egrave, /igrave, /ograve and /ugrave. (The Widths array for the font is also tweaked so it's first five entries contain the appropriate width metrics for the above listed glyph names), then the string printed in the PDF content stream is character codes: 0, 1, 2, 3 and 4 (corresponding to the glyphs added in those positions to the encoding).

In the non-working PDF, the PDF font object does not contain any encoding entry (so the interpreter will use the font's own encoding - StandardEncoding), and the string in the PDF content stream uses character codes: 224, 232, 236, 242 and 249. In StandardEncoding, those character codes map to the following glyph names: /.notdef, /Lslash, /.notdef and /.notdef.


If I take the differences array, the values for the Widths array and the string contents from the old file, and paste them into the new one (leaving the embedded font data exactly the same), the output is correct.

Basically, the PDF has been created incorrectly. I can't see anything obvious in the fonts (both use StandardEncoding and both have the required glyphs in the CharStrings dictionary under the same names) to cause such a change in behaviour, but it's clear this is *not* a problem with the fonts!
Comment 7 Chris Liddell (chrisl) 2015-10-21 10:05:19 UTC
Created attachment 12002 [details]
New font data in hacked up PDF

This is the "roman.pdf" file attached above, that I have decompressed, and edited to contain the same differences array, width and string as the "roman-old.pdf" also attached above (I also fixed up the file offsets and stream lengths so it's a valid PDF). To be clear, the embedded font data stream is unchanged from "roman.pdf".

This one displays correctly. To reiterate, the embedded font stream is *exactly* the same, and the difference between the working file and the non-working one *not* in the font.

I've left this PDF uncompressed and relatively human readable, in case anyone wants to inspect it.
Comment 8 Chris Liddell (chrisl) 2015-10-21 10:05:57 UTC
Based on the above analysis..... I'm closing this.
Comment 9 Chris Liddell (chrisl) 2016-01-05 23:15:13 UTC
*** Bug 696495 has been marked as a duplicate of this bug. ***