Summary: | Latin alphabets with grave accent are broken in pdf exported by LibreOffice | ||
---|---|---|---|
Product: | Fonts | Reporter: | Tom Yan <tom.ty89> |
Component: | free URW | Assignee: | Chris Liddell (chrisl) <chris.liddell> |
Status: | RESOLVED INVALID | ||
Severity: | normal | CC: | cheinzmann3, chris.liddell |
Priority: | P4 | ||
Version: | unspecified | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: |
Libreoffice's export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9
Libreoffice's export with commit e5b3fce0aadb091699b409be325468c682bd436d Calligra Words' export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9 broken pdf from LO with one string in Roman Regular working pdf from LO with one string in Roman Regular New font data in hacked up PDF |
Description
Tom Yan
2015-10-11 12:38:17 UTC
Created attachment 11977 [details] Libreoffice's export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9 Created attachment 11978 [details] Libreoffice's export with commit e5b3fce0aadb091699b409be325468c682bd436d Created attachment 11979 [details] Calligra Words' export with commit c983ed400dc278dcf20bdff68252fad6d9db7af9 Created attachment 12000 [details]
broken pdf from LO with one string in Roman Regular
Created attachment 12001 [details]
working pdf from LO with one string in Roman Regular
Right, okay, that made things rather easier.
In the working file, the font is embedded with an explicit encoding entry in the PDF font object, and that encoding looks like:
<</Type /Encoding
/Differences [ 0
/agrave
/egrave
/igrave
/ograve
/ugrave
]
>>
In other words, take the first 5 entries of the embedded font's own encoding, and replace them respectively with the names /agrave, /egrave, /igrave, /ograve and /ugrave. (The Widths array for the font is also tweaked so it's first five entries contain the appropriate width metrics for the above listed glyph names), then the string printed in the PDF content stream is character codes: 0, 1, 2, 3 and 4 (corresponding to the glyphs added in those positions to the encoding).
In the non-working PDF, the PDF font object does not contain any encoding entry (so the interpreter will use the font's own encoding - StandardEncoding), and the string in the PDF content stream uses character codes: 224, 232, 236, 242 and 249. In StandardEncoding, those character codes map to the following glyph names: /.notdef, /Lslash, /.notdef and /.notdef.
If I take the differences array, the values for the Widths array and the string contents from the old file, and paste them into the new one (leaving the embedded font data exactly the same), the output is correct.
Basically, the PDF has been created incorrectly. I can't see anything obvious in the fonts (both use StandardEncoding and both have the required glyphs in the CharStrings dictionary under the same names) to cause such a change in behaviour, but it's clear this is *not* a problem with the fonts!
Created attachment 12002 [details]
New font data in hacked up PDF
This is the "roman.pdf" file attached above, that I have decompressed, and edited to contain the same differences array, width and string as the "roman-old.pdf" also attached above (I also fixed up the file offsets and stream lengths so it's a valid PDF). To be clear, the embedded font data stream is unchanged from "roman.pdf".
This one displays correctly. To reiterate, the embedded font stream is *exactly* the same, and the difference between the working file and the non-working one *not* in the font.
I've left this PDF uncompressed and relatively human readable, in case anyone wants to inspect it.
Based on the above analysis..... I'm closing this. *** Bug 696495 has been marked as a duplicate of this bug. *** |