Summary: | pdfwrite: wrong characters in merged PDF file | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Michael Weghorn <m.weghorn> |
Component: | PDF Writer | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED DUPLICATE | ||
Severity: | normal | CC: | m.weghorn |
Priority: | P4 | ||
Version: | master | ||
Hardware: | PC | ||
OS: | Linux | ||
Customer: | Word Size: | --- | |
Attachments: |
input PDF files for merge
result of merging the single PDF files PostScript file processed by Ghostscript result of converting the PostScript file |
Description
Michael Weghorn
2015-06-30 08:10:52 UTC
Created attachment 11771 [details]
result of merging the single PDF files
Created attachment 11772 [details]
PostScript file processed by Ghostscript
Created attachment 11773 [details]
result of converting the PostScript file
First thing to understand is that Ghostscript does *NOT* merge PDF files. It interprets its input, converts it to marking operations, and then sends those to the device. When the device is pdfwrite those marking operations are then written out as a PDF file. This means that the output does not bear any particular resemblance to the input, other than visually. As I keep on saying to people you should avoid re-processing the output from Ghostscript, do it once and don't do it again. Your problem is that LibreOffice names the font subsets it embeds in the same way, no matter what the font name, or content is. Ghostscript assumes that two fonts subsets with the same name are the same font. In this case LibreOffice is behaving very poorly indeed. As noted in bug #694537 there is nothing much we can do about this, the damage is done before we see the file. The work-around is to re-fry each PDF file before passing them to pdfwrite. The reason is that Ghostscript will produce a new subset font and will name it with a sensible name, which should reduce name collisions. Though since you have 250 subset fonts, all with practically the same glyph coverage but different encodings, that may not help you. Basically you are trying to use Ghostscript in a way its not intended to be used, from an application which frankly isn't very good at what its doing. *** This bug has been marked as a duplicate of bug 694537 *** (In reply to Ken Sharp from comment #4) > As noted in bug #694537 there is nothing much we can do about this, the > damage is done before we see the file. The work-around is to re-fry each PDF > file before passing them to pdfwrite. The reason is that Ghostscript will > produce a new subset font and will name it with a sensible name, which > should reduce name collisions. Though since you have 250 subset fonts, all > with practically the same glyph coverage but different encodings, that may > not help you. > Thank you very much for your quick reply. I had actually tried to first process each PDF file individually before passing all files to pdfwrite. The result was better than without first processing each file, but there were still wrong characters. When I tried this workaround last time, i had used version 9.06 of Ghostscript. With the current master build, the resulting PDF is indeed OK. In fact, we do not only have 250 documents, but many more... |