690180 – Improve large font splitting in pdfwrite

Bug 690180 - Improve large font splitting in pdfwrite

Summary: Improve large font splitting in pdfwrite

Status:	UNCONFIRMED

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	PDF Writer (show other bugs)
Version:	master
Hardware:	All Windows NT

Importance:	P4 enhancement
Assignee:	Ken Sharp

URL:
Keywords:

Depends on:
Blocks:

Reported:	2008-11-21 06:23 UTC by Ken Sharp
Modified:	2008-11-21 06:30 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
LargeCFF-Font.zip (9.79 MB, application/octet-stream) 2008-11-21 06:30 UTC, Ken Sharp	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Ken Sharp 2008-11-21 06:23:36 UTC

This is a place holder for future work. Summary below is to assist my memory really.

The code in scan_cmap_text which handles large fonts by creating a number of
subset fonts could use improvement. This is especially true for the attached
file, which has a single large CFF font composed with a CMap to produce a
CIDKeyed instance with a single large descendant.

The fonts created to hold the subsets default to a preferred encoding which
result in us not filling every possible position from 0-255. We could create
fewer fonts if we filled them completely.

Secondly, we currently break when we detect a switch of descendant font, but not
if we switch subsets. This means that all the glyphs in a given text string must
be in the same font. I *think* this means that if some glyphs are already
encoded in earlier subsets we won't detect that, and will embed them again in
the new subset. Also, if we encounter more than 255 glyphs in a single text
string which have not previously been encoded I think we will fail to emit the
text. We should allow for a break to switch subsets when required, which should
lead to fewer embedded subsets. Needs more investigation.

Finally Acrobat works differently for these fonts. It embeds a single large
FontFile, and a single FontDescriptor, and multiple type 1 fonts, each of which
contains only a subset of glyphs in its Encoding. This is still more efficient,
and would be 'nice to have'. However it may well be impossible without
extensively rewriting the font handling code.

Comment 1 Ken Sharp 2008-11-21 06:30:56 UTC

Created attachment 4617 [details]
LargeCFF-Font.zip

This test file demonstrates some of the issues, and contains a usefully large
CFF font to serve as the basis for constructing further test files