687869 – extracting embedded fonts

Bug 687869 - extracting embedded fonts

Summary: extracting embedded fonts

Status:	NOTIFIED LATER

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	Other Driver (show other bugs)
Version:	8.14
Hardware:	All All

Importance:	P2 normal
Assignee:	Ray Johnston

URL:
Keywords:

Depends on:
Blocks:

Reported:	2004-12-20 08:53 UTC by Jack Moffitt
Modified:	2011-09-18 21:47 UTC (History)
CC List:	1 user (show)

See Also:
Customer:	400
Word Size:	---

Attachments
extractPDFfonts.ps (1.68 KB, application/postscript) 2005-01-07 18:41 UTC, Ray Johnston	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jack Moffitt 2004-12-20 08:53:22 UTC

From the customer:

Is there a way to "extract" embedded fonts from PDF / PS / EPS files?

Are there any rules / problems extracting embedded fonts for the user to
user in other applications ?

I would only extract what is possible from a legal stand point and only
extract "whole" embedded fonts not any of the partial fonts.

Comment 1 Ray Johnston 2005-01-07 18:41:54 UTC

Created attachment 1138 [details]
extractPDFfonts.ps

This can be used to extract fonts (and optionally subsets) from a PDF.

It is up to the user to observe legal requirements.

Comment 2 Ray Johnston 2005-01-07 18:42:48 UTC

It is possible to extract fonts from PDF files, and one method to
extract fonts from PS files is to use Ghostscript to 'distill' the
PS into a PDF, then use the same extraction method.

I've attached a file that can be used with Ghostscript to extract fonts
from a PDF file. The synopsis is in the extractPDFfonts.ps file:

example usage:
  gs -q -dNODISPLAY extractPDFfonts.ps -c "(somefile.pdf) extractPDFfonts quit"
to extract embedded fonts and font subsets, use:
  gs -q -dExtractSubsets -dNODISPLAY extractPDFfonts.ps -c "(somefile.pdf)
extractPDFfonts quit"

I've tested this and it seems to work. Note that if font subsets are present,
the filenames will contain '+' characters, but these should be OK on most
platforms. If subsets are extracted, no attempt is made to merge subsets.

Comment 3 Tony Teveris 2005-01-27 09:10:34 UTC

General info

I created the following files in AI 9.0:

90actionls-bed.pdf	TrueType font
90actionls-bed_sub.pdf
90actionls-not.pdf

90URWchan-bed.pdf	Type1 font
90URWchan-bed_sub.pdf
90URWchan-not.pdf

Running each file through the getPDFfontinfo.ps script gave me what I think is 
the correct info.

Running each file through the extractPDFfonts.ps script gave me the following:

ActionIs
URWChanceryL-MediItal
OQUYXN+ActionIs
QYXNPS+URWChanceryL-MediItal

Now if I put a file extension on these files in order for the system to 
understand them  the ONLY one that I can do anything with is the full TrueType 
font (Actionls). The subset TrueType is not recognized by the system (W2K). I’m 
not sure what extension to put on the Type1 fonts. (pfb, pfm, ??????). 

The whole idea behind this was to use the extracted fonts on the user’s system 
and be accessible through the Gerber applications once installed as an 
appropriate font. This seems to only be true for the “full” TrueType font.


Now taking the TrueType files and opening them in AI CS and letting  AI convert 
the job to it’s new internal format and then saving gives me different results. 
I also may have to ask the Adobe Boards about this.

I can not seem to find a way NOT to embed a font in a PDF file. For files that 
have embedded (full or subset) the fontinfo script always reports font type of 
CIDFontType2 no matter what type of font I use. When taking these file (CS 
versions) and running them through the extractor I get nothing.

So, one are my finding correct and can you shed any light on them. I’m not 
saying there is a problem, if this is all we can do then so be it, if not what 
else is there.

Comment 4 Tony Teveris 2005-01-28 04:45:01 UTC

Doing somemore digging I see that AI CS's text engine is designed around 
unicode, so I guess thats why the embedded font format is CID. Oh well, another 
road block to conquer.

Comment 5 Marcos H. Woehrmann 2011-09-18 21:47:49 UTC

Changing customer bugs that have been resolved more than a year ago to closed.