690608 – pdf2ps and ps2pdf conversion - problem with copying text

Bug 690608 - pdf2ps and ps2pdf conversion - problem with copying text

Summary: pdf2ps and ps2pdf conversion - problem with copying text

Status:	RESOLVED WORKSFORME

Alias:	None

Product:	Ghostscript
Classification:	Unclassified
Component:	General (show other bugs)
Version:	8.61
Hardware:	PC Linux

Importance:	P4 normal
Assignee:	Default assignee

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-07-07 09:50 UTC by Thomas
Modified:	2009-07-08 07:58 UTC (History)
CC List:	0 users

See Also:
Customer:
Word Size:	---

Attachments
test_in.pdf (101.56 KB, application/pdf) 2009-07-08 01:55 UTC, Thomas	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Thomas 2009-07-07 09:50:56 UTC

It looks like newly created pdf file after pdf2ps && ps2pdf conversion can't be
searched, and you can't copy text from it.

Comment 1 Ken Sharp 2009-07-07 10:10:49 UTC

Hmm, well there can be many reasons for this, not least the fact that you appear
to be using quite an old version of Ghostscript.

I'd suggest upgrading and retrying, but if you post an example PostScript file I
can try it and let you know the results. Also, if this turns out to be expected,
why it happens.

Note that the dual conversion really isn't a terribly good idea, as you
potentially lose information on each conversion. In particular the pdf2ps script
uses the pswrite (PostScript language level 1) device, rather than the much more
capable ps2write device. Most likely the conversion via pswrite is discarding
information. If you are unable to copy text this suggests that there is in fact
no text in the document, merely bitmaps.

Comment 2 Thomas 2009-07-08 01:55:20 UTC

Created attachment 5195 [details]
test_in.pdf

test PDF created with OpenOffice 3.1

Comment 3 Thomas 2009-07-08 02:02:02 UTC

And now when I run `pdf2ps test_in.pdf && ps2pdf test_in.ps test_out.pdf` the
test_out.pdf becomes unsearchable and uncopyable.

Comment 4 Thomas 2009-07-08 02:46:25 UTC

OK, quick work around to the problem seems to be adding '-sDEVICE=ps2write'
option to the pdf2ps.

But why even in the latest,
http://svn.ghostscript.com/ghostscript/trunk/gs/lib/pdf2ps , version that device
is not being used by default?

Comment 5 Alex Cherepanov 2009-07-08 05:17:21 UTC

pswrite device converts everything to path and image objects. All text
information is discarded.

pswrite device is used in pdf2ps script for backward compatibility.
The PostScript generated by pswrite device is rather simple and works on
most printers including Level 1 ones.

We have pdf2ps2 script that uses ps2write device.

Even with ps2write device conversion is still lossy. PostScript has no
transparency, and transparent objects are converted to images.

If you want to convert PDF to PDF, you'd better do it in one step.
  ps2pdf src.pdf dst.pdf
Yes, ps2pdf also accepts PDF input because Ghostscript auto-detects the
file type.

In short, Ghostscript works as designed.

Comment 6 Thomas 2009-07-08 07:58:41 UTC

all make sense now.. thanks!