Many physics journals have put all of the older articles onto the web as pdf files. The new ones have no problem but the older ones display very badly using gs or gs based viewers. They seem to display much better with xpdf or AdobeReader. Is there any way to improve the display quality or is this an inherent problem? I am attaching a sample file. Thanks
Created attachment 3809 [details] Sample pdf file
This file is one of the prevalent 'PDF in name only' PDF's that many lame applications create (such as scanners). The PDF consists of a single image per page. As -dPDFDEBUG with gs shows: %Resolving: [3 0] << /Type /XObject /Subtype /Image /Name /I0 /Filter [ /CCITTFaxDecode ] /Width 5169 /Height 7129 /BitsPerComponent 1 /ColorSpace /DeviceGray /Length 4 0 R /DecodeParms [ << /Columns 5169 /Rows 7129 /K -1 /EndOfBlock false >> ] >> Thus this text is rendered at approximtely 720 dpi. Running: gs -sFile=bug_689719.pdf toolbin/pdf_info.ps shows: bug_689719.pdf has 3 pages. Producer: g42pdf.pl 1.0 By default Ghostscript doesn't perform any 'image smoothing', but Adobe Acrobat does. The image looks better on my screen when I force Ghostscript to use an image filter with: gs -dDOINTERPOLATE bug_689719.pdf
Ray, could you please clarify your 'PDF in name only' comment. I don't see why this file is any less a 'real' PDF file than any other. I agree that it's a simple PDF file, but surely there isn't a complexity requirement in the PDF spec.
Marcos requested clarification of my 'PDF in name only' comment. It's not an issue of compliance with the PDF spec, but rather a comment on a PDF that doesn't conform to the 'spirit' of creating a Portable Document Format that provides many advantages over other formats such as TIFF. At least this one doesn't use lossy (JPEG) compression. For a page that looks like text, and is placed into an archive of documents, it seems that one might expect something if the document is a PDF instead of a TIFF or JPEG. Most PDF's that have text are 'searchable' i.e., the text is in the PDF as PDF text operators, and usually with embedded fonts (or font subsets) to make the PDF portable. Also, these 'real' PDF's don't have a specific resolution 'baked in' so the print and display well at a wide range of resolutions/zoom factors. A PDF that is nothing more than a PDF wrapper on a full page bitmap is neither resolution independent, nor is it searchable. Also the file size is usually larger than a 'real' PDF. Ghostscript originally put many fonts into PDF's as bitmap fonts, and that had the latter (resolution specific) limitation, but at least it was text, although somtimes the Encoding would keep it from being searchable by tools that didn't handle Type 3 fonts correctly. Note that there are tools to convert images into searchable PDF's. Scansoft is one that I've used that works quite well, although like most OCR based s/w it may require manual 'cleanup'. Their cleanup tool is worthwhile as well. This software came with my Fujitsu ScanSnap scanner, but this scanner does default to non-OCR mode, creating exactly the type of PDF attached to this report.