Summary: | ps2write does not include DSC comments | ||
---|---|---|---|
Product: | Ghostscript | Reporter: | Ralph Giles <ralph.giles> |
Component: | PS Writer | Assignee: | Ken Sharp <ken.sharp> |
Status: | RESOLVED FIXED | ||
Severity: | enhancement | CC: | till.kamppeter |
Priority: | P4 | Keywords: | bountiable |
Version: | master | ||
Hardware: | PC | ||
OS: | All | ||
Customer: | Word Size: | --- | |
Attachments: |
launch_leaflet.pdf
out-ps2write-9.01.ps out-ps2write-8.71.ps 2008-07-Logistics.pdf |
Description
Ralph Giles
2006-01-15 14:56:30 UTC
Passing to Ken since he handles pdfwrite from now. ps2write is a close of pdfwrite. *** Bug 690064 has been marked as a duplicate of this bug. *** We have discussed the problem of ps2write not being DSC-conforming today on the IRC. pswrite has a severe problem with font handling and therefore it is really best to deprecate it. It converts all glyphs to raster images, which once makes output files much bigger and more awkward to process/render. Second, in certain cases, especially with GhostScript's pxlmono/pxlcolor output devices rendering of these glyphs fails (bug #690025). ps2write does not do so and therefore it does not trigger bug #690025 when passing the resulting PostScript through the GhostScript's pxlmono/pxlcolor output devices. Unfortunately, missing DSC conformance does not allow to switch the pdftops CUPS filter to use ps2write. alexcher tells that ps2write can be easily made DSC-compliant by copying the PDF part to a reusable stream and writing the DSC-compliant reader in the style of pdf2dsc. If someone could implement this we could fix the problem. Below is a copy of the IRC discussion (everything not about this topic is removed): <tkamppeter> Anyone had a look at bug #690025 recently? <tkamppeter> It does not only occur when Ghostscript applies the "pxlmono" driver to its own PostScript but also when it applies this driver to its own PDF. So it is most probably the driver. <tkamppeter> And it breaks a PDF-based printing workflow, as many apps still send PS which needs to be converted by Ghostscript. <kens> tkamppeter I did look today at #690025 but wihtout any insight. I haven't so far been able to look at the pxl file to see what it looks like. <kens> My assumption is that the problem is the PDF text rednering mode, Tr 3 or Tr 2 where the text is either stroked, or filled and then stroked. There is an open bug against that at present. <kens> However, that ebing the case, it should also fail when the PDF or PostScript is sent to other drivers, which I gather it does not. <tkamppeter> kens, the problem only occurs with the pxlmono (and probably also pxlcolor) driver and it occurs with both Ghostscript-created PostScript (from pswrite) and Ghostscript-created PDF (from pdfwrite). <kens> tkamppeter Not surprised that PostScript and PDF from GS both cause the issue if it is indeed text rendering modes, as the code is the same. However I can't see why the pxl drivers would be the only one to exhibit problems. I'll try again to get my PCL installation working and view the output. <tkamppeter> It does AFAIK not occur with other output devices nor does it occur with pxlmono/pxlcolor and PostScript or PDF input coming from other sources than Ghostscript. <kens> tkamppeter Till, I have managed to run the PostScript file and I do see the 's' in Grayscale rendered as a solid black square. Also the '1' in 100% is a solid rectangle either black or white. <kens> This does not happen when rendering to any other device I've tried (same version of GS, same PostScript file, ie the output from pswrite). <kens> I don't think this is a pswrite/pdfwrite problem per se, since it doesn't happen with other devices, and I don't think it can be text rendering modes, since we started with a PostScript file... <henrys> kens:so you have a postscript file that only produces bad output on the pxl devices? <kens> henrys It seems that wya yes, weird huh ? <ray_laptop> morning, all <henrys> marcos has been handling pxl devices, assign it to him. <kens> OK, I'll pass it his way, I can't really see what the problem is. <henrys> kens:sometimes using -Z# and looking for problems in the gdevpx.c helps. <kens> henrys tried that, no errors reported :-) <henrys> there could also be a resolution issue if a bitmap got into the picture. <rillian> henrys: now I have <henrys> I guess -r600 works... <kens> henrys there are a *lot* of bitmaps. All the text is converted into imagemask operations when using pswrite. Haven't tried -r 600 yet, will give it a go as well. <tkamppeter> kens, pswrite converts all text into bitmaps? Can this also be a problem why GS can run into problems with big input files? <ray_laptop_> tkamppeter: yes, pswrite converts text to bitmaps, ps2write does "real text" <henrys> kens:I guess marcos would be thankful if you could isolate the imagemask that is the source of the 's' being rendered as a square. <tkamppeter> ray_laptop, and pdfwrite, does this real text? <ray_laptop> tkamppeter: yes, pdfwrite also does 'real text' <ray_laptop> tkamppeter: to be more precise, pswrite creates a type 3 font of bitmaps and then paints text referring to the bitmaps in the pseudo font <ray_laptop> tkamppeter: this is from memory -- it's been a long time since I looked into pswrite (caveat emptor) <tkamppeter> ray_laptop: this is bad for the CUPS pdftops filter. if this filter is used, for example to print a PDF file on a PostScript printer the PS file gets much bigger than the PDF input and so the pstops filter or the printer can crash on big input files due to lack of memory. <kens> Apologies, been away updating the bug report. Text 'usually' ends up as bitmaps but I believe sufficiently large glyph will be uncahced and rendered as outlines (with pswrite) <kens> pdfwrirte will usually embed teh original font, and therefore 'real' text. However, there are conditions under which it will resort to bitmaps as well. <henrys> we are trying to deprecate pswrite and move folks to ps2write <kens> tkamppeter the file size inflation and the lack of scalability are why I've been reluctant to have you continue to use pswrite. <kens> However until I get a chance to revisit ps2write and make it DSC compliant there is no choice. <ray_laptop> tkamppeter: so there is stuff that relies on DSC comments later in the workflow ? <tkamppeter> ps2write will only be usable in the CUPS pdftops filter if we have the PDF workflow everywhere, as then the page management is done by pdftopdf and afterwards pdftops makes PS for PS printers. Then non-DSC PostScript is OK. <tkamppeter> If a distro did not adopt the PDF workflow the pdftops filter is run as the first filter and afterwards page management is done by pstops. pstops needs DSC-compliant POstScript. <kens> tkamppeter are you sure that this also occurs when using pdfwrite ? I just tried the testpage-a4.ps file here, running to PDF then converting the PDF to PXL using the pxlmono driver, this works OK. <tkamppeter> kens is this exactly what I did in comment #16 of bug #690025? There the "s" in "grayscale" gets a black square. <tkamppeter> Ghostscript is current trunk on Jaunty. <kens> tkamppeter your comment #16 describes using pswrite, not pdfwrite. <kens> Sorry, the description says pdfwrite, but the command line says pswrite. <alexcher> ps2write can be easily mede DSC-compliant by copying PDF part to a reusable stream and writing DSC-compliant reader in the style of pdf2dsc. Do we want this? <kens> If I use PDF as the input, then it is OK, if I convert the same PDF to PostScript using pswrite then I see the identical problem to starting with the original pswrite output. <kens> alexcher yes we do want this, we need ps2write to be DSC compliant <alexcher> Bot reusable streams are level 3 feature and the whole file is stored in the core. <kens> Would be nice to have level 2 DSC output too of course. <tkamppeter> kens, now I see, this test is wrong, seems that I converted PostScript to PostScript with the first command line .. <kens> tkamppeter agreed, but while using PDF as an input works (for me) converting the PDF to PostScript gives the same wrong result. <tkamppeter> kens, if I correct the test of comment #16, using pdfwrite in the first line I get correct output. I will need another input file. <kens> tkamppeter Given that there's definitely a problem with the pswrite-produced PostScript and the pxl driver, maybe we should just concentrate on that for now. <henrys> tkamppeter or kens:can you simplify the test to 1 character for marcos, that would let him focus on the driver issue. <tkamppeter> kens, in the linked Ubuntu bug it happens when incoming PS is turned to PDF by the pstopdf CUPS filter (calls ps2pdf13) and afterwards fed into GS with the pxlmono driver. <kens> tkamppeter I was just ing regular GS and the pdfwrite driver so I hadn't restricted the version. Possibly there is some reason for this. Could be converting fonts to bitmaps, which might explain the commonality of the issue. I'll try that too. <kens> tkamppeter just setting -dCompatibilityLevel=1.3 doesn't cause a problem for me, teh PDF is still OK with pxlmono, will look further tomorrow. <tkamppeter> kens, I will ask the Ubuntu users on the Ubuntu bug top supply sample input files for me, as I cannot reproduce the problem with pdfwrite. <kens> tkamppeter, OK thanks Till, in the meantime I will reduce the PostScript problem and hope Marcos will work on it. <tkamppeter> ray_laptop, kens: With ps2write there are no squares, but the problem is that it is not DSC-conforming. <kens> tkamppeter yes, we know that ps2write is not DSC compliant. Its a useful clue that it is OK though, like the PDF file, it suggests the problem is the fact that text has been turned into images. <kens> henrys When do you want to start ? ;-) <tkamppeter> DSC conformance is also needed for the PDF workflow, as then the cpdftocps CUPS filter converts PDF to PS at first (using the pdftops CUPS filter) and then applies the PPD options to the PS data (using CUPS pstops filter, required DSC compliance). <tkamppeter> This is needed for PS printers. <kens> tkamppeter I think we just need to accept that we need to make ps2write DSC compliant. It son my little list of things to do. <tkamppeter> So replacing pswrite by ps2write in the CUPS pdftops filter is not (yet) possible. <tkamppeter> kens, In which time frame will this happen? <kens> tkamppeter I'm afraid there is no time frame, its an enhancement, and there are ltos of those to do. Mostly from commercial customers. Revisions 11827, 11828, 11835, 11838, 11938, 11946, 11950, 11951, 11952. Taken together these modify ps2write so that it is capable of emitting DSC-compliant PostScript, and does so by default. Also updated documentation and scripts, pswrite is now deprecated. Tested by running files through the currently (GS 9.0) released ps2write and the modified ps2write, then rendering both sets of PostScript output, and comparing the results. The new code does not apparently introduce any errors in the 2700 test files used. Also tested the DSC-compliant output with GSView, psselect and psnup which seem to perform acceptably. Till, it would be *really* useful to get some insight on the usage with CUPS, whether the output is now acceptable for this purpose, and if not what more needs to be done. I'm leaving the bug open for now in the hope of more feedback. Now I have installed the current SVN snapshot of Ghostscript onto my Natty system. I have also built CUPS with the "--with-pdftops=/usr/bin/gs" option for "./configure" and with filter/pdftops.c patched so that Ghostscript is used with "ps2write" instead of "pswrite". I have also deactivated the "pdftopdf" filter to get the old PostScript-based filter workflow, as otherwise the pdftops CUPS filter gets never used. Now I have sent a PDF job (file attached) to a PostScript printer and the result is that landscape pages are not rotated to fit the paper and images and fonts are missing. So I tried to isolate the problem by running Ghostscript manually: cat launch_leaflet.pdf | gs -dPARANOIDSAFER -dNOPAUSE -dBATCH -sDEVICE=ps2write -sstdout=%stderr -sOutputFile=%stdout -_ > out.ps out.ps (also attached) shows the same problem as my first test with CUPS: Missing fonts and pictures, landscape pages not rotated. If I use good old "pswrite", the conversion takes longer but the output is absolutely correct. The output of "ps2write" is not perfectly DSC-conforming. Comments seem to be correct, but not the maximum line length (and it also contains binary data): till@till:~/ubuntu/cups/test/cups-1.4.5$ cupstestdsc out.ps out.ps: FAIL Line 29303 is longer than 255 characters (1022)! REF: Page 25, Line Length Saw 19 lines that exceeded 255 characters! Warning: file contains binary data! till@till:~/ubuntu/cups/test/cups-1.4.5$ "pswrite" is perfectly DSC-conforming: till@till:~/ubuntu/cups/test/cups-1.4.5$ cupstestdsc out.ps out.ps: PASS till@till:~/ubuntu/cups/test/cups-1.4.5$ If I use Ghostscript 8.71 as it comes with Maverick, "ps2write" does not give DSC-conforming PostScript at all, as expected: till@eee-pc:~$ cupstestdsc out.ps out.ps: FAIL Missing %!PS-Adobe-3.0 on first line! REF: Page 17, 3.1 Conforming Documents till@eee-pc:~$ less out.ps The output of "ps2write" is somewhat better: Still not rotated landscape pages and missing pictures but fonts are appearing correctly. So in terms of DSC-conforming output the missing parts are the maximum line length and the binary data, but to accept "ps2write" as part of a general print filter, there are independent problems which also need to get fixed. Created attachment 7054 [details]
launch_leaflet.pdf
Input file for the tests described in the previous comment. Pages are landscape and there are many images inside.
Created attachment 7055 [details]
out-ps2write-9.01.ps
Output of the new "ps2write" of GS 9.01 (current snapshot).
Created attachment 7057 [details]
out-ps2write-8.71.ps
Output of the old "ps2write" in GS 8.71 (from Maverick).
Snapshots are of SVN rev 11960. Testing more PDF input files I get exceeding of the maximum line lenght as only DSC-spec violation. I also get often the warning about the output file containing binary data. Created attachment 7058 [details]
2008-07-Logistics.pdf
Another input file where fonts are not correctly rendered by the "ps2write" device in GS 9.01. Output file is 3.4 MB, whereas the output file of 8.71 displays the fonts correctly but is 55 MB.
(In reply to comment #5) > out.ps (also attached) shows the same problem as my first test with CUPS: > Missing fonts and pictures, landscape pages not rotated. I haven't run the job yet, but I'm not clear on why the pages should be rotated by ps2write. If the pages need to be rotated in order to fit the printer, then that should be handled either by the printer itself selecting media based on the setpagedevice request, or by the document manager dealing with the problem (assuming that the BoundingBox comments are correct). I'll run the file and look into this later. > The output of "ps2write" is not perfectly DSC-conforming. Comments seem to be > correct, but not the maximum line length (and it also contains binary data): As far as I'm aware it is legal to use binary data in a DSC conforming document. Indeed the DSC specification includes the %%BeginBinary: and %%EndBinary: comments. Possibly the long line length is caused by the use of binary data, and I'll look into why the line is more than 256 characters. Perhaps I did not express myself well with the missing rotation of Landscape images. The problem is the following: If I do simply "gs out.ps". I get a Portrait-oriented window with the Landscape-oriented page in it. So the right part of the page is cut off. And the same I get printed on paper. Looks like the output has a wrong BoundingBox. (In reply to comment #13) > Perhaps I did not express myself well with the missing rotation of Landscape > images. The problem is the following: If I do simply "gs out.ps". I get a > Portrait-oriented window with the Landscape-oriented page in it. So the right > part of the page is cut off. And the same I get printed on paper. Looks like > the output has a wrong BoundingBox. I see the problems with launch_leaflet, haven't tried the other one yet. Clearly the PostScript is not requesting a page size, which is probably due to a change I made to make the output portable enough to use with psnup. The text is also clearly wrong, and I've seen that in other places too so I obviously need to tackle something there. Could be a conversion to type 3 that's the issue. A quick check with pdfwrite shows none of these problems, and since ps2write is based very closely on ps2write (much of the code is the same) this shows that the information is present, so it should be straight-forward to resolve the issues. For now I'll assume the 'logistics' file is the same, but I'll test it when I have the launch leaflet working properly, in case it exposes new problems. revision 11974 should fix the page size issue. This was due to the presence of extra functionality in ps2write, which isn't present in pswrite. With ps2write the output PostScript file does not set the page size by default, but allows the page to be rotated/scaled/cropped depending on various switches. This doesn't really make sense in a DSC environment, so I've initialised the 'SetPageSize' flag to true, which means that the output PostScript will request an appropriate page size. With this change there do not appear to be any further problems with launch_leaflet.pdf for me, the page is the correct size, and the text and images all appear. Because the source PDF uses TrueType fonts they are converted into type 3 bitmaps, which does cause Acrobat to display them somewhat differently to the original (this can be minimised by turning off all of Acrobat's 'enhancements' for text viewing). 2008-07-Logistics.pdf now causes a PostScript error after conversion, and page 1 is rendered incorrectly, so I'll look into that one. Till, the warning you've quoted says the maximum line lengths is exceeded on page 25 (19 lines), but both these files have fewer pages than that. Can you tell me which page(s) from these PDF files have incorrect line lengths after DSC PostScript conversion please ? Or if these files are unaffected, can I see one that does cause this please ? (In reply to comment #15) > 2008-07-Logistics.pdf now causes a PostScript error after conversion, and page > 1 is rendered incorrectly, so I'll look into that one. This looks like the problem is that the file includes a Symbolic TrueType font. The opdfread prolog seems to be applying a standard Encoding for this font, which simply isn't going to work. The text uses direct glyph IDs (eg 0x01, 0x02 etc) which map to /.notdef glyphs in the standard Encoding. I think the proper approach is to map the character codes directly to GIDs in this case but I'll need to do some more investigation to be sure. Updated to rev 11978: Landscape output of both files is working correctly now. Text of Launch Leaflet displays correctly, but of Logistics the text is broken as before. In Launch Leaflet the images which were missing before are still missing. Line length issue still there: Launch Leaflet: till@till:~/ghostscript/gpl/testfiles$ cupstestdsc out.ps out.ps: FAIL Line 29304 is longer than 255 characters (1022)! REF: Page 25, Line Length Saw 19 lines that exceeded 255 characters! Warning: file contains binary data! till@till:~/ghostscript/gpl/testfiles$ Logistics: till@till:~/ghostscript/gpl/testfiles$ cupstestdsc out.ps out.ps: FAIL Line 8157 is longer than 255 characters (295)! REF: Page 25, Line Length Saw 75 lines that exceeded 255 characters! Warning: file contains binary data! till@till:~/ghostscript/gpl/testfiles$ NOTE: The reference to page 25 is not a reference to the page in the PostScript file, but a reference to Adobe's document about the DSC specs. The only reference to the file is the line number in the second line of the output. (In reply to comment #17) > Updated to rev 11978: > > Landscape output of both files is working correctly now. Text of Launch Leaflet > displays correctly, but of Logistics the text is broken as before. Yes, that's the Symbolic TrueType font, I'll get back to that in the New Year. The problem seems to be that the prolog is applying a standard Encoding to a symbolic font, which is clearly incorrect. > In Launch > Leaflet the images which were missing before are still missing. Hmm, I didn't notice any missing images, can you point them out to me ? > Line length issue still there: [snip] > NOTE: The reference to page 25 is not a reference to the page in the PostScript > file, but a reference to Adobe's document about the DSC specs. Aha! Brilliant, thanks. I suspect this is the type 3 font bitmaps, I think I can fix that easily enough if that's the case. (In reply to comment #18) > (In reply to comment #17) > > In Launch > > Leaflet the images which were missing before are still missing. > > Hmm, I didn't notice any missing images, can you point them out to me ? "x11alpha" must die. If I use the "x11" output device, Launch Leaflet works perfectly. (In reply to comment #19) > (In reply to comment #18) > > (In reply to comment #17) > > > In Launch > > > Leaflet the images which were missing before are still missing. > > > > Hmm, I didn't notice any missing images, can you point them out to me ? > > "x11alpha" must die. If I use the "x11" output device, Launch Leaflet works > perfectly. Ah, thanks for checking that Till! revision 11988: http://ghostscript.com/pipermail/gs-cvs/2011-January/012077.html should fix the text problem with 2008-07-Logistics.pdf. An earlier change to pdfwrite (to conform to the PDF spec) meant that we weren't writing an /Encoding for symbolic TrueType fonts. This is inappropriate for ps2write because if the font has no Encoding it assumes StandardEncoding and that will not work properly with a Symbolic font. Unlike PDF, PostScript has no notion of Symbolic fonts. So just the line length left to deal with. The (first) line length problem in logistics is the PDF Title. The (first) line length problem in launch_leaflet is a string lookup table for a /Indexed CMYK colour space. Neither of which is what I expected :-) Looks like I'll need to investigate string emission in ps2write. If the string exceeds 256 bytes I'll need to do 'something' about it. I think its safe to include CR/LF white space inside a string delimited by '(' and ')' as they have no effect, if you want CR/LF in such a string you need to use /r and /n. I'll look into it tomorrow. I'd be a little nervous that some PS processors could be confused if we put line breaks inside a string. What if the next line starts with '%%' ? Although it is less efficient w.r.t. the size of the output, putting strings in as 'hex' would avoid the issue. For example (string) as <7374726967> since hex strings _can_ contain whitespace (for line breaks) , and this avoids all of the nonsense for escaping "special" characters such as ( ) \ etc. and can actually be more efficient for strings with characters beyond <7f> which must be encoded in octal (at a cost of 4 bytes per character). (In reply to comment #23) > I'd be a little nervous that some PS processors could be confused if we put > line breaks inside a string. What if the next line starts with '%%' ? If its reading PostScript then it will parse the () correctly, if its not reading PostScript but scanning then its exceedingly unlikely to parse out a meaningful DSC comment (except possibly from a PostScript representation of the DSC specification ;-). Finding %% on its own should do no more than pass on to the next line. In addition, the page contents are by default compressed and stored in hex strings, so its only the reusable objects (colour spaces etc) and 'furniture' like the document title which are written like this. > Although it is less efficient w.r.t. the size of the output, putting strings > in as 'hex' would avoid the issue. We use a single routine to write all strings for both ps2write and pdfwrite, and I'm not certain that PDF allows us to freely substitute hex strings for ASCII strings, unlike PostScript which does. As I said, I need to look at the problem. revision 11994 should fix the line length issue. When emitting DSC-compliant PostScript from ps2write we now write strings as hex strings. If the string length exceeds 255 bytes we insert newlines as appropriate. We also prepend and append the string with newlines to ensure that preceding or trailing data doesn't flow over the line length. pdfwrite is unchanged by this and continues to emit strings as escaped ASCII. I have tested with rev 12000 and it works great! I get only till@till:~/ghostscript/gpl/testfiles$ cupstestdsc out.ps out.ps: PASS Warning: file contains binary data! till@till:~/ghostscript/gpl/testfiles$ gs out.ps for both input files and if I display out.ps with Ghostscript the screen output is correct. (In reply to comment #26) > I have tested with rev 12000 and it works great! I get only > > till@till:~/ghostscript/gpl/testfiles$ cupstestdsc out.ps > out.ps: PASS > Warning: file contains binary data! > till@till:~/ghostscript/gpl/testfiles$ gs out.ps As noted on IRC I think this is the trailing 0x04 (EOF) character, its the only binary I can identify in the file. I don't think this is really a problem. > for both input files and if I display out.ps with Ghostscript the screen output > is correct. Excellent! Many thanks for your help with this Till. At present the file is more likely to be DSC-compliant if it uses CompressPages=true (the default now) as this uses LZW compression and then the ASCII85 filter, this filter not only writes the output as non-binary, it keeps the line length under 80 characters. Without this there is currently no limit on the line length, though I think only long lines of text will break it. I intend to look at that condition next, its minor but it would be useful not to have to compress the pages all the time. There may well be bugs still in the ps2write output, but a number of our customers use it so hopefully not too many. If you don't mind I will close this issue now, and we can track any bugs that do crop up as new Bugzilla items. If you do come across any bugs, do please report them! It is indeed the trailing 0x04 which causes the binary data warning. I tried it out by emacsing the 0x04 away. For print spoolers it is not needed to add this 0x04. Perhaps it is even better to remove it as the spoolers could add PJL commands at the end of the job and they should be considered part of the same job to avoid unwished effects like a trailing blank page. WDYT? By definition, PJL (which must be preceded by UEL) aborts/ends previous jobs, so 0x04 followed by UEL or PJL should not cause any extra jobs or blank pages by any conforming spooler or processor. revision 12002 doesn't emit the trailing 0x04 (EOF) byte when producing DSC compliant output. I don't think this will cause any problems, just have to wait and see if anyone complains. |