Bug 704872 - Problem rendering pdf to ps and back to pdf using ps2write and pdfwrite
Summary: Problem rendering pdf to ps and back to pdf using ps2write and pdfwrite
Status: RESOLVED INVALID
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PS Writer (show other bugs)
Version: 9.50
Hardware: PC Linux
: P4 normal
Assignee: Default assignee
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-02-03 09:45 UTC by David Marogy
Modified: 2022-02-03 14:28 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
PDF from Weasyprint version 54 and 52, converted ps, hack script (2.09 MB, application/zip)
2022-02-03 09:45 UTC, David Marogy
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Marogy 2022-02-03 09:45:52 UTC
Created attachment 22028 [details]
PDF from Weasyprint version 54 and 52, converted  ps, hack script

Hello,

i have an issue converting my PDF format from RGB to CMYK with plain black. 
After finish converting my PDF is always renderd as one Image instead of many objects such like texts, image, vectors etc.

My steps to reproduce are creating a PDF using Weasyprint Version 54, converting the PDF to PS using ps2write and converting the PS to PDF with CMYK and Plain Black. Whats realy confusing is that the problem does not occur if i use Weasyprint Version 52.5 to create the pdf. I tried contacting the Weasyprint support but they didn't realy understand why this happens. See issue : https://github.com/Kozea/WeasyPrint/issues/1501


It seems this problem occures if you convert the PDF to an PS using ghostscript. I need this step to convert rgb to cmyk with plain black instead of rich black. Ghostscript doesn't realy say that an error occured:

GPL Ghostscript 9.50 (2019-10-15)
Copyright (C) 2019 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
Processing pages 1 through 1.
Page 1
GPL Ghostscript 9.50 (2019-10-15)
Copyright (C) 2019 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.



The same problem occured if there was a transparend image, but i know that postscript cannot handle transparency very well, so i tried it without any one.

I added a zip with my newest test with all the converted pdfs and postscript files and the hack for converting rich black to plain black. Perhaps you could find a difference between them. Would be nice if we could solve this.



Here is my ghostscript code:

def convert_pdf_to_cmyk(pdf_bytes: bytes) -> bytes:
    if pdf_bytes is not None:
        with NamedTemporaryFile(prefix="touriprint_pdf_", suffix=".pdf") as rgb_pdf_file:
            rgb_pdf_file.write(pdf_bytes)
            rgb_pdf_file.seek(0)
            # Converting pdf from RGB to CMYK
            # https://stackoverflow.com/questions/6241282/converting-pdf-to-cmyk-with-identify-recognizing-cmyk
            # HACK to convert rich black to plain CMYK black we need to convert it to ghostscript than to pdf
            # using a colour conversion script.
            # By default RGB->CMYK will create rich black instead plain K black
            # https://stackoverflow.com/questions/6248563/converting-any-pdf-to-black-k-only-cmyk/9024346#9024346).
            with NamedTemporaryFile(prefix="ghostscript_", suffix=".ps") as ghostscript_file:
                command = [
                    "gs",
                    "-q",
                    "-o",
                    ghostscript_file.name,
                    "-dNOPAUSE",
                    "-dBATCH",
                    "-sDEVICE=ps2write",
                    rgb_pdf_file.name,
                ]
                subprocess.check_call(command)
                with NamedTemporaryFile(prefix="converted_", suffix=".pdf") as converted_pdf_file:
                    command = [
                        "gs",
                        "-q",
                        "-o",
                        converted_pdf_file.name,
                        "-sDEVICE=pdfwrite",
                        "-dNOPAUSE",
                        "-dBATCH",
                        "-sProcessColorModel=DeviceCMYK",
                        "-sColorConversionStrategy=CMYK",
                        "-sColorConversionStrategyForImages=CMYK",
                        "-dOverrideICC",
                        "-dEncodeColorImages=true",
                        os.path.join(DOCUMENT_DATA_DIR, "rgb_to_plain_cmyk_black.ps"),
                        ghostscript_file.name,
                    ]
                    subprocess.check_call(command)
                    pdf_bytes = converted_pdf_file.read()
    return pdf_bytes


Best regards,
David
Comment 1 Ken Sharp 2022-02-03 10:16:12 UTC
(In reply to David Marogy from comment #0)

Your PDF files contain transparency, both of them.

In fact, the transparency is pointless, since all the graphics states set the alpha to 1 and there is no other transparency in the file, but we can't know that without processing the whole file (and even then there are cases where it would be difficult, as well as highly time-consuming, to be certain, such as examining the value of every image sample).

Now we do try to avoid pointless transparency, but there are limits.

The '52' file is created using Cairo, which is a well known producer of this sort of thing and we can detect that it doesn't really need the transparency. 

The '54' file is produced in an utterly different manner and I suspect is using a totally different PDF engine. In this case the transparency definition has moved from the page level to a Form XObject in the depths of the document, and we can no longer detect the fact that it does not truly use transparency.


Now as you later note, the PostScript language does not support PDF transparency, and so the only way to reliably represent the content of the PDF file in PostScript is to render it to an image. So that's what the ps2write device does.

So, for your '52' file we detect the pointless transparency and ignore it, resulting in a vector PostScript output. for your '54' file we cannot eliminate the transparency and end up rendering the content to an image.


> It seems this problem occures if you convert the PDF to an PS using
> ghostscript. I need this step to convert rgb to cmyk with plain black
> instead of rich black.

I can't see any reason why you 'need' to convert to PostScript. Your 'hack' doesn't seem to do anything useful, it simply writes the output to PostScript. If you actually need to go via PostScript for some reason then you are going to have to do something about your input file using transparency.


> Ghostscript doesn't realy say that an error occured:

Well it wouldn't, there was no error.


> The same problem occured if there was a transparend image, but i know that
> postscript cannot handle transparency very well, so i tried it without any
> one.

Nevertheless, both your PDF files contain transparency operations.


> Here is my ghostscript code:

This clearly isn't the Ghostscript API, perhaps you are using some other wrapper such as Ghostscript.NET (which is nothing to do with us). In any event I can't do anything with this. If you think there is a problem you need to supply us with steps which involve using the Ghostscript executable to reproduce them.

I do not see any evidence of a bug here.
Comment 2 David Marogy 2022-02-03 13:08:11 UTC
Thanks for your Quick reply.

(In reply to Ken Sharp from comment #1)
> (In reply to David Marogy from comment #0)
> 
> Your PDF files contain transparency, both of them.
> 
Okei? Didn't expect that, so the pdf is created with images having an alpha channel set to 1. I used images in rgb format, without transparency, so i didn't expect that.
> In fact, the transparency is pointless, since all the graphics states set
> the alpha to 1 and there is no other transparency in the file, but we can't
> know that without processing the whole file (and even then there are cases
> where it would be difficult, as well as highly time-consuming, to be
> certain, such as examining the value of every image sample).
> 
> Now we do try to avoid pointless transparency, but there are limits.
> 
> The '52' file is created using Cairo, which is a well known producer of this
> sort of thing and we can detect that it doesn't really need the
> transparency. 
> 
> The '54' file is produced in an utterly different manner and I suspect is
> using a totally different PDF engine. In this case the transparency
> definition has moved from the page level to a Form XObject in the depths of
> the document, and we can no longer detect the fact that it does not truly
> use transparency.
> 
Correct, weasyprint switched from using cairo to their own PDF creation engine. Perhaps they will change this if its possible and hopefulle this would work again.
> 
> Now as you later note, the PostScript language does not support PDF
> transparency, and so the only way to reliably represent the content of the
> PDF file in PostScript is to render it to an image. So that's what the
> ps2write device does.
> 
> So, for your '52' file we detect the pointless transparency and ignore it,
> resulting in a vector PostScript output. for your '54' file we cannot
> eliminate the transparency and end up rendering the content to an image.
> 
> 
> > It seems this problem occures if you convert the PDF to an PS using
> > ghostscript. I need this step to convert rgb to cmyk with plain black
> > instead of rich black.
> 
> I can't see any reason why you 'need' to convert to PostScript. Your 'hack'
> doesn't seem to do anything useful, it simply writes the output to
> PostScript. If you actually need to go via PostScript for some reason then
> you are going to have to do something about your input file using
> transparency.
> 
Yeah i first convert the pdf to ps and after that when i convert it back to pdf the hack.ps file is used so that the rgb black colour is mapped to cmyk plain black. Without using the step of converting it to ps, my converted pdf results in a cmyk pdf with rich black. I unfortunally need it in plain black for printing.
> 
> > Ghostscript doesn't realy say that an error occured:
> 
> Well it wouldn't, there was no error.
> 
> 
> > The same problem occured if there was a transparend image, but i know that
> > postscript cannot handle transparency very well, so i tried it without any
> > one.
> 
> Nevertheless, both your PDF files contain transparency operations.
> 
> 
> > Here is my ghostscript code:
> 
> This clearly isn't the Ghostscript API, perhaps you are using some other
> wrapper such as Ghostscript.NET (which is nothing to do with us). In any
> event I can't do anything with this. If you think there is a problem you
> need to supply us with steps which involve using the Ghostscript executable
> to reproduce them.
> 
Ah sorry i used python to implement the ghostscript command line tool for my application. In the end i have 2 steps.
      
First is to convert the pdf to PS:
gs -q -o ghostscript_file.ps -dNOPAUSE -dBATCH -sDEVICE=ps2write rgb_file.pdf

The next step is to convert the result from above with rgb_to_plain_cmyk_black.ps to the converted.pdf file:

gs -q -o converted.pdf -sDEVICE=pdfwrite -dNOPAUSE -dBATCH sProcessColorModel=DeviceCMYK -sColorConversionStrategy=CMYK -sColorConversionStrategyForImages=CMYK -dOverrideICC -dEncodeColorImages=true rgb_to_plain_cmyk_black.ps ghostscript_file.ps,


If i don't convert it to ps befor converting it to a cmyk PDF using the hack it will result in a pdf with texts in rich black instead of plain black. Unfortunaly i need it in plain black.

Is there an easier way to convert a rgb pdf to cmyk pdf with plain black instead of rich black?

> I do not see any evidence of a bug here.
Yeah it seems so thanks a lot :).
Comment 3 Ken Sharp 2022-02-03 13:38:55 UTC
(In reply to David Marogy from comment #2)

> > Your PDF files contain transparency, both of them.
> > 
> Okei? Didn't expect that, so the pdf is created with images having an alpha
> channel set to 1. I used images in rgb format, without transparency, so i
> didn't expect that.

No. PDF transparency is much more complex than a simple alpha channel in images. Your files contain transparency groups in  both cases. Nothing to do with images.


> Yeah i first convert the pdf to ps and after that when i convert it back to
> pdf the hack.ps file is used so that the rgb black colour is mapped to cmyk
> plain black. Without using the step of converting it to ps, my converted pdf
> results in a cmyk pdf with rich black. I unfortunally need it in plain black
> for printing.

Well that's a bad way to proceed IMO. You should rather create a colour managed workflow entirely in PDF by using ICC profiles.

If you resort to using PostScript then you are going to be constrained by the architecture of that language.

 
> Is there an easier way to convert a rgb pdf to cmyk pdf with plain black
> instead of rich black?

You'd have to create input RGB and output CMYK ICC profiles and use those in place of the default profiles supplied with Ghostscript.
Comment 4 David Marogy 2022-02-03 13:53:30 UTC
(In reply to Ken Sharp from comment #3)
> (In reply to David Marogy from comment #2)
> 
> > > Your PDF files contain transparency, both of them.
> > > 
> > Okei? Didn't expect that, so the pdf is created with images having an alpha
> > channel set to 1. I used images in rgb format, without transparency, so i
> > didn't expect that.
> 
> No. PDF transparency is much more complex than a simple alpha channel in
> images. Your files contain transparency groups in  both cases. Nothing to do
> with images.
> 
Ah okei thanks for the explenation.
> 
> > Yeah i first convert the pdf to ps and after that when i convert it back to
> > pdf the hack.ps file is used so that the rgb black colour is mapped to cmyk
> > plain black. Without using the step of converting it to ps, my converted pdf
> > results in a cmyk pdf with rich black. I unfortunally need it in plain black
> > for printing.
> 
> Well that's a bad way to proceed IMO. You should rather create a colour
> managed workflow entirely in PDF by using ICC profiles.
> 
> If you resort to using PostScript then you are going to be constrained by
> the architecture of that language.
> 
>  
> > Is there an easier way to convert a rgb pdf to cmyk pdf with plain black
> > instead of rich black?
> 
> You'd have to create input RGB and output CMYK ICC profiles and use those in
> place of the default profiles supplied with Ghostscript.
Okei, so using these profiles it would be possible to convert RGB to CMYK while preserving the plain black. Converting would be done in Ghostscript i would assume.

I understand, problem here is that i don't have any expertise in this field, do you perhaps know of some ICC profiles for my problem? 

Do i also need a converting profile for mapping the input to output profile?

Thanks again for helping me here.
Comment 5 Ken Sharp 2022-02-03 14:03:39 UTC
(In reply to David Marogy from comment #4)

> > You'd have to create input RGB and output CMYK ICC profiles and use those in
> > place of the default profiles supplied with Ghostscript.
> Okei, so using these profiles it would be possible to convert RGB to CMYK
> while preserving the plain black. Converting would be done in Ghostscript i
> would assume.

That would be the intention, yes.

 
> I understand, problem here is that i don't have any expertise in this field,
> do you perhaps know of some ICC profiles for my problem? 

No, and this is into the realm of technical support, we don't (generally) do tech support for free users. This is also outside my own area of expertise.

 
> Do i also need a converting profile for mapping the input to output profile?

You need 2 profiles, one for RGB and one for CMYK, the combination of the 2 maps from RGB -> CIE -> CMYK
Comment 6 David Marogy 2022-02-03 14:21:58 UTC
(In reply to Ken Sharp from comment #5)
> (In reply to David Marogy from comment #4)
> 
> > > You'd have to create input RGB and output CMYK ICC profiles and use those in
> > > place of the default profiles supplied with Ghostscript.
> > Okei, so using these profiles it would be possible to convert RGB to CMYK
> > while preserving the plain black. Converting would be done in Ghostscript i
> > would assume.
> 
> That would be the intention, yes.
> 
>  
> > I understand, problem here is that i don't have any expertise in this field,
> > do you perhaps know of some ICC profiles for my problem? 
> 
> No, and this is into the realm of technical support, we don't (generally) do
> tech support for free users. This is also outside my own area of expertise.
> 
Yeah i understand, no problem.
>  
> > Do i also need a converting profile for mapping the input to output profile?
> 
> You need 2 profiles, one for RGB and one for CMYK, the combination of the 2
> maps from RGB -> CIE -> CMYK
What does CIE stands for?

And again thank you for helping me this much. I will try to find an icc profile or contact my printer to see if he has some expertiese in converting rgb to cmyk using icc profiles.
Comment 7 Ken Sharp 2022-02-03 14:28:51 UTC
(In reply to David Marogy from comment #6)

> > You need 2 profiles, one for RGB and one for CMYK, the combination of the 2
> > maps from RGB -> CIE -> CMYK
> What does CIE stands for?

Commission International de l'Eclairage the body responsible for colour standards. To quote Wikipedia:

"The CIE color model is a visual perception system based on three colors, red (X), green (Y), and blue (Z)...."

I should technically have said RGB->CIE XYZ->CMYK but the XYZ is usually just understood.