Bug 695306 - Converting a PDF to RGB colorspace causes missing drawing parts
Summary: Converting a PDF to RGB colorspace causes missing drawing parts
Status: RESOLVED FIXED
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: PDF Writer (show other bugs)
Version: master
Hardware: PC Linux
: P4 normal
Assignee: Ken Sharp
URL:
Keywords:
: 695335 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-06-11 02:35 UTC by Duan Yao
Modified: 2014-07-01 01:32 UTC (History)
1 user (show)

See Also:
Customer:
Word Size: ---


Attachments
PDF that to be converted to RGB colorspace (660.86 KB, application/pdf)
2014-06-11 02:35 UTC, Duan Yao
Details
The converted RGB pdf, missing 2 brown bands (655.08 KB, application/pdf)
2014-06-27 19:11 UTC, Duan Yao
Details
A minimized sample file to repoduce this color conversion issue (386.13 KB, application/pdf)
2014-06-28 09:15 UTC, Duan Yao
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Duan Yao 2014-06-11 02:35:02 UTC
Created attachment 10981 [details]
PDF that to be converted to RGB colorspace

I convert the attached PDF to RGB colorspace, but some part of the drawing are lost.

If sRGB is specified:

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dColorConversionStrategy=/sRGB -sOutputFile=p64-sRGB.pdf dili-7a-p64.pdf

a light blue and 2 brown parts are missing.

If RGB is specified:

gs -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dColorConversionStrategy=/RGB -sOutputFile=p64-RGB.pdf dili-7a-p64.pdf

only 2 brown parts are missing.

The console outputs are similar:

GPL Ghostscript GIT PRERELEASE 9.15 (2014-03-25)
Copyright (C) 2014 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1

   **** Warning: File encountered 'rangecheck' error while processing an image.

   **** Warning: File encountered 'rangecheck' error while processing an image.

   **** Warning: File encountered 'rangecheck' error while processing an image.

   **** Warning: File encountered 'rangecheck' error while processing an image.

   **** This file had errors that were repaired or ignored.
   **** The file was produced by: 
   **** >>>> GPL Ghostscript 9.14 <<<<
   **** Please notify the author of the software that produced this
   **** file that it does not conform to Adobe's published PDF
   **** specification.

  ./base/gsicc_manage.c:1050: gsicc_open_search(): Could not find �[� 
| ./base/gsicc_manage.c:1651: gsicc_set_device_profile(): cannot find device profile
Comment 1 Ken Sharp 2014-06-26 08:03:26 UTC
with the two commits mentioned in your other bug report #695316 I can't reproduce any problem here, there are no rangecheck errors and the content appears to be OK, though your description is a little vague.

I do see some form of memory corruption causing the ICC errors and I'll continue to work on that when I have some free time, so I'm leaving this open for that purpose.
Comment 2 Ken Sharp 2014-06-26 08:11:30 UTC
*** Bug 695335 has been marked as a duplicate of this bug. ***
Comment 3 Duan Yao 2014-06-27 05:04:59 UTC
Yes, I also got a segmentation fault with the latest code.
Comment 4 Ken Sharp 2014-06-27 05:26:17 UTC
(In reply to Duan Yao from comment #3)
> Yes, I also got a segmentation fault with the latest code.

I don't. How are you building Ghostscript ? Any shared libraries (I'd recommend you don't do this, as I can't compare my build with yours then) 64 bit or 32 bit ?

What's the SHA of the code you are using ?
Comment 5 Duan Yao 2014-06-27 07:32:51 UTC
I build abcf61e770b8b4687c571f321c79e6a0bb0a7424, system is ubuntu 14.04 64bit.

ldd /usr/local/bin/gs
	linux-vdso.so.1 =>  (0x00007fffd39fe000)
	libXt.so.6 => /usr/lib/x86_64-linux-gnu/libXt.so.6 (0x00007fc971086000)
	libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007fc970d51000)
	libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fc970b4c000)
	libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fc970846000)
	libidn.so.11 => /usr/lib/x86_64-linux-gnu/libidn.so.11 (0x00007fc970613000)
	libfontconfig.so.1 => /usr/lib/x86_64-linux-gnu/libfontconfig.so.1 (0x00007fc9703d6000)
	libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fc9701b8000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fc96fdf2000)
	libSM.so.6 => /usr/lib/x86_64-linux-gnu/libSM.so.6 (0x00007fc96fbe9000)
	libICE.so.6 => /usr/lib/x86_64-linux-gnu/libICE.so.6 (0x00007fc96f9cd000)
	libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007fc96f7ae000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fc97131d000)
	libfreetype.so.6 => /usr/lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007fc96f50a000)
	libexpat.so.1 => /lib/x86_64-linux-gnu/libexpat.so.1 (0x00007fc96f2e0000)
	libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007fc96f0db000)
	libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007fc96eed6000)
	libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007fc96ecd0000)
	libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fc96eab7000)
	libpng12.so.0 => /lib/x86_64-linux-gnu/libpng12.so.0 (0x00007fc96e890000)
Comment 6 Duan Yao 2014-06-27 08:16:50 UTC
Update:

I didn't "make clean" last time, so gs crash with seg fault.
If I do, gs don't crash, but 'rangecheck' errors still show up, and 2 brown bands are still missing.
Comment 7 Ken Sharp 2014-06-27 08:26:20 UTC
(In reply to Duan Yao from comment #6)
> Update:
> 
> I didn't "make clean" last time, so gs crash with seg fault.
> If I do, gs don't crash, but 'rangecheck' errors still show up, and 2 brown
> bands are still missing.

For me on 64-bit Fedora, built from the same source SHA and using the same command line neither gives a warning nor drops any content (as far as I can tell). However, assuming its memory corruption that's the cause, this is not too surprising, different memory configurations will cause corruption of different objects.

I'll carry on looking at the problem which I can see.

By the way, what are you using to view the output PDF ?
Comment 8 Duan Yao 2014-06-27 09:23:51 UTC
I use adobe reader 9 and evince 3.10.3, same result.
Comment 9 Ken Sharp 2014-06-27 13:23:24 UTC
(In reply to Duan Yao from comment #8)
> I use adobe reader 9 and evince 3.10.3, same result.

Would you attach your output PDF as well please ? Normally I wouldn't bother with that, but I'd like to see the difference.
Comment 10 Duan Yao 2014-06-27 19:11:06 UTC
Created attachment 11033 [details]
The converted RGB pdf, missing 2 brown bands

-dColorConversionStrategy=/sRGB is used, /RGB has same result.
Comment 11 Duan Yao 2014-06-27 21:20:52 UTC
The crash of gs metioned above may be due to out-of-memory problem, since conversion this file requires > 2GB virtual memory. If I set "ulimit -v 2500000", the conversion can complete; If I set "ulimit -v 2000000", gs crashes.
Comment 12 Ken Sharp 2014-06-28 01:17:20 UTC
As I suspected, Those portions are not missing on my output. We'll see what happens when the memory problem is fixed.
Comment 13 Ken Sharp 2014-06-28 01:20:22 UTC
(In reply to Duan Yao from comment #11)
> The crash of gs metioned above may be due to out-of-memory problem, since
> conversion this file requires > 2GB virtual memory. If I set "ulimit -v
> 2500000", the conversion can complete; If I set "ulimit -v 2000000", gs
> crashes.

There shouldn't be a crash anyway, you should get a VMerror, though it rather depends what the OS does when it trips the limit.

But again, I'll look at that when I've fixed the memory problem. I also don't see anything like this amount of memory used, and I can't see any good reason why it should be. For me the process peaked at 28Mb.
Comment 14 Duan Yao 2014-06-28 07:49:41 UTC
I tried to debug gs and found a clue to the large memory consumption.
A few calls to gs_heap_alloc_bytes() try to allocate very big trunk, and this is one of the stack trace, allocating 962636020 bytes:

gs_heap_alloc_bytes() at gsmalloc.c:173 0x942304	//size == 962636020
alloc_acquire_chunk() at gsalloc.c:1,845 0x91d2bd	
i_alloc_string_immovable() at gsalloc.c:1,064 0x91bcc8	
i_alloc_string() at gsalloc.c:1,033 0x91bbff	
pdf_make_sampled_base_space_function() at gdevpdfc.c:371 0x7778c0	
convert_DeviceN_alternate() at gdevpdfg.c:702 0x785bbc	 // pcs->params.device_n.num_components == 33776700
new_pdf_begin_typed_image() at gdevpdfi.c:1,930 0x79435f	
pdf_begin_typed_image() at gdevpdfi.c:2,122 0x794fa9	
pdf_image3x_make_mcde() at gdevpdfi.c:2,600 0x79646d	
gx_begin_image3x_generic() at gximag3x.c:292 0x5ddb57	

At gdevpdfg.c:702, the code is:

code = pdf_make_sampled_base_space_function(pdev, &new_pfn, pcs->params.device_n.num_components, 3, data_buff);

The num_components is 33776700, which is ridiculous. 
pcs's type is gs_color_space_index_Indexed, so pcs->params.device_n.num_components must be a misuse. pcs's base_space has type gs_color_space_type_DeviceN, so shouldn't pcs->base_space->params.device_n.num_components be used instead?
Comment 15 Ken Sharp 2014-06-28 08:29:05 UTC
(In reply to Duan Yao from comment #14)

> The num_components is 33776700, which is ridiculous. 

THat looks like the memory corruption, so it hsould be fixed when I finally find it.

> pcs's type is gs_color_space_index_Indexed, so
> pcs->params.device_n.num_components must be a misuse. pcs's base_space has
> type gs_color_space_type_DeviceN, so shouldn't
> pcs->base_space->params.device_n.num_components be used instead?

Umm, its not so simple, we want to use the number of components of the *new* base colour space. WHich is 3 for me on my debug. I feel this is the memory corruption at fault, but I will look at it, later....
Comment 16 Duan Yao 2014-06-28 09:15:21 UTC
Created attachment 11034 [details]
A minimized sample file to repoduce this color conversion issue

I removed irrelevant portions of orininal attachment, now the file only contains one image in Indexed colorspace. The conversion still produce an error message:

  Warning: File encountered 'rangecheck' error while processing an image.

After conversion, the image is lost. The conversion also requires very large memory on my machine, as metioned above.
Comment 17 Duan Yao 2014-06-28 09:22:51 UTC
(In reply to Ken Sharp from comment #15)

Waiting for your good news :)
Comment 18 Ken Sharp 2014-06-30 01:49:56 UTC
Commit e492e1671b7b1041ba123a22c1df3b920cf753af addresses the 2 problems I see.

The rangecheck is almost certainly responsible for the missing content, the PDF interpreter ignores such errors and continues with the input, in order to better mimic the behaviour of Adobe Acrobat. Obviously the object which causes the error doesn't end up in the output. I *believe* this is causes by the misplaecement of the dereference of Indexed spaces, which is moved to the head of the routine in this commit. This should resolve the rangecheck errors you see and should produce complete output.

The second part of the commit addresses the memory corruption I see, which is due to early freeing of some string data, while it still has pointers to it. This should resolve the gsicc_open_search() warnings. However, its in the nature of memory corruption that changing the memory layout even slightly makes can mask such problems but not actually fix them, its possible your problem was a different one.

But for now, this file now runs to completion without error or warnigns for me, and the result appears to be correct.
Comment 19 Duan Yao 2014-06-30 08:18:39 UTC
Thanks, now the result looks correct.

The image in converted PDF by gs has Indexed colorspace, whose base colorspace is DeviceN, whose alternate colorspace is DeviceRGB.

However, when I convert the PDF to sRGB with acrobat 10's "convert colors" tool, the image has Indexed colorspace, whose base colorspace is DeviceRGB; and the output file size if much smaller than gs's output. Most of gs's output's file size is taken by colorspace.

Can gs mimic acrobat's behavior? which seems more ideal.
Comment 20 Ken Sharp 2014-07-01 01:13:58 UTC
(In reply to Duan Yao from comment #19)

> The image in converted PDF by gs has Indexed colorspace, whose base
> colorspace is DeviceN, whose alternate colorspace is DeviceRGB.

'The' image ? Which file are you using here, the original PDF file has many images....
 
> However, when I convert the PDF to sRGB with acrobat 10's "convert colors"

My Acrobat X doesn't have an 'sRGB' profile, do you mean sRGB IEC61966-2.1 ?

> tool, the image has Indexed colorspace, whose base colorspace is DeviceRGB;
> and the output file size if much smaller than gs's output. Most of gs's
> output's file size is taken by colorspace.
> 
> Can gs mimic acrobat's behavior? which seems more ideal.

Using your original test file, the Acrobat file size is slightly larger (though not very significantly so) that Ghostscript's.....

The smaller test file is considerably larger, this is because although we have altered all the marking objects, we do not alter the blending colour space, which remains as a DeviceN/ICCBased space. Since we haven't altered that space we must continue to include the 4 colour (CMYK) ICC profile required for it, and that amounts to more than 500Kb of data. This accounts for practically all the difference between the 2 files.

In short the answer at present is 'no' as we only colour manage marking objects.
Comment 21 Duan Yao 2014-07-01 01:32:40 UTC
(In reply to Ken Sharp from comment #20)
> (In reply to Duan Yao from comment #19)
> 
> > The image in converted PDF by gs has Indexed colorspace, whose base
> > colorspace is DeviceN, whose alternate colorspace is DeviceRGB.
> 
> 'The' image ? Which file are you using here, the original PDF file has many
> images....
 The updated file "A minimized sample file to repoduce this color conversion issue", 386KB, has only one image.
>  
> > However, when I convert the PDF to sRGB with acrobat 10's "convert colors"
> 
> My Acrobat X doesn't have an 'sRGB' profile, do you mean sRGB IEC61966-2.1 ?
Yes, sRGB IEC61966-2.1.
> 
> > tool, the image has Indexed colorspace, whose base colorspace is DeviceRGB;
> > and the output file size if much smaller than gs's output. Most of gs's
> > output's file size is taken by colorspace.
> > 
> > Can gs mimic acrobat's behavior? which seems more ideal.
> 
> Using your original test file, the Acrobat file size is slightly larger
> (though not very significantly so) that Ghostscript's.....
> 
> The smaller test file is considerably larger, this is because although we
> have altered all the marking objects, we do not alter the blending colour
> space, which remains as a DeviceN/ICCBased space. Since we haven't altered
> that space we must continue to include the 4 colour (CMYK) ICC profile
> required for it, and that amounts to more than 500Kb of data. This accounts
> for practically all the difference between the 2 files.
> 
> In short the answer at present is 'no' as we only colour manage marking
> objects.
Thank for explanation. Still hope gs can eliminate this kind of overhead in future because we just want a RGB/sRGB file.