Bug 691995 - Command line works, DLL does not
Summary: Command line works, DLL does not
Status: NOTIFIED INVALID
Alias: None
Product: Ghostscript
Classification: Unclassified
Component: Client API (show other bugs)
Version: 9.01
Hardware: PC Windows XP
: P1 normal
Assignee: Alex Cherepanov
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-02-22 18:41 UTC by Marcos H. Woehrmann
Modified: 2011-10-02 02:35 UTC (History)
0 users

See Also:
Customer: 780
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2011-02-22 18:41:29 UTC
The customer reports that:

So here is what I am finding:  When I run gswin32.exe on this file it seems to work.  When I load the gs32dll.dll and run directly from it (as we do in our software) I get the following result in OUR output:

(M-D-Y)02-07-2011 14:03:44[PID:1024,TID:10]<->CVLtoPS:Executing 'gswin32.exe -I=.\;.\FONTS -sFONTPATH=C:\PROGRA~1\xxxxxx\WORKSRV\FONTS -dNOPAUSE -sstdout=nul-sDEVICE=tiffg4 -r204x196 -sOutputFile=C:\WINDOWS\TEMP\000000FA.TIF C:\PROGRA~1\xxxxxx\outgoing\000000F9.000 '
(M-D-Y)02-07-2011 14:03:44[PID:870,TID:20090731]<->The detected revision info is (870,20090731). Revision requested is (870,20090731)
(M-D-Y)02-07-2011 14:03:44[PID:0,TID:4241604]<!>LGSD:Processor exception occured during PostScript initialization.  Process failed
(M-D-Y)02-07-2011 14:03:44[PID:1024,TID:998]<!>CVLtoPS:Return from execution (0x3E6). Time for call = 0 secs

I compared the input file using beyond compare to see if we had changed the input from the original and found that they are identical.

Any idea why running from the EXE vs the DLL would be any different?  I actually tested from the same location with the EXE in the same folder as our production gsdll32.dll file and got the same result.  It works from gswin32.exe but not from the DLL.  Other (most) files work fine using the DLL only.

Any idea what could be going on?  Does error 0x3E6 ring a bell?
Comment 3 Marcos H. Woehrmann 2011-02-22 18:48:06 UTC
Ken had some suggestions/comments:

Our executable *also* uses the DLL (unless they've built the 'big executable' which seems unlikely). So what they are comparing is their executable using the DLL against our executable using the DLL. Ours works, theirs doesn't, which rather points the finger at their implementation somehow. (I'm assuming that they are using our built DLL and not building it themselves.)

The only interesting piece of information is the error value (0x3E6), which isn't a Ghostscript error, if it is a system error then it is error #998 'ERROR_NOACCESS' which indicates an invalid access to a memory location.

Assuming that to be the case, then it is likely a memory error (buffer overrun, memory corruption, access to freed memory etc). The reason it exhibits when using their executable and not ours is because the DLL executes in the address space of the parent process. Obviously memory locations will be different in their process.

Its not really possible for us to debug this without having the code for the process which provokes the issue. Also debug builds often don't exhibit the problem either, because the address layout changes. These kinds of problems are something of a nightmare.

At present the best suggestion I can make is that they upgrade to a newer version. I've fixed several memory problems over the years since the release of 8.64 (or 8.70, which ever they are actually using) and I'm sure the other developers have too. Its possible that the problem has been fixed already.

Its also possible it hasn't.


It looks to me like their application is trapping the exception and reporting it, to prevent the usual Windows crash dialog. The first thing I would suggest is that they try running their code without that to verify that what I'm saying is true. If it is they should get a crash dialog.

After that, its up to them what they want to do. They could send us the binary for their (modified to crash) application and we could try and see if we can get a crash here, if we can we could try using a DLL with symbols (but still optimised) and a just-in-time debugger. That might tell us where the crash happens and might enable us to identify whether a fix exists. We would absolutely have to know *precisely* which version of Ghostscript they are using, as we would need to build a DLL with symbols to do the test. Even the slightest deviation of the code could cause the problem to vanish as memory locations change.

They could send us the source to their application and we could try fully debugging it but I have to say its not very likely to be much more helpful than having their binary. The most likely thing is that we debug it, find its a crash in the garbage collector and then have to shrug and say 'well its *probably* fixed in a later revision'

The other thing they can do is upgrade to a newer version of Ghostscritp and hope the problem has been fixed. The trouble with that one is that the memory locations will have moved (because we've changed the code) and so the problem might still be present, just masked.


Pretty much all of the above is speculation based on the error return value though. You might want to mention to their developer that there is a Windows API call 'FormatMessage' which will turn system error numbers into human readable strings and that might be more useful than an opaque number.....
Comment 4 Marcos H. Woehrmann 2011-02-22 21:09:50 UTC
The customer has produced a sample program that demonstrates the problem, which also occurs with 9.01 (I've spoken with the customer and he's using the 9.01 DLL that we supply):

To test this, create a RFGSW folder under the gs9.01 build folder.  That way it can find the needed GS files.  Just put the RFGSW folder at the same level as all of the other main folders such as base, bin, contrib., cups, etc.

Then unzip the files into that folder.  Open RFGSW.sln using VS 2005.  I only worked with the debug build to duplicate the problem.  If you open the project properties and look under Debugging, you will see the command arguments that I was using.  You are welcome to use those (d:\Good.pdf d:\Output.tif) or change them to fit your needs.  I was not sure where the gsdll32.dll needed to be, so I put it both in the RFGSW folder as well as the RFGSW\Debug folder just to make sure it would load.  I am using version 9.01 with revision date 20110207.  I got it directly from the installer rather than compiling it ourselves and adding that variable to the mix.

Run the Good.pdf file through and you will see that all will work ok.  Then change the first argument to Broken.pdf and you will see the problem.  The way that we load and run the DLL has been pretty much the same for many versions of our software, so it may be that we are loading it incorrectly or that we are using old calls.
Comment 8 Ken Sharp 2011-02-28 15:33:09 UTC
If we remove the exception handling (as I suggested) then we can see that GP fault occurs in memmove called thus:

memmove
stream_compact
s_process_write_buf
s_std_write_flush
zflushfile             - flushing the warning message from the PDF interpreter

The buffer pointers are incorrect, resulting in a small negative read, which ends up as a large positive read (unsigned longs). Not surprisingly this quickly attempts to read from unassigned memory and causes a GPF.

So why do we get a small negative number ? In rfgsw.cpp there is a callback to handle stderr and stdout:

----8<----------------------8<-----------------8<--------------------------
int gsdll_callback(int message, char *str, unsigned long count)
{
   switch (message) {
      case GSDLL_STDIN:
         // We don't allow reading from stdin when running PostScript from within the WorkServer,
         // so always return a 0 indicating EOF
         return(0);

      case GSDLL_STDOUT:
            if (str != (char *) NULL)
               fwrite(str, 1, count, stdout);

	      //8/25/2010 KLS: add these two lines to fix RF-4608:
         if( (strstr(str, "Warning") != (char *) NULL) )
            return(GSDLL_INIT_QUIT);

         return(count);
...
----8<----------------------8<-----------------8<--------------------------

The culprit is the code with the comment '//8/25/2010 KLS: add these two lines to fix RF-4608:'

The stream writing code expects to be told how much of the data has been consumed, which is why the return value is normally 'count' (consumed all of it). Returning 'GSDLL_INIT_QUIT' tells the stream code that 'GSDLL_INIT_QUIT' (decimal 101, 0x65) bytes have been processed, which is not true. In this case there were only 62 bytes (0x3e) available.

The problem is that this tells the stream code that more bytes were processed than were actually available. In this particular case this causes the buffer pointer to be moved past the end of the buffer. Later when we calculate how much space remains in the buffer we subtract the pointer from the limit, resulting in a negative number, and causing the crash.

I have no idea what 'RF-4608' is, but it seems to me that this code is clearly incorrect, you can't tell the stream code that you read more bytes than it gave you :-) I suppose we could have the stream code check the return value and clamp it to the maximum of the supplied data. I'll leave that up to Ray if he wants to do it. I assume the intention was to have this error code returned up to the calling application, but I'm afraid there is no actual way to do that from here.

Anyway, if I remove the two lines of code after the comment then the file runs to completion and produces an apparently correct TIFF file.

I would recommend removing the _try and _except code around the initialisation of Ghostscript, at least for development. This has been added (from the comment) because initialisation was causing GPFs. While I understand a desire to shield end users from crash dialogs, this only hides real problems when developing and debugging. Also the message is somewhat misleading, it states that an exception occurred during initialisation but in fact this call not only initialises Ghostscript it runs the PostScript file too.

I would also, once again, draw attention to the Windows API call 'FormatMessage' which allows for a developer to print a meaningful error message, rather than a cryptic number.

I believe this fixes the problem, so I'm going to close this as INVALID, because Ghostscript is working as expected. Obviously if it doesn't fix the problem in the customer's code then we'll need to reopen and think again.