Bug 696115 - Missing error message and return code with release build
Summary: Missing error message and return code with release build
Status: RESOLVED FIXED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: mupdf (show other bugs)
Version: unspecified
Hardware: PC Linux
: P4 normal
Assignee: Marcos H. Woehrmann
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-25 16:00 UTC by Marcos H. Woehrmann
Modified: 2015-10-01 07:47 UTC (History)
2 users (show)

See Also:
Customer:
Word Size: ---


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marcos H. Woehrmann 2015-07-25 16:00:04 UTC
The following command lines act differently:

  ./build/release/mutool draw -o test.pgm ./missing_page_tree.pdf

With the debug build the output is as expected:

  error: Malformed pages tree
  error: cannot load page 1 in file './missing_page_tree.pdf'
  error: cannot draw './missing_page_tree.pdf'

and the mutool exit code is 1.


With the release build the final error message is missing:

  error: Malformed pages tree
  error: cannot load page 1 in file './missing_page_tree.pdf'

and more importantly the mutool exit code is 0.

Note that in neither is case an output file produced.
Comment 2 Marcos H. Woehrmann 2015-07-25 16:19:44 UTC
So far I've only been able to reproduce this with Ubuntu 15.04.  It does not appear to be related to which libraries/packages are installed.  mutool built and run on clean install of Ubuntu 15.04 Desktop amd64 acts the same way.
Comment 3 Marcos H. Woehrmann 2015-07-29 19:59:18 UTC
I've looked into this and the problem is fixed by declaring fz_context *ctx to be volatile in mudraw_main():

diff --git a/source/tools/mudraw.c b/source/tools/mudraw.c
index 693f6f9..56910c5 100644
--- a/source/tools/mudraw.c
+++ b/source/tools/mudraw.c
@@ -858,7 +858,7 @@ int mudraw_main(int argc, char **argv)
        char *password = "";
        fz_document *doc = NULL;
        int c;
-       fz_context *ctx;
+       volatile fz_context *ctx;
        fz_alloc_context alloc_ctx = { NULL, trace_malloc, trace_realloc, trace_free };
 
        fz_var(doc);


This causes a bunch of warnings since ctx is passed to functions which are not expecting the fz_context * parameter to be volatile.  

I don't fully understand the fz_try/fz_catch macros (as the comment says "pay no attention to the man behind the curtain"), so it's possible that gcc 4.9.2 has a bug in the optimizer that is masked by the volatile declaration.  I believe the issue happens in the call to drawrange() in mudraw_main() (or perhaps in the drawpage() call in drawrange()), in one of those places the fz_context gets corrupted.
Comment 4 Robin Watts 2015-07-29 20:20:04 UTC
(In reply to Marcos H. Woehrmann from comment #3)
> I've looked into this and the problem is fixed by declaring
> fz_context *ctx to be volatile in mudraw_main():

ctx is only ever assigned once, and it is never changed for the duration of the execution. It's not even assigned within an fz_try/fz_catch, so we don't have the excuse of a setjmp maybe resetting it.

> This causes a bunch of warnings since ctx is passed to functions which are
> not expecting the fz_context * parameter to be volatile.  

I'd be tempted to try calling fz_var(ctx); rather than setting it to be volatile, but that's a shot in the dark. I can't see why that should have an effect.

A better test might be to put some prints in the fz_catch clauses within that function to see which ones are hit.
Comment 5 Marcos H. Woehrmann 2015-09-04 11:03:07 UTC
(In reply to Robin Watts from comment #4)  
> 
> I'd be tempted to try calling fz_var(ctx); rather than setting it to be
> volatile, but that's a shot in the dark. I can't see why that should have an
> effect.

That does not help; still no error return with the release build.

> 
> A better test might be to put some prints in the fz_catch clauses within
> that function to see which ones are hit.


As far as I can tell this is what is happening (in more detail than is necessary so that I can recall what I was doing when I revisit this later), line numbers based on mupdf 5fb01f6:

In source/tools/mudraw.c the routine mudraw_main():855 calls drawrange():783 in a fz_try():1073/fz_catch():1188 block.  drawrange() calls drawpage():243 which calls fz_load_page() in a fz_try()/fz_catch() block. This call fails because the input file is broken, so the fz_catch() clause at line 260 is executed. This calls source/fitz/error.c:fz_rethrow_message() which ends with a call to source/fitz/error.c:throw().  

throw() calls fz_longjmp(ex->stack[ex->top].buffer, ex->stack[ex->top].code + 2) but the "+ 2" is not being passed back, causing ctx->error->stack[ctx->error->top].code to be left set to 0 and therefore the fz_catch() block at line 1115 is not executed.


I ran mudraw.c through the preprocessor and line 1073 expands to:

   if (fz_push_try(ctx->error) && ((ctx->error->stack[ctx->error->top].code = __sigsetjmp (ctx->error->stack[ctx->error->top].buffer, 0)) == 0)) { do {


I've run this prepressed code in a debugger and by settings breakpoints appropriately I can get to ctx->error->stack[ctx->error->top].code = __sigsetjmp (ctx->error->stack[ctx->error->top].buffer, 0) when the __sigsetjmp has been called by the fz_longjmp() in throw().

At this point:

(gdb) print ctx->error->stack[ctx->error->top].code
value has been optimized out
(gdb) 


This obviously doesn't happen for the non-optimized build, it correctly reports .code set to a 2 after a single step.  It also explains why declaring ctx as volatile works, since volatile variables can't be optimized out.

It seems like optimizing out a variable that is being set by a call to __sigsetjmp() is not a good idea. Otoh, my understanding of compiler optimization is minimal; as far as I'm concerned many of the optimizations that gcc does are magic.  

The current released gcc (5.2.0) behaves in the same way, so if it is a compiler bug it hasn't been fixed.  The last version of gcc that works appears to be 4.8.4 (I haven't tested 4.8.5, but 4.9.0 fails).
Comment 6 Marcos H. Woehrmann 2015-09-05 09:34:45 UTC
I was curious if it would be possible to bisect gcc to determine the commit that started this issue.  Turns out it's easy:

  https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=commit;h=a5ef9e4d
Comment 7 Hin-Tak Leung 2015-09-16 16:23:22 UTC
This is GCC (or in any case, compiler-) specific, you can tell GCC not to optimize out something by declaring:

__attribute__((used))

i.e. 

fz_context __attribute__((used)) *ctx;

I think.

This is a little tips I came across in a few other projects where the improved link-time-optimization in gcc 4.9+ (4.8+ ? ) are overly agressive and optimize out things which should stay...

if this work for you, you can make it a bit more portable with #ifdef __GNUC__ .
Comment 8 Marcos H. Woehrmann 2015-09-17 09:14:34 UTC
(In reply to Hin-Tak Leung from comment #7)
> This is GCC (or in any case, compiler-) specific, you can tell GCC not to
> optimize out something by declaring:
> 
> __attribute__((used))
> 
> i.e. 
> 
> fz_context __attribute__((used)) *ctx;
> 
> I think.
> 

This does not work for me.  

The internets suggests: 

  __asm__ __volatile__("" :: "m" (ctx));

but that doesn't work either.
Comment 9 Robin Watts 2015-10-01 07:47:35 UTC
Fixed with:

commit 1f15bade3112bfdc8ddbae9c4b06047427c2367f
Author: Robin Watts <robin.watts@artifex.com>
Date:   Mon Sep 28 15:24:00 2015 +0100

    Bug 696115: Further fix for setjmp/longjmp.

    Tor turned up an interesting section in the C spec about this. See
    page 275 of http://open-std.org/jtc1/sc22/wg14/www/docs/n1494.pdf
    regarding acceptable places for setjmp to occur.

    It seems that:

      if (setjmp(buf))
      if (!setjmp(buf))
      if (setjmp(buf) {==,!=,<,>} <integer constant>)

    etc are all valid things to do, but assignments (and subsequent
    testing of values) like:

      if ((code = setjmp(buf)) == 0)

    are not allowed.

    Further, it's not even clear that:

     if (a() && setjmp(buf))

    is permissible.

    We therefore recast the macros into the form:

      a();
      if (setjmp((buf)) == 0)

    which should be acceptable under the C spec.

    To keep try atomic, we introduce a block '{{{' around this, along
    with a matching close block '}}}' in the catch clause. This has the
    nifty extra effect of giving us a compile time error if we mismatch
    our try/catches.

Thanks!