Bug 690931 - consider not passing large generated C files to compiler to save memory
Summary: consider not passing large generated C files to compiler to save memory
Status: RESOLVED FIXED
Alias: None
Product: MuPDF
Classification: Unclassified
Component: apps (show other bugs)
Version: unspecified
Hardware: Other Linux
: P4 enhancement
Assignee: Tor Andersson
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-11-18 14:54 UTC by Timo Juhani Lindfors
Modified: 2010-07-23 21:13 UTC (History)
0 users

See Also:
Customer:
Word Size: ---


Attachments
encode font as a string instead of an array (719 bytes, patch)
2009-11-18 16:13 UTC, Timo Juhani Lindfors
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Timo Juhani Lindfors 2009-11-18 14:54:25 UTC
I tried building MuPDF 2009-07-07 on openmoko (armel system with 128 MB of RAM)
but gcc ran out of memory at ./build/font_cjk.c

This file seems to be 15 megabytes of autogenerated content and seems to cause
GCC to allocate 787424 kilobytes of RAM according to ps -eorss,cmd:

787424 /usr/lib/gcc/x86_64-linux-gnu/4.3.2/cc1 -quiet ./build/font_cjk.c -quiet
-dumpbase font_cjk.c -mtune=generic -auxbase font_cjk -o /tmp/ccDgLeuP.s

Could this be improved somehow?
Comment 1 Timo Juhani Lindfors 2009-11-18 16:13:38 UTC
Created attachment 5685 [details]
encode font as a string instead of an array

The attached patch encodes the font file as a string instead of an array and
lets me build muPDF on openmoko. I have tested that muPDF runs ok with this
patch on amd64 and armel but I can't be sure that the fallback font is actually
used.

However, the binary is still quite large (4.8 megabytes on amd64, 4.7 on armel)
-- would it be possible to make embedding fonts optional? In a debian package I
can use dependencies to guarantee that the font will always be available.
Comment 2 Ralph Giles 2009-11-18 16:19:27 UTC
This is font data required for display of documents which don't embed their own
fonts. The normal build process converts the binary font data to C code and
compiles that into the executable. Something similar is done with the CMap files.

If there's a portable way to ask the linker to embed the binary data directly in
the executable, that might work around the problem. I'm not aware of one.

On a system like maemo where software is always installed by a package manager,
it's reasonable to store associated resources in the filesystem and mmap() them
instead. I don't know if Tor would take a patch to optionally do that. Currently
we compile everything together for convenience.
Comment 3 Ralph Giles 2009-11-18 16:20:11 UTC
Oops, I meant openmoko, not maemo. Sorry for the confusion.
Comment 4 Tor Andersson 2009-11-19 07:59:15 UTC
Convenience and portability are indeed the main issues here. Autogenerating the big data files
as assembler code speeds things up immensely but is not portable. If we can find something that
works with linux, darwin, and mingw and then only use the current generated C code
as a fall back.

With strings instead of an array as in your patch can run into compiler issues.
Some compilers will not emit the complete string if there are embedded zeroes,
others have a 65k limit to string literal length.
Comment 5 Tor Andersson 2010-01-17 06:49:43 UTC
I have added inline asm directives to the generated C files so that
on Linux it will use .incbin instead of compiling a static C array.
Darwin's assembler is too ancient to support .incbin, and I haven't
looked at the other BSD's, MSVC and Mingw yet.
Comment 6 Robin Watts 2010-01-19 08:36:58 UTC
The use of .incbin caused me problems when building for the beagleboard. I 
ended up editing the autogenerated files to get around it.

Rather than #ifdef __linux__ as I believe it is now, could we use: #ifdef 
USE_INCBIN or something similar ?

(Of course, I was cross compiling, and that had other problems, so this is 
minor...)
Comment 7 Tor Andersson 2010-07-23 21:13:24 UTC
The generated font files now have #ifdef HAVE_INCBIN around
the asm sections. The directive is set if __linux__ or __FreeBSD__
and not __STRICT_ANSI__. Set it manually if your compiler
supports the .incbin directive and is not one of the above.