I tried building MuPDF 2009-07-07 on openmoko (armel system with 128 MB of RAM) but gcc ran out of memory at ./build/font_cjk.c This file seems to be 15 megabytes of autogenerated content and seems to cause GCC to allocate 787424 kilobytes of RAM according to ps -eorss,cmd: 787424 /usr/lib/gcc/x86_64-linux-gnu/4.3.2/cc1 -quiet ./build/font_cjk.c -quiet -dumpbase font_cjk.c -mtune=generic -auxbase font_cjk -o /tmp/ccDgLeuP.s Could this be improved somehow?
Created attachment 5685 [details] encode font as a string instead of an array The attached patch encodes the font file as a string instead of an array and lets me build muPDF on openmoko. I have tested that muPDF runs ok with this patch on amd64 and armel but I can't be sure that the fallback font is actually used. However, the binary is still quite large (4.8 megabytes on amd64, 4.7 on armel) -- would it be possible to make embedding fonts optional? In a debian package I can use dependencies to guarantee that the font will always be available.
This is font data required for display of documents which don't embed their own fonts. The normal build process converts the binary font data to C code and compiles that into the executable. Something similar is done with the CMap files. If there's a portable way to ask the linker to embed the binary data directly in the executable, that might work around the problem. I'm not aware of one. On a system like maemo where software is always installed by a package manager, it's reasonable to store associated resources in the filesystem and mmap() them instead. I don't know if Tor would take a patch to optionally do that. Currently we compile everything together for convenience.
Oops, I meant openmoko, not maemo. Sorry for the confusion.
Convenience and portability are indeed the main issues here. Autogenerating the big data files as assembler code speeds things up immensely but is not portable. If we can find something that works with linux, darwin, and mingw and then only use the current generated C code as a fall back. With strings instead of an array as in your patch can run into compiler issues. Some compilers will not emit the complete string if there are embedded zeroes, others have a 65k limit to string literal length.
I have added inline asm directives to the generated C files so that on Linux it will use .incbin instead of compiling a static C array. Darwin's assembler is too ancient to support .incbin, and I haven't looked at the other BSD's, MSVC and Mingw yet.
The use of .incbin caused me problems when building for the beagleboard. I ended up editing the autogenerated files to get around it. Rather than #ifdef __linux__ as I believe it is now, could we use: #ifdef USE_INCBIN or something similar ? (Of course, I was cross compiling, and that had other problems, so this is minor...)
The generated font files now have #ifdef HAVE_INCBIN around the asm sections. The directive is set if __linux__ or __FreeBSD__ and not __STRICT_ANSI__. Set it manually if your compiler supports the .incbin directive and is not one of the above.