Adobe Reader is much more successful for extracting text e.g. from http://www.ice.gov/doclib/sevis/pdf/sevis_arabic_fs.pdf (one of the first results from http://www.google.com/search?q=arabic+ext%3Apdf ). This seems partially related to dev_text not expecting RtL text and inserting too many unintended linebreaks, and also due to Unicode normalization divergences.
Hopefully fixed in commit cffcdf1ab2189a55b09b8ac74d552e6a2e809510 Author: Tor Andersson <tor.andersson@artifex.com> Date: Fri May 3 16:33:31 2013 +0200 Add simple visual-to-logic RTL reordering as a text extraction pass.