As per user enhancement request #3122 (https://github.com/pymupdf/PyMuPDF/issues/3122) in PyMuPDF, is it possible to include an additional int member "psm" (page segmentation mode) in the OCR options structure and pass its value to Tesseract-OCR? PSM can optimize Tesseract's recognition rate very (!) significantly, for instance in cases when the image is known to represent just a line or a word or the image background has large patches of different background colors. The default PSM value is "3 Fully automatic page segmentation, but no OSD. (Default)" OSD: orientation and script detection Frequent desirable PSM options: 7 Treat the image as a single text line. 8 Treat the image as a single word. 10 Treat the image as a single character. 11 Sparse text. Find as much text as possible in no particular order.
tesseract 5.0 above does not support PCM. It doesn't make sense to add this feature for the sake of tesseract 4.0
(In reply to mister_torn from comment #1) > tesseract 5.0 above does not support PCM. It doesn't make sense to add this > feature for the sake of tesseract 4.0 Can you provide a reference for this, please? The current tesseract 5 documentation still includes references to psm being supported. e.g. https://tesseract-ocr.github.io/tessdoc/Command-Line-Usage.html