I’ve had some trouble with PDFs that were just images of pages of text (easy way to tell, assuming you’re on linux, is run pdftotext on it and see if you get anything). There’s a utility called pdfsandwich that will use Tesseract to OCR the images and add text to the PDF.
I’ve had some trouble with PDFs that were just images of pages of text (easy way to tell, assuming you’re on linux, is run
pdftotext
on it and see if you get anything). There’s a utility calledpdfsandwich
that will use Tesseract to OCR the images and add text to the PDF.That might help too.
Thx crude scans are the only way you can get a lot of the more fringe books on https://annas-archive.li/ etc.