Fabrice Harari, Ericus

[WD 21]PDFToText

Startbeitrag von Ericus am 02.09.2017 12:20


PDFToText works well when it is one page but it does not work for me for a 2 page scanned PDF.

PDFToText help specifies that specific pages can be converted i.e. PDFTotext(sfilename,"1") or "2" etc.

But it ain't working. Am I doing something wrong?

Thanks in advance.

Ericus Steyn


Hi Ericus,

you are not doing anything wrong per se, but in that case, there is no text to extract in the pdf.

A pdf can contain text just like a text file, and that is what happens when you print to pdf or save as pdf in word of libreoffice.

However, when you SCAN to pdf, there is NO OCR PERFORMED, so the content of the pdf is an image for each page.
And PDFToText is also not performing any OCR, just extracting the text content, if any.

That's why pdftotext will not give you any result when working on a scanned pdf. You will need to do some OCR on the pdf file instead.

Best regards

von Fabrice Harari - am 02.09.2017 12:35
