vor 2 Jahren, 8 Monaten
vor 2 Jahren, 7 Monaten
Yogi Yang, Joris, evanpan, Stefan Bentvelsen, GuenterP

[WD17] - Extracting Text from a PDF file

Startbeitrag von Yogi Yang am 13.10.2015 04:43

In a software I have to provide facility to import data from a PDF.

I check WD's docs. there is facility to load and show PDF but could not find anything that will allow me to extract text content from a PDF in a well formatted manner.

Any ideas as to how I can extract content of a PDF file?




Hi, extracting text with knowing its exact position from a PDF depends on a lot of unknown factors. Version number of the used PDF converter and so on. Best ist to scan the PDF (or put it into a picture control if it is already scanned) and use an OCR program / library (BCL ?) to read the text. However, the x/y position of the text within the document is lost then.

von GuenterP - am 13.10.2015 06:49

In WD20 there is PDFToText function but I don't know if it's already in WD17.

see help PDFToText function



von Joris - am 13.10.2015 07:21
Thanks everyone for suggestions.

I wanted to use features offered by WD but it seems there is facility for this in WD17 so I will have to resort to using an ActiveX for this.


Yogi Yang

von Yogi Yang - am 13.10.2015 07:55

PDFToText is available from version 14 (see at bottom help PDFToText function).

von Stefan Bentvelsen - am 13.10.2015 10:28
:spos: You're right!

von Joris - am 13.10.2015 11:22
Thanks I checked on site but it seems to state this for WD20. Or probably I mis read it!

von Yogi Yang - am 13.10.2015 14:52
Yep indeed. Ocr components can be so important to pdf converters to extract text from a PDF file.

von evanpan - am 05.11.2015 12:45
