Remove non-printing chars from text file

Startbeitrag von David Egan am 27.07.2015 06:52

A client has recently started receiving text files which contain a range of non-printing characters which we need to strip out. The exact characters vary somewhat from file to file so we need a routine to remove ASCII 1 to 31 (excluding TAB,CR & LF) and 127 upwards.

I know we could loop through for each character & replace it but I'm hoping someone has a more efficient way so that we're not looping through the file 100 plus times. We have up to 30 of these files a day to deal with, up to 25Mb in size for each one.




Hi David,

Not tested, but this should do the trick I guess:

CleanString is string = Replace(InitialString, [charact(1), charact(2), Charact(3), ...], "")


Peter H.

von Peter Holemans - am 27.07.2015 07:55
Hi Peter
Clever idea but no good I'm afraid:(


von David Egan - am 27.07.2015 18:46
Hi David,

I tested it in WD20 with this piece of code and for me it works. The length of the initial string is 3 and the length of the clean string is 0 as it is supposed to be. What version are you using?

InitialString is string = Charact(1)+Charact(2)+Charact(3)
Info("Length: "+Length(InitialString))
CleanString is string = Replace(InitialString, [Charact(1),Charact(2),Charact(3)], "")
Info("Length: "+Length(CleanString))


Peter H.

von Peter Holemans - am 28.07.2015 07:30
Me too; I use the Replace function to remove white-space from strings - no problem.

von DarrenF - am 28.07.2015 08:57
you can put it in hex format using BufferToHexa, convert every byte to int using HexaToInt and remove it if the values is in your conditions and revert back from hex using HexaToBuffer.

With replace or this way you don't need to loop thru the file more than once.

von Paulo Oliveira - am 28.07.2015 09:34
Hi Peter
The project is in WD17 & that construct throws a syntax error in it. It does work fine in WD20 though so it's either time to update it or run with Paulo's suggestion which does work in WD17.

Thanks guys for the suggestions


von David Egan - am 28.07.2015 20:01
