Subject Re: STUFF() returns incorrect character
From Mervyn Bick <invalid@invalid.invalid>
Date Mon, 4 Jan 2021 09:58:19 +0200
Newsgroups dbase.getting-started

On 2021/01/04 07:00, John Gillen wrote:

> The text file I am processing is acutally an html file from the State of California's website. I don't know the file's encoding format, but a hex view identified the В as CHR(194) in the first example below.
>

Getting data out of a web page is a whole new ballgame. :-)  Several
users have done work on this.  If you post the URL of a page (or attach
the page itself) you're interested in and then tell us the data you want
to extract it may be possible to come up with some ideas.

As you are accessing the newsgroups via the link on the dBASE website
(which uses Web-New) you may not, depending on the size of the page, be
able to attach the page to your message in this news group.  If this
happens try attaching it to a message in the binaries newsgroup.

> I could open the html file and save it as ANSI, but I was hoping to avoid that step if possible, as there are hundreds of these files. So, I opted to use the low level commands to see if I could clean up the wayward characters.

It can be done but it's not quite that straight forward.  HTML uses
&keyword to encode special characters so you would need a list of these
characters and keywords.

>
> I have used STUFF() in many other programs, but this is the first time I have used it in processing an html file.
>
> Here's the code for the test file. (TestOut.log is just for troubleshooting) I normally use an .h file, but this was a quick proof of concept test. In this version, I was testing CHR(00), but I have tried "", '' and CHR(00):

You could find chr(0) in a .dbf file if you examine it byte by byte
using the low level file functions (or preferably a file object :-) )
but you should never find a chr(0) in a text file.

If you use stuff() to replace 1 character with char(0) in string it
actually removes the character from the string.

Mervyn.