Web-News v.1.6.2

Subject	Re: STUFF() returns incorrect character
From	Gaetano <gaetanodd@hotmail.com>
Date	Mon, 4 Jan 2021 17:48:39 +1000
Newsgroups	dbase.getting-started
Try to open the HTML file with Notepad++ to see any hidden characters. I am surprised that replacing a character with null works at all because that should nullify the string, I would certainly not replace with CHR(00), I would stick to an empty string for the replacement value. It won't make a difference to your current issue but the DOS/Windows end of line sequence is CR/LF, hence CHR(13)+CHR(10) and not CHR(10)+CHR(13). Using your current EOL sequence results in the file being identified as a Unix file because the first end of line character is CHR(10) and reading the file into dBase with fgets() would read the entire file in one go since FGETS() wouldn't find a CR+LF sequence. I don't understand character encoding well enough to comment any further, but out of curiosity, just try the replacement routine that Mervyn or myself have provided to see if it makes any difference to the output. I ran this quick test using Mervyn's code and the output is as expected: clear cStr='</span>В <span style="font-size: 13pt">February'+CHR(194)+' 13,'+CHR(194)+'2020' ?cStr cRemove = CHR(194) if cRemove$cStr // do we need to deal with this line? do while cRemove$cStr // loop in case there are several characters cStr = stuff(cStr,at(cRemove,cStr),1,"") enddo endif ?cStr fOut.close() Output: </span>? <span style="font-size: 13pt">FebruaryÂ 13,Â2020 </span>? <span style="font-size: 13pt">February 13,2020 Cheers, Gaetano. On 04/01/2021 15:00, John Gillen wrote: > John Gillen Wrote: > >> Hello, >> >> I am using low level file functions (FOPEN(), FEOF(), FREAD(), FWRITE() and FCLOSE() to parse a text file. As part of this parsing, I am using three commands to replace characters in the incoming file: mstring = <incoming string>; mpos = AT(<where character to replace is found in the incoming string>); mstring = STUFF(mstring,mpos,1,"") to do the character replacement. >> >> If I run these three steps in the Command window, it works as expected. >> >> If I run these three steps in a program, instead of NULL, I get a comma (,). >> >> I tried "",'', and NULL CHR(0) all with the same result - a comma. >> >> dBASE 8/Windows 10 64bit >> >> Any ideas/suggestions are appreciated. >> >> John > > John Gillen Wrote: > >> Hello, >> >> I am using low level file functions (FOPEN(), FEOF(), FREAD(), FWRITE() and FCLOSE() to parse a text file. As part of this parsing, I am using three commands to replace characters in the incoming file: mstring = <incoming string>; mpos = AT(<where character to replace is found in the incoming string>); mstring = STUFF(mstring,mpos,1,"") to do the character replacement. >> >> If I run these three steps in the Command window, it works as expected. >> >> If I run these three steps in a program, instead of NULL, I get a comma (,). >> >> I tried "",'', and NULL CHR(0) all with the same result - a comma. >> >> dBASE 8/Windows 10 64bit >> >> Any ideas/suggestions are appreciated. >> >> John > > Hello All and thanks for the feedback, > > The text file I am processing is acutally an html file from the State of California's website. I don't know the file's encoding format, but a hex view identified the В as CHR(194) in the first example below. > > I could open the html file and save it as ANSI, but I was hoping to avoid that step if possible, as there are hundreds of these files. So, I opted to use the low level commands to see if I could clean up the wayward characters. > > I have used STUFF() in many other programs, but this is the first time I have used it in processing an html file. > > Here's the code for the test file. (TestOut.log is just for troubleshooting) I normally use an .h file, but this was a quick proof of concept test. In this version, I was testing CHR(00), but I have tried "", '' and CHR(00): > > mchaptest = FOPEN("Test.html","R") > mtestout = FCREATE("TestOut.txt","W") > mchaplog = FCREATE("TestOut.log","W") > > DO WHILE .NOT. FEOF(mchaptest) > mstring = FGETS(mchaptest) > moutstr = "Current string: " + mstring > FWRITE(mchaplog,moutstr) > FWRITE(mchaplog,CHR(10)+CHR(13)) > > DO WHILE CHR(194) $ mstring > moutstr = "Testing for CHR(194)" > FWRITE(mchaplog,moutstr) > FWRITE(mchaplog,CHR(10)+CHR(13)) > mpos = AT(CHR(194), mstring) > mstring = STUFF(mstring,mpos,1,CHR(00)) > ENDDO > > DO WHILE CHR(195) $ mstring > moutstr = "Testing for CHR(195)" > FWRITE(mchaplog,moutstr) > FWRITE(mchaplog,CHR(10)+CHR(13)) > mpos = AT(CHR(195), mstring) > mstring = STUFF(mstring,mpos,1,CHR(00)) > ENDDO > > write the results* > FWRITE(mtestout,mstring) > FWRITE(mtestout,CHR(10)+CHR(13)) > ENDDO > > close Chapters file* > FCLOSE(mchaptest) > FCLOSE(mtestout) > > Here are sample lines subject to STUFF() and the output: > </span>В <span style="font-size: 13pt">FebruaryВ 13,В 2020. > This produces: > </span> ‚ <span style="font-size: 13pt">February ‚ 13, ‚ 2020. > > submitting to the voterвЂ™s county elections official > This produces: > submitting to the voter ўв‚¬в„ўs county elections official > > 2119.5.</h6>В (a)В From the 14th day > This produces: > 2119.5.</h6> ‚ (a) ‚ From the 14th day > > (2)В The voterвЂ™s former residence > This produces: > The voter ўв‚¬в„ўs former residence > > Thanks again. > > John >