| Subject |
Re: STUFF() returns incorrect character |
| From |
Gaetano <gaetanodd@hotmail.com> |
| Date |
Mon, 4 Jan 2021 17:48:39 +1000 |
| Newsgroups |
dbase.getting-started |
Try to open the HTML file with Notepad++ to see any hidden characters.
I am surprised that replacing a character with null works at all because
that should nullify the string, I would certainly not replace with
CHR(00), I would stick to an empty string for the replacement value.
It won't make a difference to your current issue but the DOS/Windows end
of line sequence is CR/LF, hence CHR(13)+CHR(10) and not
CHR(10)+CHR(13). Using your current EOL sequence results in the file
being identified as a Unix file because the first end of line character
is CHR(10) and reading the file into dBase with fgets() would read the
entire file in one go since FGETS() wouldn't find a CR+LF sequence.
I don't understand character encoding well enough to comment any
further, but out of curiosity, just try the replacement routine that
Mervyn or myself have provided to see if it makes any difference to the
output.
I ran this quick test using Mervyn's code and the output is as expected:
clear
cStr='</span>В <span style="font-size: 13pt">February'+CHR(194)+'
13,'+CHR(194)+'2020'
?cStr
cRemove = CHR(194)
if cRemove$cStr // do we need to deal with this line?
do while cRemove$cStr // loop in case there are several characters
cStr = stuff(cStr,at(cRemove,cStr),1,"")
enddo
endif
?cStr
fOut.close()
Output:
</span>? <span style="font-size: 13pt">February 13,Â2020
</span>? <span style="font-size: 13pt">February 13,2020
Cheers,
Gaetano.
On 04/01/2021 15:00, John Gillen wrote:
> John Gillen Wrote:
>
>> Hello,
>>
>> I am using low level file functions (FOPEN(), FEOF(), FREAD(), FWRITE() and FCLOSE() to parse a text file. As part of this parsing, I am using three commands to replace characters in the incoming file: mstring = <incoming string>; mpos = AT(<where character to replace is found in the incoming string>); mstring = STUFF(mstring,mpos,1,"") to do the character replacement.
>>
>> If I run these three steps in the Command window, it works as expected.
>>
>> If I run these three steps in a program, instead of NULL, I get a comma (,).
>>
>> I tried "",'', and NULL CHR(0) all with the same result - a comma.
>>
>> dBASE 8/Windows 10 64bit
>>
>> Any ideas/suggestions are appreciated.
>>
>> John
>
> John Gillen Wrote:
>
>> Hello,
>>
>> I am using low level file functions (FOPEN(), FEOF(), FREAD(), FWRITE() and FCLOSE() to parse a text file. As part of this parsing, I am using three commands to replace characters in the incoming file: mstring = <incoming string>; mpos = AT(<where character to replace is found in the incoming string>); mstring = STUFF(mstring,mpos,1,"") to do the character replacement.
>>
>> If I run these three steps in the Command window, it works as expected.
>>
>> If I run these three steps in a program, instead of NULL, I get a comma (,).
>>
>> I tried "",'', and NULL CHR(0) all with the same result - a comma.
>>
>> dBASE 8/Windows 10 64bit
>>
>> Any ideas/suggestions are appreciated.
>>
>> John
>
> Hello All and thanks for the feedback,
>
> The text file I am processing is acutally an html file from the State of California's website. I don't know the file's encoding format, but a hex view identified the В as CHR(194) in the first example below.
>
> I could open the html file and save it as ANSI, but I was hoping to avoid that step if possible, as there are hundreds of these files. So, I opted to use the low level commands to see if I could clean up the wayward characters.
>
> I have used STUFF() in many other programs, but this is the first time I have used it in processing an html file.
>
> Here's the code for the test file. (TestOut.log is just for troubleshooting) I normally use an .h file, but this was a quick proof of concept test. In this version, I was testing CHR(00), but I have tried "", '' and CHR(00):
>
> mchaptest = FOPEN("Test.html","R")
> mtestout = FCREATE("TestOut.txt","W")
> mchaplog = FCREATE("TestOut.log","W")
>
> DO WHILE .NOT. FEOF(mchaptest)
> mstring = FGETS(mchaptest)
> moutstr = "Current string: " + mstring
> FWRITE(mchaplog,moutstr)
> FWRITE(mchaplog,CHR(10)+CHR(13))
>
> DO WHILE CHR(194) $ mstring
> moutstr = "Testing for CHR(194)"
> FWRITE(mchaplog,moutstr)
> FWRITE(mchaplog,CHR(10)+CHR(13))
> mpos = AT(CHR(194), mstring)
> mstring = STUFF(mstring,mpos,1,CHR(00))
> ENDDO
>
> DO WHILE CHR(195) $ mstring
> moutstr = "Testing for CHR(195)"
> FWRITE(mchaplog,moutstr)
> FWRITE(mchaplog,CHR(10)+CHR(13))
> mpos = AT(CHR(195), mstring)
> mstring = STUFF(mstring,mpos,1,CHR(00))
> ENDDO
>
> * write the results
> FWRITE(mtestout,mstring)
> FWRITE(mtestout,CHR(10)+CHR(13))
> ENDDO
>
> * close Chapters file
> FCLOSE(mchaptest)
> FCLOSE(mtestout)
>
> Here are sample lines subject to STUFF() and the output:
> </span>В <span style="font-size: 13pt">FebruaryВ 13,В 2020.
> This produces:
> </span> ‚ <span style="font-size: 13pt">February ‚ 13, ‚ 2020.
>
> submitting to the voter’s county elections official
> This produces:
> submitting to the voter ўв‚¬в„ўs county elections official
>
> 2119.5.</h6>В (a)В From the 14th day
> This produces:
> 2119.5.</h6> ‚ (a) ‚ From the 14th day
>
> (2) The voter’s former residence
> This produces:
> The voter ўв‚¬в„ўs former residence
>
> Thanks again.
>
> John
>
|
|