Fast replace$() function - mdijkens - 04-15-2025
For long time, I've used below replace$() function I created:
Code: (Select All)
Function replace$ (content$, from$, to$)
content2$ = content$
flen& = Len(from$): tlen& = Len(to$)
p& = InStr(content2$, from$)
If flen& = Len(to$) Then
Do While p& > 0
Mid$(content2$, p&, flen&) = to$
p& = InStr(p& + tlen&, content2$, from$)
Loop
Else
Do While p& > 0
content2$ = Left$(content2$, p& - 1) + to$ + Mid$(content2$, p& + flen&)
p& = InStr(p& + tlen&, content2$, from$)
Loop
End If
replace$ = content2$
End Function
But recently I had to do +3000's replaces times +1500 1MB files.
The total replace-process took over 1100 seconds.
When I realized the amount of memory copying going on when constantly rebuilding the content2$ for each replace, I was wondering if _Mem could be an option.
I ended up with below mreplace$() function which got exactly the same job done in less then 45 seconds!!
Code: (Select All)
Function mreplace$ (content$, from$, to$)
Dim mp As Long, pp As Long, m As _MEM
m = _MemNew(Len(content$) * 2): mp = 0: pp = 1
flen& = Len(from$): tlen& = Len(to$)
p& = InStr(content$, from$)
Do While p& > 0
_MemPut m, m.OFFSET + mp, Mid$(content$, pp, p& - pp): mp = mp + p& - pp
_MemPut m, m.OFFSET + mp, to$: mp = mp + tlen&: pp = p& + flen&
p& = InStr(p& + flen&, content$, from$)
Loop
_MemPut m, m.OFFSET + mp, Mid$(content$, pp): mp = mp + Len(Mid$(content$, pp))
content2$ = String$(mp, 0): _MemGet m, m.OFFSET, content2$: _MemFree m
mreplace$ = content2$
End Function
Take advantage if you run into this issue
RE: Fast replace$() function - bplus - 04-15-2025
+1 thanks for sharing this. I haven't needed such heavy duty processing power but nice to know it's ready when I do!
RE: Fast replace$() function - hsiangch_ong - 04-18-2025
this is the function i use. see if you could figure out why.
otherwise yes, this too is flawed, but in a different way.
Code: (Select All) SUB ReplaceString3 (tx AS STRING, sfind AS STRING, repl AS STRING, numtimes AS _UNSIGNED LONG)
DIM AS STRING s, t, searc, replac
DIM AS _UNSIGNED LONG ls, count, u, j
DIM AS _BYTE goahead
IF (tx = "") OR (sfind = "") OR (sfind = repl) OR (LEN(sfind) > LEN(tx)) THEN EXIT SUB
FOR j = 1 TO 2
IF j = 1 THEN
searc = sfind
replac = CHR$(255)
ELSE
searc = CHR$(255)
replac = repl
END IF
s = UCASE$(searc)
t = UCASE$(tx)
ls = LEN(s)
count = 0
goahead = 1
DO
u = INSTR(t, s)
IF u > 0 THEN
tx = LEFT$(tx, u - 1) + replac + MID$(tx, u + ls)
t = UCASE$(tx)
IF numtimes > 0 THEN count = count + 1: IF count >= numtimes THEN goahead = 0
ELSE
goahead = 0
END IF
LOOP WHILE goahead
NEXT
END SUB
RE: Fast replace$() function - mdijkens - 04-18-2025
What is flawed?
My point posting this was performance with big strings...
This is veery slow with big strings
Here all 3 compared:
Code: (Select All)
$Console:Only
Const SIZE = 2000000
txt$ = String$(SIZE, 0)
For p& = 1 To SIZE
Asc(txt$, p&) = 32 + p& Mod 96
Next p&
t! = Timer(.001#)
result$ = replace$(txt$, "A", "Hello")
Print Using " replace$: ##.### seconds"; Timer(.001#) - t!
t! = Timer(.001#)
result$ = mreplace$(txt$, "A", "Hello")
Print Using " mreplace$: ##.### seconds"; Timer(.001#) - t!
t! = Timer(.001#)
ReplaceString3 txt$, "A", "Hello", 0
result$ = mreplace$(txt$, "A", "Hello")
Print Using "ReplaceString3: ##.### seconds"; Timer(.001#) - t!
End
Function replace$ (content$, from$, to$)
content2$ = content$
flen& = Len(from$): tlen& = Len(to$)
p& = InStr(content2$, from$)
If flen& = Len(to$) Then
Do While p& > 0
Mid$(content2$, p&, flen&) = to$
p& = InStr(p& + tlen&, content2$, from$)
Loop
Else
Do While p& > 0
content2$ = Left$(content2$, p& - 1) + to$ + Mid$(content2$, p& + flen&)
p& = InStr(p& + tlen&, content2$, from$)
Loop
End If
replace$ = content2$
End Function
Function mreplace$ (content$, from$, to$)
Dim mp As Long, pp As Long, m As _MEM
m = _MemNew(Len(content$) * 2): mp = 0: pp = 1
flen& = Len(from$): tlen& = Len(to$)
p& = InStr(content$, from$)
Do While p& > 0
_MemPut m, m.OFFSET + mp, Mid$(content$, pp, p& - pp): mp = mp + p& - pp
_MemPut m, m.OFFSET + mp, to$: mp = mp + tlen&: pp = p& + flen&
p& = InStr(p& + flen&, content$, from$)
Loop
_MemPut m, m.OFFSET + mp, Mid$(content$, pp): mp = mp + Len(Mid$(content$, pp))
content2$ = String$(mp, 0): _MemGet m, m.OFFSET, content2$: _MemFree m
mreplace$ = content2$
End Function
Sub ReplaceString3 (tx As String, sfind As String, repl As String, numtimes As _Unsigned Long)
Dim As String s, t, searc, replac
Dim As _Unsigned Long ls, count, u, j
Dim As _Byte goahead
If (tx = "") Or (sfind = "") Or (sfind = repl) Or (Len(sfind) > Len(tx)) Then Exit Sub
For j = 1 To 2
If j = 1 Then
searc = sfind
replac = Chr$(255)
Else
searc = Chr$(255)
replac = repl
End If
s = UCase$(searc)
t = UCase$(tx)
ls = Len(s)
count = 0
goahead = 1
Do
u = InStr(t, s)
If u > 0 Then
tx = Left$(tx, u - 1) + replac + Mid$(tx, u + ls)
t = UCase$(tx)
If numtimes > 0 Then count = count + 1: If count >= numtimes Then goahead = 0
Else
goahead = 0
End If
Loop While goahead
Next
End Sub
RE: Fast replace$() function - SMcNeill - 04-18-2025
(04-18-2025, 07:31 PM)mdijkens Wrote: What is flawed?
My point posting this was performance with big strings...
This is veery slow with big strings
Code: (Select All)
m = _MemNew(Len(content$) * 2): mp = 0: pp = 1
Only thing I wonder about.
1) Is the line above that I left in the quote. Why is the size LEN * 2? How are you certain that you're not going to run into overflow issues and such with it?
For example, let's just say this is my string: "AA"
I want to replace "A" with "Hello"... A length of 4 isn't going to hold my 10 bytes for "HelloHello".
Seems to be safe, you'd need to run a routine once to search the string for the number of instances of "A" in it, and then use that to calculate how much larger the buffer would need to be. (Hello is 5 characters, A is 1, so the increase is 4* the number of instances of A in the program. In this case, we start with the original size of 2, then increase by 2 * 4, which gives us the 10 bytes needed to hold the buffer without overflow or wasted memory usage.)
2) Once you have an *exact* size and don't need to worry about overflowing the buffer, you can then $CHECKING:OFF around the rest of that Function, speeding it up and making up for the search/sizing time, *perhaps* even running faster in the long run, without having such concern for extreme memory usage and such.
RE: Fast replace$() function - mdijkens - 04-18-2025
You are absolutely right.
For performance sake, I just reserve 'enough' space for the specific usecase at hand.
If you have a case where the replace could make the resulting string more then 2x the size, you'd have to increase it.
making it dynamic with inline resizing would make it (a lot) slower; In my scenario this is run millions of times so I'd rather reserve multiple GB for _mem just to be sure (memory enough to spare)
$Checking:Off would definitely help also
RE: Fast replace$() function - madscijr - 04-18-2025
I use replace all the time (along with split) and will definitely give this a look.
On the subject of replace and split, is there any chance of adding these to QB64PE as fast native routines in the future? Would it improve performance at all?
RE: Fast replace$() function - SMcNeill - 04-19-2025
(04-18-2025, 09:08 PM)mdijkens Wrote: You are absolutely right.
For performance sake, I just reserve 'enough' space for the specific usecase at hand.
If you have a case where the replace could make the resulting string more then 2x the size, you'd have to increase it.
making it dynamic with inline resizing would make it (a lot) slower; In my scenario this is run millions of times so I'd rather reserve multiple GB for _mem just to be sure (memory enough to spare)
$Checking:Off would definitely help also
To compare, here's an auto-resizing version so you can see the speed hit is fairly minimal.
Code: (Select All)
$Console:Only
Const SIZE = 200000000
Randomize Timer
Dim As Double t(20), t1(20), d(20)
Print "Original", , "Steve", , "Diff"
For i = 1 To 20
txt$ = String$(SIZE, 0)
For p& = 1 To SIZE
Asc(txt$, p&) = 32 + p& Mod 96
Next p&
t1# = Timer(.001#)
result$ = mreplace$(txt$, "A", "Hello")
t2# = Timer(.001#)
result$ = SteveReplace$(txt$, "A", "Hello")
t3# = Timer(0.001#)
t(i) = t2# - t1#
t1(i) = t3# - t2#
d(i) = t1(i) - t(i)
total# = total# + t(i)
total1# = total1# + t1(i)
tdiff# = tdiff# + d(i)
Print Using "##.### seconds"; t(i),
Print Using "##.### seconds"; t1(i),
Print " ",
Print Using "(##.### diff)"; d(i)
Next
Print
Print
Print Using "##.### seconds"; total#,
Print Using "##.### seconds"; total1#,
Print " ",
Print Using "(##.### diff)"; tdiff#
End
Function replace$ (content$, from$, to$)
content2$ = content$
flen& = Len(from$): tlen& = Len(to$)
p& = InStr(content2$, from$)
If flen& = Len(to$) Then
Do While p& > 0
Mid$(content2$, p&, flen&) = to$
p& = InStr(p& + tlen&, content2$, from$)
Loop
Else
Do While p& > 0
content2$ = Left$(content2$, p& - 1) + to$ + Mid$(content2$, p& + flen&)
p& = InStr(p& + tlen&, content2$, from$)
Loop
End If
replace$ = content2$
End Function
Function mreplace$ (content$, from$, to$)
Dim mp As Long, pp As Long, m As _MEM
m = _MemNew(Len(content$) * 2): mp = 0: pp = 1
flen& = Len(from$): tlen& = Len(to$)
p& = InStr(content$, from$)
Do While p& > 0
_MemPut m, m.OFFSET + mp, Mid$(content$, pp, p& - pp): mp = mp + p& - pp
_MemPut m, m.OFFSET + mp, to$: mp = mp + tlen&: pp = p& + flen&
p& = InStr(p& + flen&, content$, from$)
Loop
_MemPut m, m.OFFSET + mp, Mid$(content$, pp): mp = mp + Len(Mid$(content$, pp))
content2$ = String$(mp, 0): _MemGet m, m.OFFSET, content2$: _MemFree m
mreplace$ = content2$
End Function
Function String.Count& (content$, search$)
If search$ = "" Then Exit Function
p& = InStr(content$, search$)
l& = Len(search$)
Do While p& > 0
count = count + 1
p& = InStr(p& + l&, content$, search$)
Loop
String.Count = count
End Function
$Checking:Off
Function SteveReplace$ (content$, from$, to$)
Dim As Long mp, pp, found
Dim m As _MEM
found = String.Count(content$, from$)
flen& = Len(from$): tlen& = Len(to$)
m = _MemNew(Len(content$) + (tlen& - flen&) * found): mp = 0: pp = 1
p& = InStr(content$, from$)
Do While p& > 0
_MemPut m, m.OFFSET + mp, Mid$(content$, pp, p& - pp): mp = mp + p& - pp
_MemPut m, m.OFFSET + mp, to$: mp = mp + tlen&: pp = p& + flen&
p& = InStr(p& + flen&, content$, from$)
Loop
_MemPut m, m.OFFSET + mp, Mid$(content$, pp): mp = mp + Len(Mid$(content$, pp))
content2$ = String$(mp, 0): _MemGet m, m.OFFSET, content2$: _MemFree m
SteveReplace$ = content2$
End Function
$Checking:On
This runs replace on 200MB strings, and does it 20 times for us. It totals the time for each routine and compares differences in performance. There's a few odd cases where the 2nd one out-performs the first one, but generally not having to count and have a perfect size is a *small* bit faster. Overall though, after 4GB of replacements, the total time is less than a second difference on my laptop.
The first is 3.8 seconds, the second is 4.6 seconds and no pass is as much as 0.1 second faster/slower than the other version.
For a "plug-and-go" routine, like in a library or such, I'd go with something like this. For a "quick-n-dirty" personal project, I'd do like you did -- just set a buffer large enough to handle the change and go with it.
Several folks are saying they'll swap to this routine. They may need to consider which would work best for them. The first is faster, but might overflow the buffer in various situations. The second is a *wee* bit slower, but is safer in more use cases.
It's always nice for folks to have options. (Though rename the SteveReplace, if anyone wants to copy it. That's waaay too cheesy of a name -- even for me! LOL! I just named it *something* for testing purposes.)
RE: Fast replace$() function - mdijkens - 04-19-2025
Quote:Several folks are saying they'll swap to this routine. They may need to consider which would work best for them. The first is faster, but might overflow the buffer in various situations. The second is a *wee* bit slower, but is safer in more use cases.
It's always nice for folks to have options. (Though rename the SteveReplace, if anyone wants to copy it. That's waaay too cheesy of a name -- even for me! LOL! I just named it *something* for testing purposes.)
I could not agree more!
It depends on the usecase and options are good
RE: Fast replace$() function - hsiangch_ong - 04-19-2025
@mdijkens
if you have programmed for as long as you say. (not only someone in basic, because this topic could apply to any programming language. especially a language that is good at text processing.) you would make sure you use your really-fast routine. while the replace string is not included in the search string. the user would have to use the very slow routine only for that case. your really-fast routine for other cases.
what if i had a large text which only had "van gogh". but not the name of that person. which van gogh? ok so it's vincent? change every occurrence of "van gogh" with "vincent van gogh". that is the trouble i was trying to indicate.
edit: now i notice above that the length of the replacement was being accounted for in your version. the slower of the two. that is crucial. but i did say my version was flawed. into assuming to temporarily replace with a string that nobody desired as a valid value.
|