02-29-2024, 10:56 AM
I submit the following for a starting point for a speed test for folks to compare against:
Quick question though: Are you looking for the fastest way to LOAD the file, or the fastest way to SEARCH the file?
I didn't really try to push the limits of either here, as I'm certain I can do both load and search faster with just very little effort. I just thought I'd start with a simple OPEN FOR BINARY and LINE INPUT to get our data, and then a simple binary search to find the words, so folks would have a starting point to compete against.
Only thing is, I wasn't certain what exactly I was timing, so the above only times search times for me, and on my PC that's 0.0 seconds for 10 word look-ups.
Looks like we need a bigger search list to go on as well! LOL!
Code: (Select All)
dict$ = "Collins.txt"
Screen _NewImage(800, 600, 32)
Type Dict_Type
As String Word, Definition
End Type
ReDim Shared Dict(1000000) As Dict_Type
Open dict$ For Binary As #1
Do Until EOF(1)
Line Input #1, temp$
l = InStr(temp$, Chr$(9)) 'tab separated data file
count = count + 1
Dict(count).Word = _Trim$(Left$(temp$, l - 1))
Dict(count).Definition = _Trim$(Mid$(temp$, l + 1))
Loop
Close
ReDim _Preserve Dict(count) As Dict_Type
Dim Junk(10) As String
Data cheese,dog,cat,elephant,rootbeer,house,food,drink,zebra,mouse
For i = 1 To 10
Read Junk(i)
Next
t# = Timer
For i = 1 To 10
f = FindWord(Junk(i))
Print Junk(i),
If f = 0 Then
Print "Word not found"
Else
Print Dict(FindWord(Junk(i))).Definition
End If
Next
t1# = Timer
Print Using "###.######## seconds to find #### words and definitions"; t1# - t#, i - 1
Function FindWord (word$)
Dim As Long low, hi, test
low = 1: hi = UBound(Dict)
While low <= hi
test = Int((low + hi) / 2)
Select Case _StriCmp(Dict(test).Word, word$)
Case 0
'Print "found"; test
FindWord = test: Exit Function
Case -1
'Print "low"; Dict(test).Word
low = test + 1
Case 1
'Print "high"; Dict(test).Word
hi = test - 1
End Select
Wend
End Function
Quick question though: Are you looking for the fastest way to LOAD the file, or the fastest way to SEARCH the file?
I didn't really try to push the limits of either here, as I'm certain I can do both load and search faster with just very little effort. I just thought I'd start with a simple OPEN FOR BINARY and LINE INPUT to get our data, and then a simple binary search to find the words, so folks would have a starting point to compete against.
Only thing is, I wasn't certain what exactly I was timing, so the above only times search times for me, and on my PC that's 0.0 seconds for 10 word look-ups.
Looks like we need a bigger search list to go on as well! LOL!