Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
An hash array dictonary step by step
#26
(04-11-2023, 09:17 PM)DSMan195276 Wrote:
(04-11-2023, 05:36 PM)TempodiBasic Wrote: I must understand ,thinking about your post, that the very better result of Luke's algorithm is linked to a different way to store data.

Kinda, it's more that Luke's hash table doesn't support multiple values with the same key. I should have included the code for my "fix":

Code: (Select All)
SUB InsertLookup (key$, lngIndex&)
    DIM hIndex~&

    hIndex~& = GetHash~&(key$, UBOUND(m_arrLookup))
    DO
        IF m_arrLookup(hIndex~&) = 0 THEN EXIT DO
        IF m_arrDict(m_arrLookup(hIndex~&)).key = key$ THEN EXIT DO ' I added this line
        hIndex~& = (hIndex~& + 1) MOD (UBOUND(m_arrLookup) + 1)
    LOOP
    m_arrLookup(hIndex~&) = lngIndex&
END SUB ' InsertLookup

The change causes the insert to return an existing entry if the same key is already in the hash table. Otherwise, the loop in that function has to loop over all the existing identical keys before it finds the next free spot, which is very slow and gets slower the more keys you add.

My hash table is the same and also doesn't support multiple entries with the same key (lookup can only return a single entry for a given key), but it doesn't have the performance issue Luke's does because `HashTableAdd&` doesn't have to loop over all the existing entries in a bucket. It still doesn't work though, when you add an entry with the same key the old ones are still in the bucket but can't be accessed via the lookup function.

Now if you _want_ to be able to add multiple entries with the same key my bucket approach is better but neither are really suited for it. What you'd probably want is to create a linked list for each set of entries with the same key. For my approach, that means adding another link next to `nxt` that points to entries with the same key, making each bucket a bit of a "list of lists". You could also do this with Luke's approach but I think it would require a separate array to store the links for the list of entries with the same key. It's definitely doable though.

That said, I'd probably call this the user's problem Big Grin A hash table doesn't really need to support this natively, you can always modify the `Value` Type to hold a list and get the same thing.

Additionally, for a speed comparison, I can add a similar check in my `HashTableAdd&` that blocks adding entries with the same key. Technically in typical usage this actually makes it slower, because now the `Add` function has to loop over all the entries. For your particular test though it actually makes it faster, because it means avoiding some ReDim's on the `HashEntries()` array:

[Image: hashtable2.png]

It's not particularly consistent on my machine though (you can see everything is a bit slower today).

Here's how I tend to do this type of little thing:

Code: (Select All)
Dim Shared dict(1000003) As String
Print "One moment, loading dictionary..."
ImportDict "370099 Word List.txt" 'change to your favorite local dictionary

Screen _NewImage(800, 600, 32)
Do
    Print "Enter a word to check for => ";
    Input word$
    If word$ = "" Then System
    If DictLookup(word$) Then
        Print word$; " is found in the loaded dictionary."
    Else
        Print word$; " is not found in the loaded dictionary. Sorry."
    End If
Loop


Sub ImportDict (dict$)
    f = FreeFile
    Open dict$ For Binary As #f
    Do Until EOF(1)
        Line Input #1, word$
        DictAdd word$
    Loop
    Close #f
End Sub


Function DictLookup (word$)
    v = WordValue(word$)
    If InStr(dict(v), Chr$(0) + word$ + Chr$(0)) Then DictLookup = -1
End Function

Sub DictAdd (word$)
    v = WordValue(word$)
    dict(v) = dict(v) + Chr$(0) + word$ + Chr$(0)
End Sub

Function WordValue (word$)
    For i = 1 To Len(word$)
        a = Asc(word$, i)
        If i < 7 Then m = 10 ^ (7 - i)
        a = a * m
        v = v + a
    Next
    WordValue = v Mod 1000003
End Function


.7z   370099 Word List.7z (Size: 880.44 KB / Downloads: 45)   << Get the word list here.  


Now, with a dictionary of 370000 words, this generates a dictionary which contains a quick lookup table with a max of 15 entries per value, and in most cases has a 1 to 1 lookup value.  (A binary search would have a max search of 19 passes to find our word, just for comparison...)

Quick.  Simple.  Efficient.  What's not to love?
Reply


Messages In This Thread
RE: An hash array dictonary step by step - by SMcNeill - 04-14-2023, 12:18 PM



Users browsing this thread: 5 Guest(s)