04-11-2023, 09:17 PM
(04-11-2023, 05:36 PM)TempodiBasic Wrote: I must understand ,thinking about your post, that the very better result of Luke's algorithm is linked to a different way to store data.
Kinda, it's more that Luke's hash table doesn't support multiple values with the same key. I should have included the code for my "fix":
Code: (Select All)
SUB InsertLookup (key$, lngIndex&)
DIM hIndex~&
hIndex~& = GetHash~&(key$, UBOUND(m_arrLookup))
DO
IF m_arrLookup(hIndex~&) = 0 THEN EXIT DO
IF m_arrDict(m_arrLookup(hIndex~&)).key = key$ THEN EXIT DO ' I added this line
hIndex~& = (hIndex~& + 1) MOD (UBOUND(m_arrLookup) + 1)
LOOP
m_arrLookup(hIndex~&) = lngIndex&
END SUB ' InsertLookup
The change causes the insert to return an existing entry if the same key is already in the hash table. Otherwise, the loop in that function has to loop over all the existing identical keys before it finds the next free spot, which is very slow and gets slower the more keys you add.
My hash table is the same and also doesn't support multiple entries with the same key (lookup can only return a single entry for a given key), but it doesn't have the performance issue Luke's does because `HashTableAdd&` doesn't have to loop over all the existing entries in a bucket. It still doesn't work though, when you add an entry with the same key the old ones are still in the bucket but can't be accessed via the lookup function.
Now if you _want_ to be able to add multiple entries with the same key my bucket approach is better but neither are really suited for it. What you'd probably want is to create a linked list for each set of entries with the same key. For my approach, that means adding another link next to `nxt` that points to entries with the same key, making each bucket a bit of a "list of lists". You could also do this with Luke's approach but I think it would require a separate array to store the links for the list of entries with the same key. It's definitely doable though.
That said, I'd probably call this the user's problem A hash table doesn't really need to support this natively, you can always modify the `Value` Type to hold a list and get the same thing.
Additionally, for a speed comparison, I can add a similar check in my `HashTableAdd&` that blocks adding entries with the same key. Technically in typical usage this actually makes it slower, because now the `Add` function has to loop over all the entries. For your particular test though it actually makes it faster, because it means avoiding some ReDim's on the `HashEntries()` array:
It's not particularly consistent on my machine though (you can see everything is a bit slower today).