Posts: 128
Threads: 12
Joined: Apr 2022
Reputation:
14
10-17-2024, 11:06 AM
(This post was last modified: 10-17-2024, 11:08 AM by mdijkens.)
For a project I need to store an array of variable length strings.
Let's say
Code: (Select All)
Dim Shared as String s(100000)
But the issue is that the string lengths could vary from several bytes up to 2 GB
Code: (Select All)
For i% = 1 To 100
s(i%) = String$(100000000, 42) ' 100MB
Next i%
As soon as the arrays total size is above a couple of GB it aborts the program...
I'd like to find a way to make max use of internal memory (>=32GB)
What would be the best approach to define this?
I think _Mem is not very suitable for variable length strings
I could do one big _Mem and keep track of indexes/blocks but that's complicating the code quite a bit
Any better suggestions?
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Posts: 34
Threads: 6
Joined: Apr 2024
Reputation:
5
10-17-2024, 11:11 AM
(This post was last modified: 10-17-2024, 11:14 AM by ahenry3068.)
I have some ideas but it depends on the application.
Questions:
What do these strings represent (Files, A text buffer in an editor, ?? etc)
Why do you want to load them all at once ?
I'm thinking a more 'C' like approach where your array is actually an array of pointers then you write a couple of SUBS to allocate and deallocate _MEM for each pointer.
Then a Cleanup SUB to free all the _MEM's
Posts: 128
Threads: 12
Joined: Apr 2022
Reputation:
14
(10-17-2024, 11:11 AM)ahenry3068 Wrote: I have some ideas but it depends on the application.
Questions:
What do these strings represent (Files, A text buffer in an editor, ?? etc)
Why do you want to load them all at once ?
I'm thinking a more 'C' like approach where your array is actually an array of pointers then you write a couple of SUBS to allocate and deallocate _MEM for each pointer.
Then a Cleanup SUB to free all the _MEM's
I'm reading the contents of a directory with files to do a lot of searches on this content and report back which files have matches
Search terms are not known upfront but depend on content/dependencies of these files, so I can't do the searches file by file...
I am also thinking of an array of pointers to one big _Mem that I load all contents in, but I'm also curious what 'normal' variable structures can hold the biggest set of variable length strings?
Are there max size differences between variable/fixed length, arrays, shared/no-shared, dynamic/static, user defined types, etc...
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Posts: 34
Threads: 4
Joined: Apr 2022
Reputation:
14
By all rights this should work fine as long as you have a reasonable amount of memory (which, as you say, you do). I can consistent reproduce the crash after 32 loop iterations, and at a glance in the debugger this looks like a QB64 bug.
Posts: 2,696
Threads: 327
Joined: Apr 2022
Reputation:
217
If I remember right, there's some internal logic that bugs out at around the same limit as a LONG variable type. (2GB of memory usage, or so)
The only time I've ever successfully used larger batches of memory like this, it's always been via a _MEM structure.
Posts: 34
Threads: 4
Joined: Apr 2022
Reputation:
14
Yep. The size of the string allocation area (i.e. all current string allocations) is tracked in an unsigned 32 bit value. I'll see about changing that to a size_t or similar.
Posts: 128
Threads: 12
Joined: Apr 2022
Reputation:
14
As a test, I created the following which works
(Of course _ReadFile$() only works for files up to 2GB, but I already have a filereader function with no limit, so for testing it's okay)
Code: (Select All)
Type fType
fname As String
fpath As String
End Type
ReDim Shared f(1 To 1000) As fType
ReDim Shared m(1 To 1000) As _MEM
nfiles& = getFiles("E:\TEMP\test\")
Print nfiles&
End
Function getFiles& (path$)
n& = 0
fname$ = _Files$(path$ + "*.*")
Do While fname$ <> ""
If Right$(fname$, 1) <> "\" Then
Print path$ + fname$;
c$ = _ReadFile$(path$ + fname$)
Print Len(c$)
If n& = UBound(f) Then
ReDim _Preserve f(1 To n& + 1000) As fType
ReDim _Preserve m(1 To n& + 1000) As _MEM
End If
n& = n& + 1
f(n&).fpath = path$
f(n&).fname = fname$
m(n&) = _MemNew(Len(c$))
_MemPut m(n&), m(n&).OFFSET, c$
End If
fname$ = _Files$
Loop
ReDim _Preserve f(1 To n&) As fType
ReDim _Preserve m(1 To n&) As _MEM
getFiles& = n&
End Function
What would now be the fastest way to textsearch _Mem? There's no _MemSearch or something ....
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Posts: 128
Threads: 12
Joined: Apr 2022
Reputation:
14
10-17-2024, 12:26 PM
(This post was last modified: 10-17-2024, 01:27 PM by mdijkens.)
Hmmm, this one also aborts above 3.5GB of files ....
It's the _ReadFile$() which goes wrong after 10000+ files with combined size >3GB
Code: (Select All)
path$ = "E:\TEMP\test\"
fname$ = _Files$(path$ + "*.*")
Do While fname$ <> ""
If Right$(fname$, 1) <> "\" Then
Print path$ + fname$
c$ = _ReadFile$(path$ + fname$)
End If
fname$ = _Files$
Loop
I think that's a bug!
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Posts: 128
Threads: 12
Joined: Apr 2022
Reputation:
14
10-17-2024, 02:01 PM
(This post was last modified: 10-17-2024, 02:26 PM by mdijkens.)
I think it is the c$ assignment where memory gets corrupted:
Code: (Select All)
c$ = Space$(2000000000)
Print Using "##,###,###,###"; Len(c$)
Sleep
c$ = Space$(2000000000)
Print Using "##,###,###,###"; Len(c$)
Sleep
Second assignment aborts program...
It seems when going above 1GB it sooner or later aborts
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience