Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Huge array of variable length strings
#1
For a project I need to store an array of variable length strings.
Let's say
Code: (Select All)
Dim Shared as String s(100000)
But the issue is that the string lengths could vary from several bytes up to 2 GB
Code: (Select All)
For i% = 1 To 100
  s(i%) = String$(100000000, 42) ' 100MB
Next i%
As soon as the arrays total size is above a couple of GB it aborts the program...

I'd like to find a way to make max use of internal memory (>=32GB) 
What would be the best approach to define this?
I think _Mem is not very suitable for variable length strings

I could do one big _Mem and keep track of indexes/blocks but that's complicating the code quite a bit
Any better suggestions?
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply
#2
I have some ideas but it depends on the application.

Questions:
 
           What do these strings represent (Files,  A text buffer in an editor,  ??  etc)

            Why do you want to load them all at once ?

I'm thinking a more 'C' like approach where your array is actually an array of pointers then you write a couple of SUBS to allocate and deallocate _MEM for each pointer.

Then a Cleanup SUB to free all the _MEM's
Reply
#3
(10-17-2024, 11:11 AM)ahenry3068 Wrote: I have some ideas but it depends on the application.

Questions:
 
           What do these strings represent (Files,  A text buffer in an editor,  ??  etc)

            Why do you want to load them all at once ?

I'm thinking a more 'C' like approach where your array is actually an array of pointers then you write a couple of SUBS to allocate and deallocate _MEM for each pointer.

Then a Cleanup SUB to free all the _MEM's
I'm reading the contents of a directory with files to do a lot of searches on this content and report back which files have matches
Search terms are not known upfront but depend on content/dependencies of these files, so I can't do the searches file by file...

I am also thinking of an array of pointers to one big _Mem that I load all contents in, but I'm also curious what 'normal' variable structures can hold the biggest set of variable length strings?
Are there max size differences between variable/fixed length, arrays, shared/no-shared, dynamic/static, user defined types, etc...
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply
#4
By all rights this should work fine as long as you have a reasonable amount of memory (which, as you say, you do). I can consistent reproduce the crash after 32 loop iterations, and at a glance in the debugger this looks like a QB64 bug.
Reply
#5
If I remember right, there's some internal logic that bugs out at around the same limit as a LONG variable type.  (2GB of memory usage, or so)

The only time I've ever successfully used larger batches of memory like this, it's always been via a _MEM structure.
Reply
#6
(10-17-2024, 11:42 AM)luke Wrote: By all rights this should work fine as long as you have a reasonable amount of memory (which, as you say, you do). I can consistent reproduce the crash after 32 loop iterations, and at a glance in the debugger this looks like a QB64 bug.

     The most likely cause of such a bug is some one using a signed 32 bit integer in the code when they should be using an unsigned 32 bit integer or a 64 bit integer.   The clue is it happens at 2gb (which is the largest positive value of a signed 32 bit integer).    

      I ran into a bug in the _SND sub system that hits on the same limitation.   _SNDSETPOS will fail on Wave files > 2gb's in size.
Reply
#7
Yep. The size of the string allocation area (i.e. all current string allocations) is tracked in an unsigned 32 bit value. I'll see about changing that to a size_t or similar.
Reply
#8
As a test, I created the following which works
(Of course _ReadFile$() only works for files up to 2GB, but I already have a filereader function with no limit, so for testing it's okay)

Code: (Select All)
Type fType
  fname As String
  fpath As String
End Type
ReDim Shared f(1 To 1000) As fType
ReDim Shared m(1 To 1000) As _MEM

nfiles& = getFiles("E:\TEMP\test\")
Print nfiles&
End

Function getFiles& (path$)
  n& = 0
  fname$ = _Files$(path$ + "*.*")
  Do While fname$ <> ""
    If Right$(fname$, 1) <> "\" Then
      Print path$ + fname$;
      c$ = _ReadFile$(path$ + fname$)
      Print Len(c$)
      If n& = UBound(f) Then
        ReDim _Preserve f(1 To n& + 1000) As fType
        ReDim _Preserve m(1 To n& + 1000) As _MEM
      End If
      n& = n& + 1
      f(n&).fpath = path$
      f(n&).fname = fname$
      m(n&) = _MemNew(Len(c$))
      _MemPut m(n&), m(n&).OFFSET, c$
    End If
    fname$ = _Files$
  Loop
  ReDim _Preserve f(1 To n&) As fType
  ReDim _Preserve m(1 To n&) As _MEM
  getFiles& = n&
End Function


What would now be the fastest way to textsearch _Mem? There's no _MemSearch or something ....
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply
#9
Hmmm, this one also aborts above 3.5GB of files ....
It's the _ReadFile$() which goes wrong after 10000+ files with combined size >3GB

Code: (Select All)
path$ = "E:\TEMP\test\"
fname$ = _Files$(path$ + "*.*")
Do While fname$ <> ""
If Right$(fname$, 1) <> "\" Then
Print path$ + fname$
c$ = _ReadFile$(path$ + fname$)
End If
fname$ = _Files$
Loop

I think that's a bug!
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply
#10
I think it is the c$ assignment where memory gets corrupted:

Code: (Select All)
c$ = Space$(2000000000)
Print Using "##,###,###,###"; Len(c$)
Sleep

c$ = Space$(2000000000)
Print Using "##,###,###,###"; Len(c$)
Sleep

Second assignment aborts program...
It seems when going above 1GB it sooner or later aborts
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply




Users browsing this thread: 3 Guest(s)