Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Questions about INSTR
#1
First, what is the numerical limit to the position number returned by INSTR? I think I might be getting an overflow that is producing wrong results.

Second, how does INSTR work internally? Previously I was using a FOR NEXT loop with a MID inside it to search though a line of 1 billion numerals of pi and it was pretty slow. Then I realised I should have been using INSTR instead and this sped things up by about 30 times! So how does INSTR actually work?
Reply
#2
Mid$ is extremely slow with larger strings (a lot of memcopy is going on)
Instr is extremely efficient in finding next hit in memory
Strings have a limit of 2GB max and since Instr only works on strings, the startpos and returnval are Long (-2G to +2G)
_Mem can handle size of your PC's RAM

Code: (Select All)
Const MAXSIZE = 2 ^ 31 - 1 '2GB

Dim a As String: a = String$(MAXSIZE, 0)
Asc(a, MAXSIZE) = 42
aa& = InStr(a, Chr$(42))
Print Len(a), InStr(a, Chr$(42)), aa&

Dim b As String * MAXSIZE
Asc(b, MAXSIZE) = 42
bb& = InStr(b, Chr$(42))
Print Len(b), InStr(b, Chr$(42)), bb&


Const MSIZE = MAXSIZE * 5 '10GB
Dim m As _MEM: m = _MemNew(MSIZE) '10GB
Dim c As String * 1: c = Chr$(42)
_MemPut m, m.OFFSET + MSIZE - 1, c
Print m.SIZE,
Dim block As String * MAXSIZE
For p&& = 0 To m.SIZE \ MAXSIZE
b = _MemGet(m, m.OFFSET + p&& * MAXSIZE, String * MAXSIZE)
mm& = InStr(b, c)
If mm& Then Exit For
Next p&&
Print m.SIZE, p&& * MAXSIZE + mm&
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply
#3
One thing to note when it comes to performance -- most things you write in QB64 are going to always be slower than any counterpart in C or what not.  Our internal commands are usually based on C, and are pretty quick, but once you start writing the equivalent in BASIC, they slow down somewhat.

The reason?

Each command does error checking and system checking.   

You might just write a program that counts to 10..  

FOR i = 1 to 10
NEXT

This simple little program gets translated into this mess: 
Code: (Select All)
fornext_value2= 1 ;
fornext_finalvalue2= 10 ;
fornext_step2= 1 ;
if (fornext_step2<0) fornext_step_negative2=1; else fornext_step_negative2=0;
if (is_error_pending()) goto fornext_error2;
goto fornext_entrylabel2;
while(1){
fornext_value2=fornext_step2+(*__SINGLE_I);
fornext_entrylabel2:
*__SINGLE_I=fornext_value2;
if (fornext_step_negative2){
if (fornext_value2<fornext_finalvalue2) break;
}else{
if (fornext_value2>fornext_finalvalue2) break;
}
fornext_error2:;
if(qbevent){evnt(1);if(r)goto S_1;}
fornext_continue_1:;
}
fornext_exit_1:;

Notice how much error checking is in there? and event checking?  This is so that things such as clicking the little red X on the top right of the window will close the program between process and all.  It's so you don't go out of bounds and other issues...

In C, you could write that code without all that error checking and event checking, but it's not something which QB64 can afford to pass up on.   Now, these are *small* time sinks, but they add up.  

IF everyone wrote perfect, error free code, these checks wouldn't be necessary.   Since Steve, however, codes in QB64PE, I'm afraid those are necessary so I don't melt down my computer or destroy something, as *I* certainly don't write perfect, error free code.  Big Grin



So *why* is INSTR faster than a self-created routine?

It's been optimized directly for the underlying language that we translate BAS code into and skips a lot of these type checks that are necessary at each stage when you try and write the same routine yourself.   Unless you want to write C yourself and skip those BAS checks, I don't think you'd ever be able to write something that would be faster than what's packaged under the hood.
Reply
#4
Quote: "Strings have a limit of 2GB max and since Instr only works on strings, the startpos and returnval are Long (-2G to +2G)"

Would it be a big deal to update QB64 Phoenix so that it can handle longer strings? Maybe even as big as +/- 2^63. Or would that compromise other stuff? Last night I downloaded a pi listing of 10 billion decimal places, but obviously i can't use it at this point.
Reply
#5
(04-27-2025, 03:03 AM)Circlotron Wrote: Would it be a big deal to update QB64 Phoenix so that it can handle longer strings? Maybe even as big as +/- 2^63. Or would that compromise other stuff?
I hope that will happen someday

(04-27-2025, 03:03 AM)Circlotron Wrote: Last night I downloaded a pi listing of 10 billion decimal places, but obviously i can't use it at this point.
This I don't understand? Of course you can use it! QB64 does not have a size limit on files or data in general.
I am reading/writing files 100x bigger then this with QB64, no problem at all.
Irrespective of the programming language used, huge data processing requires a different approach then small data processing:
You would look into processing it in blocks of for example 64MB or 1GB. In QB64 you can use _MEM to store it but (of course) only up to the limit of your pc memory.
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply
#6
I have 16GB of RAM.
Okay, I'd like to try out _MEM but I can't figure out how to do it. I want to put a 10 GB single line file into memory then variously read out x characters starting from position y. I looked at examples but it wasn't clear to me. Could someone show me how?
Reply
#7
Instead of _MEM, you'd be better off to open the file for BINARY and then read it in chunks and process it from there.  Just keep that file on the disk and process it in chunks, rather than try and use up all your memory and make your machines fans run like they're airplane engines and raise your CPU temps out the roof.
Reply
#8
My sample code (line 14 onwards) above shows use of _MEM.
But as stated above in most cases you can much easier work with BINARY file read in chunks of for example 64MB
Then you can never run out of available memory and it even works for files of 10 TB

(quick sample):
Code: (Select All)
Const BLOCKSIZE = 2 ^ 26 '64MB
Dim block As String * BLOCKSIZE
Dim remaining As _Unsigned _Integer64
Open file$ For Input Access Read As #1
remaining = LOF(1) Mod BLOCKSIZE
Do While Not EOF(1)
  Get #1, , block
  If Not EOF(1) Then
    'process block
  ElseIf remaining > 0 Then 'last incomplete block
    'process remaining characters of block
  End If
Loop
Close #1

or if you prefer for..next with access to previous block:
Code: (Select All)
Const BLOCKSIZE = 2 ^ 26 '64MB
Dim block(0 To 1) As String * BLOCKSIZE
Dim As _Unsigned _Integer64 blocks, remain, r
Dim b As _Byte
Open file$ For Input Access Read As #1
blocks = _Ceil(LOF(1) / BLOCKSIZE)
remain = LOF(1) Mod BLOCKSIZE
For r = 1 To blocks
Get #1, , block(b)
If r < blocks _OrElse remain = 0 Then
'process block(b); previous still in block(1-b) (if r>1)
Else 'last (incomplete) block
'process remain characters of block(b)
End If
b = 1 - b
Next r
Close #1
45y and 2M lines of MBASIC>BASICA>QBASIC>QBX>QB64 experience
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Half baked pipe dream questions - hardware and os Parkland 9 1,310 05-23-2025, 03:00 PM
Last Post: madscijr
  _IIF limits two questions doppler 18 2,992 05-14-2025, 04:33 PM
Last Post: madscijr
  IDE suggestions / editor questions madscijr 14 2,335 05-01-2025, 12:56 PM
Last Post: madscijr
  Just a Few Questions TarotRedhand 15 2,858 09-11-2023, 12:10 PM
Last Post: DSMan195276
  Questions on style justsomeguy 23 4,406 06-28-2023, 08:31 PM
Last Post: CharlieJV

Forum Jump:


Users browsing this thread: 1 Guest(s)