05-11-2024, 04:55 PM
(05-11-2024, 04:22 PM)bplus Wrote: do these time improvements have anything to do with string concatenation you were talking about earlier? when massive improvements seen with 100's 1000's of strings.
They do, in a round about away.
For our IDE, you have to understand how our program is stored -- everything is all in one long string! (idet$)
Now, when storing data like this, you also have to store the length of the data, or else you won't know where one starts and the other ends. We do that by storing the length before AND after the string (for quick navigation back and forth between lines).
So any given line of data looks like:
MKL$(LEN(text$)) + text$ + MKL$(LEN(text$))
So a simple PRINT statement would be saved as basically "5PRINT5".
Now, let's add a second print statement -- idet$ is now: "5PRINT55PRINT5"
Easy enough to understand, parse, and decipher. Right?
But now think about how you'd add a new line, or command into the middle of that program. Let's say we want to add a CLS statement:
idet$ = LEFT$(idet$, before_new_line) + MKL$(LEN(text$)) + text$ + MKL$(LEN(text$)) + MID$(idet$, after_new_line)
We're adding 5 segments of information together to make one new contigious string to hold the whole program.
And with that in mind, do you remember the overhead that I was talking about with adding large strings? <-- That's basically where the IDE slows down as your program gets longer and longer -- it's moving larger and larger strings about in memory with each new line, or even each new character you type!
So I had this basic thought on how to improve things:
To start with, let's look at what we're adding together:
LEFT$(idet$, before_new_line) -- generally a long string (say lines 0 to 1000 in your program)
MKL$(LEN(text$)) -- 4 byte short string (string representation of the length of the new line)
text$ -- probably a fairly short string. (the new line, which is what? 100 characters or so max, in most cases?)
MKL$(LEN(text$)) -- 4 bytes once again.
MID$(idet$, after_new_line)-- generally a long string (say lines 1001 to the end of your program)
So just by writing the line where we add things together, we're doing a lot of string allocation, moving, and freeing.
long string + short string = long string (lots of memory) <first operation LEFT$ + MKL$)
long string + short string = long string (lots of memory) <that result + text$)
long string + short string = long string (lots of memory) <that result + MKL$)
long string + long string = long string (lots of memory) <that result + MID$)
So, as you can see, that's lots of large chunks of memory being allocated, moved, added, and freed just to make our final result.
Sp let's make the simplest of changes in the world:
insert$ = MKL$(LEN(text$)) + text$ + MKL$(LEN(text$))
idet$ = LEFT$ + insert$ + RIGHT$
The insert$ is now moving those small blocks of memory and adding them together to make one final small block of memory.
And we now only need to move large blocks of memory twice to get the same result.
And that's where my basic idea for how to improve the IDE performance started at. Reduce/minimize/optimize the number of times we actually initialize/move/free those blocks of memory.
And once you start down this type of road for optimizing things, you just keep on tweaking one little thing at a time, until you just can't tweak it anymore.
It's not changing the format of the way we hold our data (I'd love to swap over to using a pure array to store each line of data sometime). We're not changing the index. Or making any sweeping changes.
We're basically just changing the way we add those strings together to minimize how many times we move large chunks of data about in memory.