QB64 Phoenix Edition
DAY 007: _PRESERVE - Printable Version

+- QB64 Phoenix Edition (https://qb64phoenix.com/forum)
+-- Forum: Official Links (https://qb64phoenix.com/forum/forumdisplay.php?fid=16)
+--- Forum: Learning Resources and Archives (https://qb64phoenix.com/forum/forumdisplay.php?fid=13)
+---- Forum: Keyword of the Day! (https://qb64phoenix.com/forum/forumdisplay.php?fid=49)
+---- Thread: DAY 007: _PRESERVE (/showthread.php?tid=1111)



DAY 007: _PRESERVE - SMcNeill - 11-12-2022

This is another one of those commands that, unfortunately, we see people using wrong all the time.  Let's see if we can help showcase a bit of a better way to use it than what many are doing for their standard practice. 

First, let's start with the same basic questions as every day.

What is it?  _PRESERVE is a command which is used when REDIM-ing an array and one wishes to preserve the existing data within it.  It mainly only works properly with single dimensional arrays, so I'm not going to talk about the issues it faces with multi-dimensional arrays here.  If someone is truly curious about those problems, go watch my movie-length video on REDIM and how it interacts with _MEM commands and memory.  https://qb64phoenix.com/forum/showthread.php?tid=172&pid=707#pid707

How's it used?  Place your _PRESERVE statement after REDIM and then set your array to the new size that you need it to be.  The syntax here is rather simple for folks to grasp.  Wink

So how's it used wrongly??  Let me share a BAD example of the command to begin with -- we see this type of code all the time:

Code: (Select All)
OPEN "myfile.txt" FOR INPUT AS #1
DO UNTIL EOF(1)
    count = count + 1
    IF count > UBOUND(array) THEN REDIM _PRESERVE array(count) AS STRING
    LINE INPUT array(count)
LOOP

A file is opened where one might not know the length of the file contents, but they want to read each line into an array.  How do many people do this?  They end up reading the file one line at a time and growing the array with REDIM and _PRESERVE until it's large enough to hold all their data.

As I've said many times regarding this practice:  Yuck!  Yucky!  YUCK!!

Let me showcase why this is a bad practice:

Code: (Select All)
Limit = 5000000 '5,000,000  -- only a limit of five million

count = 0: t## = Timer
ReDim Array(0) As Long 'a standard redimable array, with a starting index of zero
Do
    count = count + 1
    If count > UBound(Array) Then ReDim _Preserve Array(count) As Long
Loop Until count >= Limit
Print Using " ##.##### seconds to redim and preserve our array as we go one increment at a time."; Timer - t##

Print "Phew!  That took a wee bit to just redim and count, now didn't it?"
Print "IS there a better way??"
Print
Print "How about:"

count = 0: t## = Timer
ReDim _Preserve Array(100000) As Long 'start with an arbitary large number to begin with...

Do
    count = count + 1
    If count > UBound(Array) Then ReDim _Preserve Array(count + 100000) As Long 'add a large number of elements all at once, instead of 1 at a time
Loop Until count >= Limit
ReDim _Preserve Array(count) As Long 'resize the array to the max size AFTER the loop is finished.
Print Using " ##.##### seconds to redim and preserve our array as we go in large chunks."; Timer - t##


Now there's two REDIM and _PRESERVE loops inside the above code.  Let's explain them both a little:

The first loop does nothing but counts from 0 to 5,000,000 and redims and preserves as we see above -- WITHOUT any calls to load data from the drive or anything else.  All I want to show here is *how long* it takes for REDIM and PRESERVE to resize our array in this manner.

Our second loop works a wee bit different than the first -- instead of resizing itself one element at a time, it resizes in *large chunks*.  At the end of our loops, BOTH arrays are the exact same size -- but there's a slight difference in the speed and performance between the two routines as show in the image below.

[Image: image.png]

12.2 seconds for the first loop to run and resize.  0.1 seconds for the second loop.   AND REMEMBER -- This isn't actually reading data or assigning data to the array, or anything else with these loops.  This is the simple speed difference in how long it takes them both to resize and count to five million!

Use REDIM and _PRESERVE properly in your code, and you may be able to cut down load/processing times from multiple seconds/minutes to just fractions of a second.  _PRESERVE is an important command for any programmer's tool box, but it's definitely one which needs to be used properly so that it doesn't bog down your programs unnecessarily.


RE: DAY 007: _PRESERVE - Pete - 11-12-2022

Woohoo! Another one that needs no <bag> tag from me.

Steve's second example is how I set up my W.P. database read/write file system. I need speed for that, as it can handle thousands of text line entries. Sam-Clip I'm starting with the first example for the beta phase. Why? Because it's simple to implement and track. With the first method I can just use ubounds() as the terminal entry.

Now here's one I love that wasn't mentioned, array shuffling...

Let's say you have 10 arrays and you want to cut of numbers 3 through 6.
Code: (Select All)
REDIM c$(10)
FOR i = 1 TO 10: c$(i) = CHR$(64 + i): PRINT c$(i);: NEXT 'ABCDEFGHIJ"
PRINT
x1 = 3: x2 = 6 ' Cut this range out.
FOR i = 1 TO 10
    IF i >= x1 AND i <= x2 THEN
        c$(i) = "nul" ' Tag and bag.
    END IF
NEXT
i = 0: j = 0: terminate = UBOUND(c$)
DO
    i = i + 1
    IF c$(i + j) = "nul" THEN
        DO
            j = j + 1
        LOOP UNTIL c$(i + j) <> "nul" OR i + j = terminate
    END IF
    c$(i) = c$(i + j)
LOOP UNTIL i + j = terminate
REDIM _PRESERVE c$(i): terminate = UBOUND(c$)
FOR i = 1 TO terminate
    PRINT c$(i);
NEXT

You can add to an array in a similar manner. Other methods exist, feel free to post what you use REDIM _PRESERVE for!

Pete


RE: DAY 007: _PRESERVE - mnrvovrfc - 11-12-2022

Curiously chosen QB64-only keyword. I was trying to point out to this:

https://qb64phoenix.com/forum/showthread.php?tid=1109&pid=9745#pid9745

It confused a couple of users. Maybe on a 64-bit system many memory block resizing requests in a short time could be handled well, but on 32-bit this was ornery. What I tended to do was read in the file twice (but it might have been even less efficient), the first time to discover how many elements to read in, then use "REDIM" for the first time and then read the file again. It had to be done that way without "_PRESERVE" and for those who didn't have M$ BASIC PDS v7.1. I was that afraid of making a program crash, and even when I started using 32-bit Freebasic because I had really bad experiences using 16-bit Power C for a year or so.


RE: DAY 007: _PRESERVE - SMcNeill - 11-12-2022

(11-12-2022, 03:13 PM)mnrvovrfc Wrote: Curiously chosen QB64-only keyword. I was trying to point out to this:

https://qb64phoenix.com/forum/showthread.php?tid=1109&pid=9745#pid9745

It confused a couple of users. Maybe on a 64-bit system many memory block resizing requests in a short time could be handled well, but on 32-bit this was ornery. What I tended to do was read in the file twice (but it might have been even less efficient), the first time to discover how many elements to read in, then use "REDIM" for the first time and then read the file again. It had to be done that way without "_PRESERVE" and for those who didn't have M$ BASIC PDS v7.1. I was that afraid of making a program crash, and even when I started using 32-bit Freebasic because I had really bad experiences using 16-bit Power C for a year or so.

I've followed that approach as well, in the past, but don't tend to do so very often anymore.  Disk access and reading a file can take several seconds (or hours if one opens the file FOR INPUT).  As you can tell from my above demo, our REDIM _PRESERVE only tacks on a fraction of an second to resize most arrays.  When one is worried about loading times (such as when loading a large CSV database into memory, for example), it's often better to REDIM _PRESERVE in "large chunks", than it is to read the file for a line/element count, and then DIM once and have to reread the file a second time to actually read in the data.

It's honestly not very often at all anymore where I do the "read twice, dim once" method.  In fact, the method I often rely on the most is the "DIM stupidly large, read once, resize once" method.

REDIM array(1234567890) AS STRING
...read in data, count as I do so
REDIM _PRESERVE array(count) AS STRING 'redim down to the proper size so excessive amounts of memory aren't being reserved for no reason.


RE: DAY 007: _PRESERVE - Pete - 11-12-2022

@mnrvovrfc

Your post reminded me of the guy who said, "I'm afraid to go to Hawaii because my cheap ore made in China might break." Well sure it will break, but next time buy a boat with a friggin' engine, preferably one made in the U.S.A. and powered by diesel. Don't go buying a stupid electric ferry like a town in the state of Washington got 8-million tax dollars to buy. Anyway, reality aside for a minute and back to how this relates to REDIM _PRESERVE. My rule of thumb has always been to rely on system resources for small code that speed is of virtually no issue. Very fast to implement. For example, using UBOUNDS() instead of making a counter index. Other short-cuts I love are things like mytimer. My patented fill makes mytimer the absolute easiest timer to code, and if you use your PROMO code on the screen, you can get two mytimers for the proice of one, but hurry, supplies are limited. What's that, time for my next lithium treatment, well hold on a minute...

Code: (Select All)
mytimer = TIMER: DO UNTIL ABS(TIMER - mytimer) >= 60: LOOP: PRINT "Okay, I held on a minute."

Okay, so it momentarily fails once at midnight. Big deal, I'm watching por.. porkie pig reruns that time of night, anyway.

My point, maybe not for 60 seconds but a s 1/10th of a second delay, I can even live with the midnight glitch.

Oh, if someone doesn't know how to code for midnight, it goes something like this...

Code: (Select All)
mytimer = TIMER
DO
    IF mytimer > TIMER THEN mytimer = mytimer - 86400 ' Midnight adjustment for when TIMER goes to zero.
LOOP UNTIL TIMER - mytimer >= 60
PRINT "Okay, I held on a minute."

Now, would my short-cut antics win any coding awards? Oh hell no! But it sure is fun taking advantage of system resources for small non-speed dependent projects. For larger and more speed sensitive projects, I too use variables to represent resources and won't take short-cuts while watching Loony Tunes after midnight.

Pete