String Addition (Optimization) - Printable Version +- QB64 Phoenix Edition (https://qb64phoenix.com/forum) +-- Forum: Official Links (https://qb64phoenix.com/forum/forumdisplay.php?fid=16) +--- Forum: Learning Resources and Archives (https://qb64phoenix.com/forum/forumdisplay.php?fid=13) +--- Thread: String Addition (Optimization) (/showthread.php?tid=2624) Pages:
1
2
|
String Addition (Optimization) - SMcNeill - 04-26-2024 Code: (Select All)
I think the code above helps showcase one of QB64's and QB64PE's greatest flaws -- the time it takes to do anything substantial with strings! The above simply makes a list of strings, by using the numbers from 1 to whatever LIMIT we set, and then it adds those strings all into one massive string. Basicially "1" + "2" + "3" +... = "123..." -- a simple process, but all that is needed to showcase how slow this process can become. In the above, I've made use of two different methods for speed comparison, for folks. The first method is what we see a lot in our programs: temp$ = temp$ + num$. It simply adds the strings together over and over until it's finished. The second method is much more efficient: mid%(temp$, p) = num$. This has to pre-allocate the proper size of the string, create the string, and then it simply uses mid$ to swap out the proper portion of that string with num$ as it goes along. I won't post my times on this, as I'd prefer for folks to run this code at least once themselves and see what the difference in speed might be on their own system, with whatever compiler flags/optimizations they might be using. Let's just say that I'm certain folks should be able to see a noticeable difference here. NOTE: And one important thing to note here -- the larger the strings involved, the greater the difference becomes. The difference here isn't a static, "it takes 0.1 seconds to perform X additions to the string". Change that limit and you can see for yourself. A limit of 100000 takes X seconds. A limit of 110000 takes 2 * x seconds. A limit of 120000 takes 4 * x seconds. Exponential time increases as string length increases!! (Though not quite as bad as the * 2 modifier I just tossed out as an example to the left there. ) Take away from all this?? If you're going to be doing a lot of string addition, you'd probably be better served by running it through two loops, instead of just adding the strings together in one. Loop 1: Count the total size of the string FOR i = 1 TO limit totalSize = totalSize + len(num$(i)) NEXT Allocate the output string first: temp$ = SPACE$(totalSize) Loop 2: Mid$ the string together instead of adding it p = 1 FOR i = 1 TO limit mid$(temp$, p) = num(i) p = p + len(num(i)) NEXT It's thinking outside the box, and it's not the way that most folks tend to do things, but I'd think the above showcases WHY this might be an important concept to keep under your hat, if you're ever doing any serious string work in your projects. (And feel free to post your own times below. I'd love to see how large a difference such a simple approach makes for most people on their own systems. I just don't want to spoil the surprise by posting my own times here, and you guys might not want to read below this post until you run the code for yourself once, just for the "pop factor" of the figures. ) RE: String Addition (Optimization) - Jack - 04-26-2024 I like benchmarking different algorithms, my time is Quote:It took 13.715 seconds to AddStrings RE: String Addition (Optimization) - Jack - 04-26-2024 @SMcNeill in FreeBasic the times were about the same for both methods or .004 seconds but QB64pe beats FreeBasic by a factor of 2.8 in your Fibonacci program here https://qb64phoenix.com/forum/showthread.php?tid=1830&pid=17680#pid17680 Quote:QB64 time .332 seconds RE: String Addition (Optimization) - TerryRitchie - 04-26-2024 20.75 addstrings 0.007 midstrings Holy cow. Can anything be done to improve QB64's string manipulation? The second method is definitely thinking outside the box like you said, but well worth utilizing for large string concatenation. RE: String Addition (Optimization) - SMcNeill - 04-26-2024 (04-26-2024, 02:19 AM)TerryRitchie Wrote: 20.75 addstrings If so, I wouldn't know how -- it'd have to be some sort of change by someone much more knowledgeable about the inner workings of C and its memory than what I have. To explain the difference in the speed here, let me go through what both of these processes are actually doing for us. First, let's go with adding strings together. c$ = c$ + a$ 1) Fiirst we have to check the length of both strings. len(c$), len (a$) 2) Add those lengths together and make certain we have that much memory availble for use. Create a temp mem-block of the proper size. 3) Add those two strings to that temp mem-block, in order. (Put c$ in that block, then put a$ there.) 4) Free the old c$ from memory. 5) Point c$ to that temp mem-block, so it now contains the full data of both strings put together. Make a block of memory. Merge the strings to that block. Free a block of memory. Point the old block to the new block... <-- That's the basic process in a nutshell. And let's compare it to the mid$ method: 1) Fiirst we have to check the length of both strings. len(c$), len (a$) 2) Add those lengths together and make certain we have that much memory availble for use. Create a temp mem-block of the proper size. 3) Add those two strings to that temp mem-block, in order. (Put c$ in that block, then put a$ there.) Make a block large enough. Put the data in that block. Done. <--- And that's all we do with mid$. Now, where we REALLY see a difference in speed is when we repeat this process over and over in a loop. Adding a string repetitively means freeing that old block over and over, and creating a new block over and over, and then pointing the old block to the new block. On the other hand, if we pre-allocate all the memory for that new string at once, we skip that free/redirect over and over -- and the larger the memory block that we're allocating, freeing, and such, the longer it takes for the OS to make that manipulation. See why mid$ ends up being faster for us, no matter which OS/compiler options we end up setting? It basically skips the most intensive part of the process -- the repeated allocating/freeing that comes from making certain we have enough memory to add those two strings together and assign them to one. RE: String Addition (Optimization) - Dimster - 04-26-2024 Steve would you consider this use of Mid$ a worthy example to be added to the wiki?? RE: String Addition (Optimization) - RhoSigma - 04-26-2024 (04-26-2024, 01:22 PM)Dimster Wrote: Steve would you consider this use of Mid$ a worthy example to be added to the wiki?? Indeed, would be a good showcase for the concatenation flaws and help potential users to decide which method is best for their working cases. Possible places to mention that and placing the example whould be the following wiki pages: https://qb64phoenix.com/qb64wiki/index.php/STRING https://qb64phoenix.com/qb64wiki/index.php/%2B (+) https://qb64phoenix.com/qb64wiki/index.php/Concatenation RE: String Addition (Optimization) - mdijkens - 04-26-2024 Yep, standard stuff with QB64pe and large strings Imagine the differences when working with Terabyte files When working with large strings; really think twice about your implementation and know the differences Hint: _Mem Hint2: Asc(a$, x&) Hint3: Blocksizes of 4MB ~ 64MB work best RE: String Addition (Optimization) - bplus - 04-26-2024 line 15 should be o1$ not o$ not that it matters as long as it's doing it right very impressive RE: String Addition (Optimization) - DSMan195276 - 04-29-2024 (04-26-2024, 02:19 AM)TerryRitchie Wrote: Holy cow. Can anything be done to improve QB64's string manipulation?There are many improvements that could be made, the existing system has some pretty significant bugs as well as not being the most performant it could be. It requires someone to take the time to fix it though, that tends to be the limiting factor for a lot of things related to QB64 That said, there are some trade-offs that mean concatenation will always be slow. Many languages have this problem and introduce a kind of 'string builder' API to efficiently create strings, `MID$()` in QB64 like Steve used it is probably the closest thing to that. |