10-11-2023, 02:20 AM
A couple interesting notes here:
1) There's no reason to argue over which is faster CLS or LINE.. If you look in libqb.cpp, you'll see this little snippet of code.
That's for CLS, and it's where we basically do the screen clear for 32-bit screens. Notice that what it calls here is a simple routine called "fast_boxfill"... That's the *exact* same routine that LINE calls when we add the BF tag to the end of it (for Box Filled).
There's not going to be any real difference in the two routines, as they both call the same helper routine to do the exact same job.
2) As amazing as is it, CLS and LINE ...,BF are both faster than _MEMFILL and other memory filling routines that come with C. (Such as the std::fill which I was making use of above for testing, and which a470g was so nice as to provide for us.)
HOW is CLS so much faster than _MEMFILL and such???
It's all in the number of operations which the routines end up doing! Let me explain each for us, and you'll quickly see the difference in performance.
For _MEMFILL, we can basically set a 4 byte color, point it at our image, and then fill *each and every* pixel with that color, using our mem commands.
For CLS (and LINE ..., BF), what Galleon has the code doing is:
a) First we basically do a _MEMFILL to create a single complete line of colors.
b) We then take that completed line and _MEMCOPY it to fill up all the other lines with that same data.
End result is:
_MEMFILL does _WIDTH * _HEIGHT fills of our color.
CLS does _WIDTH fills of our color for one line + _HEIGHT fills of that line to replace all the rest.
See the difference in the number of operations we're making here??
Hats off to Galleon -- he really took some time to work out how to optimize what he was doing with the box_fill routines!
1) There's no reason to argue over which is faster CLS or LINE.. If you look in libqb.cpp, you'll see this little snippet of code.
Code: (Select All)
} else { // 32-bit
i = write_page->alpha_disabled;
write_page->alpha_disabled = 1;
if (write_page->clipping_or_scaling) {
qb32_boxfill(write_page->window_x1, write_page->window_y1, write_page->window_x2, write_page->window_y2, use_color);
} else { // fast method (no clipping/scaling)
fast_boxfill(0, 0, write_page->width - 1, write_page->height - 1, use_color);
}
write_page->alpha_disabled = i;
That's for CLS, and it's where we basically do the screen clear for 32-bit screens. Notice that what it calls here is a simple routine called "fast_boxfill"... That's the *exact* same routine that LINE calls when we add the BF tag to the end of it (for Box Filled).
There's not going to be any real difference in the two routines, as they both call the same helper routine to do the exact same job.
2) As amazing as is it, CLS and LINE ...,BF are both faster than _MEMFILL and other memory filling routines that come with C. (Such as the std::fill which I was making use of above for testing, and which a470g was so nice as to provide for us.)
HOW is CLS so much faster than _MEMFILL and such???
It's all in the number of operations which the routines end up doing! Let me explain each for us, and you'll quickly see the difference in performance.
For _MEMFILL, we can basically set a 4 byte color, point it at our image, and then fill *each and every* pixel with that color, using our mem commands.
For CLS (and LINE ..., BF), what Galleon has the code doing is:
a) First we basically do a _MEMFILL to create a single complete line of colors.
b) We then take that completed line and _MEMCOPY it to fill up all the other lines with that same data.
End result is:
_MEMFILL does _WIDTH * _HEIGHT fills of our color.
CLS does _WIDTH fills of our color for one line + _HEIGHT fills of that line to replace all the rest.
See the difference in the number of operations we're making here??
Hats off to Galleon -- he really took some time to work out how to optimize what he was doing with the box_fill routines!