Login

TerryRitchie · (This post was last modified: 03-11-2024, 03:43 AM by TerryRitchie.)

In my quest learning the _MEM related commands I'm also looking for the fastest possible ways to do things.

When working with image memory blocks many times you need to multiply by 4 to calculate offsets to pixel locations. Instead of multiplying by 4 I thought perhaps shifting left by 2 would be faster and still achieve the same result. So I wrote a little speed test program (below) to prove this out.

When the program is first run the very first loop will sometimes show multiplying is faster than shifting, but then subsequent loops show shifting is faster.

Now here is the really weird part, REM the SLEEP 3 line in the code below and multiplying is always faster ???

Any idea what is causing this? I'm using QB64PE version 3.12.0. (Update: I just ran the code in version 3.11.0 with the same results)

Also, has anyone else noticed that the 3.12.0 IDE flickers occasionally when not using it (not the active window)? As I am writing this text into the forum it's flickering now and then. This also happens when I run code and press keys while the code is running, the IDE flickers in the background.

Code: (Select All)
DIM x AS INTEGER

DIM y AS INTEGER

DIM i AS LONG

DIM t AS DOUBLE

DIM f1 AS LONG

DIM f2 AS LONG

DIM h AS DOUBLE

DIM l AS DOUBLE

DIM p AS DOUBLE

x = 4

DO

    i = 0

    t = TIMER(.001)

    DO

        y = x * 4

        i = i + 1

    LOOP UNTIL TIMER(.001) - t >= 1

    f1 = i

    i = 0

    t = TIMER(.001)

    DO

        y = _SHL(x, 2)

        i = i + 1

    LOOP UNTIL TIMER(.001) - t >= 1

    f2 = i

    p = (f2 - f1) / f1 * 100 ' calculate percentage

    IF p > h THEN h = p

    IF l = 0 THEN l = p

    IF p < l THEN l = p

    CLS

    PRINT

    PRINT f1; "--> Multiplying"

    PRINT f2; "--> Shifing left"

    PRINT

    PRINT f2 - f1; "--> Additional calculations using shifting"

    PRINT USING " ##.##% --> Increase"; p

    PRINT

    PRINT USING " ##.##% --> Lowest seen"; l

    PRINT USING " ##.##% --> Highest seen"; h

    ' Rem the following line for negative results ???

    SLEEP 3

LOOP

**DSMan195276** · 03-11-2024, 03:57 AM

Functionally speaking a regular `* 4` with a constant 4 should not be faster than `_SHL`. The C++ compiler is smart enough to simply use a left shift to implement multiplication when it's a power of two, you don't have to use `_SHL`. The benefits come when the shift value is not a constant, in that case the compiler cannot optimize the multiplication.

For your test code, I think the general issue is that `TIMER(.001)` is significantly more expensive than the single shift you're doing, I would expect any kind of fluctuation to be more related to `TIMER(.001)`s performance than anything else. Instead of looping for a set amount of time, you should instead try performing a set number of `_SHL` or `* 4` commands (ex. 1 million of them) and then measure how long it takes to execute all of them by calling `TIMER(.001)` at the start and end.

(03-11-2024, 03:36 AM)TerryRitchie Wrote: Also, has anyone else noticed that the 3.12.0 IDE flickers occasionally when not using it (not the active window)? As I am writing this text into the forum it's flickering now and then. This also happens when I run code and press keys while the code is running, the IDE flickers in the background.

I have not heard about that, but I'll keep an eye out, that sounds odd.

TerryRitchie · (This post was last modified: 03-11-2024, 04:20 AM by TerryRitchie.)

(03-11-2024, 03:57 AM)DSMan195276 Wrote: Functionally speaking a regular `* 4` with a constant 4 should not be faster than `_SHL`. The C++ compiler is smart enough to simply use a left shift to implement multiplication when it's a power of two, you don't have to use `_SHL`. The benefits come when the shift value is not a constant, in that case the compiler cannot optimize the multiplication.

For your test code, I think the general issue is that `TIMER(.001)` is significantly more expensive than the single shift you're doing, I would expect any kind of fluctuation to be more related to `TIMER(.001)`s performance than anything else. Instead of looping for a set amount of time, you should instead try performing a set number of `_SHL` or `* 4` commands (ex. 1 million of them) and then measure how long it takes to execute all of them by calling `TIMER(.001)` at the start and end.

(03-11-2024, 03:36 AM)TerryRitchie Wrote: Also, has anyone else noticed that the 3.12.0 IDE flickers occasionally when not using it (not the active window)? As I am writing this text into the forum it's flickering now and then. This also happens when I run code and press keys while the code is running, the IDE flickers in the background.
I have not heard about that, but I'll keep an eye out, that sounds odd.

Thanks for the reply. I'll rework the code to keep TIMER out of the loops.

The flickering is the line numbers on the left hand side highlighting very briefly along with the highlight bar (the current line the cursor is on). It's as if the IDE is becoming the active window for very brief periods of time. This only happens when typing on the keyboard.

TerryRitchie · 03-11-2024, 04:34 AM

Ok, I reworked the code and multiplying is definitely faster. Thanks for the insight DSMan195276.

Code: (Select All)
DIM x AS INTEGER

DIM y AS INTEGER

DIM i AS LONG

DIM t AS DOUBLE

DIM t1 AS DOUBLE

DIM t2 AS DOUBLE

x = 4

DO

    i = 0

    t = TIMER(.001)

    DO

        y = x * 4

        i = i + 1

    LOOP UNTIL i = 1000000000 ' a billion times

    t1 = TIMER(.001) - t

    i = 0

    t = TIMER(.001)

    DO

        y = _SHL(x, 2)

        i = i + 1

    LOOP UNTIL i = 1000000000

    t2 = TIMER(.001) - t

    BEEP

    CLS

    PRINT

    PRINT USING " #.### --> Multiplying"; t1

    PRINT USING " #.### --> Shiting left"; t2

LOOP

**a740g** · 03-11-2024, 05:16 AM

You might see different results with compiler optimizations enabled and even more when comparing _SHR to integer divisions.

**DSMan195276** · 03-11-2024, 06:21 AM

Exactly what @a740g said. It seems the likely issue you're seeing is that because no optimization is applied to the generated source (unless you enable it in the `Compiler Settings`), the `_shl` function is not inlined by the compiler, that makes it significantly slower than the `* 4` which is just turned into a single shift instruction. You can see it in the disassembly, the first highlighted line is the shift generated for the `y = x * 4`, the second highlighted line is the call to the `_SHL` function when doing `_SHL(x, 2)`.

Kernelpanic · 03-11-2024, 12:28 PM

@Terry - I changed the program slightly to compare the times and added a termination condition. Hope you don't mind.

Flickering: Maybe the IDE under QB 3.12 requires more memory, or generally requires more performance?

Code: (Select All)



'Geschwindigkeitstest fuer Bilderverschieben, Terry - 11. Maerz 2024



Option _Explicit



Dim x As Integer

Dim y As Integer

Dim i As Long

Dim t As Double

Dim t1 As Double

Dim t2 As Double

Dim As Integer z



Locate 2, 2

Print "Shows how long it takes for 1 billion shifts "

Locate 3, 2

Print "with multiplication or shift."

Locate 5, 2



z = 0

x = 4

Do

  i = 0

  t = Timer(.001)

  Do

    y = x * 4

    i = i + 1

  Loop Until i = 1000000000 ' a billion times

  t1 = Timer(.001) - t



  i = 0

  t = Timer(.001)

  Do

    y = _ShL(x, 2)

    i = i + 1

  Loop Until i = 1000000000

  t2 = Timer(.001) - t



  Beep

  'Cls

  Print

  Print Using " #.### --> Multiplying"; t1

  Print Using " #.### --> Shiting left"; t2

  Locate CsrLin + 1, 2



  z = z + 1

Loop Until z = 4



End

[Image: Bildverschiebung-Zeitdauer2024-03-11.jpg]

mcalkins · 05-05-2024, 11:51 PM

When writing assembly, shl is better than mul, and shr is way better than div.

When writing c++, with constants, as dsman said, it shouldn't matter, if optimization is enabled. The c++ compiler should be able to optimize multiplications and divisions by constant powers of 2 into shifts.

_SHL(x, 2)

Ugh. I think i'd have rather seen it as a << operator. I don't think that would conflict with existing basic source code. Or even x _SHL 2 , in the same style as MOD or AND. But regardless, why is it a function at all? Couldn't the qb64 compiler treat it like an intrinsic, and just transform it into x << 2 ?

Login
Username/Email:
Password:	Lost Password?
	Remember me