Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Shifting versus Mutliplying
#1
In my quest learning the _MEM related commands I'm also looking for the fastest possible ways to do things.

When working with image memory blocks many times you need to multiply by 4 to calculate offsets to pixel locations. Instead of multiplying by 4 I thought perhaps shifting left by 2 would be faster and still achieve the same result. So I wrote a little speed test program (below) to prove this out.

When the program is first run the very first loop will sometimes show multiplying is faster than shifting, but then subsequent loops show shifting is faster.

Now here is the really weird part, REM the SLEEP 3 line in the code below and multiplying is always faster ???

Any idea what is causing this? I'm using QB64PE version 3.12.0. (Update: I just ran the code in version 3.11.0 with the same results)

Also, has anyone else noticed that the 3.12.0 IDE flickers occasionally when not using it (not the active window)? As I am writing this text into the forum it's flickering now and then. This also happens when I run code and press keys while the code is running, the IDE flickers in the background.

Code: (Select All)
DIM x AS INTEGER
DIM y AS INTEGER
DIM i AS LONG
DIM t AS DOUBLE
DIM f1 AS LONG
DIM f2 AS LONG
DIM h AS DOUBLE
DIM l AS DOUBLE
DIM p AS DOUBLE

x = 4
DO
    i = 0
    t = TIMER(.001)
    DO
        y = x * 4
        i = i + 1
    LOOP UNTIL TIMER(.001) - t >= 1
    f1 = i

    i = 0
    t = TIMER(.001)
    DO
        y = _SHL(x, 2)
        i = i + 1
    LOOP UNTIL TIMER(.001) - t >= 1
    f2 = i

    p = (f2 - f1) / f1 * 100 ' calculate percentage

    IF p > h THEN h = p
    IF l = 0 THEN l = p
    IF p < l THEN l = p

    CLS
    PRINT
    PRINT f1; "--> Multiplying"
    PRINT f2; "--> Shifing left"
    PRINT
    PRINT f2 - f1; "--> Additional calculations using shifting"
    PRINT USING " ##.##% --> Increase"; p
    PRINT
    PRINT USING " ##.##% --> Lowest seen"; l
    PRINT USING " ##.##% --> Highest seen"; h

    ' Rem the following line for negative results ???
    SLEEP 3

LOOP
New to QB64pe? Visit the QB64 tutorial to get started.
QB64 Tutorial
Reply
#2
Functionally speaking a regular `* 4` with a constant 4 should not be faster than `_SHL`. The C++ compiler is smart enough to simply use a left shift to implement multiplication when it's a power of two, you don't have to use `_SHL`. The benefits come when the shift value is not a constant, in that case the compiler cannot optimize the multiplication.

For your test code, I think the general issue is that `TIMER(.001)` is significantly more expensive than the single shift you're doing, I would expect any kind of fluctuation to be more related to `TIMER(.001)`s performance than anything else. Instead of looping for a set amount of time, you should instead try performing a set number of `_SHL` or `* 4` commands (ex. 1 million of them) and then measure how long it takes to execute all of them by calling `TIMER(.001)` at the start and end.

(03-11-2024, 03:36 AM)TerryRitchie Wrote: Also, has anyone else noticed that the 3.12.0 IDE flickers occasionally when not using it (not the active window)? As I am writing this text into the forum it's flickering now and then. This also happens when I run code and press keys while the code is running, the IDE flickers in the background.
I have not heard about that, but I'll keep an eye out, that sounds odd.
Reply
#3
(03-11-2024, 03:57 AM)DSMan195276 Wrote: Functionally speaking a regular `* 4` with a constant 4 should not be faster than `_SHL`. The C++ compiler is smart enough to simply use a left shift to implement multiplication when it's a power of two, you don't have to use `_SHL`. The benefits come when the shift value is not a constant, in that case the compiler cannot optimize the multiplication.

For your test code, I think the general issue is that `TIMER(.001)` is significantly more expensive than the single shift you're doing, I would expect any kind of fluctuation to be more related to `TIMER(.001)`s performance than anything else. Instead of looping for a set amount of time, you should instead try performing a set number of `_SHL` or `* 4` commands (ex. 1 million of them) and then measure how long it takes to execute all of them by calling `TIMER(.001)` at the start and end.

(03-11-2024, 03:36 AM)TerryRitchie Wrote: Also, has anyone else noticed that the 3.12.0 IDE flickers occasionally when not using it (not the active window)? As I am writing this text into the forum it's flickering now and then. This also happens when I run code and press keys while the code is running, the IDE flickers in the background.
I have not heard about that, but I'll keep an eye out, that sounds odd.
Thanks for the reply. I'll rework the code to keep TIMER out of the loops.

The flickering is the line numbers on the left hand side highlighting very briefly along with the highlight bar (the current line the cursor is on). It's as if the IDE is becoming the active window for very brief periods of time. This only happens when typing on the keyboard.
New to QB64pe? Visit the QB64 tutorial to get started.
QB64 Tutorial
Reply
#4
Ok, I reworked the code and multiplying is definitely faster. Thanks for the insight DSMan195276.

Code: (Select All)
DIM x AS INTEGER
DIM y AS INTEGER
DIM i AS LONG
DIM t AS DOUBLE
DIM t1 AS DOUBLE
DIM t2 AS DOUBLE

x = 4
DO
    i = 0
    t = TIMER(.001)
    DO
        y = x * 4
        i = i + 1
    LOOP UNTIL i = 1000000000 ' a billion times
    t1 = TIMER(.001) - t

    i = 0
    t = TIMER(.001)
    DO
        y = _SHL(x, 2)
        i = i + 1
    LOOP UNTIL i = 1000000000
    t2 = TIMER(.001) - t

    BEEP
    CLS
    PRINT
    PRINT USING " #.### --> Multiplying"; t1
    PRINT USING " #.### --> Shiting left"; t2

LOOP
New to QB64pe? Visit the QB64 tutorial to get started.
QB64 Tutorial
Reply
#5
You might see different results with compiler optimizations enabled and even more when comparing _SHR to integer divisions.
Reply
#6
Exactly what @a740g said. It seems the likely issue you're seeing is that because no optimization is applied to the generated source (unless you enable it in the `Compiler Settings`), the `_shl` function is not inlined by the compiler, that makes it significantly slower than the `* 4` which is just turned into a single shift instruction. You can see it in the disassembly, the first highlighted line is the shift generated for the `y = x * 4`, the second highlighted line is the call to the `_SHL` function when doing `_SHL(x, 2)`.


[Image: shift.png]
Reply
#7
@Terry - I changed the program slightly to compare the times and added a termination condition. Hope you don't mind.

Flickering: Maybe the IDE under QB 3.12 requires more memory, or generally requires more performance?

Code: (Select All)

'Geschwindigkeitstest fuer Bilderverschieben, Terry - 11. Maerz 2024

Option _Explicit

Dim x As Integer
Dim y As Integer
Dim i As Long
Dim t As Double
Dim t1 As Double
Dim t2 As Double
Dim As Integer z

Locate 2, 2
Print "Shows how long it takes for 1 billion shifts "
Locate 3, 2
Print "with multiplication or shift."
Locate 5, 2

z = 0
x = 4
Do
  i = 0
  t = Timer(.001)
  Do
    y = x * 4
    i = i + 1
  Loop Until i = 1000000000 ' a billion times
  t1 = Timer(.001) - t

  i = 0
  t = Timer(.001)
  Do
    y = _ShL(x, 2)
    i = i + 1
  Loop Until i = 1000000000
  t2 = Timer(.001) - t

  Beep
  'Cls
  Print
  Print Using " #.### --> Multiplying"; t1
  Print Using " #.### --> Shiting left"; t2
  Locate CsrLin + 1, 2

  z = z + 1
Loop Until z = 4

End

[Image: Bildverschiebung-Zeitdauer2024-03-11.jpg]
Reply
#8
When writing assembly, shl is better than mul, and shr is way better than div.

When writing c++, with constants, as dsman said, it shouldn't matter, if optimization is enabled. The c++ compiler should be able to optimize multiplications and divisions by constant powers of 2 into shifts.

_SHL(x, 2)

Ugh. I think i'd have rather seen it as a << operator. I don't think that would conflict with existing basic source code. Or even x _SHL 2 , in the same style as MOD or AND. But regardless, why is it a function at all? Couldn't the qb64 compiler treat it like an intrinsic, and just transform it into x << 2 ?
Reply




Users browsing this thread: 5 Guest(s)