Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Working on Base-85 Encoder/Decoder Functions
#1
When I learned QB64-PE allows resources to be loaded from memory it re-kindled my interest in encoding methods to put small files in BAS code, like Base64 is used in several programs found here.  One step more space efficient than Base64 is Base85 (also called ASCII85) which outputs smaller encoded data and uses a larger character set, so I started trying to figure it out.  It's been a real challenge as I am not very math savvy and there were no BASIC sources of Base85 out there I could find to study.

Here's what I have so far.  This will take 4 bytes of binary data, convert it to an integer, divide that up and map them to a characterset, outputting 5 bytes of printable characters that can be included in BAS code.  The decoder reverses the process. I have tested these functions on a few files, and they seem to encode/decode ok here - but they is VERY SLOW.  If you would like to try them and/or add some improvements please do (Speed it up if you can).   I'm also working on Base91 too, but it's not ready to post yet.

- Dav

Edit: Updated version.  Now encode/decode much faster.
Code: (Select All)

'==========
'BASE85.BAS - v0.30
'==========
'Base85 Encoder/Decoder Functions.
'Coded by Dav for QB64-PE 3.8.0, SEP/2023

'CREDIT:  Thanks to RhoSigma for helping me make these lightning fast!!


a1$ = Space$(1000000) ' make a 1MB string to test encoder/decoder speed

Print
Print "Original size:"; Len(a1$); "bytes."
Print

t1# = Timer
Print "Encoding"; Len(a1$); "bytes....";
a2$ = Base85Encode$(a1$)
Print Timer - t1#; "secs, output size:"; Len(a2$)
Print
t1# = Timer
Print "Decoding"; Len(a2$); "bytes....";
a3$ = Base85Decode$(a2$)
Print Timer - t1#; "secs, output:"; Len(a3$)
Print

If a1$ = a3$ Then
    Print "Original Data and Decoded Data Match!"
    Print Len(a1$), Len(a3$)
Else
    Print "Original Data and Decoded DO NOT Match!"
    Print Len(a1$), Len(a3$)
End If



Function Base85Encode$ (in$)
    For i = 39 To 125 'Make 85 character set to use
        If i <> 64 And i <> 96 Then c$ = c$ + Chr$(i)
    Next
    Dim v As _Unsigned Long

    t$ = in$ 'make a working copy so in$ isn't changed

    If Len(t$) Mod 4 > 0 Then 'pad needed bytes on end
        a = 5 - Len(t$) Mod 4
        t$ = t$ + Space$(a - 1)
    End If

    out$ = Space$(Len(t$) * 1.25): outb& = 1

    For i& = 1 To Len(t$) Step 4
        v = CVL(Mid$(t$, i&, 4))
        For j& = 4 To 0 Step -1
            p& = 85 ^ j&
            r& = v \ p&
            v = v Mod p&
            Mid$(out$, outb&, 1) = Mid$(c$, r& + 1, 1)
            outb& = outb& + 1
        Next
    Next
    Base85Encode$ = LTrim$(RTrim$(Str$(a))) + out$
End Function

Function Base85Decode$ (in$)
    For i = 39 To 125 'Make an 85 character set
        If i <> 64 And i <> 96 Then c$ = c$ + Chr$(i)
    Next
    Dim v As _Unsigned Long
    t$ = in$ 'use a working copy so in$ isn't changed
    a = Val(Mid$(t$, 1, 1)) 'grab pad number

    out$ = Space$(Len(t$) / 1.25 - 1): outb& = 1

    For i& = 2 To Len(in$) Step 5: v = 0
        For j& = 0 To 4: p& = 85 ^ (4 - j&)
            cv& = InStr(c$, Mid$(t$, i& + j&, 1)) - 1
            v = v + cv& * p&
        Next

        Mid$(out$, outb&, 4) = MKL$(v)
        outb& = outb& + 4

    Next
    Base85Decode$ = Mid$(out$, 1, Len(out$) - a + 1)
End Function

Find my programs here in Dav's QB64 Corner
Reply
#2
Why not go from 64 to base-128?  Everything from ASC(32) to ASC(160) should represent in the IDE with 0 issues, and all you'd need to do is convert from 8-bit to 7-bit data.
Reply
#3
That's an idea to play with.   WIll give it a go.  Would base-128 code post on forums ok?  Maybe would need to skip the " chr(34) in the ide. The ` seems to be messing up posted code here lately when several are in a row, and the @ symbol was a problem on the old form I recall. 

Edit:  Hey Steve - I just remembered something.  Didn't you do a base-128 thing in a Christmas demo a few years back?  I seem to recall a loooong one line code thing.  I'm gonna look through my code stash!

- Dav

Find my programs here in Dav's QB64 Corner
Reply
#4
Code: (Select All)
SCREEN _NEWIMAGE(800, 600, 32)

FOR i = 0 TO 25 'Print A to Z, from the alphabet
    text$ = text$ + CHR$(65 + i)
    a$ = B256to128(text$)
    PRINT "B128: "; a$;
    LOCATE , 40
    b$ = B128to256(a$)
    PRINT "B256: "; b$
    SLEEP
NEXT
_DELAY 2
END

FUNCTION B256to128$ (text$)
    l = 8 * LEN(text$)
    DIM A(1 TO l) AS _UNSIGNED _BIT, b AS _UNSIGNED _BYTE
    'convert the text to the 8 bit array
    FOR i = 1 TO LEN(text$)
        b = ASC(text$, i)
        p = (i - 1) * 8 + 1
        FOR j = 0 TO 7
            IF b AND (2 ^ j) THEN A(p + j) = 1 ELSE A(p + j) = 0
        NEXT
    NEXT
    'convert the array to 7bit strings
    FOR i = 1 TO l STEP 7
        b = 0
        FOR j = 6 TO 0 STEP -1
            IF i + j < l THEN
                IF A(i + j) THEN b = b + (2 ^ j)
            END IF
        NEXT
        b = b + 45
        t$ = t$ + CHR$(b)
    NEXT
    IF LEN(t$) MOD 8 <> 0 AND RIGHT$(t$, 1) = "-" THEN t$ = LEFT$(t$, LEN(t$) - 1)
    B256to128 = t$
END FUNCTION

FUNCTION B128to256$ (text$)
    l = 7 * LEN(text$)
    DIM A(1 TO l) AS _UNSIGNED _BIT, b AS _UNSIGNED _BYTE
    'convert the text to the 8 bit array
    FOR i = 1 TO LEN(text$)
        b = ASC(text$, i) - 45
        FOR j = 0 TO 6
            p = p + 1: IF p > l THEN EXIT FOR
            IF b AND (2 ^ j) THEN A(p) = 1 ELSE A(p) = 0
        NEXT
    NEXT
    'convert the array to 8bit strings
    p = 0
    FOR i = 1 TO l STEP 8
        b = 0
        FOR j = 0 TO 7
            p = p + 1
            IF p > l THEN EXIT FOR
            IF A(p) THEN b = b + (2 ^ j)
        NEXT
        t$ = t$ + CHR$(b)
    NEXT
    B128to256 = t$
END FUNCTION
Reply
#5
Oh cool - you posted what I was asking about in my edit post.  Thanks!  I will look those over.

- Dav

Find my programs here in Dav's QB64 Corner
Reply
#6
https://qb64forum.alephc.xyz/index.php?topic=2017.0 -- There's where me and Petr played around with the encoding back in .net days.  Wink
Reply
#7
Thanks for the link!  That must have been when I was gone for a while back then, or I sure would have remembered reading that!

- Dav

Find my programs here in Dav's QB64 Corner
Reply
#8
Steve, the Base-128 is the most space efficient yet.  Will be great for stuffing files in code.  A 256 byte input will output 293 byte encoded. My Base85 outputs 321 bytes for the the same input, larger.  However, I seem to be getting an extra byte added converting 128 back to 256.

Here's what I'm testing with...

- Dav

Code: (Select All)

_ControlChr Off
For i = 0 To 255
    a1$ = a1$ + Chr$(i)
Next

Color 7: Print a1$
Color 11: Print Len(a1$); "bytes"
Print
a2$ = B256to128(a1$)

Color 7: Print a2$
Color 11: Print Len(a2$); "bytes"
Print
a3$ = B128to256(a2$) 'adding an extra byte at end?

Color 7: Print a3$
Color 11: Print Len(a3$); "bytes"


Function B256to128$ (text$)
    l = 8 * Len(text$)
    Dim A(1 To l) As _Unsigned _Bit, b As _Unsigned _Byte
    'convert the text to the 8 bit array
    For i = 1 To Len(text$)
        b = Asc(text$, i)
        p = (i - 1) * 8 + 1
        For j = 0 To 7
            If b And (2 ^ j) Then A(p + j) = 1 Else A(p + j) = 0
        Next
    Next
    'convert the array to 7bit strings
    For i = 1 To l Step 7
        b = 0
        For j = 6 To 0 Step -1
            If i + j < l Then
                If A(i + j) Then b = b + (2 ^ j)
            End If
        Next
        b = b + 45
        t$ = t$ + Chr$(b)
    Next
    If Len(t$) Mod 8 <> 0 And Right$(t$, 1) = "-" Then t$ = Left$(t$, Len(t$) - 1)
    B256to128 = t$
End Function

Function B128to256$ (text$)
    l = 7 * Len(text$)
    Dim A(1 To l) As _Unsigned _Bit, b As _Unsigned _Byte
    'convert the text to the 8 bit array
    For i = 1 To Len(text$)
        b = Asc(text$, i) - 45
        For j = 0 To 6
            p = p + 1: If p > l Then Exit For
            If b And (2 ^ j) Then A(p) = 1 Else A(p) = 0
        Next
    Next
    'convert the array to 8bit strings
    p = 0
    For i = 1 To l Step 8
        b = 0
        For j = 0 To 7
            p = p + 1
            If p > l Then Exit For
            If A(p) Then b = b + (2 ^ j)
        Next
        t$ = t$ + Chr$(b)
    Next
    B128to256 = t$
End Function

Find my programs here in Dav's QB64 Corner
Reply
#9
It's an extra 0 at the end from padding sometimes.  I've corrected it already in the version under my name in Prolific Programmers.  

Think of it as the following 14 bites (two 7-bit "words"):

1111111 1000000....

Now, you can see where we'd make two 8-bit characters with those:

11111111 000000..

It's extra padding left over, giving us a stray 0 at the end, which I didn't notice and ignore in my math above.  Wink
Reply
#10
Thanks, Steve!  I grabbed the newer ones you posted in your code area.  Been testing it.  The last byte is still decodes odd sometimes.  I've narrowed it down to higher ascii input.  Seems whenever the input data to be encoded contains ascii values of 128 or higher then the last byte of the decoded output will either be missing (only missing when 128 is present) or it's changed from the original (when 129 to 255 values).  Everything is decoded perfect if input data is 0-127 only. 

- Dav

Find my programs here in Dav's QB64 Corner
Reply




Users browsing this thread: 7 Guest(s)