Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A hybrid pattern-based data compressor
#1
This is made as a learning experience, and not to try and replace DEFLATE or a real compressor.  It's a hybrid data compressor using pattern recognition in a data window size, sort of like LZ77 does, and VLQ encoding like base-128 uses.  It searches for patterns in the data and encodes those as references, compressing data.   It's better than RLE, but not that much.  Was a fun code to work on.  More info in the code.

- Dav

Code: (Select All)
'=========
'PCOMP.BAS
'=========
'Simple Pattern-Based data Compressor with VLQ encoding.
'Coded by Dav for QB64-PE, DEC/2025

'This is a hybrid pattern-based data compressor. It searches data for
'repeated patterns within a specified window size, similar to how LZ77
'does, but without maintaining a sliding window or dynamic dictionary.
'Instead, it performs pattern matching within a fixed window size.

'Pattern references (distance and length) are encoded using VLQ, using
'the most significant bit (MSB) as a flag. If MSB=1, then more bytes
'follow. If MSB=0, this is the last byte. This method is like base-128
'encoding and is also the method used in MIDI files.

'Unlike RLE, which detects only consecutive runs, this compressor can
'identify non-consecutive repeats, matching patterns that appeared
'earlier in the data.

'Note: This compressor works well with patterned data but is not good
'for compressing random data.

'=====================================================================

Screen 12: _FullScreen

'make string of data for testing
For o = 1 To 12
    orig$ = orig$ + _Trim$(Str$(Timer))
    orig$ = orig$ + Time$ + Date$
Next

cdat$ = Compress$(orig$)
ddat$ = Decompress$(cdat$)

_ControlChr Off

Print "==============================================================================="
Print "ORIGINAL STRING      Size ="; Len(orig$); "bytes"
Print "==============="
Print orig$
Print
Print "==============================================================================="
Print "COMPRESS STRING      Size ="; Len(cdat$); "bytes"
Print "==============="
Print cdat$
Print
Print "==============================================================================="
Print "DECOMPRESSED STRING  Size ="; Len(ddat$); "bytes"
Print "==================="
Print ddat$
Print
Print "==============================================================================="
Print
If orig$ = ddat$ Then
    Print "MATCH! Original and decompressed are the same!"
    If Len(cdat$) >= Len(orig$) Then
        Print " >> Sorry, this data could not be compressed!"
    Else
        Print "Ratio ------>: Compressed to"; _Round(Len(cdat$) / Len(orig$) * 100); "% of original!"
    End If
    Print
Else
    Print "ERROR! Original and decompressed don't match!"
End If
Print "============================================="

Function Compress$ (in$)
    p = 1
    While p <= Len(in$)
        matchlen = 0: dis = 0
        For f = _Max(1, p - 2048) To p - 1
            lenmatch = 0
            While (p + lenmatch <= Len(in$)) And (f + lenmatch <= Len(in$)) And (Mid$(in$, f + lenmatch, 1) = Mid$(in$, p + lenmatch, 1))
                lenmatch = lenmatch + 1: If lenmatch >= 64 Then Exit While
            Wend
            If lenmatch > matchlen Then
                matchlen = lenmatch
                dis = p - f
            End If
        Next
        If matchlen >= 4 Then
            out$ = out$ + Chr$(255)
            v = dis: d$ = ""
            Do
                bval = v Mod 128: v = v \ 128
                If v > 0 Then bval = bval Or 128
                d$ = d$ + Chr$(bval)
            Loop While v > 0
            out$ = out$ + d$
            v = matchlen: b$ = ""
            Do
                bval = v Mod 128: v = v \ 128
                If v > 0 Then bval = bval Or 128
                b$ = b$ + Chr$(bval)
            Loop While v > 0
            out$ = out$ + b$
            p = p + matchlen
        Else
            c$ = Mid$(in$, p, 1)
            bval = Asc(c$)
            If bval = 255 Then
                out$ = out$ + Chr$(0) + Chr$(2)
            ElseIf bval = 0 Then
                out$ = out$ + Chr$(0) + Chr$(1)
            Else
                out$ = out$ + c$
            End If
            p = p + 1
        End If
    Wend
    Compress$ = out$
End Function

Function Decompress$ (in$)
    p = 1
    While p <= Len(in$)
        v = Asc(Mid$(in$, p, 1))
        If v = 255 Then
            p = p + 1: vlq = 0: b = 0
            Do
                bVal = Asc(Mid$(in$, p, 1)): p = p + 1
                vlq = vlq + ((bVal And 127) * (2 ^ b))
                b = b + 7
            Loop While (bVal And 128) <> 0
            d = vlq: vlq = 0: b = 0
            Do
                bVal = Asc(Mid$(in$, p, 1)): p = p + 1
                vlq = vlq + ((bVal And 127) * (2 ^ b))
                b = b + 7
            Loop While (bVal And 128) <> 0
            l = vlq
            For k = 1 To l
                If Len(out$) - d + 1 < 1 Then Exit For
                out$ = out$ + Mid$(out$, Len(out$) - d + 1, 1)
            Next
        ElseIf v = 0 Then
            bVal = Asc(Mid$(in$, p + 1, 1))
            If bVal = 1 Then
                out$ = out$ + Chr$(0)
            ElseIf bVal = 2 Then
                out$ = out$ + Chr$(255)
            End If
            p = p + 2
        Else
            out$ = out$ + Chr$(v)
            p = p + 1
        End If
    Wend
    Decompress$ = out$
End Function

Find my programs here in Dav's QB64 Corner
Reply
#2
Very impressive - especially to someone whos nearest thing to data compression is using LOL and ASAP in my E-mails!   Big Grin Nice work.
Of all the places on Earth, and all the planets in the Universe, I'd rather live here (Perth, Western Australia.) Big Grin
Please visit my Website at: http://oldendayskids.blogspot.com/
Reply
#3
(12-24-2025, 05:28 AM)PhilOfPerth Wrote: Very impressive - especially to someone whos nearest thing to data compression is using LOL and ASAP in my E-mails!   Big Grin Nice work.

lol.  Thanks, Phil.

Tested it by compressing the qb64pe.bas source and compared to _DEFLATE$.  Got the following results:

original size: 1,084,708 bytes
using PCOM:  383,271 bytes
_DEFLATE$:  176,435 bytes

Not too shabby, but PCOM took a LONG time to compress the file.  I'll work on it and try to improve compression, or at least speed it up.  Maybe replace the string concatenating building with the MID$ swapping method instead that @RhoSigma suggested on my base-85 encoder - that speed up the encoding/decoding dramatically.

- Dav

Find my programs here in Dav's QB64 Corner
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
Information Top4 Data Compression (compressor and decompressor included) JamesAlexander 7 2,197 09-22-2025, 12:22 PM
Last Post: Dragoncat
  Alt-Keys pattern eoredson 3 1,104 07-12-2023, 03:42 AM
Last Post: eoredson
  A single line function to modify MOD for better pattern recognition.... Pete 4 1,166 11-29-2022, 12:53 AM
Last Post: Pete

Forum Jump:


Users browsing this thread: 1 Guest(s)