Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
A hybrid pattern-based data compressor
#1
This is made as a learning experience, and not to try and replace DEFLATE or a real compressor.  It's a hybrid data compressor using pattern recognition in a data window size, sort of like LZ77 does, and VLQ encoding like base-128 uses.  It searches for patterns in the data and encodes those as references, compressing data.   It's better than RLE, but not that much.  Was a fun code to work on.  More info in the code.

- Dav

Code: (Select All)
'=========
'PCOMP.BAS
'=========
'Simple Pattern-Based data Compressor with VLQ encoding.
'Coded by Dav for QB64-PE, DEC/2025

'This is a hybrid pattern-based data compressor. It searches data for
'repeated patterns within a specified window size, similar to how LZ77
'does, but without maintaining a sliding window or dynamic dictionary.
'Instead, it performs pattern matching within a fixed window size.

'Pattern references (distance and length) are encoded using VLQ, using
'the most significant bit (MSB) as a flag. If MSB=1, then more bytes
'follow. If MSB=0, this is the last byte. This method is like base-128
'encoding and is also the method used in MIDI files.

'Unlike RLE, which detects only consecutive runs, this compressor can
'identify non-consecutive repeats, matching patterns that appeared
'earlier in the data.

'Note: This compressor works well with patterned data but is not good
'for compressing random data.

'=====================================================================

Screen 12: _FullScreen

'make string of data for testing
For o = 1 To 12
    orig$ = orig$ + _Trim$(Str$(Timer))
    orig$ = orig$ + Time$ + Date$
Next

cdat$ = Compress$(orig$)
ddat$ = Decompress$(cdat$)

_ControlChr Off

Print "==============================================================================="
Print "ORIGINAL STRING      Size ="; Len(orig$); "bytes"
Print "==============="
Print orig$
Print
Print "==============================================================================="
Print "COMPRESS STRING      Size ="; Len(cdat$); "bytes"
Print "==============="
Print cdat$
Print
Print "==============================================================================="
Print "DECOMPRESSED STRING  Size ="; Len(ddat$); "bytes"
Print "==================="
Print ddat$
Print
Print "==============================================================================="
Print
If orig$ = ddat$ Then
    Print "MATCH! Original and decompressed are the same!"
    If Len(cdat$) >= Len(orig$) Then
        Print " >> Sorry, this data could not be compressed!"
    Else
        Print "Ratio ------>: Compressed to"; _Round(Len(cdat$) / Len(orig$) * 100); "% of original!"
    End If
    Print
Else
    Print "ERROR! Original and decompressed don't match!"
End If
Print "============================================="

Function Compress$ (in$)
    p = 1
    While p <= Len(in$)
        matchlen = 0: dis = 0
        For f = _Max(1, p - 2048) To p - 1
            lenmatch = 0
            While (p + lenmatch <= Len(in$)) And (f + lenmatch <= Len(in$)) And (Mid$(in$, f + lenmatch, 1) = Mid$(in$, p + lenmatch, 1))
                lenmatch = lenmatch + 1: If lenmatch >= 64 Then Exit While
            Wend
            If lenmatch > matchlen Then
                matchlen = lenmatch
                dis = p - f
            End If
        Next
        If matchlen >= 4 Then
            out$ = out$ + Chr$(255)
            v = dis: d$ = ""
            Do
                bval = v Mod 128: v = v \ 128
                If v > 0 Then bval = bval Or 128
                d$ = d$ + Chr$(bval)
            Loop While v > 0
            out$ = out$ + d$
            v = matchlen: b$ = ""
            Do
                bval = v Mod 128: v = v \ 128
                If v > 0 Then bval = bval Or 128
                b$ = b$ + Chr$(bval)
            Loop While v > 0
            out$ = out$ + b$
            p = p + matchlen
        Else
            c$ = Mid$(in$, p, 1)
            bval = Asc(c$)
            If bval = 255 Then
                out$ = out$ + Chr$(0) + Chr$(2)
            ElseIf bval = 0 Then
                out$ = out$ + Chr$(0) + Chr$(1)
            Else
                out$ = out$ + c$
            End If
            p = p + 1
        End If
    Wend
    Compress$ = out$
End Function

Function Decompress$ (in$)
    p = 1
    While p <= Len(in$)
        v = Asc(Mid$(in$, p, 1))
        If v = 255 Then
            p = p + 1: vlq = 0: b = 0
            Do
                bVal = Asc(Mid$(in$, p, 1)): p = p + 1
                vlq = vlq + ((bVal And 127) * (2 ^ b))
                b = b + 7
            Loop While (bVal And 128) <> 0
            d = vlq: vlq = 0: b = 0
            Do
                bVal = Asc(Mid$(in$, p, 1)): p = p + 1
                vlq = vlq + ((bVal And 127) * (2 ^ b))
                b = b + 7
            Loop While (bVal And 128) <> 0
            l = vlq
            For k = 1 To l
                If Len(out$) - d + 1 < 1 Then Exit For
                out$ = out$ + Mid$(out$, Len(out$) - d + 1, 1)
            Next
        ElseIf v = 0 Then
            bVal = Asc(Mid$(in$, p + 1, 1))
            If bVal = 1 Then
                out$ = out$ + Chr$(0)
            ElseIf bVal = 2 Then
                out$ = out$ + Chr$(255)
            End If
            p = p + 2
        Else
            out$ = out$ + Chr$(v)
            p = p + 1
        End If
    Wend
    Decompress$ = out$
End Function

Find my programs here in Dav's QB64 Corner
Reply


Messages In This Thread
A hybrid pattern-based data compressor - by Dav - 12-23-2025, 03:51 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
Information Top4 Data Compression (compressor and decompressor included) JamesAlexander 7 2,255 09-22-2025, 12:22 PM
Last Post: Dragoncat
  Alt-Keys pattern eoredson 3 1,122 07-12-2023, 03:42 AM
Last Post: eoredson
  A single line function to modify MOD for better pattern recognition.... Pete 4 1,200 11-29-2022, 12:53 AM
Last Post: Pete

Forum Jump:


Users browsing this thread: 1 Guest(s)