Login

SagaraS · 07-01-2023, 09:48 PM

Hello,

I started to write a code to read, count and sort a byte array.

Now I want to take the next step, but I don't know how to start exactly by creating a Huffman code tree with QB64 syntax.

I don't want a C or C++ solution, I want a QB64 solution for it.

Here is my current code:

In the 'test.txt' stand an example like 'aaavvrijgtmmspoe'
The file input can be anything. So all bytes should be considered from 0 to 255.

Code: (Select All)
'Huffman Encoding

TYPE assignment

  CHAR AS _UNSIGNED _BYTE '<-- ASCII Character

  COUNT AS _UNSIGNED LONG '<-- Frequenzy of ASCII Chars (Counter)

END TYPE

DIM File AS STRING

File = "test.txt"

OPEN File FOR BINARY ACCESS READ AS #1

REDIM MEM(LOF(1) - 1) AS _UNSIGNED _BYTE

GET #1, , MEM()

CLOSE #1

' Step 1 - Calc ASCII Char Frequenzy

REDIM Table(0) AS assignment

CALC_Table Table(), MEM()

COLOR 11: PRINT " STEP 1 *** Calc ASCII Frequenzy ***"

COLOR 7

FOR i = 0 TO UBOUND(Table)

  PRINT Table(i).CHAR; " - "; Table(i).COUNT

NEXT i

OPEN "test_TABLE.txt" FOR OUTPUT AS #1

FOR i = 0 TO UBOUND(table)

  PRINT #1, HEX$(Table(i).CHAR) + " - " + LTRIM$(STR$((Table(i).COUNT)))

NEXT i

CLOSE #1

'SLEEP

' Step 2 - Huffman Tree create

SUB InsertElement (Array() AS assignment, Index AS _UNSIGNED LONG)

  DIM I AS _UNSIGNED LONG

  DIM Empty AS assignment

  IF Index > (UBOUND(Array) + 1) THEN EXIT SUB

  REDIM _PRESERVE Array(UBOUND(Array) + 1) AS assignment

  FOR I = UBOUND(Array) - 1 TO Index STEP -1

    Array(I + 1) = Array(I)

  NEXT I

  Array(Index) = Empty

END SUB

SUB RemoveElement (Array() AS assignment, Index AS _UNSIGNED LONG)

  DIM I AS _UNSIGNED LONG

  FOR I = Index TO UBOUND(Array) - 1

    Array(I) = Array(I + 1)

  NEXT I

  REDIM _PRESERVE Array(UBOUND(Array) - 1) AS assignment

END SUB

SUB CALC_Table (Table() AS assignment, Array() AS _UNSIGNED _BYTE)

  ' Step 1 - Calc ASCII Char Frequenzy

  DIM i AS _UNSIGNED LONG ' <- Counter for Array

  DIM r AS _UNSIGNED LONG ' <- Counter for Table

  DIM TableIDX AS _UNSIGNED LONG ' <- MAX Index for Table

  DIM NewEntry AS _UNSIGNED _BYTE ' <- becomes 1 if character is missing from table

  Table(TableIDX).CHAR = Array(i)

  FOR i = 0 TO UBOUND(Array)

    FOR r = 0 TO UBOUND(Table)

      ' If the character is already in the table,

      ' then increase the number of characters by 1,

      ' otherwise create a new entry.      '

      IF Array(i) = Table(r).CHAR THEN

        Table(r).COUNT = Table(r).COUNT + 1

        NewEntry = 0

        EXIT FOR

      ELSE

        NewEntry = 1

      END IF

    NEXT r

    ' New Entry in Table

    IF NewEntry = 1 THEN

      TableIDX = TableIDX + 1

      REDIM _PRESERVE Table(TableIDX) AS assignment

      Table(TableIDX).CHAR = Array(i)

      Table(TableIDX).COUNT = 1

    END IF

  NEXT i

  ' Sort table by counter of characters

  QUICKSORT Table(), LBOUND(Table), UBOUND(Table), 1

END SUB

SUB QUICKSORT (Array() AS assignment, LB AS _UNSIGNED LONG, UB AS _UNSIGNED LONG, Mode AS _UNSIGNED _BYTE)

  DIM P1 AS _UNSIGNED LONG

  DIM P2 AS _UNSIGNED LONG

  DIM REF AS assignment

  DIM temp AS assignment

  P1 = LB

  P2 = UB

  REF.CHAR = Array((P1 + P2) \ 2).CHAR

  REF.COUNT = Array((P1 + P2) \ 2).COUNT

  DO

    SELECT CASE Mode

      CASE 0:

        DO WHILE Array(P1).CHAR < REF.CHAR

          P1 = P1 + 1

        LOOP

        DO WHILE Array(P2).CHAR > REF.CHAR

          P2 = P2 - 1

        LOOP

      CASE 1:

        DO WHILE Array(P1).COUNT < REF.COUNT

          P1 = P1 + 1

        LOOP

        DO WHILE Array(P2).COUNT > REF.COUNT

          P2 = P2 - 1

        LOOP

    END SELECT

    IF P1 <= P2 THEN

      temp = Array(P1)

      Array(P1) = Array(P2)

      Array(P2) = temp

      P1 = P1 + 1

      P2 = P2 - 1

    END IF

  LOOP WHILE P1 <= P2

  IF LB < P2 THEN CALL QUICKSORT(Array(), LB, P2, Mode)

  IF P1 < UB THEN CALL QUICKSORT(Array(), P1, UB, Mode)

END SUB

**DSMan195276** · 07-02-2023, 02:12 AM

I haven't closely read all of your code, but it looks like you have the right idea for creating the table of character frequencies. For creating the actual Huffman tree you can just make an array of nodes, and in place of where you'd normally use pointers you just store an index into the array of nodes. Once you do that it's reasonably easy to use the typical algorithm to make the huffman tree. So it would work something like this (I didn't test this):

Code: (Select All)

' I think this is all you need

Type HuffmanNode

  Char As _Unsigned _Byte

  Count As _Unsigned Long

  Left As Long ' 0-bit path

  Right As Long ' 1-bit path

End Type



ReDim tree(20) As HuffmanNode ' I just picked a random size, in practice you'll resize this as you create more nodes



tree(2).Char = ASC("A") ' Leaf node

tree(2).Count = 10

tree(2).Left = -1 ' -1 for the indexes indicates a leaf node

tree(2).Right = -1



tree(3).Char = ASC("B")

tree(3).Count = 15

tree(3).Left = -1

tree(3).Right = -1



' This is an internal node created when creating the huffman tree

tree(1).Char = 0 ' doesn't actually have a char

tree(1).Count = 25 ' Addition of the counts of left/right

tree(1).Left = 2 ' The "A" leaf node

tree(1).Right = 3 ' The "B" leaf node



' When traversing the tree, you index the array with the .Left and .Right values

Print "0-bit path of internal node 1: "; tree(tree(1).Left)

Print "1-bit path of internal node 1: "; tree(tree(1).Right)

When creating the huffman tree you'll initialize that `tree()` array with leaf nodes created from your existing `Table()` array (and really, you could probably combine them into one array with a little work).

One catch is that you can't sort that array to find the next two nodes to handle, because sorting will change the position of the nodes in the array and make all your `.Left` and `.Right` entries wrong. To fix that you can just use another array of indexes and sort that (which is what you'd typically do in C anyway), like this:

Code: (Select All)

ReDim HuffmanQueue(256) As Long



' Assuming the starting leafnodes are entries 1 to 256 in tree()

For i = 1 To 256

  HuffmanQueue(i) = i

Next



' When sorting you do this kind of thing

If tree(HuffmanQueue(x)).Count > tree(HuffmanQueue(y)).Count Then

  ' Swapping two entries in the queue

  z = HuffmanQueue(y)

  HuffmanQueue(y) = HuffmanQueue(x)

  HuffmanQueue(x) = z

End If



' Create a new tree() node and assign it into HuffmanQueue()



tree(nextNode).Left = HuffmanQueue(1)

tree(nextNode).Right = HuffmanQueue(2)

tree(nextNode).Count = tree(HuffmanQueue(1)).Count + tree(HuffmanQueue(2)).Count



HuffmanQueue(1) = nextNode

HuffmanQueue(2) = -1

nextNode = nextNode + 1



' The above requires your sort to ignore the -1 entries.

' You could also just shift all the entries in the HuffmanQueue() array up one so that the HuffmanQueue(2) entry is gone and the array is one less in size.

To actually build the huffman tree you will sort the `HuffmanQueue()` array, then take the first two entries and build a new `HuffmanNode` that has those two entries as its `.Left` and `.Right`. Then you replace one of the `HuffmanQueue()` entries with your newly created node, remove the other one (Maybe set it to `-1`, or shift everything over and reduce the size of the array), and then sort and remove again until only one node is left in `HuffmanQueue()` (either due to reduced size, or all of them are `-1`).

bplus · (This post was last modified: 07-02-2023, 12:05 PM by bplus.)

Hi @SagaraS Welcome to the forum!

What's a Hoffman Tree and why in the world would anyone want one?

Asking such questions is how I got into Pathfinder and an alternate Paint3 code. You never know what rabbit hole leads to buried treasure Smile

**DSMan195276** · 07-02-2023, 03:16 PM

(07-02-2023, 12:01 PM)bplus Wrote: What's a Hoffman Tree and why in the world would anyone want one?

Asking such questions is how I got into Pathfinder and an alternate Paint3 code. You never know what rabbit hole leads to buried treasure

A Huffman tree is a data structure that maps bit sequences to bytes/characters. Each node in the tree has a left and right node, representing either a 1 or a 0. If you follow a particular left/right path through the tree, you eventually end up at a leaf node representing an actual byte or character (Ex. 'A'). The particular left/right path then tells you the bit pattern mapping to that character.

In this case it's being used to do compression. SagaraS is creating the huffman tree via analyzing the frequency of each character in the text he is compressing, and then the huffman tree is built such that higher frequency characters have shorter bit sequences (and lower frequency characters have longer ones). You then replace each character in the original text with the bit sequence it maps to in the huffman tree, which results in less bits than the original text. Someone else can then decompress the bits into the original text via using the huffman tree to read each bit and follow the tree until they find a leaf node. Then they output the character at the leaf node and start over at the next bit.

Jack · 07-02-2023, 03:34 PM

DSMan195276
that's an excellent explanation, brief and eloquent Smile

**SMcNeill** · 07-02-2023, 04:58 PM

Did you guys know that morse code is a huffman/binary tree? Absolutely blew my mind when I learned that fact a few years back!

[Image: morse-code-binary-tree-bg.png]

bplus · (This post was last modified: 07-02-2023, 05:06 PM by bplus.)

(07-02-2023, 03:34 PM)Jack Wrote: DSMan195276
that's an excellent explanation, brief and eloquent

Well sure Matt knows but describes well too!

I was hoping our new guy might respond and give us more a glimpse of who he is. I mean I could have looked it up on Wiki. Smile

but applications of algos are very interesting.

BTW is all that fooling around really necessary in this day and age, I know back in DOS days space was precious but these days space is cheap and speed is king!

SagaraS · 07-02-2023, 08:50 PM

(07-02-2023, 02:12 AM)DSMan195276 Wrote: I haven't closely read all of your code, but it looks like you have the right idea for creating the table of character frequencies. For creating the actual Huffman tree you can just make an array of nodes, and in place of where you'd normally use pointers you just store an index into the array of nodes. Once you do that it's reasonably easy to use the typical algorithm to make the huffman tree. ...

(07-02-2023, 12:01 PM)bplus Wrote: Hi @SagaraS Welcome to the forum!

What's a Hoffman Tree and why in the world would anyone want one?

Asking such questions is how I got into Pathfinder and an alternate Paint3 code. You never know what rabbit hole leads to buried treasure

Sorry for my late reply.

How the tree is built and how it works on a piece of paper is simple.
But implementing the whole thing is harder for me.
Especially these recursive matters.

After all, the whole thing should automatically generate the code for the respective ASCII characters at the end.
Had found many examples on the net, but didn't really know where to start.

@'DSMan1s276' Thanks for your post with the sample code. Now I have a plan how I can implement it for my purposes. ^^

@'bplus'
On the one hand, I need this for my own interest. Would like to learn that. On the other hand, I would like to use it to reduce the size of small tables or small images.

I have already managed to get a working RLE compression for images. Based on Targa images.

But if the entropy is too big, it doesn't do me much good. Therefore an additional Huffman Compression.

This means that not everyone can manipulate the data immediately.
And the data is smaller in size.
That's my goal at the moment.

Have another QB64 project where I would like to include this for the resources.

Here are some pictures of it:
[Image: pic171i5h.png]

The helpfile on the last image is a B800 text format. 1 byte char + 1 byte for foreground and background color. So 2 bytes per character.
The helpfile exist 1x in german an 1x in english. The total size for the helpfile is 28 KB. This * 2 is 56 KB.

Then i have 3 images in the ressource file. About 212 KB.

mouse cursor + my own 8x8 font. About 2,2 KB.

I have brought all resources into an archive which is foot-controlled.
I can attach this archive to the EXE file.

Thus, the EXE file with the archive is standalone.

And I also like to apologize for my bad English. Actually, German is my mother tongue. ^^

mnrvovrfc · 07-02-2023, 10:07 PM

Welcome to the forums. You seem to have a nice project going.

**SMcNeill** · 07-03-2023, 06:07 AM

@ SagaraS One thing which you might want to keep in mind: QB64PE has built-in compression routines already. If you can simply use _INFLATE and _DEFLATE for your file size reduction, then you may not have to struggle to reinvent the wheel. (Or in this case, huffman compression trees and routines.)

Login
Username/Email:
Password:	Lost Password?
	Remember me