Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Problem with creating a Huffman code tree
#1
Hello,

I started to write a code to read, count and sort a byte array.

Now I want to take the next step, but I don't know how to start exactly by creating a Huffman code tree with QB64 syntax.

I don't want a C or C++ solution, I want a QB64 solution for it.

Here is my current code:

In the 'test.txt' stand an example like 'aaavvrijgtmmspoe'
The file input can be anything. So all bytes should be considered from 0 to 255.
Code: (Select All)
'Huffman Encoding

TYPE assignment
  CHAR AS _UNSIGNED _BYTE '<-- ASCII Character
  COUNT AS _UNSIGNED LONG '<-- Frequenzy of ASCII Chars (Counter)
END TYPE

DIM File AS STRING

File = "test.txt"

OPEN File FOR BINARY ACCESS READ AS #1
REDIM MEM(LOF(1) - 1) AS _UNSIGNED _BYTE
GET #1, , MEM()
CLOSE #1

' Step 1 - Calc ASCII Char Frequenzy
REDIM Table(0) AS assignment
CALC_Table Table(), MEM()

COLOR 11: PRINT " STEP 1 *** Calc ASCII Frequenzy ***"
COLOR 7
FOR i = 0 TO UBOUND(Table)
  PRINT Table(i).CHAR; " - "; Table(i).COUNT
NEXT i

OPEN "test_TABLE.txt" FOR OUTPUT AS #1
FOR i = 0 TO UBOUND(table)
  PRINT #1, HEX$(Table(i).CHAR) + " - " + LTRIM$(STR$((Table(i).COUNT)))
NEXT i
CLOSE #1

'SLEEP

' Step 2 - Huffman Tree create



SUB InsertElement (Array() AS assignment, Index AS _UNSIGNED LONG)
  DIM I AS _UNSIGNED LONG
  DIM Empty AS assignment

  IF Index > (UBOUND(Array) + 1) THEN EXIT SUB

  REDIM _PRESERVE Array(UBOUND(Array) + 1) AS assignment

  FOR I = UBOUND(Array) - 1 TO Index STEP -1
    Array(I + 1) = Array(I)
  NEXT I

  Array(Index) = Empty
END SUB

SUB RemoveElement (Array() AS assignment, Index AS _UNSIGNED LONG)
  DIM I AS _UNSIGNED LONG

  FOR I = Index TO UBOUND(Array) - 1
    Array(I) = Array(I + 1)
  NEXT I

  REDIM _PRESERVE Array(UBOUND(Array) - 1) AS assignment
END SUB

SUB CALC_Table (Table() AS assignment, Array() AS _UNSIGNED _BYTE)
  ' Step 1 - Calc ASCII Char Frequenzy
  DIM i AS _UNSIGNED LONG ' <- Counter for Array
  DIM r AS _UNSIGNED LONG ' <- Counter for Table
  DIM TableIDX AS _UNSIGNED LONG ' <- MAX Index for Table
  DIM NewEntry AS _UNSIGNED _BYTE ' <- becomes 1 if character is missing from table

  Table(TableIDX).CHAR = Array(i)
  FOR i = 0 TO UBOUND(Array)
    FOR r = 0 TO UBOUND(Table)

      ' If the character is already in the table,
      ' then increase the number of characters by 1,
      ' otherwise create a new entry.      '
      IF Array(i) = Table(r).CHAR THEN
        Table(r).COUNT = Table(r).COUNT + 1
        NewEntry = 0
        EXIT FOR
      ELSE
        NewEntry = 1
      END IF
    NEXT r

    ' New Entry in Table
    IF NewEntry = 1 THEN
      TableIDX = TableIDX + 1
      REDIM _PRESERVE Table(TableIDX) AS assignment
      Table(TableIDX).CHAR = Array(i)
      Table(TableIDX).COUNT = 1
    END IF
  NEXT i

  ' Sort table by counter of characters
  QUICKSORT Table(), LBOUND(Table), UBOUND(Table), 1
END SUB

SUB QUICKSORT (Array() AS assignment, LB AS _UNSIGNED LONG, UB AS _UNSIGNED LONG, Mode AS _UNSIGNED _BYTE)
  DIM P1 AS _UNSIGNED LONG
  DIM P2 AS _UNSIGNED LONG
  DIM REF AS assignment
  DIM temp AS assignment

  P1 = LB
  P2 = UB
  REF.CHAR = Array((P1 + P2) \ 2).CHAR
  REF.COUNT = Array((P1 + P2) \ 2).COUNT

  DO

    SELECT CASE Mode
      CASE 0:
        DO WHILE Array(P1).CHAR < REF.CHAR
          P1 = P1 + 1
        LOOP

        DO WHILE Array(P2).CHAR > REF.CHAR
          P2 = P2 - 1
        LOOP
      CASE 1:
        DO WHILE Array(P1).COUNT < REF.COUNT
          P1 = P1 + 1
        LOOP

        DO WHILE Array(P2).COUNT > REF.COUNT
          P2 = P2 - 1
        LOOP
    END SELECT

    IF P1 <= P2 THEN
      temp = Array(P1)
      Array(P1) = Array(P2)
      Array(P2) = temp

      P1 = P1 + 1
      P2 = P2 - 1
    END IF

  LOOP WHILE P1 <= P2

  IF LB < P2 THEN CALL QUICKSORT(Array(), LB, P2, Mode)
  IF P1 < UB THEN CALL QUICKSORT(Array(), P1, UB, Mode)
END SUB
Reply
#2
I haven't closely read all of your code, but it looks like you have the right idea for creating the table of character frequencies. For creating the actual Huffman tree you can just make an array of nodes, and in place of where you'd normally use pointers you just store an index into the array of nodes. Once you do that it's reasonably easy to use the typical algorithm to make the huffman tree. So it would work something like this (I didn't test this):

Code: (Select All)
' I think this is all you need
Type HuffmanNode
  Char As _Unsigned _Byte
  Count As _Unsigned Long
  Left As Long ' 0-bit path
  Right As Long ' 1-bit path
End Type

ReDim tree(20) As HuffmanNode ' I just picked a random size, in practice you'll resize this as you create more nodes

tree(2).Char = ASC("A") ' Leaf node
tree(2).Count = 10
tree(2).Left = -1 ' -1 for the indexes indicates a leaf node
tree(2).Right = -1

tree(3).Char = ASC("B")
tree(3).Count = 15
tree(3).Left = -1
tree(3).Right = -1

' This is an internal node created when creating the huffman tree
tree(1).Char = 0 ' doesn't actually have a char
tree(1).Count = 25 ' Addition of the counts of left/right
tree(1).Left = 2 ' The "A" leaf node
tree(1).Right = 3 ' The "B" leaf node

' When traversing the tree, you index the array with the .Left and .Right values
Print "0-bit path of internal node 1: "; tree(tree(1).Left)
Print "1-bit path of internal node 1: "; tree(tree(1).Right)

When creating the huffman tree you'll initialize that `tree()` array with leaf nodes created from your existing `Table()` array (and really, you could probably combine them into one array with a little work).

One catch is that you can't sort that array to find the next two nodes to handle, because sorting will change the position of the nodes in the array and make all your `.Left` and `.Right` entries wrong. To fix that you can just use another array of indexes and sort that (which is what you'd typically do in C anyway), like this:

Code: (Select All)
ReDim HuffmanQueue(256) As Long

' Assuming the starting leafnodes are entries 1 to 256 in tree()
For i = 1 To 256
  HuffmanQueue(i) = i
Next

' When sorting you do this kind of thing
If tree(HuffmanQueue(x)).Count > tree(HuffmanQueue(y)).Count Then
  ' Swapping two entries in the queue
  z = HuffmanQueue(y)
  HuffmanQueue(y) = HuffmanQueue(x)
  HuffmanQueue(x) = z
End If

' Create a new tree() node and assign it into HuffmanQueue()

tree(nextNode).Left = HuffmanQueue(1)
tree(nextNode).Right = HuffmanQueue(2)
tree(nextNode).Count = tree(HuffmanQueue(1)).Count + tree(HuffmanQueue(2)).Count

HuffmanQueue(1) = nextNode
HuffmanQueue(2) = -1
nextNode = nextNode + 1

' The above requires your sort to ignore the -1 entries.
' You could also just shift all the entries in the HuffmanQueue() array up one so that the HuffmanQueue(2) entry is gone and the array is one less in size.

To actually build the huffman tree you will sort the `HuffmanQueue()` array, then take the first two entries and build a new `HuffmanNode` that has those two entries as its `.Left` and `.Right`. Then you replace one of the `HuffmanQueue()` entries with your newly created node, remove the other one (Maybe set it to `-1`, or shift everything over and reduce the size of the array), and then sort and remove again until only one node is left in `HuffmanQueue()` (either due to reduced size, or all of them are `-1`).
Reply
#3
Hi @SagaraS Welcome to the forum!

What's a Hoffman Tree and why in the world would anyone want one? 

Asking such questions is how I got into Pathfinder and an alternate Paint3 code. You never know what rabbit hole leads to buried treasure Smile
b = b + ...
Reply
#4
(07-02-2023, 12:01 PM)bplus Wrote: What's a Hoffman Tree and why in the world would anyone want one? 

Asking such questions is how I got into Pathfinder and an alternate Paint3 code. You never know what rabbit hole leads to buried treasure Smile
A Huffman tree is a data structure that maps bit sequences to bytes/characters. Each node in the tree has a left and right node, representing either a 1 or a 0. If you follow a particular left/right path through the tree, you eventually end up at a leaf node representing an actual byte or character (Ex. 'A'). The particular left/right path then tells you the bit pattern mapping to that character.

In this case it's being used to do compression. SagaraS is creating the huffman tree via analyzing the frequency of each character in the text he is compressing, and then the huffman tree is built such that higher frequency characters have shorter bit sequences (and lower frequency characters have longer ones). You then replace each character in the original text with the bit sequence it maps to in the huffman tree, which results in less bits than the original text. Someone else can then decompress the bits into the original text via using the huffman tree to read each bit and follow the tree until they find a leaf node. Then they output the character at the leaf node and start over at the next bit.
Reply
#5
DSMan195276
that's an excellent explanation, brief and eloquent Smile
Reply
#6
Did you guys know that morse code is a huffman/binary tree?  Absolutely blew my mind when I learned that fact a few years back!

[Image: morse-code-binary-tree-bg.png]
Reply
#7
(07-02-2023, 03:34 PM)Jack Wrote: DSMan195276
that's an excellent explanation, brief and eloquent Smile

Well sure Matt knows but describes well too!

I was hoping our new guy might respond and give us more a glimpse of who he is. I mean I could have looked it up on Wiki. Smile but applications of algos are very interesting.

BTW is all that fooling around really necessary in this day and age, I know back in DOS days space was precious but these days space is cheap and speed is king!
b = b + ...
Reply
#8
(07-02-2023, 02:12 AM)DSMan195276 Wrote: I haven't closely read all of your code, but it looks like you have the right idea for creating the table of character frequencies. For creating the actual Huffman tree you can just make an array of nodes, and in place of where you'd normally use pointers you just store an index into the array of nodes. Once you do that it's reasonably easy to use the typical algorithm to make the huffman tree. ...
(07-02-2023, 12:01 PM)bplus Wrote: Hi @SagaraS Welcome to the forum!

What's a Hoffman Tree and why in the world would anyone want one? 

Asking such questions is how I got into Pathfinder and an alternate Paint3 code. You never know what rabbit hole leads to buried treasure Smile
Sorry for my late reply.

How the tree is built and how it works on a piece of paper is simple.
But implementing the whole thing is harder for me.
Especially these recursive matters.

After all, the whole thing should automatically generate the code for the respective ASCII characters at the end.
Had found many examples on the net, but didn't really know where to start.

@'DSMan1s276' Thanks for your post with the sample code. Now I have a plan how I can implement it for my purposes. ^^

@'bplus'
On the one hand, I need this for my own interest. Would like to learn that. On the other hand, I would like to use it to reduce the size of small tables or small images.

I have already managed to get a working RLE compression for images. Based on Targa images.

But if the entropy is too big, it doesn't do me much good. Therefore an additional Huffman Compression.

This means that not everyone can manipulate the data immediately.
And the data is smaller in size.
That's my goal at the moment.

Have another QB64 project where I would like to include this for the resources.

Here are some pictures of it:
[Image: pic171i5h.png][Image: pic2a0cs0.png][Image: pic3rgf2x.png][Image: pic4uefxo.png][Image: pic5q0ef9.png]

The helpfile on the last image is a B800 text format. 1 byte char + 1 byte for foreground and background color. So 2 bytes per character.
The helpfile exist 1x in german an 1x in english. The total size for the helpfile is 28 KB. This * 2 is 56 KB.

Then i have 3 images in the ressource file. About 212 KB.

mouse cursor + my own 8x8 font. About 2,2 KB.

I have brought all resources into an archive which is foot-controlled.
I can attach this archive to the EXE file.

Thus, the EXE file with the archive is standalone.

And I also like to apologize for my bad English. Actually, German is my mother tongue. ^^
Reply
#9
Welcome to the forums. You seem to have a nice project going.
Reply
#10
@ SagaraS One thing which you might want to keep in mind:  QB64PE has built-in compression routines already.   If you can simply use _INFLATE and _DEFLATE for your file size reduction, then you may not have to struggle to reinvent the wheel.  (Or in this case, huffman compression trees and routines.)
Reply




Users browsing this thread: 3 Guest(s)