Understanding Huffman Coding

Lima Jmr
4 min readApr 9, 2022

What is Huffman coding in algorithms? Small guide to the coding algorithm

This algorithm, developed by David Huffman, an American pioneer in the field of Computer Science, has been a go-to for lossless data encoding by many developers and researchers alike since 1951. It was developed by David Huffman, an American pioneer in the field of Computer Science, especially known for his work in the topic we’re discussing in the blog. He was a big contributor in the field of mathematical origami as well.

Coders must solve algorithms quickly and effectively, which is where Huffman coding comes in, allowing encoders and developers working on the algorithm to work more efficiently. And in this post, you’ll discover how to use this strategy to solve algorithms, as well as where and when to use it.

First things first, what is Huffman coding? —

With variable-duration codes, Huffman coding is a well-known technique for reducing records. The method generates a hard and fast of variable-duration codewords with the least common duration and assigns them to the symbols given a hard and fast of records characters and their frequencies of occurrence. Huffman coding is the foundation for a number of programs that are used on well-known platforms. Some algorithms use the Huffman algorithm alone, while others use it as part of a multi — step compression process. The Huffman technique is similar to the Shannon — Fano coding technique, which was proposed independently by Claude Shannon and Robert Fano in the late 1940s. It generally provides higher codes, and it produces them in the very same way that the Shannon-Fano algorithm operates. It generates excellent variable-duration codes, with the symbols’ odds being negative powers of the number 2. The primary distinction between the two systems is that Shannon — Fano builds its codes from top to bottom (and the bits of each codeword are created from left to right), whereas Huffman builds a code tree from the bottom up (and the bits of every codeword are produced from the right to the left).

How is it used?

Let’s take the example of a question that needs to be solved to get the average code length. Because Huffman Coding algorithm uses the concept of a prefix code to eliminate any ambiguity in the decoding process, it firstly has to form a tree using the character’s frequencies before it starts generating the possible code for each character.

To implement and apply this solving algorithm, we have to use the following steps:-

  1. Start the program or the code(algorithm) and go through what the values of the variables and their respective frequencies are. Then there is a summation of the frequencies at the end.
  2. Secondly, we have to arrange the source symbols in descending order of their values or probabilities in the form of a tree.
  3. Merge the two of the lowest frequencies and then put the symbols into one subgroup.
  4. We now have to assign the values of zero’s and one’s to the top and bottom branches of the frequencies of the tree figure.
  5. Then we check if there is more than one node that is not unmerged and if there is, then we go back to the beginning of step 2 and repeat it until we get the results we desire. If there isn’t then we proceed into the final step.
  6. Lastly, we stop the coding then read the transition bits on the branches starting from top to bottom to generate the code words. Then we average the total sum of the values hence resulting in the answer that we desire.

Let’s take the help of an example to clarify things further —

Example:- Q. Say a text string of 47 letters contains 10 a’s , 20 b’s, 2 c’s and 15 d’s. Using Huffman encoding schema, we have to compute the code for the letters a, b, c, and d respectively. After that, we have to clearly mention the final hub of man tree along with the average code length.

fig. answer sheet.

The figure given above is the solution sheet for the question that was used as an example. As we can see the question given above follows the algorithm steps as was mentioned above and the question asked to find the average of the string, which the figure shows as the value of Average= 1.829.

This is only a small example of how the coding works and there is a wide variety of how it is implemented in several algorithm solving methods, just that there is a need for practice to correctly and efficiently use it.

When to use this coding technique-

Huffman’s coding can be used on the algorithms that require a faster and greedier approach(problems that require Greedy solution)to acquiring the results in a fast pace. It is also used in traditional compression formats such as PKZIP and other known formats. The only possible case for Huffman encoding can happen in a bad manner is when the probability of the most likely symbol exceeds the value of 2^-1 = 0.5 by far, hence making the upper limit of inefficiency unbounded, thus resulting in a very unbalanced output which disrupts the entire code. Other than this, the technique is vastly used by many and is the medium used by many to solve algorithms.

--

--