Run Length Encoding (RLE)

Run-Length Encoding (RLE) is a simple and highly efficient lossless data compression technique. It works by replacing consecutive, repeating characters (or "runs") with a single character and a count of how many times it repeats. For example, the string AAABBBBBBBAAA transforms into 3A7B3A.

Why Use RLE?

RLE is incredibly useful for data that contains long sequences of repeating elements. While it isn't great for standard text documents (where letters rarely repeat in long chains), it shines in:

Image Compression: Especially for simple graphics, icons, or black-and-white line drawings where the same pixel color extends across a large area.
File Formats: RLE is the foundation for encoding in legacy formats like the TGA image file and plays a role in scientific data like computer vision segmentation masks.

Because RLE is lossless, no data is lost during the compression process; the original data can be perfectly reconstructed when decompressed.

Implementation

Let’s look at how to build a basic RLE encoder and decoder from scratch in Python.

1. The Encoding Function

Our encoder will iterate through a string, keep track of the current character, and count its consecutive occurrences. When the character changes, it appends the count and the character to our output list.

def rle_encode(data):
    if not data:
        return ""

    encoded_chars = []
    count = 1
    
    # Iterate starting from the second character
    for i in range(1, len(data)):
        if data[i] == data[i - 1]:
            count += 1
        else:
            encoded_chars.append(f"{count}{data[i - 1]}")
            count = 1
            
    # Don't forget to append the final run
    encoded_chars.append(f"{count}{data[-1]}")
    
    return "".join(encoded_chars)

# Example Usage:
original_text = "WWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWW"
print("Original:", original_text)
print("Encoded: ", rle_encode(original_text))

2. The Decoding Function

To reverse the process, the decoder reads the encoded string in pairs: a number (the count) followed by a character. It then repeats that character according to the specified count.

import re

def rle_decode(encoded_data):
    # Split the string into pairs of digits and characters using regex
    pattern = r'(\d+)([A-Za-z])'
    pairs = re.findall(pattern, encoded_data)
    
    decoded_chars = []
    for count, char in pairs:
        decoded_chars.append(char * int(count))
        
    return "".join(decoded_chars)

# Example Usage:
encoded_text = "10W1B12W3B8W"
print("Encoded: ", encoded_text)
print("Decoded: ", rle_decode(encoded_text))

Run-Length Encoding (RLE)

Run Length Encoding (RLE)

Why Use RLE?

Implementation

1. The Encoding Function

2. The Decoding Function

Suggested Problems

Tags