Run Length Encoding (RLE)
Run-Length Encoding (RLE) is a simple and highly efficient lossless data compression technique. It works by replacing consecutive, repeating characters (or "runs") with a single character and a count of how many times it repeats. For example, the string AAABBBBBBBAAA transforms into 3A7B3A.
Why Use RLE?
RLE is incredibly useful for data that contains long sequences of repeating elements. While it isn't great for standard text documents (where letters rarely repeat in long chains), it shines in:
-
Image Compression: Especially for simple graphics, icons, or black-and-white line drawings where the same pixel color extends across a large area.
-
File Formats: RLE is the foundation for encoding in legacy formats like the TGA image file and plays a role in scientific data like computer vision segmentation masks.
Because RLE is lossless, no data is lost during the compression process; the original data can be perfectly reconstructed when decompressed.
Implementation
Let’s look at how to build a basic RLE encoder and decoder from scratch in Python.
1. The Encoding Function
Our encoder will iterate through a string, keep track of the current character, and count its consecutive occurrences. When the character changes, it appends the count and the character to our output list.
def rle_encode(data): if not data: return "" encoded_chars = [] count = 1 # Iterate starting from the second character for i in range(1, len(data)): if data[i] == data[i - 1]: count += 1 else: encoded_chars.append(f"{count}{data[i - 1]}") count = 1 # Don't forget to append the final run encoded_chars.append(f"{count}{data[-1]}") return "".join(encoded_chars) # Example Usage: original_text = "WWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWW" print("Original:", original_text) print("Encoded: ", rle_encode(original_text))
2. The Decoding Function
To reverse the process, the decoder reads the encoded string in pairs: a number (the count) followed by a character. It then repeats that character according to the specified count.
import re def rle_decode(encoded_data): # Split the string into pairs of digits and characters using regex pattern = r'(\d+)([A-Za-z])' pairs = re.findall(pattern, encoded_data) decoded_chars = [] for count, char in pairs: decoded_chars.append(char * int(count)) return "".join(decoded_chars) # Example Usage: encoded_text = "10W1B12W3B8W" print("Encoded: ", encoded_text) print("Decoded: ", rle_decode(encoded_text))