🧬 DNA Sequencing Compression
A DNA sequencing machine represents genetic markers as digits.
Given a DNA marker sequence, the compression system summarizes it by reading consecutive identical markers from left to right.
For each group:
count + marker
For example:
111221
is compressed as:
- three 1s -> 31
- two 2s -> 22
- one 1 -> 11
So the compressed sequence becomes:
312211
Now suppose the first DNA sequence is:
1
After each compression cycle, the output becomes the input for the next cycle.
Given an integer n, return the DNA sequence after n compression cycles.
Example
Input:
n = 5
Output:
111221
Explanation:
Cycle 1: 1 Cycle 2: 11 Cycle 3: 21 Cycle 4: 1211 Cycle 5: 111221
🔧 Solution
The main idea is to scan the current DNA sequence and group consecutive identical markers.
For example:
111221
We scan it as:
111 | 22 | 1
Then we describe each group:
- 111 -> 31
- 22 -> 22
- 1 -> 11
So the next sequence is:
312211
This is just run-length encoding applied repeatedly.
</> Python Implementation
class Solution: def countAndSay(self, n: int) -> str: sequence = "1" for _ in range(n - 1): sequence = self.compress(sequence) return sequence def compress(self, sequence: str) -> str: # Implementation similar to encoder of RLE algorithm. result = [] i = 0 while i < len(sequence): j = i while j < len(sequence) and sequence[j] == sequence[i]: j += 1 count = j - i marker = sequence[i] result.append(str(count)) result.append(marker) i = j return "".join(result)