Table of contents
Open Table of contents
Problem
Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.
Example 1:
Input: dummy_input = ["Hello","World"]
Output: ["Hello","World"]
Example 2:
Input: dummy_input = [""]
Output: [""]
Constraints:
1 <= strs.length <= 200
0 <= strs[i].length <= 200
strs[i]
contains any possible characters out of 256 valid ASCII characters.
Follow up: Could you write a generalized algorithm to work on any possible set of characters?
Explanation/Approach
The challenge is to encode and decode strings in a way that we can uniquely identify each string after transmission. To handle any possible set of characters, including delimiters, we can use a length-prefixed encoding.
Length-Prefixed Encoding:
- Encoding: For each string, encode it as
<length>#<string>
. For example,"Hello"
becomes"5#Hello"
. - Decoding: Scan the encoded string for
#
, extract the length, and then read the nextlength
characters as a string. Repeat until the end of the encoded string.
This approach handles any set of characters, including those that could be confused with delimiters in other encoding schemes.
Solution: Length-Prefixed Encoding
class Codec:
def encode(self, strs):
"""Encodes a list of strings to a single string."""
encoded = []
for s in strs:
encoded.append(str(len(s)) + "#" + s)
return ''.join(encoded)
def decode(self, s):
"""Decodes a single string to a list of strings."""
strs = []
i = 0
while i < len(s):
j = i
while s[j] != '#':
j += 1
length = int(s[i:j])
strs.append(s[j + 1: j + 1 + length])
i = j + 1 + length
return strs
Time and Memory Complexity
The time complexity for both encoding and decoding is O(n), where n is the total number of characters in all strings. The space complexity is also O(n) as we need to store the encoded string or the list of strings.
Explanation
This solution uses length-prefixing to encode each string, ensuring that even strings containing the delimiter (#
) can be correctly encoded and decoded. During decoding, the algorithm locates each #
, determines the length of the next string, and extracts it accordingly. This method is robust for any set of ASCII characters.
Visualization Process:
Encoding:
Input: ["Hello", "World"]
Encoded: "5#Hello5#World"
Decoding:
Encoded: "5#Hello5#World"
Output: ["Hello", "World"]
Each string is prefixed with its length and a #
delimiter. During decoding, these prefixes are used to accurately extract each original string, ensuring that the decoded list of strings matches the original list.