LZWL explained

LZWL is a syllable-based variant of the LZW (Lempel-Ziv-Welch) compression algorithm, designed to work with syllables derived from any syllable decomposition algorithm. This approach allows LZWL to efficiently process both syllables and words, offering a nuanced method for data compression.

Algorithm

The LZWL algorithm initializes by populating a dictionary with all characters from the alphabet. It then searches for the longest string, S, that exists in both the dictionary and as a prefix of the unencoded portion of the input. The algorithm outputs the identifier of S and augments the dictionary with a new phrase, which combines S with the subsequent character in the input. The input position advances by the length of S. During decoding, LZWL addresses scenarios where the received phrase identifier does not exist in the dictionary by constructing the missing phrase from the concatenation of the last added phrase and its initial character.

Syllable-Based Adaptation

In its syllable-based adaptation, LZWL employs a list of syllables as its alphabet. The initialization step includes the empty syllable and integrates small, frequently occurring syllables into the dictionary. Identifying S and encoding its identifier mirrors the original algorithm, with the distinction that S represents a syllable string. If S is an empty syllable, the algorithm extracts a syllable K from the input and encodes K using methods for new syllables before adding K to the dictionary and advancing the input position accordingly.

Dictionary Expansion

A notable variation in the syllable-based LZWL involves dictionary expansion. When both S and the subsequent string S1 are non-empty syllables, a new phrase is added to the dictionary by concatenating S1 with S’s initial syllable. This method prevents the formation of strings from syllables that appear only once and ensures the decoder does not encounter undefined phrase identifiers.

References