Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed. Examples include universal lossless data compression algorithms.[1] To compress a data sequence , a grammar-based code transforms into a context-free grammar . The problem of finding a smallest grammar for an input sequence (smallest grammar problem) is known to be NP-hard,[2] so many grammar-transform algorithms are proposed from theoretical and practical viewpoints. Generally, the produced grammar is further compressed by statistical encoders like arithmetic coding.

Thumb
Straight-line grammar (with start symbol ß) for the second sentence of the United States Declaration of Independence. Each blue character denotes a nonterminal symbol; they were obtained from a gzip-compression of the sentence.

Examples and characteristics

The class of grammar-based codes is very broad. It includes block codes, the multilevel pattern matching (MPM) algorithm,[3] variations of the incremental parsing Lempel-Ziv code,[4] and many other new universal lossless compression algorithms. Grammar-based codes are universal in the sense that they can achieve asymptotically the entropy rate of any stationary, ergodic source with a finite alphabet.

Practical algorithms

The compression programs of the following are available from external links.

  • Sequitur[5] is a classical grammar compression algorithm that sequentially translates an input text into a CFG, and then the produced CFG is encoded by an arithmetic coder.
  • Re-Pair[6] is a greedy algorithm using the strategy of most-frequent-first substitution. The compressive performance is powerful, although the main memory space requirement is very large.
  • GLZA,[7] which constructs a grammar that may be reducible, i.e., contain repeats, where the entropy-coding cost of "spelling out" the repeats is less than the cost creating and entropy-coding a rule to capture them. (In general, the compression-optimal SLG is not irreducible, and the Smallest Grammar Problem is different from the actual SLG compression problem.)

See also

References

Wikiwand in your browser!

Seamless Wikipedia browsing. On steroids.

Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.

Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.