Weissman score explained

The Weissman score is a performance metric for lossless compression applications. It was developed by Tsachy Weissman, a professor at Stanford University, and Vinith Misra, a graduate student, at the request of producers for HBO's television series Silicon Valley, a television show about a fictional tech start-up working on a data compression algorithm.[1] [2] [3] [4] It compares both required time and compression ratio of measured applications, with those of a de facto standard according to the data type.

The formula is the following; where r is the compression ratio, T is the time required to compress, the overlined ones are the same metrics for a standard compressor, and alpha is a scaling constant.[1] W = \alpha

The Weissman score has been used by Daniel Reiter Horn and Mehant Baid of Dropbox to explain real-world work on lossless compression. According to the authors it "favors compression speed over ratio in most cases."[5]

Example

This example shows the score for the data of the Hutter Prize,[6] using the paq8f as a standard and 1 as the scaling constant.

ApplicationCompression ratioCompression time [min]Weissman score
paq8f5.4676003001.000000
raq8g5.5149904200.720477
paq8hkcc5.6825933001.039321
paq8hp15.6925663001.041145
paq8hp25.7502793001.051701
paq8hp35.8000333001.060801
paq8hp45.8688293001.073826
paq8hp55.9177193001.082325
paq8hp65.9766433001.093102
paq8hp126.1042765400.620247
decomp86.2615745400.63623
decomp86.2762955400.637726

Limitations

Although the value is relative to the standards against which it is compared, the unit used to measure the times changes the score (see examples 1 and 2). This is a consequence of the requirement that the argument of the logarithmic function must be dimensionless. The multiplier also can't have a numeric value of 1 or less, because the logarithm of 1 is 0 (examples 3 and 4), and the logarithm of any value less than 1 is negative (examples 5 and 6); that would result in scores of value 0 (even with changes), undefined, or negative (even if better than positive).

Standard compressorScored compressorWeissman scoreObservations
Compression ratioCompression timeLog (compression time)Compression ratioCompression timeLog (compression time)
12.12 min0.301033.43 min0.4771211×(3.4/2.1)×(0.30103/0.477121)=1.021506Change in unit or scale, changes the result.
22.1120 sec2.0791813.4180 sec2.2552731×(3.4/2.1)×(2.079181/2.255273)=1.492632
32.21 min03.31.5 min0.1760911×(3.3/2.2)×(0/0.176091)=0If time is 1, its log is 0; then the score can be 0 or infinity.
42.20.667 min−0.1760913.31 min01×(3.3/2.2)×(−0.176091/0)=infinity
51.60.5 h−0.301032.91.1 h0.0413931×(2.9/1.6)×(−0.30103/0.041393)=−13.18138If time is less than 1, its log is negative; then the score can be negative.
61.61.1 h0.0413931.60.9 h−0.0457571×(1.6/1.6)×(0.041393/−0.045757)=−0.904627

See also

Notes and References

  1. Web site: A Fictional Compression Metric Moves Into the Real World. Perry. Tekla. July 28, 2014. January 25, 2016.
  2. Web site: A Made-For-TV Compression Algorithm. Perry. Tekla. July 25, 2014. January 25, 2016.
  3. Web site: HBO's 'Silicon Valley' Tech Advisor on Realism, Possible Elon Musk Cameo. Sandberg. Elise. April 12, 2014. The Hollywood Reporter. June 10, 2014.
  4. Web site: There's a New Geek in Town: HBO's 'Silicon Valley'. Jurgensen. John. Rusli. Evelyn M.. April 3, 2014. The Wall Street Journal. June 10, 2014.
  5. Web site: Lossless compression with Brotli in Rust for a bit of Pied Piper on the backend . 2017-06-24 . Dropbox Tech Blog . en.
  6. Web site: Hutter. Marcus. Contestants. July 2016. January 25, 2016.