Collision resistance explained

In cryptography, collision resistance is a property of cryptographic hash functions: a hash function H is collision-resistant if it is hard to find two inputs that hash to the same output; that is, two inputs a and b where a ≠ b but H(a) = H(b).^[1] The pigeonhole principle means that any hash function with more inputs than outputs will necessarily have such collisions; the harder they are to find, the more cryptographically secure the hash function is.

The "birthday paradox" places an upper bound on collision resistance: if a hash function produces N bits of output, an attacker who computes only 2^N/2 (or

\scriptstyle\sqrt{2^N}

) hash operations on random input is likely to find two matching outputs. If there is an easier method to do this than brute-force attack, it is typically considered a flaw in the hash function.^[2]

Cryptographic hash functions are usually designed to be collision resistant. However, many hash functions that were once thought to be collision resistant were later broken. MD5 and SHA-1 in particular both have published techniques more efficient than brute force for finding collisions.^[3] ^[4] However, some hash functions have a proof that finding collisions is at least as difficult as some hard mathematical problem (such as integer factorization or discrete logarithm). Those functions are called provably secure.

Definition

A family of functions generated by some algorithm G is a family of collision-resistant hash functions, if |m(k)| > |l(k)| for any k, i.e., h_k compresses the input string, and every h_k can be computed within polynomial time given k, but for any probabilistic polynomial algorithm A, we have

Pr [''k'' ← ''G''(1''n''), (''x''1, ''x''2) ← ''A''(''k'', 1''n'') s.t. ''x''1 ≠ ''x''2 but ''h''''k''(''x''1) = ''h''''k''(''x''2)] < negl(n),

where negl(·) denotes some negligible function, and n is the security parameter.^[5]

Weak and strong collision resistance

There are two different types of collision resistance.

A hash function has weak collision resistance when, given a hashing function H and an x, no other x' can be found such that H(x)=H(x'). In words, when given an x, it is not possible to find another x' such that the hashing function would create a collision.

A hash function has strong collision resistance when, given a hashing function H, no arbitrary x and x' can be found where H(x)=H(x'). In words, no two x's can be found where the hashing function would create a collision.

Rationale

Collision resistance is desirable for several reasons.

In some digital signature systems, a party attests to a document by publishing a public key signature on a hash of the document. If it is possible to produce two documents with the same hash, an attacker could get a party to attest to one, and then claim that the party had attested to the other.
In some distributed content systems, parties compare cryptographic hashes of files in order to make sure they have the same version. An attacker who could produce two files with the same hash could trick users into believing they had the same version of a file when they in fact did not.

Notes and References

[Shafi Goldwasser|Goldwasser, S.]
Pass, R. "Lecture 21: Collision-Resistant Hash Functions and General Digital Signature Scheme". Course on Cryptography, Cornell University, 2009
Web site: How to Break MD5 and Other Hash Functions. Xiaoyun Wang. Hongbo Yu. 2009-12-21. https://web.archive.org/web/20090521024709/http://merlot.usc.edu/csac-f06/papers/Wang05a.pdf. 2009-05-21. dead.
Xiaoyun Wang . Yiqun Lisa Yin. Yiqun Lisa Yin . Hongobo Yu . Finding Collisions in the Full SHA-1 . CRYPTO 2005 . 10.1007/11535218_2.
Web site: Dodis. Yevgeniy. Lecture 12 of Introduction to Cryptography. 3 January 2016., def 1.

Collision resistance explained

Definition

Weak and strong collision resistance

Rationale

See also

Notes and References