75

Given a set of 100 different strings of equal length, how can you quantify the probability that a SHA1 digest collision for the strings is unlikely... ?

6
  • clarify, how can you have 'different but equal length' strings?
    – KevinDTimm
    Dec 8, 2009 at 14:13
  • 6
    @kevindtimm "a", "b", "c" are equal length but different strings Dec 8, 2009 at 14:16
  • 4
    @anthony, why is that obvious? I don't know if that's true Dec 8, 2009 at 14:19
  • 3
    doh, upon re-reading, it's perfectly clear.
    – KevinDTimm
    Dec 8, 2009 at 15:17
  • 1
    This is relevant to this question: sites.google.com/site/itstheshappening At least one collision has been found. Oct 8, 2015 at 21:44

3 Answers 3

148

alt text

Are the 160 bit hash values generated by SHA-1 large enough to ensure the fingerprint of every block is unique? Assuming random hash values with a uniform distribution, a collection of n different data blocks and a hash function that generates b bits, the probability p that there will be one or more collisions is bounded by the number of pairs of blocks multiplied by the probability that a given pair will collide.

(source : http://bitcache.org/faq/hash-collision-probabilities)

5
  • 13
    In conclusion, the likelihood of a collision is in the order of 10^-45. That is very, very unlikely. Dec 8, 2009 at 14:41
  • 5
    But SHA-1 is not uniform distribution, it could be bigger than this upper bound. I doubt that this equation is not correct. AS least the EQUAL.
    – Kamel
    Apr 11, 2015 at 2:12
  • 3
    This answer doesn't take into account the chinese discovery in 2005 where they are able to produce collisions in 2^69 iterations rather than the 2^80 projected by brute force schneier.com/blog/archives/2005/02/sha1_broken.html
    – Djarid
    Apr 18, 2016 at 14:09
  • 7
    @Djarid It's important not to confound accidental hash collision and adversarial collision hunting. The former is the probability that the hash of two items will collide, and follows the formula above (although, as noted by Kamel, the distribution is not perfectly uniform and thus the probability is likely higher). The latter is used to intentionally try to find collisions, and relies on discovering and exploiting weaknesses in the hash. Cryptographic hashes attempt to be robust against such attacks, but often they are overkill for simpler hashing applications (not transmitting secrets).
    – Pierre D
    Feb 19, 2017 at 18:01
  • that formula is accurate when 2^b >> n^2 (and when 2^b very big). I know it is the mayority (if not all) the cases for sha1... but for the record!
    – MrIo
    Dec 8, 2020 at 22:21
7

Well, the probability of a collision would be:

1 - ((2^160 - 1) / 2^160) * ((2^160 - 2) / 2^160) * ... * ((2^160 - 99) / 2^160)

Think of the probability of a collision of 2 items in a space of 10. The first item is unique with probability 100%. The second is unique with probability 9/10. So the probability of both being unique is 100% * 90%, and the probability of a collision is:

1 - (100% * 90%), or 1 - ((10 - 0) / 10) * ((10 - 1) / 10), or 1 - ((10 - 1) / 10)

It's pretty unlikely. You'd have to have many more strings for it to be a remote possibility.

Take a look at the table on this page on Wikipedia; just interpolate between the rows for 128 bits and 256 bits.

6

That's Birthday Problem - the article provides nice approximations that make it quite easy to estimate the probability. Actual probability will be very very very low - see this question for an example.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.