Given a set of 100 different strings of equal length, how can you quantify the probability that a SHA1 digest collision for the strings is unlikely... ?
-
clarify, how can you have 'different but equal length' strings?– KevinDTimmDec 8, 2009 at 14:13
-
6@kevindtimm "a", "b", "c" are equal length but different strings– Joe PhillipsDec 8, 2009 at 14:16
-
4@anthony, why is that obvious? I don't know if that's true– Joe PhillipsDec 8, 2009 at 14:19
-
3doh, upon re-reading, it's perfectly clear.– KevinDTimmDec 8, 2009 at 15:17
-
1This is relevant to this question: sites.google.com/site/itstheshappening At least one collision has been found.– Christophe De TroyerOct 8, 2015 at 21:44
3 Answers
Are the 160 bit hash values generated by SHA-1 large enough to ensure the fingerprint of every block is unique? Assuming random hash values with a uniform distribution, a collection of n different data blocks and a hash function that generates b bits, the probability p that there will be one or more collisions is bounded by the number of pairs of blocks multiplied by the probability that a given pair will collide.
(source : http://bitcache.org/faq/hash-collision-probabilities)
-
13In conclusion, the likelihood of a collision is in the order of 10^-45. That is very, very unlikely. Dec 8, 2009 at 14:41
-
5But SHA-1 is not uniform distribution, it could be bigger than this upper bound. I doubt that this equation is not correct. AS least the EQUAL.– KamelApr 11, 2015 at 2:12
-
3This answer doesn't take into account the chinese discovery in 2005 where they are able to produce collisions in 2^69 iterations rather than the 2^80 projected by brute force schneier.com/blog/archives/2005/02/sha1_broken.html– DjaridApr 18, 2016 at 14:09
-
7@Djarid It's important not to confound accidental hash collision and adversarial collision hunting. The former is the probability that the hash of two items will collide, and follows the formula above (although, as noted by Kamel, the distribution is not perfectly uniform and thus the probability is likely higher). The latter is used to intentionally try to find collisions, and relies on discovering and exploiting weaknesses in the hash. Cryptographic hashes attempt to be robust against such attacks, but often they are overkill for simpler hashing applications (not transmitting secrets).– Pierre DFeb 19, 2017 at 18:01
-
that formula is accurate when 2^b >> n^2 (and when 2^b very big). I know it is the mayority (if not all) the cases for sha1... but for the record!– MrIoDec 8, 2020 at 22:21
Well, the probability of a collision would be:
1 - ((2^160 - 1) / 2^160) * ((2^160 - 2) / 2^160) * ... * ((2^160 - 99) / 2^160)
Think of the probability of a collision of 2 items in a space of 10. The first item is unique with probability 100%. The second is unique with probability 9/10. So the probability of both being unique is 100% * 90%
, and the probability of a collision is:
1 - (100% * 90%), or 1 - ((10 - 0) / 10) * ((10 - 1) / 10), or 1 - ((10 - 1) / 10)
It's pretty unlikely. You'd have to have many more strings for it to be a remote possibility.
Take a look at the table on this page on Wikipedia; just interpolate between the rows for 128 bits and 256 bits.
That's Birthday Problem - the article provides nice approximations that make it quite easy to estimate the probability. Actual probability will be very very very low - see this question for an example.