57

I have a 10-character string key field in a database. I've used CRC32 to hash this field, but I'm worrying about duplicates. Could somebody show me the probability of collision in this situation?

P.S.: My string field is unique in the database. If the number of string fields is 1 million, what is the probability of a collision?

2 Answers 2

118

Duplicate of Expected collisions for perfect 32bit crc

The answer referenced this article: http://arstechnica.com/civis/viewtopic.php?f=20&t=149670

Found the image below from: http://preshing.com/20110504/hash-collision-probabilities

enter image description here

2
  • 5
    SO WHAT's THE ANSWER?
    – Tommix
    Jan 18, 2023 at 14:16
  • 1
    Technically, odds of a meteor landing on your house is 0, unless your house is in orbit. Sep 29, 2023 at 20:08
47

In the case you cite, at least one collision is essentially guaranteed. The probability of at least one collision is about 1 - 3x10-51. The average number of collisions you would expect is about 116.

In general, the average number of collisions in k samples, each a random choice among n possible values is:

N(n,k)~=k(k-1)/(2n)

The probability of at least one collision is:

p(n,k)~=1-e^(-k(k-1)/(2n))

In your case, n = 232 and k = 106.

The probability of a three-way collision in your case is about 0.01. See the Birthday Problem.

2
  • Thank you so much, I wonder that this probability still depend on CRC32 algorithm ? Jan 11, 2013 at 10:03
  • 2
    Any good 32-bit hash algorithm will give exactly the same result.
    – Mark Adler
    Jan 11, 2013 at 16:50

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.