First off: I know this is on the verge of being subjective and i'll try my best to make it as specific as possible.
I am in a situation where I need to generate a fixed number of hashes (k), say 10,000. My goal is to choose a small as possible number space for those hashes (N) while keeping collision probability below a fixed threshold (p).
The collision probability can be approximated by: $p\left(k,N\right)=1-e^{\frac{-k\left(k-1\right)}{2N}}$
The key question now is: How low is low enough for my collision threshold? I've tried numerous things to come up with a number:
- Comparing various unlikely events like winning the Powerball, meteorite hits, lightning strikes, etc. (While doing this I found out that China is going to be hit by a meteorite in 2027)
- Looking into the failure rates of technical items like uncorrectable bit errors in hard drives
- Read a lot of Stack, Mathematics and CS posts about the topic which all explain the birthday problem and show me why I have a problem but are unfortunately not helping to solve it.
Yet nothing seems to do the trick for me. The system I am implementing relies on the hashes generated and collision would corrupt the functionality of the system at its core. Unfortunately perfect hashing is not an option for reasons not relevant here.
Does anyone have an Idea how I can tackle this problem and come up with a solid, profound probability threshold that is not just made up from the top of my head?