Given a Graph, where we know
- Total number of nodes (~100,000)
- Average no of connections per node (~200)
- Maximum distance between two nodes (~5)
How many nodes (and its connections) do we have to sample so that for 20% of node-pairs, we know the exact connection (any path or shortest path or a path less than 5 nodes away)? My idea right now is to simulate this. Is there anything better I can do?
To give a relevant example of possible application, let’s say, when you visit a LinkedIn profile, LinkedIn wants to suggest a person whom you can ask for introduction, a second or third or fourth degree contact who is a common contact for both of you.
After LinkedIn launches in India, how many people (out of ~100,000 working professionals in India) do they need on their platform (with all of their ~200 their contacts) before they can start doing this for 20% of profile visits?
Please note that I am in no way related to LinkedIn :)
Here is a related question (still unanswered): What is probability that two persons are befriended in a social network?