Use Wolfram curated databases to determine how many randomly chosen people are needed to have a 50% chance two live in the same or adjacent states?

Question

Background:

This question is based on one asked on the statistics stack exchange, CrossValidated.SE here. Alas, the full answer to the statistical question seems to require enormous computational resources to perform rigorously, so I gather the only approach forward is to perform instead a stochastic simulation.

Hence this question on Mathematica.SE.

How would one use Mathematica and its curated geographic and census database to determine how many Americans, randomly chosen, are needed to have a 50% chance that two of them live in a) the same state or b) in the same or an adjacent state?

One can determine the populations of each states by:

WolframAlpha["US state population table", 
{{"PropertyRanking:USStateData", 1}, "QuantityData"}, 
 PodStates -> {"PropertyRanking:USStateData__More", 
   "PropertyRanking:USStateData__More", 
   "PropertyRanking:USStateData__More"}]

One can determine the adjacency matrix of the states (or undirected graph $g$) using GeoData and neighboring, as described by this analogous problem with the counties of Florida:

counties=EntityList[US counties in Florida (administrative divisions)];

and

Cases[GeoNearest["USCounty", counties[[16]]], Except[counties[[16]]]]

(Alas, the generalization to states within the US does not seem to work directly.)

So the approach to part b) would be to do a large simulation of choosing $n=2$ people randomly according to the probabilities based on the state populations. Then find what percentage of the time these two people live in the same or adjacent states. Surely for $n=2$ this will be a small number, say $1\%$. Then repeat with $n=3$. And $n=4$... until one finds the probability of roughly $50\%$. Is there a more efficient approach?

Given the population statistics in Wolfram curated data and the neighboring state data inherent in the GeoData, what are the numerical values of the solutions to parts a) and b)?

I'm voting to close this question as off-topic because this question has nothing to do with Mathematica. — JimB, Commented Jul 11, 2017 at 20:51
@JimBaldwin: This question is entirely about Mathematica, specifically the methods for using Wolfram's unique curated database of geographic data to compute the adjacency matrix for US states. I don't know how one would even approach this problem using other computer languages. (Moreover, I urge you to wait more than two minutes until a question has been fully posted before voting to close it.) — David G. Stork, Commented Jul 11, 2017 at 20:53
Pointer : RandomChoice with arguments "list of states" rule "populations of states" — LLlAMnYP, Commented Jul 11, 2017 at 21:04
You can find a list of which states neighbor which states at theincidentaleconomist.com/wordpress/…. — JimB, Commented Jul 11, 2017 at 22:00

Coolwater · Accepted Answer · 2017-07-12 13:52:38Z

9

Answer is 3.5 when including District of Columbia:

states = EntityValue[Entity["Country", "UnitedStates"], "AdministrativeDivisions"];
pops = AdministrativeDivisionData[states, "Population"][[All, 1]];
(* very slow *)  dists = GeoDistance[{#, #2}][[1]] & @@@ Subsets[states, {2}];
(* fast *)       dists = Uncompress["1:eJztWDkOg0AM3Eh8hP+k4gkUSFSRQn5PAwVKE5G9PONZBBIWje2xx/Z66cfXMK1dCMv+hue8fObH/pEg3uF4vh+pmnGT1Zb41i8lANzWUmWChBpO3JlFmjleNPLlyaad7zPVvyZz/GWUhGwymqfaljC16KgoNCA2K4pa7M55daZR51nacvR76pejU5xP+IkhsfuiuhJPP6cZCJVmopVWxmJNCXNxkdOlnC6zWjPbn1v591JrCXDhkO1EwNBFxIq6AzoWHXH3Awv9AZOLUHCCxPWL1mAmQJiWd904oSME6p3ZbP8btjQYMQ=="];

true = Association[Thread[Join[Pick[Subsets[Range[51], {2}], dists, 0.], Table[{i, i}, {i, 51}]] -> 1]];
N[Mean[Map[Max[Lookup[true, Subsets[Sort[#], {2}], 0]] &,
   RandomChoice[pops -> Range[51], {10000, #}]]]] & /@ {3, 4}

{0.3444, 0.5889}

The true-assosation contains the pairs (Michigan, Minnesota) and (New York, Rhode Island), because the first state has an island next to the coast of the other state. The website Jim Baldwin's linked to and using "BorderingStates" in Mathematica both disagree on this inclusion. However the populations there are too small to change the result.

edited Jul 12, 2017 at 13:52

answered Jul 11, 2017 at 21:23

Coolwater

20.4k3 gold badges36 silver badges65 bronze badges

$\begingroup$ +1, of course the problem is small enough to bruteforce $\endgroup$
– LLlAMnYP
Commented Jul 11, 2017 at 21:27
$\begingroup$ Wonderful. Elegant. Efficient. (Somehow, though, your code doesn't work on my v. 10.4.) Accept. $\endgroup$
– David G. Stork
Commented Jul 11, 2017 at 21:36
2

$\begingroup$ +1. A warning to people who attempt to run the code in this answer: the computations might take up to 10-15 minutes. $\endgroup$
– Anton Antonov
Commented Jul 12, 2017 at 13:22
$\begingroup$ @AntonAntonov added altenative for the slow part $\endgroup$
– Coolwater
Commented Jul 12, 2017 at 13:53
$\begingroup$ @AntonAntonov you can get a lot of speed up by using #[[Range[1, Length[#], 2]]] & @ Normal[GeoDistanceList[spairList] & @ Subsets[states, {2}]] instead of that mapped GeoDistance. In my experience GeoDistanceList is orders of magnitude faster. $\endgroup$
– b3m2a1
Commented Aug 2, 2017 at 2:58

Add a comment |

ubpdqn · Accepted Answer · 2017-07-12 06:34:49Z

Using the property of "BorderingStates" (assuming this qualifies as adjacent):

states = EntityValue[Entity["Country", "UnitedStates"], 
   "AdministrativeDivisions"];
pops = AdministrativeDivisionData[states, "Population"][[All, 1]];
f[x_, y_] := If[y == {}, Nothing, x <-> # &
   /@ y]
g = DeleteDuplicates[
   Catenate[
    f[#, EntityValue[#, 
        EntityProperty["AdministrativeDivision", 
         "BorderingStates"]]] & /@ states], (Sort[#1] == Sort[#2] &)];
grph = Graph[states, g];
sr = Thread[Range[51] -> states];
rnc[n_, num_] := RandomChoice[pops -> Range[51], {n, num}] /. sr;
func[ss_] := Module[{prt = Subsets[ss, {2}]},
  Unitize@
   Total[If[#1 === #2, 1, 
       If[Length[FindPath[grph, ##, {1}]] > 0, 1, 0]] & @@@ prt]]
res[n_, num_] := Total[func /@ rnc[n, num]]/n // N

So simulation:

TableForm[
 Catenate[Table[{i, j, 
    res[i, j]}, {i, {100, 1000, 10000, 100000}}, {j, {2, 3, 4}}]], 
 TableHeadings -> {None, {"N", "choice size", "proportion adjacent"}}]

Stack Exchange Network

Use Wolfram curated databases to determine how many randomly chosen people are needed to have a 50% chance two live in the same or adjacent states?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
probability-or-statistics
curated-data
combinatorics
geography
or ask your own question.

Linked

Hot Network Questions

Use Wolfram curated databases to determine how many randomly chosen people are needed to have a 50% chance two live in the same or adjacent states?

2 Answers 2

Not the answer you're looking for? Browse other questions tagged probability-or-statisticscurated-datacombinatoricsgeography or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
probability-or-statistics
curated-data
combinatorics
geography
or ask your own question.