4
$\begingroup$

Background:

This question is based on one asked on the statistics stack exchange, CrossValidated.SE here. Alas, the full answer to the statistical question seems to require enormous computational resources to perform rigorously, so I gather the only approach forward is to perform instead a stochastic simulation.

Hence this question on Mathematica.SE.


How would one use Mathematica and its curated geographic and census database to determine how many Americans, randomly chosen, are needed to have a 50% chance that two of them live in a) the same state or b) in the same or an adjacent state?

One can determine the populations of each states by:

WolframAlpha["US state population table", 
{{"PropertyRanking:USStateData", 1}, "QuantityData"}, 
 PodStates -> {"PropertyRanking:USStateData__More", 
   "PropertyRanking:USStateData__More", 
   "PropertyRanking:USStateData__More"}]

One can determine the adjacency matrix of the states (or undirected graph $g$) using GeoData and neighboring, as described by this analogous problem with the counties of Florida:

counties=EntityList[US counties in Florida (administrative divisions)];

and

Cases[GeoNearest["USCounty", counties[[16]]], Except[counties[[16]]]]

(Alas, the generalization to states within the US does not seem to work directly.)

So the approach to part b) would be to do a large simulation of choosing $n=2$ people randomly according to the probabilities based on the state populations. Then find what percentage of the time these two people live in the same or adjacent states. Surely for $n=2$ this will be a small number, say $1\%$. Then repeat with $n=3$. And $n=4$... until one finds the probability of roughly $50\%$. Is there a more efficient approach?

Given the population statistics in Wolfram curated data and the neighboring state data inherent in the GeoData, what are the numerical values of the solutions to parts a) and b)?

$\endgroup$
8
  • 1
    $\begingroup$ I'm voting to close this question as off-topic because this question has nothing to do with Mathematica. $\endgroup$
    – JimB
    Commented Jul 11, 2017 at 20:51
  • 1
    $\begingroup$ @JimBaldwin: This question is entirely about Mathematica, specifically the methods for using Wolfram's unique curated database of geographic data to compute the adjacency matrix for US states. I don't know how one would even approach this problem using other computer languages. (Moreover, I urge you to wait more than two minutes until a question has been fully posted before voting to close it.) $\endgroup$ Commented Jul 11, 2017 at 20:53
  • $\begingroup$ Pointer : RandomChoice with arguments "list of states" rule "populations of states" $\endgroup$
    – LLlAMnYP
    Commented Jul 11, 2017 at 21:04
  • $\begingroup$ Excellent pointer, but what's the exact code? $\endgroup$ Commented Jul 11, 2017 at 21:06
  • 1
    $\begingroup$ You can find a list of which states neighbor which states at theincidentaleconomist.com/wordpress/…. $\endgroup$
    – JimB
    Commented Jul 11, 2017 at 22:00

2 Answers 2

9
$\begingroup$

Answer is 3.5 when including District of Columbia:

states = EntityValue[Entity["Country", "UnitedStates"], "AdministrativeDivisions"];
pops = AdministrativeDivisionData[states, "Population"][[All, 1]];
(* very slow *)  dists = GeoDistance[{#, #2}][[1]] & @@@ Subsets[states, {2}];
(* fast *)       dists = Uncompress["1:eJztWDkOg0AM3Eh8hP+k4gkUSFSRQn5PAwVKE5G9PONZBBIWje2xx/Z66cfXMK1dCMv+hue8fObH/pEg3uF4vh+pmnGT1Zb41i8lANzWUmWChBpO3JlFmjleNPLlyaad7zPVvyZz/GWUhGwymqfaljC16KgoNCA2K4pa7M55daZR51nacvR76pejU5xP+IkhsfuiuhJPP6cZCJVmopVWxmJNCXNxkdOlnC6zWjPbn1v591JrCXDhkO1EwNBFxIq6AzoWHXH3Awv9AZOLUHCCxPWL1mAmQJiWd904oSME6p3ZbP8btjQYMQ=="];

true = Association[Thread[Join[Pick[Subsets[Range[51], {2}], dists, 0.], Table[{i, i}, {i, 51}]] -> 1]];
N[Mean[Map[Max[Lookup[true, Subsets[Sort[#], {2}], 0]] &,
   RandomChoice[pops -> Range[51], {10000, #}]]]] & /@ {3, 4}

{0.3444, 0.5889}

The true-assosation contains the pairs (Michigan, Minnesota) and (New York, Rhode Island), because the first state has an island next to the coast of the other state. The website Jim Baldwin's linked to and using "BorderingStates" in Mathematica both disagree on this inclusion. However the populations there are too small to change the result.

$\endgroup$
5
  • $\begingroup$ +1, of course the problem is small enough to bruteforce $\endgroup$
    – LLlAMnYP
    Commented Jul 11, 2017 at 21:27
  • $\begingroup$ Wonderful. Elegant. Efficient. (Somehow, though, your code doesn't work on my v. 10.4.) Accept. $\endgroup$ Commented Jul 11, 2017 at 21:36
  • 2
    $\begingroup$ +1. A warning to people who attempt to run the code in this answer: the computations might take up to 10-15 minutes. $\endgroup$ Commented Jul 12, 2017 at 13:22
  • $\begingroup$ @AntonAntonov added altenative for the slow part $\endgroup$
    – Coolwater
    Commented Jul 12, 2017 at 13:53
  • $\begingroup$ @AntonAntonov you can get a lot of speed up by using #[[Range[1, Length[#], 2]]] & @ Normal[GeoDistanceList[spairList] & @ Subsets[states, {2}]] instead of that mapped GeoDistance. In my experience GeoDistanceList is orders of magnitude faster. $\endgroup$
    – b3m2a1
    Commented Aug 2, 2017 at 2:58
2
$\begingroup$

Using the property of "BorderingStates" (assuming this qualifies as adjacent):

states = EntityValue[Entity["Country", "UnitedStates"], 
   "AdministrativeDivisions"];
pops = AdministrativeDivisionData[states, "Population"][[All, 1]];
f[x_, y_] := If[y == {}, Nothing, x <-> # &
   /@ y]
g = DeleteDuplicates[
   Catenate[
    f[#, EntityValue[#, 
        EntityProperty["AdministrativeDivision", 
         "BorderingStates"]]] & /@ states], (Sort[#1] == Sort[#2] &)];
grph = Graph[states, g];
sr = Thread[Range[51] -> states];
rnc[n_, num_] := RandomChoice[pops -> Range[51], {n, num}] /. sr;
func[ss_] := Module[{prt = Subsets[ss, {2}]},
  Unitize@
   Total[If[#1 === #2, 1, 
       If[Length[FindPath[grph, ##, {1}]] > 0, 1, 0]] & @@@ prt]]
res[n_, num_] := Total[func /@ rnc[n, num]]/n // N

So simulation:

TableForm[
 Catenate[Table[{i, j, 
    res[i, j]}, {i, {100, 1000, 10000, 100000}}, {j, {2, 3, 4}}]], 
 TableHeadings -> {None, {"N", "choice size", "proportion adjacent"}}]

enter image description here

$\endgroup$

Not the answer you're looking for? Browse other questions tagged or ask your own question.