-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Milwaukee ward sizes are small and there is a highly preferred candidate #17
Comments
I believe you are correct. https://results.enr.clarityelections.com/PA/Allegheny/63905/Web02.193333/#/cid/0104 |
but according to your simulation: |
@tcauth would the smaller portion of votes possibly not move the needle due to the fact that they are blending in with the rest of the wards of equal size? |
@tcauth what does the 2016 election look like -- are we able to get the data to run it? |
And this is the point of one the main research papers that claims not to use Benfords law in these situations. I think the issue to have the proper baselines, which may be determined by looking at the 2016 data |
Trump's average vote probability was roughly 26% in Milwaukee for this election, and Biden's was roughly 73%. With those numbers, Trump gets a nice Benford's law in my simulation, and Biden gets the observed spike around the digit 4. I've updated the notebook at the same link so you can see Trump's distributions too: https://rpubs.com/frycast/687633 It makes sense: the populations in Milwaukee Wards are often around 500-1000, so 73% of a random Milwaukee ward population is a number that starts with a digit somewhere between 3 and 7. |
make sense. can we assume that if the samples are large enough to a state -> county level, the distribution should be better?
|
|
Y is turnout, X is percentage of the votes for a given candidate in each precinct. Color is the size of the bucket. It's a 2D histogram. |
And left is Biden right is Trump? |
Yes. Earlier comment I made had them inadvertently swapped on the Allegheny histogram. I've deleted that comment and uploaded the correct one. I could push my fork if someone wants to play with this. In particular, it'd be interesting to see if this is a normally occurring phenomenon by comparing these with a county where Trump won. I started looking into Miami Dade, but precinct-level voter turnout data is not easily available for it, unfortunately. |
To be clear (for folks who will repost this on social media) - I'm not alleging any fraud or anything like that. Just surfacing a pattern which I thought was counterintuitive, and seeking an explanation. |
It seems to be very easy to get and process state level which should be very solid in statistics |
Well, precinct level turnout data is technically available, but "available" in this case means a gigantic PDF, from which hundreds of values need to be copied out by hand. I don't have that kind of time or motivation. |
That's interesting. I haven't played around with that data yet. Do those turnouts count mail-in ballots? Is there also a negative correlation between turnout and something else that is correlated with Biden support, such as population density? |
Yes:
|
I haven't studied other confounders, perhaps someone else will. |
There's a pretty informative thread evolving on this in general here https://skeptics.stackexchange.com/questions/49782/do-vote-counts-for-joe-biden-in-the-2020-election-violate-benfords-law |
@frycast it has nothing to do with what presented here. They discuss other graphs of unknown source. |
That's very helpful, thanks I assume you are estimating the true probabilities using the observed frequencies. But what if these are wrong, and significantly so ? I think the big question is, Is it possible to detect this kind of fraud, with any level of. confidence, using advanced techniques. And then correlate it with real world behavior ? A good starting point it seems would be to look at earlier data from 2016, and try to use this as a baseline for the analysis to and to estimate Biden and Trumps's true 'expected probability'm with some error bars, from that frequency data. In Milwaukee, however, we see that turnout was not significantly different from 2016, and that the expected probabilities for Biden wold, in fact, be lower than 2016 in majority-Black wards. Is it possible to get the 2016 data ? |
You cannot detect fraud using statistics alone. But you can detect statistical anomalies. |
Thanks for clarifying. By 'detect fraud' I mean to detect anomalies suggestive of fraud. (I deleted this last part by accident when editing) Is it possible to get the 2016 data ? |
@frycast, you want to run that in base 16 for giggles? |
"Empirical probability" means the probability on the observed data alone. There is no need for any estimate of the true probability here. We only need the empirical probability. This is because my simulation demonstrates that the disagreement with Benford's law arises even if the true probability is equal to the empirical probability. So, in other words, if there is no fraud, then you still get the observed disagreement in Milwaukee. My simulation cautions against an erroneous use of Benford's law to try to detect election fraud in Milwaukee, since the observed result is exactly what we would expect if there was no fraud. |
Could be. According to this data GOP voter turnout in Allegheny slightly exceeds 100% (so there are probably some unaffiliated in it), whereas Joe's voter turnout is about 74%. I'm not sure what "count of all other voters" means in the spreadsheet, though. |
On the voter participation data... I think the question that is relevant to this thread is, If are the Biden vote counts distributed normally in the range 300-400, (and therefore non-Benford) , whereas the Trump are Benford-like, are the overall voter participation trends consistent with vote distributions seen ? |
That is a good criticism. I think this isn't a problem here though. My argument for this is that the simulated vote count distributions do look visually very similar to the observed ones, for both Biden and Trump (and not just the Benford distributions). A visual comparison would not be sufficient in many cases, but in this case, especially since no inference is being made about the true distribution, we can see that, even if the underlying data generating process is being misrepresented, there is enough agreement to justify a visual comparison of the Benfords. So a clearer conclusion is, if the data are generated binomially, with no difference in DGP between Biden and Trump, other than the probability of receiving a vote, then the observed data visually agree in count and benford distribution for both Biden and Trump. |
@markr-github That's very interesting. Let me suggest plotting the vote count distributions themselves. Benford's Law data is heavy-tailed, but heavy-tailed data may not be Benford. We can see if the data is heavy-tailed or not by looking at plots of the vote count distributions (and by checking the tail statistics; you can use the powerlaw packages in R or python to do this) *Also, can you share the data sets and notebooks if checked in |
@charlesmartin14 Not in a notebook, but the code and data are here: |
@markr-github. Thanks I'll take a look after work today. I think what we have learned so far is that when we see deviations from Benford's Law, the data is clustered around a high (say 100-200 votes). I'm just a bit surprised that in the cases I have looked at so far (Biden's Election Day data for Allegheny) the data appears nearly perfectly Gaussian and not seemingly heavy-tailed (i.e Biden's Absentee Data, Trump's data, etc) That is, it appears that there are (unusually?) very few districts with really high turnout for Biden. See #31 But maybe there is just not enough data to see the tail? That could certainly be, and it may be necessary to study total Biden districts across say an entire state ? I'm still checking that and need to do more careful tests. This also, however, appears to be how the Biden vote distributions. That's the in charts that @andrewzigerelli is showing above if I understand this correctly. The higher the turnout in a district, the lower the Biden percentage. And the exact opposite for Trump. |
@charlesmartin14 That relationshpi between turnout vs margin is exactly what I would have guessed beforehand so it doesn't surprise me. In every US presidential election with data, the highest-turnout ethnic group has been "white non-hispanic": If you were convinced that fraud was happening then the naive approach would be to look at high-turnout areas, since more ballots increases the probability of "fake" ballots being included. I don't think there's any evidence that Trump precincts were fabricating votes though. |
I asked the question because I see Benford's Law as a statistical test for heavy-tailed behavior, characteristic of natural (i.e. not fake) data. I agree, I don't think it can be interpreted using a naive approach However, there are other tests for heavy-tailed behavior, more suitable to finite-size systems, that might prove more useful here. The simplest of these is to fit the tail of the data to truncated power law distribution , and then compare this to an exponential distribution using a non-parametric Kolmogorov–Smirnov test see #31
But is also increases the probability of "real" ballots, so it says nothing about the signal-to-noise ratio, which will certainly affect any estimator we use |
And notice...Taleb also used normal random data as an example of Benford https://twitter.com/nntaleb/status/1326212740273278978 This seems qualitatively correct. I don't think it's helpful, however, to chime in. I prefer to avoid a flame war on Twitter. There are lots of smart people here and I think we should just figure this out ourselves. Maybe there is something here, maybe not. I'm hoping to see more once we dig into the vote distributions. |
This just hit the web. Do we have a way to check this or comment on it? Do we need to open another issue? |
I would encourage anyone planning on watching that video to read Dr. Shiva Ayyadurai's Wikipedia page as well. Here are the first few sentences, for your convenience: V. A. Shiva Ayyadurai (born Vellayappa Ayyadurai Shiva,[2] December 2, 1963)[3] is an Indian-American scientist, engineer, politician, entrepreneur, and promoter of conspiracy theories and unfounded medical claims. He is notable for his widely discredited claim to be the "inventor of email". |
@MechanicalTim agreed. Can the data be grabbed and can we run this on our own to either confirm or deny the outcome? |
@chavenor We should try to get the data ourselves. I would also suggest to reach out to the researcher at MIT |
@charlesmartin14 I'm way ahead of you. Already asked on Twitter. Who was the guy from MIT? Did they have their info on that presentaiton? |
I found the other guys and reached out on LinkedIn. Hope they can share their data with us so we can double-check it. |
@chavenor For reference, Dr. Shiva Ayyadurai ran for the senate as a Republican in Massachusetts. He's considered a bit of a joke over here. |
@alexsullivan114 It doesn't matter. What matters is getting data and doing our own honest analysis. |
@alexsullivan114 there seems to be a trend that anyone that is anti-establishment gets the "crazy stamp" -- I've moved beyond that prism. They made claims. I've asked for the data. If we get it and can verify the results then that is all the proof we should need. I didn't see that @charlesmartin14 already responed. Tossed ya a thumbs-up happy to have your input. |
Sure - totally fair. I was just trying to add some context about who this person was - of course the data should stand on its own. |
There are 3 people presenting, one of which is a state election commissioner. Remember also that there are claims that some media companies like Twitter, CNN, etc. are actively censoring information claiming to be (potential) evidence of fraud. So he may be forced to go 'underground' , so to speak. The data should speak for itself |
It seems to me that the Shiva stuff is a case of deliberately deceptive plotting. They display plots using the following data:
If we posit that people who are more likely to vote straight Republican are more likely to vote for Trump, then the mean percentages voting for Republican, and voting for Trump, might like something like this: repub_prec_fraction = [20; 30; 40; 50; 60; 70; 80]; % and rough approx of "straight Rep" THE ABOVE ARE NOT REAL DATA! USED FOR ILLUSTRATIVE PURPOSES ONLY! (Also, excuse the MATLAB syntax.) Here are two subplots:
(In Shiva's plot, there is of course the random scatter of real data around those lines.) He then claims that this shape is somehow evidence of Biden stealing votes from Trump. I have admittedly over-simplified a bit, for the sake of making my fundamental point more directly. But I think this is at the heart of Shiva's plot. I think he is obscuring truth, not revealing it. Shiva does other deceptive things on the plot, like adding lines to "guide the eye", which, if you ignore them, you realize do not actually follow the data. There are also edge effects on the plot, that he ignores. Finally, he also makes verbal statements that are similarly deceiving. I rate the video 1 out of 10. Would not watch again. (Disclaimer: I only watched the first 37 minutes before writing this.) |
This should be moved to another thread |
@MechanicalTim I do not believe that is what they are saying - I took - Straight ticket as assuming that all Republicans vote for Trump and as a precinct gets more Republican you would expect that the number would be at 0% not down -25%. Also, this does play into the discussion above about lower Dem turnout and trying to figure out where the votes came from. I'll wait for the data so can just see what they did. Moved here. #38 |
It's exactly what happens with their plots, that's the whole story :) Well said. |
The disappearance of Benford's law in Milwaukee is a function of voter preference alone. If one candidate has between 60% and 80% average chance of receiving a vote, then the sizes of the wards in Milwaukee are too small to accommodate Benford's law. See further details with my simulations here https://rpubs.com/frycast/687633
Edit: Not just too small, but too concentrated. They do not span many orders of magnitude.
Edit 2: The thread below becomes distracted by an effort to look into election data anomalies that are not directly related to this issue. My intention here is not to develop a fraud detection tool, but to highlight the major flaws with the one being used, and currently being touted by various news sources as evidence of fraud. So far, this issue is still open, and should be resolved by at least adding some comments to the README clarifying that the pattern observed in Milwaukee is a pattern that can arise in election data absent of fraud. Hopefully the owner of this popular repository, and the people involved here in this thread, are all interested in acting in good faith, and will focus on resolving the issue.
The text was updated successfully, but these errors were encountered: