13

This is an excerpt of data scraped from the New York Times election tracking website. At line 5969, the vote count suddenly drops by 340k votes, then at line 5972 it drops again by 114k votes, then at line 5974 it suddenly goes up 500k votes, then at line 5977 it drops nearly 600k again.

To be clear, I'm not alleging fraud here. I'm simply wondering whether anybody knows what happened. If you look at the ratios, Trump gained a 3.6% swing between lines 5968 and 5977. That works out to 249k votes for Biden "vanishing" between those lines and only 122k for Trump. Comparing line 5968 to 6003 when the vote count finally gets back up to that level again, Trump gets a 21.2% swing between those two similar lines, so if this vote subtraction benefited anybody, it benefited Trump.

So, anybody know the story?

line    state           timestamp               d_ratio r_ratio votes
5966    pennsylvania    2020-11-04T02:10:32Z    0.65    0.342   1087107
5967    pennsylvania    2020-11-04T02:11:01Z    0.643   0.349   1106477
5968    pennsylvania    2020-11-04T02:13:11Z    0.641   0.351   1111586
5969    pennsylvania    2020-11-04T02:14:32Z    0.592   0.399   871782
5970    pennsylvania    2020-11-04T02:14:56Z    0.59    0.402   877724
5971    pennsylvania    2020-11-04T02:16:43Z    0.586   0.406   888907
5972    pennsylvania    2020-11-04T02:17:03Z    0.689   0.301   774021
5973    pennsylvania    2020-11-04T02:17:32Z    0.685   0.306   781428
5974    pennsylvania    2020-11-04T02:18:59Z    0.677   0.299   1288475
5975    pennsylvania    2020-11-04T02:19:33Z    0.677   0.299   1288604
5976    pennsylvania    2020-11-04T02:21:59Z    0.664   0.312   1325632
5977    pennsylvania    2020-11-04T02:22:45Z    0.627   0.363   739443
5978    pennsylvania    2020-11-04T02:23:32Z    0.629   0.361   779178
5979    pennsylvania    2020-11-04T02:24:00Z    0.625   0.364   786831
...
6003    pennsylvania    2020-11-04T02:41:23Z    0.532   0.455   1125611

Dataset source

4
  • 6
    As someone who's run an election night reporting website, I can offer you the possibility that nobody knows. It could be a problem with the NYT team, a data middleman, or a county clerk report. If it's a county clerk that say, temporarily accidently entered in all ballots instead of in-person counts, they might not know the NYT even knew of their mistake. These kinds of things happen all the time when numbers are flying in fast and furious and your traffic is record-setting, putting enormous social pressure on the clerks to get things updated as fast as possible.
    – dandavis
    Commented Nov 12, 2020 at 7:43
  • A fairly mundane possibility, with no evidence, so a comment. There is some kind of local collection, say at the level of 10 machines doing counts. When this gets "full", somebody walks it over to a larger collection area. You expect that sometimes, one group goes faster than the other. For truly mundane reasons like somebody had to unclog a card reader, or put paper in a printer, or some such thing. Then at the next reporting period there is a backlog that makes the next time bin look bigger.
    – puppetsock
    Commented Nov 12, 2020 at 20:45
  • Note that those NYT files are apparently obtained by "concatenating" (adding) the precinct-level updates, which are available separately e.g. static01.nyt.com/elections-assets/2020/data/api/2020-11-03/… The latter don't have clear timestamps alas, even though they contain some kind of historical updates for precincts. Commented Nov 22, 2020 at 8:06
  • Also of note, some reporting errors during election night originated with press' systems without any corresponding event in the (even unofficial) boards' counts, e.g. politifact.com/factchecks/2020/nov/10/eric-trump/… Commented Nov 22, 2020 at 20:29

3 Answers 3

5

Possibly this, or at least related to this human error at Edison (whose precinct-level data NYT aggregated in their own software):

On Twitter and Facebook, thousands of users shared a false claim that a CNN video from election night is evidence of the supposed vote-stealing operation in action.

It’s not.

In the video, vote totals for Pennsylvania displayed on screen by CNN can be seen going up for Biden by 19,958 as they go down for Trump by that same amount.

But the error was actually a simple mistake by a research firm that gathers results data for CNN and others; it wasn’t a change in the reporting by election officials. (It’s worth noting, though, that those vote tallies are unofficial tallies, not final, official results.) The mistake was corrected about an hour later.

Edison Research, which provides vote tabulation data to CNN and other networks, said that the moment of Trump dropping by 19,958 was caused by a brief reporting error on its end.

Rob Farbman, Edison’s executive vice president, told us that Edison did receive the correct vote totals from Armstrong County in Pennsylvania through a state feed — which at that point in time was 24,233 votes for Trump and 4,275 votes for Biden. But a team member, while scouring individual county vote totals, then mistakenly entered the county’s totals backwards — 24,233 votes for Biden and 4,275 votes for Trump.

A CNN source also confirmed that the matter came down to vote totals in Armstrong County being briefly transposed in the feed provided to the network, but that the accurate figures were soon restored.

“It was simple human error,” Farbman told us by email, noting the mistake was corrected about an hour later.

The time-length of the anomaly in NYT data is about 31 minutes, which somewhat consistent to how long it took to fix this (other) data problem at Edison.

1
1

A number of media sources (including the New York Times) switched from the Associated Press to Edison Research for their exit poll data. I've searched around for Edison's data descriptions without much luck, but I suspect that 'votes' in this case is a projection: estimated votes based on some combination of exit polls and piecemeal returns from counties. Since the time period we're taking about is shortly after the polls closed, we can expect that each new chunk of data received would have a disproportionately large impact on projections (in the sense that if only 1% of the population has been surveyed, and a chunk of data the leans heavily towards (or against) Biden is added, that chunk will have an outsized influence on the next projection). These projection errors would naturally diminish as greater proportions of the population report in, but early on they might show some artificially large swings.

In other words, it isn't that the vote count dropped by 340,000, it's that someone's projection is magnified by current, momentary trends to make the assumed vote count look like it dropped. It's Bayesian logic: if you role three snake-eyes in a row on a pair of dice, then all of a sudden snake-eyes seem extremely likely (at 100% probability); roll as eight after that, the perceived probability of snake-eyes drops precipitously to 75%, and will ultimately decrease all the way to its 'natural' probability of 1 in 36. Early election results show the large statistical variability of polling. Later results (by the law of large numbers) pull back towards the actual vote share.

EDIT: 11/12

Looking at the following image of vote count over time provided by CDJB in comments (I'm assuming she downloaded that data and ran it through Excel or SPSS), I've revised my view: enter image description here

This steep 'V' is what I would expect from a glitch in a distributed network. Distributed networks hold data across multiple servers, with cross-checks to make sure that all servers are aware of changes in other servers. If one server goes down, large chunks of data become unverifiable by other servers; those other servers then mark that data as suspect, and hide it from totals until the absent server comes back on line. Nothing is gained or lost; distributed databases are designed that way to prevent data corruption.

At any rate, the volume of change speak to a computational issue, not a ballot count issue. Ballot counting is too slow to effect that level of change over that time-frame. Edison may step up and explains precisely what happened, but I think we can safely cast the into the 'glitch' category.

7
  • 3
    I don't think this is correct - see this image of the votecount variable over time. It looks like it's just a brief glitch in data entry. I would add an answer but I don't see any reporting on this so I can't source it. Do you have any source for your interpretation of these vote counts as projected vote counts?
    – CDJB
    Commented Nov 11, 2020 at 12:46
  • @CDJB: No, just rational extrapolation from what I know about the process behind the scenes. If I had seen that graphic you posted, I'd probably have gone with your analysis that it's a data glitch. With that 'V'-shaped drop, it's more likely a problem with their distributed database than a data entry issue: e.g., one of the servers might have gone offline and come back on, temporarily ghosting a whole lot of database entries that couldn't be verified by the other servers. If you don't want to make an answer of your own, I'm happy to take that image and rewrite mine.. Let me know. Commented Nov 11, 2020 at 18:44
  • 2
    call me stupid, but I don't understand this answer or or it is an answer to the question. OP asks "why the counted vote value dropped?" and you answer with probabilities and so on? why would the column be a projection? do you have a source for that?
    – Federico
    Commented Nov 12, 2020 at 14:58
  • 2
    @Federico: The point is that it's wrong to assume that we are seeing something happening in the actual ballot reading. What we are seeing is almost certainly changes in reporting, where reporting is a complex task involving the aggregation and distillation of many wide-ranging events over some type of technology. When we talk to someone on a Zoom call we we are actually seeing a digitized, encoded and transmitted image, right? So why would think we're looking at actual 'ballots' here, when that information is obviously accumulated, processed, and transmitted before it reaches us? Commented Nov 12, 2020 at 15:49
  • 1
    The two lines 5968 and 5969 have the same "edison" data source in the json file, so the decrease wasn't due to switching sources. Commented Nov 22, 2020 at 7:08
-2

I found some additional time stamped precinct level data that may be of interest. Time stamps are in the .JSON links and Power BI can be used to compile this data into a useable format, as seen in this video https://www.youtube.com/watch?v=7m7-ku181Ik&t=336s. Godspeed!

PA- https://pastebin.com/5xBeaQzh GA- https://pastebin.com/CD05B8fw FL- https://pastebin.com/ai9skssU

4
  • 2
    I encourage you to take this data and use Power BI to compile it into a usable format, and then post your results. Assuming it doesn't just duplicate the graph in the other answer, anyway.
    – Bobson
    Commented May 15, 2022 at 6:45
  • This is the NYT .JSON real time PA precinct vote tally updates from NOV 3 2020 through NOV 11 2020. This is the data defamed Edward Solomon use to claim that the reported percentages for Trumps votes did not meet the natural laws of standard deviation and number distribution for naturally occurring numbers. Based on my credentials which involve critical risk assessment and root cause analysis for naval submarine systems, the math is irrefutable. If this data is accurate the implications are astronomical.
    – Analog
    Commented May 15, 2022 at 19:07
  • 2
    As I said, you should actually perform the analysis and post your results. Linking to the raw data is good support for anyone who wants to dig deeper, but most people here don’t have the mathematical or programming knowledge to actually do anything with the raw data.
    – Bobson
    Commented May 15, 2022 at 19:51
  • I do not have the expertise to compile all of this data, I am a mechanical systems engineer not a programmer, that is why I provided it here to the questions originator. I have hand checked the entries Solomon used to support his claims however, and its a perfect match.
    – Analog
    Commented May 15, 2022 at 21:42

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .