24

I went on to this website

https://projects.fivethirtyeight.com/polls/president-general/2024/national/

to see recent presidential polls because I am interested in having an idea of who would win the presidential election. Showing Trump+X or Biden+Y leads me to understand that Trump is X more (units) likely to win, or that Biden is Y more likely to win.

However, when I look at the polls, I see that most of their methodology is to sample a diverse range of ages, race, income, and gender. But they don't divide them into state electoral college. So the reports might give an accurate prediction of the popular vote, but given that it is not that uncommon to win the popular vote but lose the election, the reports give not that good of a prediction regarding who will win the election. I assume that most people who read those polls are more interested in learning who is more likely to win an election than win the popular vote.

Presumably this shouldn't even be that hard to do. The study logs the age, race, income and gender, I assume that it isn't that much harder to log the state as well.

So why do so many polls simply aggregate the voters rather than segment into states? Is a generic population based poll that interesting?

1
  • 2
    It could be hard to determine what state the people that they are polling live in due to changes in assigning phone numbers.
    – Joe W
    Commented Jan 31 at 20:17

4 Answers 4

41

Initial Considerations

why do so many polls simply aggregate the voters rather than segment into states? Is a generic population based poll that interesting?

One reason to do a national poll is that it is a lot cheaper than doing a state poll with a sufficient sample size in every battleground state.

But, as the question notes, what you actually care about when push comes to shove, is the state by state outcome in the electoral college, not the popular vote that national polls are trying to measure.

Also, of course, doing a poll in every state is a waste of money. You don't need polls to tell you how West Virginia or the District of Columbia are going to cast their electoral votes. There are only about six to ten states in the United States that are seriously in play in the 2024 Presidential election.

enter image description here

Specifically, Biden starts the general election with 226 likely votes in the Electoral College and Trump with 235. To get to the 270 needed for victory, one of them will have to harvest some of the 77 votes up for grab in half a dozen states (shown with the number of EVs of each): Arizona (11), Georgia (16), Michigan (15), Nevada (6), Pennsylvania (19), and Wisconsin (10).

You can disagree over which state are in play at the margins and tweak that list a little, but the basic concept still holds. We already know how most states are going to cast their electoral votes in the November general election.

So, what kind of tradeoff is there between the cost savings and directly measuring what you want to know makes sense?

State polls and national polls are closely correlated

Ultimately that comes down to how tightly correlated individual battleground state polls are with the national polling.

One of the things that Nate Silver really innovated in, when making predictive election outcome models based upon polling for his 538 blog (since taken over by big media companies), to which the question cites, is that polling in individual states and national polling is strongly correlated (so are deviations from the outcomes predicted by those polls).

If a candidate gets more popular in Ohio, they tend to get more popular in Pennsylvania and by a similar magnitude, even if the absolute levels of support in each state is different.

Nate Silver quantified that by looking at old polling data, and while there are exceptions to that principle (such as home state advantages for Presidential candidates from a particular state), it turns out that polling in battleground states and national polling are pretty strongly correlated. (Another side effect of this fact, not further discussed in this answer, is that this greatly increases the uncertainty of any polling based prediction, because you can't assume that the results for each state are independent and that the law of averages will smooth out unpredictable differences between the real results and the polls.)

This is hardly surprising. In a Presidential election race, everyone in the nation is considering the same candidates, and the lion's share of the news that people are basing their opinions upon comes from the same collection of national news sources.

People everywhere in the nation are also basically evaluating Presidential candidates on the same national policy issues and on the candidate's as people; information which is the same everywhere. While "all politics is local", nobody expects the President to be front and center on dealing with the potholes in your local main street, or the dispute over how to zone the defunct golf course north of town.

It helps that in the current election, battleground polls are close to 50-50 and so is national polling, something that didn't have to be true mathematically. So, we know that battleground states are not really outliers relative to national polling and you don't have to adjust for a different partisan balance or lean for the battleground state versus the nation as a whole.

Quantity v. Quality

Obviously, the best information is polling from battleground states.

But because national polls are cheaper and capture a large share of the data that battleground state polling would, due to the strong correlation between battleground state polling and national polling, there is a tradeoff.

You can do more national polls on a more frequent basis than you can battleground polls, for the same resources. Having more independent polls reduces the risk of pollster specific systemic error, making your overall polling average data more robust. More frequent polling also reduces error due to polling data being outdated as public opinion shifts over time.

So, while battleground polling, that is conducted less frequently by fewer pollsters, does have benefits because they are directly measuring what you really care about, the greater number of pollsters and polling frequency of national polls due to their lower cost counterbalances those benefits. This can make using national polling a tolerable tradeoff, because national polling turns out to be a quite good proxy for battleground polling since it is highly correlated with it.

Battleground polls matter more later in the race, national polls are good enough earlier on

Finally, national polling is less problematic early on, because early on, no poll can accurately capture the ultimate question you want to answer, which is who will win on election day. This is because voter preferences change as they approach election day.

So, as long as the general trend lines of national polling and battleground polling are very similar, national polling is just as good in the early part of the election cycle to capture the trend lines over time that will show you in which direction voter preferences are likely to shift between now and the election, or how voters are reacting to particular current events.

As you get closer to the election, the battleground polls are more accurate gauges of actual election day behavior. At that point getting the absolute values of election day outcomes right, rather than just the trend lines, matters. So, as you get closer to the election day battleground polling needs to get more weight in your analysis of the state of the Presidential election race than it does early on.

Binning results by state doesn't work

Presumably this shouldn't even be that hard to do. The study logs the age, race, income and gender, I assume that it isn't that much harder to log the state as well.

Not really. The relationship between sample size and accuracy in statistical sampling is non-linear. You can do a tolerably accurate national poll with 300-1000 respondents.

For sake of argument, let's say that 10% of the U.S. population lives in battle ground states.

This means that your battle ground state sample is 30-100 respondents from your national poll, which has a profoundly greater sampling error uncertainty than a 300-1000 respondent poll, to the point of basically being garbage. And, you really have to spread those 30-100 respondents over six states which leaves you even smaller subsamples (i.e. about 5-17 respondents per battleground state).

At a 200 person sample you are at +/- 6.9 percentage points of sampling error. At a 100 person sample you are at +/- 9.8 percentage points. At a 30 person sample you are at +/- 17.9 percentage points. At a 17 person sample, you are at +/- 23.7 percentage points. Less than that, and it is even worse. And, don't forget that in addition to your sampling error there is always at least some sample selection bias systemic error that is the same regardless of sample size.

These combined polling uncertainties are so large that they render the polling data completely useless in a reasonably close contest like the 2024 Presidential race.

Of course, if you bin by state, you can also pretty much throw away the other 90% of your national sample data, broken down by state, because we know, without doing any meaningful polling, which candidate the non-battle ground states are going to support.

A few technical details

  1. Sample size matters, sample population doesn't matter.

Statistical sampling error mostly depends upon the size of the sample, regardless of the size of the population being sampled, so long as the population being surveyed is much larger than the size of the survey sample and so long as the proportions of support for each candidate are roughly similar.

For example, suppose you have a national survey with 1,000 respondents that shows 50% support for Trump and 50% support for Biden in a head to head race. The one standard deviation margin of error is about 1.6 percentage points.

If you do a survey of Wisconsin respondents and the support for Trump, say 45%-55%, with the balance supporting Biden, and you survey 1,000 people, the one standard deviation margin of error is also about 1.6 percentage points.

If the support for each candidate is roughly similar, and the sample is much smaller than the population sampled, the statistical sampling error depends entirely on the sample size without regard to how large the population you are sampling is, so long as the sampled population is roughly 100,000 people or larger.

So, surveying six battle ground states separately costs six times as much as one national survey.

  1. There are diminishing returns to using a larger sample size.

Also, while the cost of doing a survey is proportional to the number of respondents, increasing the sample size by X% reduces the sample error by much less than X% and the bigger your sample, the greater the diminishing returns of a larger sample size. The chart below illustrates this point:

enter image description here

For example, you need about 1200 respondents to get a +/- 3 percentage point sampling error when support is split roughly 50-50 between the candidates. But, to cut that statistically sampling error in half to +/- 1.5 percentage points, you need about 6,500 respondents.

  1. Multiple independent surveys reduce systemic error which is just as important as sample size error.

Finally, at some point it is pointless to reduce your sampling error any further because any poll has two sources of uncertainty.

One is sampling error which can be calculated precisely with pure math as I've done above.

The other is systemic error that flows from bias in how you collect your sample. Systemic error is the same no matter how big your sample is because it flows from picking a sample that is proportionately skewed toward one candidate or another, perhaps because your likely voter model of who to survey is wrong, and perhaps because supporters of one candidate are more likely to respond than supporters of another candidate (or are more likely to respond untruthfully).

In a typical political poll your systemic error is similar in order of magnitude to your sampling error. It is almost impossible to know exactly what a survey's systemic error is except from the track records of polling firms in hindsight. (But, if you do know each polling firms historical track record, the accepted way to weight their results is with a factor of one divided by the difference between their historical poll results and the actual election results, which is called inverse error weighting.)

Also, if you make your sampling error arbitrarily small by using a larger sample, you only get part of that benefit because the systemic polling bias will swamp the reduced sampling error. This is because when you combine two sources of margin of error, the large number always dominates the combined error from both sources of uncertainty taken together.

For example, if you (without knowing it) have a +3 percentage point systemic bias in your sample for Biden, even if you can get your sampling bias down to +/- 1 percentage point with a 10,000 person sample, your bias in selecting the sample will basically destroy the benefit of your reduced sampling error because you surveyed 10,000 people instead of 1,200 people.

As a practical matter, the best way to reduce systemic error is by averaging polls from multiple different pollsters of the same thing in a way that gives weight to their past accuracy.

While every pollster has some systemic bias in their sample, generally speaking, they will not have the exactly the same sources of systemic bias, so averaging the polls tends to make the biases of individual polling firms balance out and give you a less biased final average.

So, if you have the money, having six different firms do the same national surveys, you will wipe out a lot of the sample bias systemic error, and will still have a decent enough sampling error, for a lot more precision in the combined systemic and sampling error, than having a single survey from each battleground state with no reduction in systemic error and similar sample error.

On the other hand, having six firms do one 1200 person national survey each is much better than having one firm do a 7,200 person national survey, because with six firms you'll greatly reduce sampling bias in the combined result which will far outweigh the reduced sampling error from having a 7,200 person sample instead of a 1,200 person sample.

1
  • Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on Politics Meta, or in Politics Chat. Comments continuing discussion may be removed.
    – Philipp
    Commented Mar 5 at 9:24
7

There are many polls for individual states, which you can see in 538 by clicking the dropdown box under "State":

enter image description here

Examples: Florida:

enter image description here

Pennsylvania:

enter image description here


And, importantly, when it gets closer to the elections, 538 (and also many others) will give an election forecast/model that "relies mainly on state polls" (see e.g. their 2020 forecast).


In the national/international media, it's often simpler (and perhaps more interesting for most viewers/readers) to just quote the national polls. This may have given you the wrong impression that there are mostly only national polls and few/no state polls.

Indeed, in 538's 2020 forecast that I quoted from above, they state,

Our model relies mainly on state polls, which it combines with demographic, economic and other data to forecast what will happen on Election Day. If you want to see a snapshot of what voters are thinking right now — with no fancy modeling — check out the national polls.

Often what consumers of media want is a quick snapshot. And for this, the national polls is probably better than a complicated listing of the polls in 10 different swing states.

5

These polls are not intended to be a very accurate prediction of who will win. Except in landslide cases, it's somewhat pointless to do that a year away from the actual election anyway, and substantially more expensive to poll with the corresponding margin of error at state level. (In fact, predicting state-level results just a few days before the election can be a challenge greater than predicting national-level/popular-vote results, many months ahead!) The battleground states polls, which are also ran sometimes are generally more insightful, if your goal is to suss the outcome of the election, in close cases. But note that the latter have e.g. around 5,000 participants (in 7 states), whereas the national polls seldom reach that total [sample] figure, and more typically hover around 1,000. (ohwilleke's answer explains [in part] why you can't do well with much fewer people per state. There's also the complication of predicting state-level turnout by social strata, which may exhibit differences between the states--see e.g. this paper.)

So what are the nationwide polls useful for, nonetheless? A couple of things:

  • trends: you can usefully compare these polls to each other, over time (Ideally, you'd have to use the same pollster though, because of bias etc.)
  • predicting landslides: sometimes these do happen, even if not recently (except intra-party).

The presidential job approval rating polls are somewhat similar in scope, by the way. In some cases, you don't even need to know who the [exact] opponent is [going to be].

Besides those purely data-gathering purposes, polls [of all kinds] are also sometimes used in an effort to influence the ultimate result.

0

So the reports might give an accurate prediction of the popular vote, but given that it is not that uncommon to win the popular vote but lose the election, the reports give not that good of a prediction regarding who will win the election.

First off, it's only non-uncommon in recent elections, historically it was very uncommon. But because we know of recent trends, country-wide polling is still useful because we can take those trends into account. That is, assuming this election is like recent elections, if lots of polls near election time show the national average to be 50-50 that's very good news for Trump and bad news for Biden (because Biden probably has to beat trump by ~2% in the popular vote to be competitive in the electoral college).

Of course, nearer to election time there will be lots of state polls in the places that matter, and those polls will be what is weighted most heavily in the prediction algos. But it's still good to get national polling for trends, error detection / state-wide polling improvement, and to also ask questions that are of national interest, such as the presidents approval rating.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .