After the release of the Richmond Park constituency poll on Friday 28th October, a couple of eagle-eyed polling enthusiasts noted the term “effective base” size and may have conflated the term with the actual number of responses to the poll.
To clear things up, 543 Richmond Park residents were surveyed for the poll, 419 gave a voting intention after the first time of asking, and a further 39 gave the way they were leaning if they were undecided (459 in total).
To be fair, effective base sizes are not a simple concept, and I feel that we should offer at least a bit of an explanation about what they are, why BMG publishes them and why we feel that the Richmond Park poll example might be a misleading use.
What does “effective base size” mean & why do we publish them?
Most statistical tests and margin of error calculations assume that data, like those for the Richmond Park poll, has been randomly sampled without replacement, but also that response rates will be broadly even across all groups. However, there are many situations where this is not possible, for instance; where certain groups are harder to reach, or refuse to participate, or a very quick poll. For this reason, in order to make the final result more ‘accurate’ of a population that we’re interested in polling, in this case Richmond Park residents, we adjust (weight) the data to known population statistics. All BPC polling companies do this.
However, what most of the public don’t realise is that when we’re calculating the margin of error for a statistic, or doing a statistical test, these adjustments must be taken into account. In short, weighting the results has a similar effect to that of simply reducing your overall sample size. For instance, a poll of 1,000 people that is adjusted severely might have a similar margin of error to a poll of just 300 people with relatively light adjustments.
The effective base then, is an estimate of the sample size that would be needed, if using a perfectly sampled set of data, without any adjustments, was used to achieve the same level of precision.
As far as I am aware (though could well be wrong), most polling companies don’t tend to publish the effective base sizes of their polls (probably to avoid confusion), but BMG releases this information as standard on most (if not all) of our public polls.
For the Richmond Park poll, the results have been weighted so they are more representative of the entire population of the Richmond Park constituency, which is good for a snap-shot of the entire constituency, but not ideal for voting intentions. There are no official statistics for voters. The UK census doesn’t produce population statistics broken down by those on the electoral register, and certainly not by people who are ‘likely to vote’ at any given by-election.
So in order to produce an accurate estimate of voting intentions in a constituency, we first have to adjust the results to be representative of a wider population, which is likely to include many people who will not vote, and then have to adjust these results back down again for the voting intention questions in order to base the result on a population of people who we feel are likely to turnout.
Making all these adjustments hammers the effective sample size of polls, but we feel make our estimate more ‘accurate’.
Why the effective bases might be a little misleading for the Richmond Park poll
Broadly speaking, publishing the weighting efficiency of national polls of the general public is a good and appropriate exercise because most of the national statistics that we adjust to are known and robustly collected from official results/surveys like the General Election or the Census.
But producing voting intentions that are relatively accurate is more difficult because turnout intentions shift and the demographics information of those that tend to turnout aren’t so well profiled.
Each polling organisation has their own particular approach of adjusting the results to best reflect how they feel voters will turnout, some lean more towards self-reporting, others more towards fixed adjustments. We do a little of both. But this means, unfortunately, that the effective base calculations also include BMG’s “factoring” for the likelihood of a respondent voting, which are based on a combination of self-reported likelihood and whether respondents have voted previously (thought to be a very strong predictor of future turnout).
However, effective base calculations are not designed to account for these factorings meaning that they are having an additional, but perhaps misleading, reductive effect.
Also, it should be noted that the poll was actually reasonably balanced across all Richmond Park population groups; including ward, gender, age [except the young], Leavers and Remainers, and those who voted at the General Election in 2015, apart from those who didn’t vote at the General Election and EU Referendum.
Over reporting of General Election and EU Referendum turnout is normal for randomly sampled phone polls and depending on what you’re looking for, can be a good, or bad thing. In this instance over-sampling voters, might not be that bad for producing voting intention statistics.
The table below shows that the proportion of those sampled in the poll who said they did not vote at the General Election in 2015 is much lower (by 2.5 times) than the actual proportion from 2015, but all the other groups are relatively balanced. It also shows that although we weight them up by around 2.5 times, a majority of this group say that they are unlikely to vote in the by-election, compared with the vast majority of those who had voted previously. So we’re adjusting them up, only to bring them down again later.
Finally, the ultimate value of any adjustment of polling results post-fieldwork is in ‘more accurate’ (or less ‘biased’ in statistical speak) results. Therefore, if the unweighted results aren’t that different from the weighted results, it is fair to ask what the overall utility is of weighting them in the first place, especially if the effective base is likely to be significantly reduced. In fact, if the raw unweighted results and the weighted results tell practically a similar story, perhaps that adds even greater strength to the overall results, suggesting that no noteworthy voting bloc has been underrepresented in the poll.
So for the avoidance of any doubt, we’ve run tables again, now showing both the weighted and unweighted results. The full tables can be found below, with a sheet that shows readers the differences in final statistics.
For a quick glance readers can interrogate the voting intention results by weighted and unweighted results for the Richmond Park constituency poll below, to see how small effect actually is.
We will continue to publish effective base sizes
I still think there is real value in publishing the effective sample size (or effective base) of polls, especially for polling aggregators and enthusiasts, but also for reasons of transparency. Just because a poll has a low effective sample size doesn’t mean it is inaccurate, but people should know how much confidence they should place in the final estimates.
In this case however, I feel that publishing the effective base size without a qualification was misleading, and therefore easily misinterpreted.
We’ll have a think about how to caution the interpretation of effective bases for particular polls in the future, but a balance needs to be struck in making them easy to interpret and less time consuming to produce.
For now, I hope this goes some way to furthering people’s understanding of our polling results in Richmond Park and other polls we conduct.
Fieldwork dates and methodology can be found here.
A full breakdown of these results including unweighted and weighted results can be found here.
For a more detailed breakdown of results from this poll, or any other results from our polling series, please get in touch by email or phone.
0121 333 6006
Dr Michael Turner – Research Director & Head of Polling