Wednesday, March 12, 2014

On The Hickman Analytics Survey of Arkansas Why I Misread It

Just a few days ago, a poll of Arkansas by Hickman Analytics was published that had some interesting aspects to it.

Let me start off by confessing that some of my tweets the morning this hit the wire featured me misinterpreting what their pdf was presenting.

They went about asking some questions regarding the Senate race between Pryor and Cotton. But before they presented those results, they showed that their sample was comprised completely of people who said either that they would definitely vote in this election, or that they would probably vote in it. Intriguingly, they presented some of the results with the definite voters as their own breakout.

For example, when they asked the two-man horserace question (meaning, offering just Pryor and just Cotton; hereafter referred to Q5 as in the pdf), they found that definite voters chose Cotton by 9 points, 51-42%. However, they said likely voters were split evenly between the two candidates.

This is where my confusion started. Earlier in the pdf, they showed that their sample was comprised of 77% definite voters, and 23% probable. Obviously, the definite voters in the breakout on Q5 are those definite voters. But when they said likely voters, did they mean "all of the probables plus all of the definites", or did they mean "just the probables."?

The lack of an overall topline result on top of those two breakouts for the question suggests that the answer is "all of the probables plus all of the definites." In retrospect, more than suggests. But that is not how I took it. I immediately read it as that they simply used the words probable and likely interchangeably. (Spoiler alert: I was wrong)

Let me explain why I jumped to that conclusion, because I think it still speaks to some oddness in the data. To do so, let's revisit Q5, which showed Cotton leading 51-42% among definite voters and them tied at 46% among likely voters.

Since the definite voters comprise more than three quarters of the total sample, for the likely voter breakout to be inclusive of the definite voters, then the "likely but not definite" voters would need to be somewhere along the lines of 59% for Pryor, 29% for Cotton (Pryor +30!).

That is a fairly astounding gap, and feels like the kind of thing that would be noted in the writeup since it is such a surprising tidbit.

The math that led me to the 59-29% breakout assumes that the integers presented as the results by Hickman are precise. However, rounding can make a difference. But could rounding make the gap between the two candidates look less astounding among these probable-but-not-definite voters? The values that would most bring the gap between the candidates as low as it could go among this cohort but still round to the integers they presented would be 50.5% for Cotton among definites to 42.4% for Pryor, and among likelies 46.4% for Cotton and 45.5% for Pryor. With that as the assumption, the gap between the two among probable-but-not-definite voters drops, but still is huge, at Pryor +23.

I suppose it is possible that this cohort, while being not quite sure that they would actually vote, could yet be so overwhelmingly tilted toward Pryor as to bring the Q5 result to a tie overall. Stranger things have happened.

But there were other questions where they had results listed for both likely and definite voters. One put Pryor head-to-head against an unnamed Republican (Q3, asked before Cotton was mentioned). Here, they found the Republican was favored among definite voters 50-38%; this is quite similar to the Q5 definites totals of 51-42%. For this question, however, there was a much more modest swing when going from definites to likelies; 47-39%. Doing the same math as above, this suggests the probable-but-not-definite cohort would support Pryor 42-37% (Pryor +5 over an unnamed opponent).

I suppose it is also possible that this cohort, while being not quite sure that they would actually vote, would be modestly for Pryor (Pryor +5) against an unnamed Republican while being overwhelmingly for him when that Republican is named (Pryor +30 against Cotton). Perhaps Cotton is simply viewed by this cohort as being completely radioactive.

But I have not mentioned Q2. This question was all about name recognition and favorability. Here we find that just 28% of the sample has heard of Tom Cotton and has an unfavorable opinion of him. No signs of radioactivity there. And if one goes down to the bottom of the pdf, where the crosstabs live, one can see that Cotton's unfavorability must actually be lower among the probable-but-not-definites than it is with the definites; it is 31% for the definites but just 28% for the total.

That is why I interpreted the "Likely" breakouts in the pdf to be synonymous with "probably vote" and not being inclusive of the definite voters. I had not done the math, but just a quick look at the numbers told me about what I presented just now.

But when you get down to the bottom of the pdf to the crosstabs, then it is clear I did not read it right, nor had I read down to the crosstabs when I had tweeted that morning. That was sloppy on my part.

The weirdness in what it implies about the probable-but-not-definite voters remains, though. There are only 92 in this cohort according to the crosstabs, which works out to a MoE of around +/- 10% for it. Usually, though, if it was going to be towards one end or the other, the answers would be consistently towards one end or the other, whereas here it does not seem to be the case. Perhaps this cohort had an unusual number of number mashers, and the rotation of the choices causes their results to be inconsistent. Or, maybe they really just believe things in the percentages I worked out above. Either way, with that small of a subsample, it should not have shocked me that they did not break it out separately.

Either way, the results of this poll hardly seem good news for Pryor. His generic re-elect number puts him at 39% among likely voters, and he gets only 1 point more in a named horserace with all of the candidates. He gets up to 46% if respondents are limited to him and Cotton, but among the most definite to vote, he loses there by 9 points. All of this is true, with his main opponent having 33% of the respondents insufficiently familiar with him to offer an opinion on how much they like him. All of this true, with 85% effective recognition for him. All of this true, with him being the incumbent. All of this true, in a state where 61% of likely voters identify as conservative.