Then and Now: How Online Conversations About Voter Fraud Have Changed Since 2016

Authors: Cooper Raterink, Gene Tanaka, Carly Miller, Renee DiResta, Stanford Internet Observatory; Jack Nassetta, Graphika

Contributors: Kate Starbird, Andrew Beers, University of Washington Center for an Informed Public; Ben Nimmo, Graphika


In our work investigating voting-related misinformation over the past two months, Election Integrity Partnership researchers have observed several recurring themes alleging voting impropriety in the 2020 election (ballot harvesting, mail dumping, voter intimidation, electronic voting machine hacking, among others). Given what appeared to be an increase in the prevalence of delegitimization narratives in 2020, we examined how conversations leading up to this Election Day compare to those of 2016, and how the platform policy environment has evolved.

Comparing narratives between the two election years is challenging, as the conditions around the 2020 election are quite distinct. The COVID-19 pandemic has led to more time on social media and more mail-in voting. Additionally, while Black Lives Matter protests were certainly part of the landscape in 2016, extensive social unrest and recurring protests around racial justice are a far more significant dynamic in 2020; the protests themselves have been incorporated into voting-related misinformation. The 2016 election, of course, had its own distinct social and political circumstances, and its own collection of scandals — among them the Access Hollywood tape, and the Russian military intelligence hack of the Clinton campaign that led to the Wikileaks dumps.

However, examining the relative prevalence of narratives related to voting process specific conversations — such as those alleging fraud — reveals some interesting shifts. After analyzing Twitter conversations in 2016 and 2020, we found evidence which indicated a significant increase in conversations about election fraud and delegitimization in 2020. 

Key Takeaways:

  • Data from 2016 and 2020 shows a dramatic increase in Tweets mentioning  mail-in voting fraud in this election.

  • There has been an increase in discussions of voter fraud among both right and left leaning communities, however, while right leaning communities often allege incidents of fraud, left-leaning communities often dispute those allegations.

  • Since 2016, platforms including Twitter, Facebook, and YouTube have published and updated their policies in response to misinformation and disinformation, and have made efforts to take action against these threats by labeling and moderating content.

Historical Context: Social Conversations During Elections 2016 and 2020

To examine topics of conversation in 2016 and 2020, we selected a random 250-user sample from a group of approximately 4,500 influential Twitter accounts cultivated from a large Twitter dataset gathered by the University of Washington’s Center for an Informed Public (CIP)* on the 2020 Election. This dataset, which we will call Dataset 1, categorized users along ideological and sometimes behavioral lines — conservative, progressive, extremely densely-connected follow-back communities, etc. — by automatically clustering accounts that were retweeted by similar sets of users. We gathered all tweets produced by this sample of 250 users during comparable 1-week timespans during the 2016 and 2020 elections (Oct. 4 - Oct. 11, 2016, and Sep. 30 - Oct. 6, 2020). Then, we identified the most common n-grams (sequences of words) from these tweets, and evaluated the relative prominence of topics of misinformation between 2016 and 2020.

* Details of Dataset 1: The CIP used the Twitter API to stream tweets on a variety of voting related terms and candidate names. To be included as a node in their database, a user must have been retweeted by 20 distinct accounts at least 3 times each. Two users in the database are linked if the same 20 accounts retweeted both of them, and the strength of links increases when more accounts retweet both users. Clusters were automatically identified using the Louvain community detection algorithm.

While the top-40 quadgrams for both 2016 and 2020 were primarily focused on the election-related headlines du jour, they succeeded in surfacing some major themes of social media conversation. For the analyzed week in 2016, the quadgram chart below shows that narratives questioning Donald Trump’s suitability for the presidency were common: prominent topics referenced his lewd behavior towards women (“donald trump lewd comments” and “trump locker room talk”) and how other Republican politicians were not fully supportive of his bid for President (“calls trump step aside” and “late republicans replace trump”). Similarly, the 2020 set of quadgrams primarily focus on recent material: Donald Trump’s bout with COVID-19, his Proud Boys comment during the first debate, and the appointment of Amy Coney Barrett to the Supreme Court. However, some topics that appeared in 2016 have continued to hold the public interest four years later:  “Donald trump tax returns” were a topic of conversation in both elections, and the same goes for “fact check[ing]” the presidential candidates. The 2016 quadgrams referencing scandals involving the Clinton family (“foundation division clinton crime” and “division clinton crime family”) are evocative of the claims about Hunter Biden and the alleged corruption of the Biden family currently prevalent in the 2020 campaign.

Figure 1: Most prevalent 4-word sequences (quagrams) in Twitter Dataset 1 for 2016 and 2020, broken down by an estimate of the user’s political ideology.

Conversations Around Voter Fraud: An In-Depth Comparison

While the n-gram analysis succeeded in showing an interesting one-week snapshot of the major stories being spread on Twitter in 2016 and 2020, the EIP is most concerned about voting-related misinformation, including false or exaggerated claims of fraud that could be used to de-legitimize the process and outcome. Perhaps fortunately, concerning terms like ‘voter fraud‘ or ‘ballot harvesting’ do not appear in the top-40 quadgrams shown above from either year. However, to examine the prevalence of fraud-related commentary during the two elections, we conducted a keyword analysis of Dataset 1. 

We first constrained the analysis to election-focused conversations by filtering for tweets which contained terms specific to the election or election processes, such as “election,” “ballot,” and “vote.” Then, we gauged the prevalence of search terms common within prominent voting-misinformation narratives that the EIP has observed. These terms included “ballot harvesting,” “suppression,” “fraud,” and a lexicon of fraud-related words comprised of a series of common search terms such as “undermine”, “illegal” and “illegitimate.”  We also tracked the prevalence of tweets mentioning the word “mail,” as well as those mentioning both “mail” and any fraud-related words. The word “mail” and terms related to delegitimization and fraud, we found, appear to be significantly more prevalent in 2020 than they were in 2016.

Figure 2 shows the prevalence of keywords or keyword areas in Dataset 1. The election-related 2020 data has substantially higher prevalence of keywords related to allegations of fraud and other forms of delegitimization than does the election-related 2016 data. Some terms, such as “ballot harvesting” and “suppression,” had almost no presence in 2016. Other terms, such as “fraud” and the lexicon of fraud-related words, were part of the conversation in 2016 but nonetheless have a notably higher presence in 2020. 

Figure 2: Prevalence of a selection of threatening search terms or term sets in the 2016 and 2020 Twitter data we described above. Tweets must be election-related to be counted. ““Mail” & Fraud-related Lexicon” refers to tweets that include both topics.

Despite the extremely low rate (less than a hundredth of a percent) of confirmed mail-in voting-related fraud, our results suggest that it nonetheless remains a significant topic of concern for those participating in election-related conversation on Twitter: we witnessed a 1.6x increase in fraud-related conversation in our 2020 data and a 25x increase in the proportion of tweets mentioning both “mail” and a fraud-related term. Compare the 25x increase to the Pew Research Center’s finding that mail-in voting has likely increased around 2x in the 2020 primaries when compared to the 2016 and 2018 general elections. This suggests that the reaction by online communities concerned about mail-in voter fraud far outpaced the actual increase in mail-in voting

Interested in how these increases were related to ideological narratives, we performed the keyword analysis again on only the progressive and conservative subsets of the same dataset. The results, shown in Figure 3, indicate that the increase in conversation about ballot harvesting in 2020 is driven by conservative users, and that progressives drive the conversation about voter suppression. The results also show that conversations about voter fraud and mail-in related voter fraud were fairly common on both sides of the aisle in 2020, giving the initial impression that there is bipartisan concern about these issues. 

Figure 3: Prevalence of a selection of threatening search terms or term sets in Dataset 1, broken down by users’ political ideology (estimated). Tweets must be election-related to be counted.

However, upon further inspection it becomes clear these topics are a source of intense polarization in 2020, with conservative Twitter users spreading messages about alleged voter fraud and progressive Twitter users spreading counter-messaging:

Examples of fraud-related tweets from conservative users (2020):

  • “thanks to a new @project_veritas video, minneapolis police say they are ‘looking into’ allegations of voter fraud by supporters of rep. ilhan omar.”

  • “@realdonaldtrump @vp @thejusticedept #electionfraud #voterid #mailinballots #mailinvotingfraud”

  • “the only competition trump has is voter fraud.”

Examples of fraud-related tweets from progressive users (2020):

  • “pennsylvania’s top elections official says that the 9 discarded ballots were not voter fraud.”

  • “in 2018, president trump claimed that the midterm election results in florida had been tainted by ballot fraud. the findings of an 18-month inquiry, released earlier this year, proved him wrong.”

  • “rightwing trolls and tricksters try to conjure up fake proof of trump’s voter fraud claims”

The results also show that while election-related conversations mentioning “fraud” were equally prevalent among conservative users in both years, such conversations have increased almost 5x among progressives. In the 2016 conversations, we found that the same messaging/counter-messaging pattern about voter fraud was present, but not to nearly the same degree. There were simply not many conversations in the progressive sphere on this topic, as Figure 3 shows. Interestingly, the only 2016 tweet among conservative users in Dataset 1 mentioning both the word “mail” and a word in the fraud-related lexicon discussed Hillary Clinton’s emails: “if @realdonaldtrump loses for old private comments as @hillaryclinton walks for illegal private email, many will feel election was “rigged.”

2020: Voter Fraud, Voter Fraud, Voter Fraud

To confirm that the above results about the pervasiveness of voter fraud-related conversation in 2020 generalize to a wider array of Twitter activity, we turned to a larger Twitter dataset. Our findings suggest that conversations about voter fraud are indeed widespread in 2020.

We analyzed the top 3-word sequences (i.e., a trigram analysis) tweeted by users in Graphika’s Twitter dataset for misinformation around the 2020 general election. This dataset samples over 9,000 users that make up groups representing nearly every facet of the political conversation; we will refer to it as Dataset 2. The first group analyzed was followers of pro-Trump journalists and influencers. These followers are receiving messaging cues from the influencers who are the primary drivers of conversation among conservative communities on Twitter. In this group, we found that terms related to allegations of voter fraud ranked highly. Specifically, among the top ten trigrams, phrases related to voter fraud ranked first and fourth in focus.

Figure 4. Trigrams used by followers pro-Trump journalists and influencers in the Graphika general election misinformation map from September 28th to October 28th, 2020. They are ranked according to the level of unique attention being paid to them by the group. 

We conducted a hashtag analysis for the same group and time period and found similar focus on electoral delegitimization as a narrative. Out of the top 9 hashtags, 3 referred to voter fraud.

Figure 5. Hashtags used by followers of pro-Trump journalists and influencers in the Graphika general election misinformation map from September 28th to October 28th, 2020. They are ranked according to the level of unique attention being paid to the…

Figure 5. Hashtags used by followers of pro-Trump journalists and influencers in the Graphika general election misinformation map from September 28th to October 28th, 2020. They are ranked according to the level of unique attention being paid to them by the group. 

We also conducted a hashtag analysis of a more general group of pro-Trump users in the dataset across the entirety of the fall election cycle (since August 1st). Out of over 30,000 hashtags captured, “standagainstvoterfraud” ranked 1st in focus. 

Additionally, we conducted a parallel analysis of followers of anti-Trump influencers and general anti-Trump “Resistance” users from the same dataset. In the trigram and hashtag analysis we did not find a parallel in mentions of election delegitimization content.

These results as a whole strongly suggest that there was a substantial increase in voter fraud-related conversation on Twitter in 2020 as compared to 2016, and that this push is largely driven by conservative users, with liberal users responding. These results also reinforce our earlier assessments about the pervasiveness of social media conversations surrounding mail-in voter fraud in 2020. 

To understand how social media platforms policies have evolved during this four-year period, and how they might manage this challenge, we additionally took a closer look at how policies themselves have changed over the past four years. 

Policies to Address Election-related Misinformation: 2016 v 2020

Social media platform’s policies have changed dramatically since 2016. Four years ago, we did not have the same clarity on how platforms regulate content that we have now. For example, Facebook made its guidelines for enforcing community standards public for the first time in April 2018. Similarly, in September 2018 Twitter began publishing blog posts about how it intended to protect election integrity on its platform, promoting ideals of transparency and clarity. 

There has also been a greater focus on how the platforms can combat election misinformation and disinformation. In the weeks leading up  to the 2016 election, the platforms’ blogs were focused on their voter registration campaigns or announcing new ways to watch the presidential debates or keep up with election news. In contrast, over the past few months we have seen election-related policy updates from Facebook, Twitter, YouTube, Pinterest, TikTok, Nextdoor and Snapchat. These changes are a response to the type of misinformation, and concerns about misinformation, talked about above. Platforms such as Facebook, Twitter, and TikTok now have explicit policies that relate to claims aiming to undermine public confidence in the election. Twitter and Facebook specifically address claims that cast doubt on legitimate and legal methods of voting. YouTube similarly announced in September that it will add voter information panels to videos that discuss vote-by mail to help draw attention to authoritative sources. 

The policy changes are in large part in response to the different types of threats that the platforms faced during the 2016 election. Prior to 2016, social media platforms were mostly focused on mitigating cyber threats against their physical networks. Now, these companies have built teams of threat investigators for disinformation and war rooms on election nights. These companies have also shared more information about how their platforms are manipulated by both domestic and foreign actors, allowing users and researchers to better understand the techniques and strategies behind these threats.

Learning from 2020

While the most popular topics of conversation within our digital public square remain those tied to current events, our analysis shows that there has been a concerning uptick in election-related content alleging fraud in the 2020 election as compared to 2016. These narratives seek to delegitimize the election before it occurs by sowing doubt and eroding trust in the process leading up to election day. 

Platforms face a significant challenge: how to ensure that they remain spaces for political expression and discussions of the important issues of the day, while also minimizing mis- and disinformation. This is particularly critical given the rise in delegitimization narratives surrounding the 2020 election, and it remains to be seen whether the recent enhancement in platforms’ content moderation and labeling policies will prove a sustainable solution.

Previous
Previous

EIP on Election Day

Next
Next

Election Official Handbook: Preparing for Election Day Misinformation