Machine Learning to Save Money for Cash-Strapped Political Campaigns

We have a three day weekend in honor of the great civil rights activist Dr. Martin Luther King, Jr. No high school basketball games last night, and no peewee games for us today. So I had a little time to play around.

A friend of mine gets voter registration information from the Secretary of State every month. He shares the Faulkner County, Arkansas data with me whenever I ask. I'm extremely grateful, and it gives me an opportunity to practice the data science skills I gained when I was tasked with teaching a data science course a couple of years ago.

In addition to teaching, I serve as the elected Justice of the Peace for Faulkner County District 3. This year I don't have an opponent, but every two years, I'm up for reelection, and I'm sure the day is coming when I will run against an opponent. The office of JP isn't one that garners tons of donations, so my campaigns are mostly self-funded. I'm not a rich man, so stretching the dollars I sink into running for and holding this office is extremely important to me.

So last year, I played around and built a machine learning model to predict which voters in my district will vote in the upcoming election. That was a first draft, and last night I was able to improve it significantly,  using data current as of January 2, 2024, to produce a mailing list of voters that my model predicts will vote in the upcoming primary election, less than two months away.

I actually produced several machine learning models, which all produced similar predictions, but not 100% in agreement. I created a Decision Tree model, a Logistic Regression model, and a Support Vector Machine model, then employed each to predict which registered voters in my district will vote in the upcoming primary election. To build the model, I used data from 2012 to 2020 election cycles to build the model to predict who would vote in the 2022 election, then ran the model using data from the 2014 to 2022 cycles to predict who will vote in 2024. 

For those not familiar with machine learning algorithms, a very short, not-too-technical description will make the graphs below a little easier to understand. In supervised machine learning algorithms, which these are, the entire data set is randomly divided into two groups--a training set and a test set. The training set is then used to build the model, which is then deployed on the test set, so that we can get an idea of the accuracy of the model. 

Each of the graphs that follow is called a Confusion Matrix, a display to show how well the model we built with the training data actually performed on the test data. The first shows the results of the Decision Tree model.

         Decision Tree



The test set contained data for 1,184 voters. Of those, 368 actually voted in the 2022 primary election. The Decision Tree Model accurately predicted 215 of those, 58%, would have voted. There were 816 registered voters in the test data set who did not vote in the 2022 primary election. Of those, the model predicted 743 of those, 91%, would not vote. 

The model predicted 368 of the 1,184 voters in the test set would vote. Of those, 75% actually voted. It predicted 896 would not vote. Of those, 83% did not vote. 

Before we move on to show how the other models performed, let's think about how this can be used, and how it can benefit a politician with limited funds, time, and other resources.

Reaching voters costs time and money. Candidates for low-profile local races often have little of both. They don't have war chests to pay an army of canvassers to go door-to-door. Political Action Committees aren't lining up to pay for advertising and phone banks. Most of the work in campaigning for local office is done by the candidate, and paid for out of his or her own bank account. Strategic, targeted use of time and money is essential, especially in local campaigns.

Canvassing (door-to-door campaigning) means walking streets and knocking on doors, conversations with voters willing to talk. It usually also includes leaving some sort of advertising--flyers, door hangers, pamphlets--with the voter or on the door. Canvassing is expensive in time and money. But it's necessary in local campaigns because most voters pay little or no attention to races lower than the state and federal level. Candidates need to canvas to build name recognition and support. 

But starting at A Street and working your way to Z Street, knocking on every door and talking to everyone with an ear will quickly use up all your time and money. Those door hangers cost money, and the window of time for effective canvassing is small. Targeting voters most likely to vote, getting your literature in their hands, looking them in the eye and asking them for your vote, is critical if you want to win. 

As of January 2, 2024, there are 5,918 registered voters in my district. Of the 5,622 who were old enough to vote in 2022, less than 32% participated in that primary election. Even if I could have talked to everyone of them, 68 out of every 100 hands I shook would never cast a ballot. For ease of calculating, let's just assume that each of those pieces of literature you hand out costs a dollar. Even if we could have reached them all, it would have taken $5,622 worth of campaign materials to do so. 

It takes 50% plus one vote to win an election. Of the 1,822 registered voters who participated in the 2022 primary election, I would need 912 of them to vote for me. If I had to reach all 5,622 registered voters to earn those 912 votes, at one dollar per contact, it would have cost me $6.16 per voter required to win. If, on average, time spent making all 5,622 contacts would have averaged 5 minutes per contact, it would have taken 468.5 hours, more than half an hour per vote required to win.

The Decision Tree Model predicts that 1,531 of the 5,912 registered voters will vote in the upcoming March primary election. Instead of $5,622, canvassing would cost me $1,531. Instead of 468.5 hours canvassing, averaging 5 minutes per contact would take 127.5 hours. Far more affordable, both in money and in time. 

The main idea here is to make the best use of available time and money to campaign for small, local, mostly self-funded races for political office. 

Now let's take a look at the other two models I built. For the most part, the results are quite similar. However, it's worth taking a closer look at the Support Vector Machine (SVM) Model. Here is the Confusion Matrix for the SVM Model. 

        Support Vector Machine


         Linear Regression

Though slight differences in predictions appear among the three models on the test set, the counts are similar enough in these three widely used different classification models appears to lend some credibility to the approach. The different models take different approaches the classification problem. Similar results suggest the characteristics of the data set are distinguishable enough that the various models identify very similar patterns hidden in the data.

Obviously, each new election cycle could see changing circumstances that alter the decisions by the electorate to vote or stay home. A recent school board election runoff here saw record turnout that no model would have predicted. Mathematical models are not foolproof, made obvious by the multiple models employed during the Covid pandemic that never proved themselves on the ground. But widespread use of such models makes clear they can improve results in all sorts of situations. 

These models do not promise perfection, or even a win. However, used as part of a broader campaign strategy, they could save candidates significant sums, allowing candidates more efficiently and effectively utilize scarce and precious campaign dollars.








Comments

Popular posts from this blog

13 Years Ago Today...Amanda Marie Allison (1993 - 2011)

2023 Improving Education -- Ranking Arkansas High Schools by Performance vs. Expected Performance

Snow days SHOULD be made up!