I really enjoyed the challenge of sourcing this data and I only almost pulled my hair out two or three times in the process. Election data is seriously difficult to get a hold of because every state has some very specific and different ways of exposing what their results are.
First, I’m just going to give you a link to my video and the dashboard I made.
This time I was only able to provide you with scraped election data from Michigan current as of 23 November from a table presented by CBS news. Yes, I know, I wanted the .gov source data, but was unable to find any structured data from a .gov source. When I say “structured”, understand that I’m extremely critical of the way that many entities (especially local, state and federal governments) structure their data. When your available data structure can’t do these things, your data is broken and likely not very useful:
- Provide a big picture visualization
- A piece by piece representation of your organization’s data
- Unable to represent data by time (timelines and date driven analysis)
- Provide details that your stakeholders care about
- Closed source, with multiple barriers to obtaining data/information
I experienced this with Michigan’s election data set, and I want to break down exactly why I say this with examples, so that I’m not just standing on words.
So first, I hit the google machine. Things are looking good, right?
But when I got there, all I got was a list of counties. No overall result was provided for the entire state, they just refer the responsibility to provide data out to their counties. A state government election site that doesn’t provide any data and delegates it down to the county level. This removes me from being able to see the big picture, and also creates as many barriers to getting data as there are counties, and Michigan has 83 counties.
So at this point I’m realizing that there’s likely no centralized dataset that I can grab, but I dig into the county websites. When I’m there, I’m getting directed to yet another website. Kudos to them, the button was pretty big though.
I just want to point out at this point, if I wanted to see this data, at this point in my journey I’ve had to dive down a pretty decent rabbit hole to get to to a data source. Multiple barriers in my way.
Also, this site was more or less a disaster in terms of actually obtaining any bigger picture or downloadable source that had the data for even one total county breakdown. They provided a county by county request option for a table of results by precinct. This would require me to go through and data scrape every single county’s results, and then I would have to stack and add up every single precinct from it’s own file. It’s not looking good at this point, so I move on to find another source.
Through some additional deep diving on google, I was eventually able to find a CBS news dashboard that had a county by county breakdown. But sadly, the battle for clean and usable data didn’t end there.
I was luckily able to scrape this data and format it in a CSV, which eventually got me to this point
From this point, I had to clean and restructure the data with a bunch of different formulas that did a few different things.
- Split the county name data apart from the reporting percentage
- Transposed the vote count rows into columns
- Identified the correct winner of each county to isolate the vote values into two distinct columns (this was tricky)
- Filter the data in each sheet so that the county names and correct counts aligned
- Quality check the data for accuracy against the original dataset to make sure nothing had gone wrong
At this point I ended up with a clean and structured dataset, where each row represented the county and the vote count from each of the two major candidates.
Only at this point was I able to take the data and build it into a meaningful visualization for people to see and understand.
From there I added total population totals and calculated which candidate won in each specific county.
I want to voice the idea that (maybe) the state of Michigan should have provided data in this format to begin with, because this process I just went through creates a MASSIVE barrier for accessibility to this public information. It’s available, you just have to be really good with data manipulation, scraping and cleaning to turn it into anything usable.