Data Analysis: The Pennsylvania 2020 Election
So I’ve decided to start doing data science projects on topics that interest me. In this case, many people are looking for a way to understand and discuss what occurred in the 2020 US election. The way I find to be most effective in exploring this issue is through collecting and visualizing data.
Here’s a report I put together through Google Data Studio that covers multiple angles of the 2020 Pennsylvania election data that was gathered through the official .gov sources. The intent of this data is just to show statistics and comparisons, and does not intend to support any political party or claims associated with any campaign. It’s just visualized and sourced data. Check out the full interactive report here – https://datastudio.google.com/s/qft-xZmUZEw
Here’s how I went about it.
First, I wanted to try and define the “mood” of PA voters. I found an interesting dataset that helped out with this, it shows the amount of voters who left either the republican or democratic party during 2020. I added this piece of data to show where most voters had decided to leave, and I found a county by county representation of who had drifted away.
I also sorted by the total population of each county, so we could see how many people switched their vote out of the most populated counties in PA.
Then I scoped down to show only the amount of people who left their party compared to the percent of registered voters in that party.

Through this I was able to see counties like Lancaster and Delaware who both had a significant percentage of total switches, as well as their relationship to their total voters.
I was also surprised to find in this dataset that some of the most populous counties like Allegheny, Philadelphia, Montgomery and Bucks reported no changes to their voter affiliations throughout 2020. This means that across those four counties of over 2M people, nobody felt the need to switch their affiliation, as reported by the datasets provided by the official PA.gov source. My personal assessment of this is that there is likely a gap in data, as opposed to nobody in a population that size switching their voter affiliation.
Then I proceeded to do some comparisons between three different sets of data.
- The official counts from the PA.gov website that reflected Nov 3rd vote counts.
- Data scraped from the PA.gov website on Nov 7th by another researcher.
- Data scraped by myself from the PA.gov website on Nov 22nd
I took these datasets and compared them against each other to see if I could find significant differences between those three “snapshots” in time.
First I took the total sum of Democrat and Republican votes as reported 4 days after the election, 7 Nov. I also compared the difference between the vote totals by county between 7 Nov and 22 Nov on this page.

On this next page I highlighted the total gain in votes for each party from November 3rd (election day) to the vote counts represented by the pa.gov site on November 22nd.
Since election Day there have been more than 2 million votes gained for Biden and almost 650k for Trump.

In my next visualization I highlight the counties that were in favor of Trump or Biden on election day by using red/blue to show who was in the lead, and then used that same visual to compare it to the 22 November vote count.
A conclusion I’ve derived from this data is that on Election day 2020 Trump was favored heavily in 9/10 of the top counties in PA, with Biden only leading in Philadelphia. We can also observe that the results pulled from the pa.gov website on November 22nd reduced that lead to 3/10 for Trump. The bulk share of the increase in votes for Biden came from Allegheny and Philadelphia counties. This conclusion does not support any theory, it just indicates the switch in counties between the two dates and shows where those votes came from.

I also took this data and have added some additional visualizations since I first published this. You can check those out here – https://datastudio.google.com/s/lgpZszcCzJY
Please send any feedback you have or ideas on other data projects I should do to my email: *so the bots don’t get me* micah.d.johns AT gmail.com