There are many data initiatives in the Tableau community, and I like to participate in them when I have time. As a professional in healthcare, one of my favorites is #projectHealthViz. Each month, Lindsay Betzendahl posts health-related data and invites the community to take a crack at visualizing it. The December data set ended up sparking a very interesting data journey for me, and I thought it might be useful to share the thought process I went through on this project.
After downloading the data, our first step is to look at the data in Excel. This lets us get a sense of how the data is structured, and helps us start thinking about how to analyze the data. Here is a small sample of the data, to give a sense of how it looks:
This particular data set happens to be nicely structured and fairly straightforward, with one row of data per day. No real data cleanup is needed, so on to Tableau!
Getting a sense of the story
Next, let’s pull the data into Tableau and do some quick prep work. After combining year, month, and date_of_month into one Date field, let’s plot the total births per year to see the basic shape of the data:
This isn’t very interesting, but there’s a somewhat intriguing bump in the middle. Let’s investigate by zooming in on the y-axis:
This is much more interesting. What happened in 2008 that caused such a huge drop in births? After some quick searching on Wikipedia et al, I remembered the 2008 Recession! I bet that had something to do with it…
Bringing in more data
My brother works in finance, and we’ve definitely geeked out over data on multiple occasions. I remember him telling me about FRED, so that seemed like a good place to start looking. Sure enough, there is yearly GDP data on FRED. Pulling it into Tableau and plotting by year, we get the following:
While there isn’t as big of a drop as there was in births, there is definitely an inflection point in 2007. Let’s see how they look when we compare them directly:
Okay, so it’s not an immediately obvious correlation, but it looks promising. Let’s dig a little deeper. What if we looked at the yearly percent change?
This actually looks pretty good! At this point, I started to realize that birth rate is a bit of a lagging measure. In fact, on average it lags by about 40 weeks, or 9 months. How do things look if we adjust for time of pregnancy?
This is starting to look like a really compelling association!
At this point, let’s add some finishing touches, and create our end product:
I’ve recently been adding very light color to the background of my visualizations. I generally prefer a light background, but find a touch of color makes it less stark, and easier on the eyes. Also, as a relatively new parent, I know how expensive kids can be, so I had a little fun with the title. See the interactive (and downloadable) dashboard here.
To satisfy our statistical side, let’s do a quick check of the Pearson Correlation Coefficient, to see how strong the correlation actually is. To do this, we can make a scatter plot comparing yearly percent change in GDP and in birth rate, with a circle for each year:
The Pearson Coefficient ends up being 0.8453, which is a fairly strong correlation.
While I never would have thought about the relationship between birth rate and GDP, it does make sense, as you’re probably less likely to have a kid if you lose your job. That said, there are many possible interpretations. For instance, my co-author Jorge mentioned that when the US GDP slowed down, there were changes to immigration and emigration patterns that reduced the foreign-born population in the US, which could certainly impact the birth rate as well (related paper).
All in all, I had a great time looking into this data set, and really enjoyed the opportunity to contribute to the conversation in the community. I hope this glimpse into my thought process is useful, or at least interesting.
Until next time, stay warm out there!