My previous post was my first dive into examining rent affordability for Minimum Wage (MW) earners in Texas leveraging D3.js and data from the Census. One of my next steps was to expand the example to every county within the USA and this is the resulting visualization (give the page a bit of time to load all of the map data - grey space indicates missing data from the Census)
Eventually, I will throw the data into Python and analyze it from a different perspective than just a visualization. For now, I wanted to provide an overview of the work & code that went into creating this d3.js map.
Retrieving and Cleansing the Data
There are three main datasets used in this viz.
- The TopoJSON to create the map w/ all USA counties
- The gross rent of every county used to calculate the "heat"/color of that county
- Minimum Wage of every state (which will be combined into the dataset above)
Straight from the creator, TopoJSON is, "an extension of GeoJSON that encodes topology. Rather than representing geometries discretely, geometries in TopoJSON files are stitched together from shared line segments called arcs." The Census Bureau includes Shapefiles for pretty much any kind of map your heart desires - from which you can convert to GeoJSON and then reduce to TopoJSON, but the county JSON was already available on GitHub so I pulled it from there.
The United States Census Bureau website provides the American Fact Finder to easily find data collected from the Census. I used the "Guided Search" of the Fact Finder to look for "information about housing", selected the "Gross Rent" financial characteristic and then limited my data to "All Counties within the United States." After some table manipulation like removing the margin of error columns and pivoting, I exported the dataset as a CSV.
The TopoJSON file is good to go and no further editing is needed. The Census rent dataset we downloaded has a little while to go and we will need to write some Python code to scrape the minimum wage of each state to eventually merge into the Census rent data.
Let's start with finding the minimum wage of every state. I found this site with a table of every state and their corresponding minimum wage. The BeautifulSoup Python library is excellent for scraping data from the web. I've put the code I used to scrape the minimum wage table in an iPython notebook you can find here.
Now for the Census Gross Rent CSV file - the columns we want are the following:
- FIPS Code (unique ID of a county - used to link to TopoJSON id)
- County Name
- State Name (used to match to the web scraped minimum wage data)
- 2 Bedroom Gross Rent
- Gross Minimum Wage (MW) Hourly Rate
- Gross MW Monthly
- 2 Bedroom as Percent of Income
All you will see from the imported Census file is the FIPs, County Name, and 2BR Gross Rent information so go ahead and add the additional columns and we will walk through how to derive them a little later. Once the Census data is imported, create a new sheet and import the Python web-scraped data. Since the Python minimum wage data is based upon the State name, we will want to use this as our join key between the two Excel sheets and pull the minimum wage per hour figure.
In the Census Rent Data sheet, you will see the name of the state succeeds the county name after a comma/"," character. Therefore, in the State column of the first sheet, apply a function to grab everything to the right of the comma, trim any whitespace, and capitalize every character (to match the state on the second sheet):
After applying the formula, your State column should look like the following:
Ok, so now we have our Census rent data on one sheet, our MW data on a second sheet, and our key column of State. Let's create a VLOOKUP function to join on State name across each sheet and pull the hourly wage value. Select the first cell in the "Gross MW Hourly" column. Insert a function and mimic the following options for the VLOOKUP:
- Lookup Value - The join Key, "State" in our case which falls in my F2 intersection.
- Table Array - The sheet and range of data to search through for the key - the web-scraped data sheet (don't highlight any header rows when selecting the table array area.)
- Col Index Number - the column number of the data to pull into the target column once a join match is found - I want the second column's value on my second sheet.
- Range Lookup - you can leave this blank for this example but this option focuses on exact vs accurate matching and other options.
You should now see $7.25 in your MW Hourly column next to Alabama. You can now populate the Monthly MW Income for each record by multiplying the MW Hourly column (Hourly Wage * 40hrs * 4 weeks). You can also calculate the percentage of income of a 2 bedroom rental for each county
(2 Bedroom Gross Rent / MW Monthly Income) * 100
Alright, our Rental data CSV dataset is ready to be loaded into D3! Part 2 will cover creating the visualization with the datasets we have worked with here.