A friendly tutorial on getting zip codes and other geographic data from street addresses.
Knowing how to deal with geographic data is a must-have for a data scientist. In this post, we will play around with the MapQuest Search API to get zip codes from street addresses along with their corresponding latitude and longitude to boot!
In 2019, my friends and I participated in CivTechSA Datathon. At one point in the competition, we wanted to visualize the data points and overlay them on San Antonio’s map. The problem is, we had incomplete data. Surprise! All we had were a street number and a street name — no zip code, no latitude, nor longitude. We then turned to the great internet for some help.
We found a great API by MapQuest that will give us exactly what we needed. With just a sprinkle of Python code, we were able to accomplish our goal.
Today, we’re going to walk through this process.
To follow along, you can download the data from here. Just scroll down to the bottom tab on over to the Data Catalog 2019. Look for SAWS (San Antonio Water System) as shown below.
Download the file by clicking on the link to the Excel file.
OR, you can click on this.
MapQuest API Key
Head on over to https://developer.mapquest.com/ and create an account to get a free API key.
Copy the ‘Consumer Key’ and keep it in a safe place. We’ll need it later.
Now, let’s fire up a Jupyter notebook and get coding!
For starters, let’s set up the environment by doing a couple of imports.https://towardsdatascience.com/media/7d0f7ced4082761e995ecf8ce0213c3f
Don’t forget to replace the API_KEY (line#12) with your own key above.
Now. let’s read the Excel file with a simple
df = pd.read_excel().
Next, we’ll combine the street number and street name columns.https://towardsdatascience.com/media/1696465db27770b7f2942ab707d2efa5
The ALL CAPS hurts my eyes. Let’s do something about it:
df['street_address'] = df.street_address.str.title() .
Below are two functions that call the API and returns geo data.https://towardsdatascience.com/media/3ec6009e8b6069387a9edde18bdad0d3
We can manually call it with the line below. Don’t forget to replace the ‘#####’ with your own API key. You can use any address you want (replace spaces with a + character).
But we’ve got many addresses, so we’ll use a loop to call the API repeatedly.https://towardsdatascience.com/media/9a970862f0352997417c5211df359a9b
Let’s see what the result looks like:
Finally, let’s create a dataframe that will house the street addresses — complete with zip code, latitude, and longitude.https://towardsdatascience.com/media/adfcc23ff94f54877bc80b72e2537ed9
Voila! We’ve got ourselves geo data.
For extra credit, let’s import the data in Tableau and get a pretty spiffy visual:
And that’s it, folks!
You can find the jupyter notebook here.
Thanks for stopping by and reading my post. Hope it was useful 🙂
If you want to learn more about my journey from slacker to data scientist, check out the article below:From Slacker to Data ScientistMy journey into data science without a degree.towardsdatascience.com
And if you’re thinking about switching gears and venture into data science, start thinking about rebranding now:The Slacker’s Guide to Rebranding Yourself as a Data ScientistOpinionated advice for the rest of us. Love of math, optional.towardsdatascience.com
This article was first published in Towards Data Science’ Medium publication.