A friendly tutorial on getting zip codes and other geographic data from street addresses.
Knowing how to deal with geographic data is a must-have for a data scientist. In this post, we will play around with the MapQuest Search API to get zip codes from street addresses along with their corresponding latitude and longitude to boot!
In 2019, my friends and I participated in CivTechSA Datathon. At one point in the competition, we wanted to visualize the data points and overlay them on San Antonio’s map. The problem is, we had incomplete data. Surprise! All we had were a street number and a street name — no zip code, no latitude, nor longitude. We then turned to the great internet for some help.
We found a great API by MapQuest that will give us exactly what we needed. With just a sprinkle of Python code, we were able to accomplish our goal.
Today, we’re going to walk through this process.
To follow along, you can download the data from here. Just scroll down to the bottom tab on over to the Data Catalog 2019. Look for SAWS (San Antonio Water System) as shown below.
Download the file by clicking on the link to the Excel file.
If you’re using Windows 10, it will ask you to open Microsoft Store.
Go ahead and click on the “Install” button.
And let’s get started by clicking on the “Launch” button.
A Thousand Clicks
Click on “Get data” when the splash screen appears.
You will be presented with a lot of file format and sources; let’s choose “Text/CSV” and click on the “Connect” button.
Select “order_products_prior.csv” and click on the “Open” button.
The image below shows what the data looks like. Click on the “Load” button to load the dataset into Power BI Desktop.
Load the rest of the dataset by selecting “Get Data” and choosing the “Text/CSV” option on the dropdown.
You should have these three files loaded into Power BI Desktop:
You should see the following tables appear on the “Fields” panel of Power BI Desktop, as shown below. (Note: the image shows Power BI in Report View.)
Let’s see what the Data View looks like by clicking on the second icon on the left side of Power BI Desktop.
And now, let’s check out the Model View where we will see how the different tables are related to each other.
If we hover a line, it will turn yellow and the corresponding related fields are both highlighted as well.
In this case, Power BI Desktop is smart to infer the two relationships. However, most of the time, we will have to create the relationships ourselves. We will cover this topic in the future.
Let’s go back to the Report View and examine the “Visualizations” panel closely. Look for the “slicer” icon which looks like a square with a funnel at the bottom right corner. Click on it to add a visual to the report.
In the “Fields” panel, find the “department_id” and click the checkbox on its left.
This will cause the “department_id” field to appear under the “Visualizations” panel in the “Field” box.
Next, take your mouse cursor and hover over the top right corner of the visual in the Report View. Click on the three dots that appeared in the corner as shown below.
Click on “List” in the dropdown that appeared.
While the “department_id” visual is selected, you should see corner marks indicating the visual as the active visual. While the “department_id” is active, press CTRL+C to copy it and then CTRL+V to paste it. Move the new visual to the right of the original visual.
Make the second visual active by clicking somewhere inside it. Then look for the “aisle_id” field in the “Fields” panel on the right of Power BI Desktop as shown below.
Try selecting a value on the “department_id” visual and observe how the selection on “aisle_id” changes accordingly.
Now, examine the “Visualizations” panel again and click on the table visual as shown below.
In the “Fields” panel, select “product_id” and “product_name” or drag them in the “Values” box.
Power BI Desktop should look similar to the image below.
This time, try selecting a value from both “department_id” and “aisle_id” — observe what happens to the table visual on the right.
Let’s create another visual by copying and pasting the table visual. This time, select (or drag) the following fields to the “Values” box of the visual.
Power BI Desktop should now look similar to the image below.
Try clicking one of the selections in the table visual (where it’s showing “product_id” and “product_name”) and observe how the table on the right changes accordingly.
For a closer look, activate Focus Mode by clicking on the icon as shown below.
The table displays the details of orders that have the product that you selected in the table with “product_id” and “product_name.”
Get out of Focus Mode by clicking on “Back to report” as shown below.
Let’s rename this page or tab by right-clicking on the page name (“Page 1”) and selecting “Rename Page.”
Type in “PRODUCTS” and press ENTER.
let’s add another page or tab to the report by right-clicking on the page name again (“PRODUCTS”) and selecting “Duplicate Page.”
Rename the new page “TRANSACTIONS” and delete (or remove) the right-most table with order details on it.
Change the top-left visual and make update the fields as shown below. The “Fields” box should say “order_dow” while the top-left visual is activated.
Move the visuals around so it looks similar below.
Do the same thing for the next visual. This time, select “order_hour_of_day” and your Power BI Desktop should like the image below.
Do the same thing one last time for the last table and it should now contain fields as shown below.
Let’s add another page or tab to the report by clicking on the “+” icon at the bottom of the report’s main work area.
In the “Visualizations” panel, select “Stacked column chart.”
Resize the chart by drabbing their move-handles.
Make sure the “Axis” box contains “order_dow” and the “Values” box with “order_id” respectively. Power BI Desktop should automatically calculate the count for “order_id” and display the field as “Count of order_id” as shown below.
The graph above is interesting because it shows a higher number of orders for Day 0 and Day 1.
Let’s make another chart.
We will follow the same procedure of adding a chart and for this time, we’ll use “order_hour_of_day” in the “Axis” box as shown below.
The graph shows the peak time for the number of orders.
One last graph!
We will add another chart with “days_since_prior_order” in the “Axis” box.
This last graph is the most interesting because the number of reorders peaks during these three time periods: 7 days, 14 days, and 30 days since prior order. This means that people are in a habit of resupplying every week, every two weeks, and every month.
That’s it, folks!
In the next article, we will “prettify” our charts and make them more readable to the others.
The procedures above have been drawn out. But if you’re a novice Power BI user, don’t despair! With regular practice, the concepts demonstrated in this article will soon become second nature and you’ll probably be able to do them in your sleep.