This is the first in a series of practical application examples of Pivot Billions that we will be calling 5-Minute Analysis. Each week, we'll take an interesting open data set and use Pivot Billions to find interesting insights within a five minute span.
Accessing and understanding the Kaggle LA Restaurant & Market Health Data in real-time, exploring the data, and pivoting the data to report the top violators of the health code and their violations.
- Load the data to Pivot Billions and view its structure.
- Explore the data using Pivot Billions built-in features.
- Pivot the data to organize it by violator name and violation to see the worst violators and report our findings.
Load the Data and View its Structure
- Download the dataset from Kaggle and unzip your downloaded data.
- Access the Pivot Billions URL for your machine and click the Plus icon on the top right hand side of the window.
- Select Drag & Drop and drag your downloaded “restaurant-and-market-health-violations.csv” file into Pivot Billions.
- Then select the left checkbox next to the file and click Preview at the bottom of the screen.
- You can now see the columns and types of the dataset and modify them as you see fit. You can also view or change which column or columns are set as primary keys.
- When you are done viewing or modifying the data structure to be imported, click Import.
View and Explore the Data
After the data has been quickly imported you can now see and access all 272,801 rows of the data. By hovering over each column name you can sort the data by that column, view that column’s distribution over all of the data, filter by the data in that column, or rename that column. We’ll view the distribution of the data by the facitly name and owner’s name.
- Click on the distribution icon for the facility name column to see the distribution of total health code violations by facility.
- Click on the distribution icon for the owner name column to see the distribution of total health code violations by owner.
You can quickly see that Dodger Stadium is the facility with the most number of violations while Ralphs Grocery is the owner with the highest number of violations.
Pivot and Report the Data
Now that we know which facility had the highest number of violations, we want to drill down into the data and see which health codes were violated. This is made extremely simple and fast using Pivot Billions.
- First hover over the facility name column and click on the filter icon.
- Set the filter condition to "Contains" and enter "Dodger" in the field beneath.
- Click on the Pivot Icon and select "violation_code" and "violation_description as your dimensions
- Click on the View button to create the pivot table.
- In the newly created table, select the Pivot View option.
- Drag and drop both dimension labels into the row section of the pivot table and sort vertically.
- Select the bar chart visualization to be able to see a graphical comparison of the violations.
- Hover over any bar to get details for that item.
So it looks like Dodger Stadium has some work to do. I might have to think twice next time I consider getting a Dodger Dog.
That wraps up this 5-Minute Analysis. Check back next week for another quick analysis of a new data set using Pivot Billions.