Date: 4 July 2014
Author: Alphan Kırayoğlu
Client: Self-published

Grading NYC Restaurants

Since July of 2010, restaurants in New York City have been required to post letter grades showing sanitary inspection results. The Department of Health inspects every restaurant at least once a year and checks for compliance with city and state food safety regulations concerning food handling, food temperature personal hygiene and vermin control. Violations are assigned point values, and a restaurant’s letter grade is determined by looking at the total score of violations where a lower violation score receives a higher letter grade and indicates better compliance with the regulations. Restaurants with a score between 0 and 13 points earn an A, those with 14 to 27 points receive a B and those with 28 or more a C. More information can be found here.

Inspection results of individual restaurants can be found on Department of Health’s website or using their app. The inspection results data set for all restaurants is available on the NYC Open Data portal: https://nycopendata.socrata.com/dashboard

The data set of inspection results have been used previously to visualize possible relationships between cuisine and inspection results or to visualize the density of restaurants as a heat map (here) or to visualize the most roach infested neighborhoods in the city (here).

I wanted to analyze and visualize the inspection results data set as well. Mainly, I wanted to learn about the mapping capabilities and the census packages in R. Along the way, I discovered that most restaurants request a second inspection before the grades are posted, and on average improve their scores by 10 points. Hence, you will see that most of the scores are in the A range. Still you may start eating at home more frequently after learning about the violations and scoring system behind the letter grades.

Analysis
I downloaded the data from the NYC open data portal and excluded inspection results before June 1st, 2013. Since every restaurant is inspected at least once a year, this should include (hopefully) all active restaurants.

We can look at the distribution of inspection results by month to see if warmer months lead to more violations (higher scores):



At first glace, the distributions look fairly similar across months. We can tabulate the results and see that the mean and the 75th percentiles are similar as well. We see that the 90th percentiles differ, but this may be due to the variation in the number of restaurants inspected.



It is comforting to see that ~75% of restaurants are graded A. Though one should keep it in mind that the Department Health is more generous than an Ivy League institution. Even if a restaurant has mice running around in its kitchen, it may still have an A rating. Furthermore, a restaurant can request a second inspection following the initial visit after correcting the cited violations. You can see below that the restaurant was inspected for a second time following ungraded inspections multiple times. Each time the inspection score went down dramatically and yielded an A rating.



In fact, over the period from 06/01/2013 to 06/24/2014 restaurants on average improved their inspection scores by 10 points on the second inspection.



Maps
We can try to visualize the data on a map. The data set obtained from the NYC Open Data portal includes the street address and zip code, which can be translated to coordinates (geocode). Using the R package ggmap developed by David Kahle and Hadley Wickham, it is an easy task. However, you will be limited to 2500 queries per day due to Google API limitation.

Once we have the coordinates for each location, we can plot the C-rated restaurants in New York. Z-C represents yet to be graded inspections that would have resulted in a C grade.



Or again using the facet functionality of ggplot/ggmap, we can plot separate graphs for individual months. Though once again, this graph is not very useful since the number of restaurants inspected varies over months. Hence, I did not spend a lot of time to make this graph more legible.



We can create a more sophisticated visualization by overlaying the census tracts on our map and coloring according to the average score within tracts, i.e. a choropleth map. I utilized the UScensus2010tract package to obtain the tract data in New York and wrote a little script to convert coordinates of restaurants to 11-digits tract fips using the FFC’s census API. (here)

Here is the map showing the mean inspection score for each tract:



And here is the map showing the proportion of C rated restaurants in each tract:



The mean scores and the proportion of C rated restaurants have very low variation. As I pointed out, since the restaurants can request a second inspection, posted grades tend to gravitate towards A and explain why majority of the restaurants receive A's.

My code for this project can be found on github.com/alphankirayoglu.