Go to
NYC Open Data.
In the “Search Open Data for things like 311”
search box, type
DOHMH New York City Restaurant Inspection Results
(DOHMH stands for “Department of Health and Mental Hygiene”.)
Click on the first
DOHMH New York City Restaurant Inspection Results
that it finds for you.
Press the blue View Data button.
You should see
the
first page
of records.
Each record consists of 26 fields,
named on the first line in the file.
In Python,
they will be numbered from 0 to 25 inclusive when stored in a
list
.
To see a typical restaurant, type
Wo Hop 17
into the “Find in this Dataset” search box on the right side.
Then scroll to the right to the Violation Description column.
To download the file
DOHMH_New_York_City_Restaurant_Inspection_Results.csv
make sure the “Find in this Dataset” box is cleared out.
Press the blue Export button and select “CSV”.
#AWS Linux: cd ~/bin wget https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv ls -l rows.csv wc -l rows.csv
cd ~/Downloads pwd /Users/myname/Downloads ls -l DOHMH_New_York_City_Restaurant_Inspection_Results.csv -rw-r--r--@ 1 myname mygroup 166877269 Sep 15 17:38 DOHMH_New_York_City_Restaurant_Inspection_Results.csv wc -l DOHMH_New_York_City_Restaurant_Inspection_Results.csv 391077 DOHMH_New_York_City_Restaurant_Inspection_Results.csv head -3 DOHMH_New_York_City_Restaurant_Inspection_Results.csv CAMIS,DBA,BORO,BUILDING,STREET,ZIPCODE,PHONE,CUISINE DESCRIPTION,INSPECTION DATE,ACTION,VIOLATION CODE,VIOLATION DESCRIPTION,CRITICAL FLAG,SCORE,GRADE,GRADE DATE,RECORD DATE,INSPECTION TYPE,Latitude,Longitude,Community Board,Council District,Census Tract,BIN,BBL,NTA 50018145,LA FUENTES STEAKHOUSE,Bronx,1306,JEROME AVE,10452,6467735144,Spanish,01/29/2017,Violations were cited in the following area(s).,10F,"Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.",N,5,,,09/15/2019,Inter-Agency Task Force / Initial Inspection,40.838261018744,-73.919133506998,204,16,022101,2008025,2028390005,BX63 50077581,L'ADRESSE AMERICAN BISTRO,Manhattan,5,BRYANT PARK,10018,2122212510,French,08/08/2019,Violations were cited in the following area(s).,06D,"Food contact surface not properly washed, rinsed and sanitized after each use and following any activity when contamination may have occurred.",Y,12,A,08/08/2019,09/15/2019,Cycle Inspection / Initial Inspection,40.753936992473,-73.985948643127,105,04,011300,1022565,1009930029,MN17 awk -F, 'NR == 1 {print NF}' DOHMH_New_York_City_Restaurant_Inspection_Results.csv 26
Move
DOHMH_New_York_City_Restaurant_Inspection_Results.csv
into the directory that holds your Python programs.
WO HOP 17 05/25/2018 Food not cooked to required minimum temperature. WO HOP 17 05/25/2018 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 06/19/2017 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/23/2019 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 06/19/2017 Cold food item held above 41º F (smoked fish and reduced oxygen packaged foods above 38 ºF) except during necessary preparation. WO HOP 17 05/23/2019 Food contact surface not properly maintained.
list
named
hopLines
holding only the Wo Hop records.
Each item in
hopLines
will be a
list
of 26
str
ings.
Sort the records in the
list
by inspection date.
Change the part of the script after
lines = csv.reader(csvfile)
to the following.
Since the return value of this
score
function is of type string,
sort
will perform string comparisons to compare the scores.
All characters, including digits, have an “alphabetical order”;
see the
ASCII chart.
hopLines = [] #Start with an empty list. for line in lines: #During each iteration, line is a list of 26 strings. if line[0] == "41320866": #CAMIS number for Wo Hop hopLines.append(line) #Append line to hopLines. csvfile.close() def score(line): """ Return the line's datestamp, but with the format changed from "12/31/2019" to "2019/12/31". That makes alphabetical order the same as chronological order. """ fields = line[8].split("/") return f"{fields[2]}/{fields[0]}/{fields[1]}" hopLines.sort(key = score) for line in hopLines: print(line[1], line[8]) #name and inspection date print(line[11]) #violation description print() sys.exit(0)
WO HOP 17 06/19/2017 Cold food item held above 41º F (smoked fish and reduced oxygen packaged foods above 38 ºF) except during necessary preparation. WO HOP 17 06/19/2017 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/25/2018 Food not cooked to required minimum temperature. WO HOP 17 05/25/2018 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/23/2019 Food contact surface not properly maintained. WO HOP 17 05/23/2019 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
hopLines = [] #Start with an empty list. for line in lines: #During each iteration, line is a list of 26 strings. if line[0] == "41320866": #CAMIS number for Wo Hop hopLines.append(line) #Append line to hopLines.to the following list comprehension.
hopLines = [line for line in lines if line[0] == "41320866"]
score
function and change the call to
sort
to the following.
Instead of creating and returning a rearranged string such as
"2019/12/31"
,
the
lambda
function
creates and returns an object of class
datetime.date
.
import datetime
#During each call to the lambda function, line is a list of 26 strings. hopLines.sort(key = lambda line: datetime.datetime.strptime(line[8], "%m/%d/%Y").date())
sort
to the following
(and remove the
csvfile.close()
).
The entire file is read in as one big
sequence
of bytes.
The
decode
function converts this sequence into the
str
ing
of characters
s
,
and
splitlines
converts this
str
ing
into a
list
of
str
ings.
As we have already seen,
we can use a
csv.reader
to split apart the comma-separated fields each of these
str
ings.
import sys import csv #Comma-separated values. Do not name this Python script csv.py. import datetime import urllib.request #Database is at #https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv" try: fileFromUrl = urllib.request.urlopen(url) except urllib.error.URLError as error: print(error, file = sys.stderr) sys.exit(1) sequenceOfBytes = fileFromUrl.read() #Slurp whole file into one big sequenceOfBytes. fileFromUrl.close() try: s = sequenceOfBytes.decode("utf-8") #s is a string except UnicodeError as error: print(error, file = sys.stderr) sys.exit(1) lines = csv.reader(s.splitlines()) #lines is a list of lists hopLines = [line for line in lines if line[0] == "41320866"] #each line is a list of 26 strings
WO HOP 17 06/19/2017 Cold food item held above 41º F (smoked fish and reduced oxygen packaged foods above 38 ºF) except during necessary preparation. WO HOP 17 06/19/2017 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/25/2018 Food not cooked to required minimum temperature. WO HOP 17 05/25/2018 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/23/2019 Food contact surface not properly maintained. WO HOP 17 05/23/2019 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
decode
,
split into fields,
and process them one by one?
Hint:
if the argument of
csv.reader
is a
list
containing one
str
ing,
the
csv.reader
will split that one
str
ing
into fields:
reader = csv.reader([s]) #s is a string listOfStrings = next(reader) #listOfStrings is a list of the fields of s
PHONE
column, so I had to tell
read_csv
not to always expect a phone number there.
The
DataFrame
df
contains thousands of rows and 26 columns.
The
DataFrame
df.loc[df["CAMIS"] == camis]
contains only 6 lines and 26 columns.
The
DataFrame
df.loc[df["CAMIS"] == camis][columns]
contains 6 lines and 3 columns.
To sort the lines of
df2
in chronological order,
I temporarily added a fourth column named
dates
to
df2
,
drop
ping
it as soon as it was no longer needed.
Each value in the
formatters
dict
ionary
is a (nameless) function that takes one argument
and returns a
str
ing.
I printed a newline after each inspection date
to start a new line in the output.
I printed a newline after each violation description
to skip an empty line after each inspection.
The violation descriptions were very long,
so I set the maximum column width to infinity.
The output looked neater when I
strip
ped
the
whitespace
from the start and end of each line of output.
""" Print inspection results for Wo Hop. """ import sys import csv import pandas as pd #https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv" try: df = pd.read_csv(url, dtype = {"PHONE": str}) #df is a pandas DataFrame. except BaseException as error: print(error, file = sys.stderr) sys.exit(1) formatters = { "INSPECTION DATE": lambda id: f"{id}\n", "VIOLATION DESCRIPTION": lambda vd: f"{vd}\n" } camis = 41320866 #doing business as "WO HOP 17" columns = ["DBA", "INSPECTION DATE", "VIOLATION DESCRIPTION"] df2 = df.loc[df["CAMIS"] == camis][columns] #Sort by date, oldest at top. df2["date"] = pd.to_datetime(df2["INSPECTION DATE"], format = "%m/%d/%Y") df2.sort_values(by = "date", inplace = True) df2.drop("date", axis = "columns", inplace = True) pd.set_option("display.max_colwidth", -1) #Prevent ellipsis dots ... s = df2.to_string(formatters = formatters, header = False, index = False) #Strip whitespace from the start and end of each line. lines = [line.strip() for line in s.splitlines()] s = "\n".join(lines) print(s) sys.exit(0)
WO HOP 17 06/19/2017 Cold food item held above 41º F (smoked fish and reduced oxygen packaged foods above 38 ºF) except during necessary preparation. WO HOP 17 06/19/2017 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/25/2018 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit. WO HOP 17 05/25/2018 Food not cooked to required minimum temperature. WO HOP 17 05/23/2019 Food contact surface not properly maintained. WO HOP 17 05/23/2019 Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.
""" restaurantinspection.py This module contains the function getDf. """ import sys import os import pandas as pd #https://data.cityofnewyork.us/Health/DOHMH-New-York-City-Restaurant-Inspection-Results/43nn-pn8j url = "https://data.cityofnewyork.us/api/views/43nn-pn8j/rows.csv" filename = "~/Downloads/DOHMH_New_York_City_Restaurant_Inspection_Results.csv" def getDf(fromWeb = False, verbose = False): "Return a pandas DataFrame containing the restaurant inspection results." source = url if fromWeb else os.path.expanduser(filename) #Fields that should be integers. Int64 can tolerate a NA. dtype = { "ZIPCODE": "Int64", "Community Board": "Int64", "Council District": "Int64", "Census Tract": "Int64", "BIN": "Int64", "BBL": "Int64" } df = pd.read_csv(source, dtype = dtype) #If BORO == "0" but ZIPCODE is present, change BORO to the correct borough. zips = [ [10168, "Manhattan"], [10285, "Manhattan"], [11249, "Brooklyn"] ] for zipcode, boro in zips: mask = (df.BORO == "0") & (df.ZIPCODE == zipcode) if verbose: nrows = len(df.loc[mask]) q = '"' print(f'Changing BORO "0" to {q + boro + q:11} on the {nrows:2} rows containing ZIPCODE {zipcode}.') df.loc[mask, "BORO"] = boro if verbose: print() seriesOfBools = df.BORO == "0" if verbose: print(f'The remaining rows whose BORO is "0" are') columns = [ "CAMIS", "DBA", "BORO", "BUILDING", "STREET", "ZIPCODE" ] print(df[seriesOfBools].to_string(columns = columns, show_dimensions = True)) print() #Remove the remaining rows whose BORO == "0". if verbose: print(f'Deleting the {sum(df.BORO == "0")} rows whose BORO is still "0".') print() df = df[~seriesOfBools] if verbose: badPhones = df.loc[~df.PHONE.str.match(r"^\d{10}$").astype(bool)] print(f"{len(badPhones)} of the {len(df):,} rows have bad phones, including") print() s = badPhones.to_string( columns = ["CAMIS", "DBA", "PHONE", "INSPECTION DATE"], max_rows = 10, show_dimensions = True ) print(s) print() return df if __name__ == "__main__": print(f"Downloading {url} ...") df = getDf(fromWeb = True, verbose = True) columns = ["CAMIS", "DBA", "BORO", "CUISINE DESCRIPTION"] print(f"Got the following DataFrame (showing {len(columns)} of {len(df.columns)} columns).") print(df.to_string(columns = columns, max_rows = 10, show_dimensions = True)) sys.exit(0)
""" Call restaurantinspection.getDf and print the resulting DataFrame. """ import sys import pandas as pd import restaurantinspection try: df = restaurantinspection.getDf(fromWeb = False, verbose = True) except BaseException as error: print(error, file = sys.stderr) sys.exit(1) columns = [ "CAMIS", #Commercial Activities Management Information System "DBA", #Doing Business As "BORO", #Borough "CUISINE DESCRIPTION", "INSPECTION DATE" ] s = df.to_string( columns = columns, index = False, max_rows = 20, show_dimensions = True ) print(f"Showing {len(columns)} of the {len(df.columns)} columns:") print() print(s) sys.exit(0)
Changing BORO "0" to "Manhattan" on the 14 rows containing ZIPCODE 10168. Changing BORO "0" to "Manhattan" on the 5 rows containing ZIPCODE 10285. Changing BORO "0" to "Brooklyn" on the 81 rows containing ZIPCODE 11249. The remaining rows whose BORO is "0" are CAMIS DBA BORO BUILDING STREET ZIPCODE 308 50098157 NaN 0 NaN NaN NaN 36325 50098208 KONDITORI 0 NaN NaN NaN 112576 50098189 AUNT JAKE'S 0 NaN NaN NaN 130863 50098217 NaN 0 NaN NaN NaN 134039 50099742 AMBO 0 NaN NaN NaN 156097 50098161 NaN 0 NaN NaN NaN 184554 50098171 STARBUCKS COFFEE #57154 0 NaN NaN NaN 205375 50098206 BENJYS KOSHER PIZZA 0 NaN NaN NaN 218840 50098177 NaN 0 NaN NaN NaN 219844 50098202 NaN 0 NaN NaN NaN 232978 50097237 NaN 0 NaN NaN NaN 359325 50098200 DAILY PROVISIONS 0 NaN NaN NaN [12 rows x 6 columns] Deleting the 12 rows whose BORO is still "0". 472 of the 395,617 rows have bad phones, including CAMIS DBA PHONE INSPECTION DATE 274 41541587 SEATTLE COFFEE ROASTERS, PIER 79/ NY WATERWAY 917536391_ 11/30/2017 780 50063627 JOE & THE JUICE 917_566_57 07/25/2019 782 50085540 ILLY CAFFE 917407793_ 12/19/2018 1887 50073925 GRACE WOK 917_370_46 02/14/2019 2124 50056461 SAINT ANN'S CAFE RESTAURANT 917_688_16 12/05/2017 ... ... ... ... ... 392935 41637438 BOAT HOUSE & CAJUN CUISINE 41637438__ 10/12/2018 394366 50056461 SAINT ANN'S CAFE RESTAURANT 917_688_16 02/01/2019 395250 41541587 SEATTLE COFFEE ROASTERS, PIER 79/ NY WATERWAY 917536391_ 07/01/2016 395445 50056461 SAINT ANN'S CAFE RESTAURANT 917_688_16 08/09/2019 395493 50033071 GABY RESTAURANT __________ 07/27/2017 [472 rows x 4 columns] Showing 5 of the 26 columns: CAMIS DBA BORO CUISINE DESCRIPTION INSPECTION DATE 50088180 HAZAR TURKISH KEBAB Brooklyn Turkish 03/26/2019 41654607 BROOKLYN CRAB Brooklyn Seafood 09/14/2017 50058012 HARMONI Manhattan Spanish 09/25/2018 41175577 MARIO'S PIZZERIA Brooklyn Pizza 02/04/2019 50000060 ATLAS STEAKHOUSE Brooklyn Russian 10/23/2019 50063691 THAI TERMINAL Manhattan Thai 05/08/2018 41701903 SHINJUNG GALBI BBQ KOREAN RESTAURANT Queens Korean 02/08/2018 50057790 BAGELS AND SCHMEAR Manhattan Bagels/Pretzels 12/26/2018 50069955 KOTO Brooklyn Japanese 10/11/2017 41457400 VINTRY Manhattan American 11/26/2018 ... ... ... ... ... 40843676 PAZZA NOTTE Manhattan Italian 04/09/2018 41697654 HOT JALAPENO RESTAURANT Manhattan Mexican 10/30/2019 50076162 CHINA GARDEN & WING STOP Queens Chinese 09/23/2019 50016943 EL NUEVO ROBLE BILLIARDS Bronx Spanish 08/21/2017 40513217 B66 CLUB Brooklyn Russian 06/17/2016 41433964 WILD GINGER VEGETARIAN KITCHEN Manhattan Vegetarian 09/07/2016 50074107 ZAIQA RESTAURANT Brooklyn Indian 05/01/2019 50073262 BROADWAY PIZZA Brooklyn Pizza 07/25/2019 41335337 LOBSTER BOX Bronx Seafood 04/27/2019 41313395 FIVE NAPKIN BURGER Manhattan American 10/25/2017 [395617 rows x 5 columns]
""" Print the number of restaurant inspections in each borough. """ import sys import pandas as pd import restaurantinspection try: df = restaurantinspection.getDf(fromWeb = False, verbose = False) except BaseException as error: print(error, file = sys.stderr) sys.exit(1) print("Number of restaurant inspections in each borough:") print() seriesOfInts = df.groupby("BORO").size() print(seriesOfInts) print() print(seriesOfInts.to_string(dtype = False, header = False)) print() #Convert the seriesOfInts to a DataFrame so we can specify a formatter. dataFrameOfInts = pd.DataFrame(seriesOfInts) #DataFrame with only one column. s = dataFrameOfInts.to_string( formatters = [lambda i: f"{i:,}"], #Print each integer with commas. header = False, #The name of the one column of this DataFrame is 0. Don't print it. index_names = False #The name of the index of this DataFrame is BORO. Don't print it. ) print(s) sys.exit(0)
Number of restaurant inspections in each borough: BORO Bronx 35948 Brooklyn 100748 Manhattan 155391 Queens 90362 Staten Island 13168 dtype: int64 Bronx 35948 Brooklyn 100748 Manhattan 155391 Queens 90362 Staten Island 13168 Bronx 35,948 Brooklyn 100,748 Manhattan 155,391 Queens 90,362 Staten Island 13,168
""" Print the number of restaurants in each borough. """ import sys import pandas as pd import restaurantinspection df = restaurantinspection.getDf(fromWeb = False, verbose = False) #Index is CAMIS number of each restaurant in increasing order, value is restaurant's boro. seriesOfBoros = df.groupby("CAMIS")["BORO"].first() print("Number of restaurants in each borough") print() #Index is boros in alphabetical order, value is number of restaurants in boro. seriesOfNumbers = seriesOfBoros.groupby(seriesOfBoros).size() #seriesOfNumbers.sort_values(inplace = True) #increasing size order print(seriesOfNumbers) print() #Index is boros, value is number of restaurants in boro in decreasing order. seriesOfNumbers = seriesOfBoros.value_counts() #seriesOfNumbers.sort_index(inplace = True) #alphabetical order print(seriesOfNumbers) sys.exit(0)
Number of restaurants in each borough BORO Bronx 2395 Brooklyn 6775 Manhattan 10761 Queens 6140 Staten Island 973 Name: BORO, dtype: int64 Manhattan 10761 Brooklyn 6775 Queens 6140 Bronx 2395 Staten Island 973 Name: BORO, dtype: int64
""" Print the number of mouse violations in each month. """ import sys import pandas as pd import restaurantinspection #Add three columns to the DataFrame. df = restaurantinspection.getDf(fromWeb = False, verbose = False) df["timestamp"] = pd.to_datetime(df["INSPECTION DATE"], format = "%m/%d/%Y") df["monthname"] = df["timestamp"].apply(lambda timestamp: timestamp.month_name()) df["monthnumber"] = df["timestamp"].apply(lambda timestamp: timestamp.month) s = "Evidence of mice or live mice present in facility's food and/or non-food areas." mask = (df["VIOLATION DESCRIPTION"].str != "") & (df["VIOLATION DESCRIPTION"].str.contains(s)) dataFrameOfViolations = df[mask] print(f"{len(dataFrameOfViolations):,} of the {len(df):,} inspections revealed mouse violations.") print() print("Mouse violations per month:") print() #Indexed by month number and month name, in order of ascending month number. seriesOfNumbers = dataFrameOfViolations.groupby(["monthnumber", "monthname"]).size() print(seriesOfNumbers) print() #Drop the month numbers. They were there only to get the month names in chronological order. seriesOfNumbers.reset_index(level = 0, drop = True, inplace = True) print(seriesOfNumbers) sys.exit(0)
29,090 of the 395,617 inspections revealed mouse violations. Mouse violations per month: monthnumber monthname 1 January 2425 2 February 2380 3 March 2980 4 April 2626 5 May 2610 6 June 2453 7 July 2032 8 August 2303 9 September 2425 10 October 2820 11 November 2053 12 December 1983 dtype: int64 monthname January 2425 February 2380 March 2980 April 2626 May 2610 June 2453 July 2032 August 2303 September 2425 October 2820 November 2053 December 1983 dtype: int64
""" Print the number of restaurants of each cuisine. series is indexed by cuisines in alphabetical order. """ import sys import pandas as pd import restaurantinspection df = restaurantinspection.getDf(fromWeb = False, verbose = False) series = df.groupby("CUISINE DESCRIPTION").apply(lambda df: len(df["CAMIS"].unique())) #could also say #series = df.groupby("CUISINE DESCRIPTION").apply(lambda df: len(df.groupby("CAMIS"))) #series = df.groupby("CUISINE DESCRIPTION").apply(lambda df: len(df["CAMIS"].value_counts())) print("Number of restaurants:") print() s = series.to_string(max_rows = len(series)) print(s) print() s = series.sort_values(ascending = False).to_string(max_rows = len(series)) print(s) sys.exit(0)
Number of restaurants: CUISINE DESCRIPTION Afghan 14 African 74 American 5735 Armenian 25 Asian 402 Australian 21 Bagels/Pretzels 153 Bakery 748 Bangladeshi 43 Barbecue 54 Basque 1 Bottled beverages, including water, sodas, juices, etc. 109 Brazilian 29 Café/Coffee/Tea 1798 Cajun 6 Californian 4 Caribbean 706 Chicken 505 Chilean 1 Chinese 2380 Chinese/Cuban 21 Chinese/Japanese 44 Continental 48 Creole 22 Creole/Cajun 6 Czech 4 Delicatessen 298 Donuts 522 Eastern European 69 Egyptian 12 English 12 Ethiopian 17 Filipino 31 French 310 Fruits/Vegetables 6 German 28 Greek 138 Hamburgers 363 Hawaiian 48 Hotdogs 28 Hotdogs/Pretzels 31 Ice Cream, Gelato, Yogurt, Ices 346 Indian 320 Indonesian 9 Iranian 4 Irish 191 Italian 964 Japanese 868 Jewish/Kosher 335 Juice, Smoothies, Fruit Salads 423 Korean 308 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 828 Mediterranean 281 Mexican 963 Middle Eastern 192 Moroccan 9 Not Listed/Not Applicable 9 Nuts/Confectionary 6 Other 1739 Pakistani 30 Pancakes/Waffles 15 Peruvian 83 Pizza 1199 Pizza/Italian 476 Polish 25 Portuguese 11 Russian 68 Salads 78 Sandwiches 358 Sandwiches/Salads/Mixed Buffet 234 Scandinavian 8 Seafood 195 Soul Food 66 Soups 8 Soups & Sandwiches 43 Southwestern 3 Spanish 630 Steak 79 Tapas 41 Tex-Mex 126 Thai 307 Turkish 71 Vegetarian 130 Vietnamese/Cambodian/Malaysia 99 CUISINE DESCRIPTION American 5735 Chinese 2380 Café/Coffee/Tea 1798 Other 1739 Pizza 1199 Italian 964 Mexican 963 Japanese 868 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 828 Bakery 748 Caribbean 706 Spanish 630 Donuts 522 Chicken 505 Pizza/Italian 476 Juice, Smoothies, Fruit Salads 423 Asian 402 Hamburgers 363 Sandwiches 358 Ice Cream, Gelato, Yogurt, Ices 346 Jewish/Kosher 335 Indian 320 French 310 Korean 308 Thai 307 Delicatessen 298 Mediterranean 281 Sandwiches/Salads/Mixed Buffet 234 Seafood 195 Middle Eastern 192 Irish 191 Bagels/Pretzels 153 Greek 138 Vegetarian 130 Tex-Mex 126 Bottled beverages, including water, sodas, juices, etc. 109 Vietnamese/Cambodian/Malaysia 99 Peruvian 83 Steak 79 Salads 78 African 74 Turkish 71 Eastern European 69 Russian 68 Soul Food 66 Barbecue 54 Continental 48 Hawaiian 48 Chinese/Japanese 44 Soups & Sandwiches 43 Bangladeshi 43 Tapas 41 Filipino 31 Hotdogs/Pretzels 31 Pakistani 30 Brazilian 29 German 28 Hotdogs 28 Polish 25 Armenian 25 Creole 22 Australian 21 Chinese/Cuban 21 Ethiopian 17 Pancakes/Waffles 15 Afghan 14 English 12 Egyptian 12 Portuguese 11 Not Listed/Not Applicable 9 Indonesian 9 Moroccan 9 Soups 8 Scandinavian 8 Creole/Cajun 6 Cajun 6 Fruits/Vegetables 6 Nuts/Confectionary 6 Czech 4 Californian 4 Iranian 4 Southwestern 3 Basque 1 Chilean 1
""" Print the number of restaurants of each cuisine. series is indexed by cuisines in alphabetical order. """ import sys import pandas as pd import restaurantinspection df = restaurantinspection.getDf(fromWeb = False, verbose = False) df = (df .groupby("CAMIS") .first() .groupby(["CUISINE DESCRIPTION", "BORO"])["BORO"] .count() .unstack() .fillna(0) .astype(int) ) pd.set_option("display.width", None) #prevent column from beoing elided pd.set_option("display.max_colwidth", 64) #prevent index from being truncated s = df.to_string(max_rows = len(df), show_dimensions = True) print(s) sys.exit(0)
BORO Bronx Brooklyn Manhattan Queens Staten Island CUISINE DESCRIPTION Afghan 0 2 4 8 0 African 26 23 20 4 1 American 404 1253 2882 977 219 Armenian 1 7 13 4 0 Asian 6 95 171 115 15 Australian 0 4 16 1 0 Bagels/Pretzels 7 37 60 39 10 Bakery 75 188 240 223 22 Bangladeshi 8 9 6 20 0 Barbecue 3 16 24 6 5 Basque 0 0 1 0 0 Bottled beverages, including water, sodas, juices, etc. 9 23 38 35 4 Brazilian 0 5 13 11 0 Café/Coffee/Tea 62 455 978 275 28 Cajun 0 4 0 1 1 Californian 0 1 3 0 0 Caribbean 119 339 38 205 5 Chicken 113 156 111 108 17 Chilean 0 0 0 1 0 Chinese 297 726 507 774 76 Chinese/Cuban 3 3 11 4 0 Chinese/Japanese 0 13 19 10 2 Continental 0 10 26 11 1 Creole 0 15 2 5 0 Creole/Cajun 0 4 0 2 0 Czech 1 1 0 2 0 Delicatessen 27 52 112 94 13 Donuts 66 122 143 156 35 Eastern European 3 33 17 13 3 Egyptian 0 4 5 3 0 English 0 1 10 1 0 Ethiopian 0 4 13 0 0 Filipino 0 1 8 20 2 French 2 51 242 15 0 Fruits/Vegetables 0 1 3 1 1 German 0 6 12 7 3 Greek 6 18 56 52 6 Hamburgers 63 86 90 102 22 Hawaiian 1 8 33 6 0 Hotdogs 1 7 18 2 0 Hotdogs/Pretzels 3 3 17 5 3 Ice Cream, Gelato, Yogurt, Ices 23 102 115 74 32 Indian 7 60 138 106 9 Indonesian 0 2 3 4 0 Iranian 0 0 2 2 0 Irish 10 12 115 52 2 Italian 42 169 570 113 70 Japanese 16 187 471 161 33 Jewish/Kosher 8 186 76 63 2 Juice, Smoothies, Fruit Salads 42 108 174 88 11 Korean 0 26 109 173 0 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 175 139 188 315 11 Mediterranean 5 68 149 50 9 Mexican 123 288 326 186 40 Middle Eastern 4 86 58 38 6 Moroccan 0 0 8 1 0 Not Listed/Not Applicable 1 1 4 2 1 Nuts/Confectionary 0 4 2 0 0 Other 104 449 760 371 55 Pakistani 1 13 7 9 0 Pancakes/Waffles 2 3 3 5 2 Peruvian 3 13 15 49 3 Pizza 203 307 355 277 57 Pizza/Italian 59 130 127 105 55 Polish 0 17 2 3 3 Portuguese 0 2 4 5 0 Russian 0 51 8 6 3 Salads 0 8 65 5 0 Sandwiches 45 63 155 72 23 Sandwiches/Salads/Mixed Buffet 25 32 113 57 7 Scandinavian 0 3 5 0 0 Seafood 26 43 93 29 4 Soul Food 10 26 14 14 2 Soups 0 1 7 0 0 Soups & Sandwiches 2 7 30 3 1 Southwestern 0 0 1 2 0 Spanish 131 143 124 211 21 Steak 5 8 54 11 1 Tapas 0 10 22 8 1 Tex-Mex 10 32 50 28 6 Thai 5 80 139 80 3 Turkish 0 30 28 10 3 Vegetarian 1 47 69 13 0 Vietnamese/Cambodian/Malaysia 1 33 41 21 3 [84 rows x 5 columns]
""" For each cuisine, print the fraction of inspections that report mice. """ import sys import pandas as pd import restaurantinspection df = restaurantinspection.getDf(fromWeb = False, verbose = False) s = "Evidence of mice or live mice present in facility's food and/or non-food areas." #Return the fraction of inspections that report mice. def f(df): seriesOfBools = df["VIOLATION DESCRIPTION"].str.contains(s).astype(bool) return sum(seriesOfBools) / len(df) series = df.groupby("CUISINE DESCRIPTION").apply(f) #alphabetical order print(series.to_string( dtype = True, length = True, max_rows = len(series), name = True )) print() series.sort_values(inplace = True) #increasing numeric order print(series.to_string( dtype = True, length = True, max_rows = len(series), name = True )) sys.exit(0)
CUISINE DESCRIPTION Afghan 0.120773 African 0.118414 American 0.085564 Armenian 0.070000 Asian 0.086534 Australian 0.097744 Bagels/Pretzels 0.098411 Bakery 0.106604 Bangladeshi 0.090909 Barbecue 0.067183 Basque 0.000000 Bottled beverages, including water, sodas, juices, etc. 0.145559 Brazilian 0.066955 Café/Coffee/Tea 0.081409 Cajun 0.058140 Californian 0.055556 Caribbean 0.128698 Chicken 0.095354 Chilean 0.172414 Chinese 0.104725 Chinese/Cuban 0.088993 Chinese/Japanese 0.081049 Continental 0.057725 Creole 0.136531 Creole/Cajun 0.107143 Czech 0.146341 Delicatessen 0.089939 Donuts 0.054654 Eastern European 0.122524 Egyptian 0.136126 English 0.075117 Ethiopian 0.105485 Filipino 0.128319 French 0.091407 Fruits/Vegetables 0.038462 German 0.085586 Greek 0.079593 Hamburgers 0.060984 Hawaiian 0.084746 Hotdogs 0.109290 Hotdogs/Pretzels 0.121951 Ice Cream, Gelato, Yogurt, Ices 0.122258 Indian 0.094237 Indonesian 0.089286 Iranian 0.193548 Irish 0.102481 Italian 0.089796 Japanese 0.095803 Jewish/Kosher 0.100401 Juice, Smoothies, Fruit Salads 0.079560 Korean 0.074265 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 0.090856 Mediterranean 0.097950 Mexican 0.083830 Middle Eastern 0.109768 Moroccan 0.077844 Not Listed/Not Applicable 0.028571 Nuts/Confectionary 0.088235 Other 0.603555 Pakistani 0.121951 Pancakes/Waffles 0.116592 Peruvian 0.100948 Pizza 0.105466 Pizza/Italian 0.100672 Polish 0.137615 Portuguese 0.101796 Russian 0.106420 Salads 0.049844 Sandwiches 0.083353 Sandwiches/Salads/Mixed Buffet 0.076412 Scandinavian 0.142857 Seafood 0.079058 Soul Food 0.112989 Soups 0.125000 Soups & Sandwiches 0.066547 Southwestern 0.025316 Spanish 0.090044 Steak 0.084507 Tapas 0.132039 Tex-Mex 0.105292 Thai 0.091229 Turkish 0.090105 Vegetarian 0.105071 Vietnamese/Cambodian/Malaysia 0.095848 Length: 84, dtype: float64 CUISINE DESCRIPTION Basque 0.000000 Southwestern 0.025316 Not Listed/Not Applicable 0.028571 Fruits/Vegetables 0.038462 Salads 0.049844 Donuts 0.054654 Californian 0.055556 Continental 0.057725 Cajun 0.058140 Hamburgers 0.060984 Soups & Sandwiches 0.066547 Brazilian 0.066955 Barbecue 0.067183 Armenian 0.070000 Korean 0.074265 English 0.075117 Sandwiches/Salads/Mixed Buffet 0.076412 Moroccan 0.077844 Seafood 0.079058 Juice, Smoothies, Fruit Salads 0.079560 Greek 0.079593 Chinese/Japanese 0.081049 Café/Coffee/Tea 0.081409 Sandwiches 0.083353 Mexican 0.083830 Steak 0.084507 Hawaiian 0.084746 American 0.085564 German 0.085586 Asian 0.086534 Nuts/Confectionary 0.088235 Chinese/Cuban 0.088993 Indonesian 0.089286 Italian 0.089796 Delicatessen 0.089939 Spanish 0.090044 Turkish 0.090105 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 0.090856 Bangladeshi 0.090909 Thai 0.091229 French 0.091407 Indian 0.094237 Chicken 0.095354 Japanese 0.095803 Vietnamese/Cambodian/Malaysia 0.095848 Australian 0.097744 Mediterranean 0.097950 Bagels/Pretzels 0.098411 Jewish/Kosher 0.100401 Pizza/Italian 0.100672 Peruvian 0.100948 Portuguese 0.101796 Irish 0.102481 Chinese 0.104725 Vegetarian 0.105071 Tex-Mex 0.105292 Pizza 0.105466 Ethiopian 0.105485 Russian 0.106420 Bakery 0.106604 Creole/Cajun 0.107143 Hotdogs 0.109290 Middle Eastern 0.109768 Soul Food 0.112989 Pancakes/Waffles 0.116592 African 0.118414 Afghan 0.120773 Pakistani 0.121951 Hotdogs/Pretzels 0.121951 Ice Cream, Gelato, Yogurt, Ices 0.122258 Eastern European 0.122524 Soups 0.125000 Filipino 0.128319 Caribbean 0.128698 Tapas 0.132039 Egyptian 0.136126 Creole 0.136531 Polish 0.137615 Scandinavian 0.142857 Bottled beverages, including water, sodas, juices, etc. 0.145559 Czech 0.146341 Chilean 0.172414 Iranian 0.193548 Other 0.603555 Length: 84, dtype: float64
""" For each cuisine, print the fraction of restaurants that have had mice at one time or another. """ import sys import pandas as pd import restaurantinspection df = restaurantinspection.getDf(fromWeb = False, verbose = False) s = "Evidence of mice or live mice present in facility's food and/or non-food areas." #Called once for each CAMIS. #df is a DataFrame of all the inspections belonging to the CAMIS. #Return True if any inspection revealed mice. def hadMice(df): return df["VIOLATION DESCRIPTION"].str.contains(s).any() #Called once for each cusine. #df is a DataFrame of all the inspections belonging to the cusine. def get3Numbers(df): seriesOfBools = df.groupby("CAMIS").apply(hadMice) nr = len(seriesOfBools) #number of restaurants in this cuisine nm = seriesOfBools.sum() #number of restaurants in this cuisine with mice return pd.Series([nr, nm, nm / nr]) dataFrameOfNumbers = df.groupby("CUISINE DESCRIPTION").apply(get3Numbers) #alphabetical order dataFrameOfNumbers.columns = [ "n", #number of restaurants in this cuisine "mice", #number of restaurants in this cuisine with mice "fraction" ] dataFrameOfNumbers["n"] = dataFrameOfNumbers["n"].astype(int) dataFrameOfNumbers["mice"] = dataFrameOfNumbers["mice"].astype(int) print("Fraction of restaurants that have had mice at one time or another:") print() pd.options.display.max_colwidth = 64 st = dataFrameOfNumbers.to_string(max_rows = len(dataFrameOfNumbers)) print(st) print() dataFrameOfNumbers.sort_values(by = "fraction", inplace = True) st = dataFrameOfNumbers.to_string(max_rows = len(dataFrameOfNumbers)) print(st) sys.exit(0)
Fraction of restaurants that have had mice at one time or another: n mice fraction CUISINE DESCRIPTION Afghan 14 10 0.714286 African 74 55 0.743243 American 5735 2395 0.417611 Armenian 25 7 0.280000 Asian 402 185 0.460199 Australian 21 13 0.619048 Bagels/Pretzels 153 84 0.549020 Bakery 748 392 0.524064 Bangladeshi 43 27 0.627907 Barbecue 54 15 0.277778 Basque 1 0 0.000000 Bottled beverages, including water, sodas, juices, etc. 109 45 0.412844 Brazilian 29 13 0.448276 Café/Coffee/Tea 1798 552 0.307008 Cajun 6 4 0.666667 Californian 4 2 0.500000 Caribbean 706 473 0.669972 Chicken 505 236 0.467327 Chilean 1 1 1.000000 Chinese 2380 1356 0.569748 Chinese/Cuban 21 13 0.619048 Chinese/Japanese 44 23 0.522727 Continental 48 13 0.270833 Creole 22 15 0.681818 Creole/Cajun 6 4 0.666667 Czech 4 4 1.000000 Delicatessen 298 173 0.580537 Donuts 522 129 0.247126 Eastern European 69 47 0.681159 Egyptian 12 5 0.416667 English 12 5 0.416667 Ethiopian 17 9 0.529412 Filipino 31 22 0.709677 French 310 142 0.458065 Fruits/Vegetables 6 0 0.000000 German 28 14 0.500000 Greek 138 67 0.485507 Hamburgers 363 109 0.300275 Hawaiian 48 12 0.250000 Hotdogs 28 6 0.214286 Hotdogs/Pretzels 31 8 0.258065 Ice Cream, Gelato, Yogurt, Ices 346 128 0.369942 Indian 320 204 0.637500 Indonesian 9 7 0.777778 Iranian 4 3 0.750000 Irish 191 114 0.596859 Italian 964 490 0.508299 Japanese 868 417 0.480415 Jewish/Kosher 335 189 0.564179 Juice, Smoothies, Fruit Salads 423 134 0.316785 Korean 308 129 0.418831 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 828 476 0.574879 Mediterranean 281 131 0.466192 Mexican 963 463 0.480789 Middle Eastern 192 101 0.526042 Moroccan 9 4 0.444444 Not Listed/Not Applicable 9 2 0.222222 Nuts/Confectionary 6 1 0.166667 Other 1739 43 0.024727 Pakistani 30 18 0.600000 Pancakes/Waffles 15 10 0.666667 Peruvian 83 54 0.650602 Pizza 1199 632 0.527106 Pizza/Italian 476 269 0.565126 Polish 25 15 0.600000 Portuguese 11 6 0.545455 Russian 68 37 0.544118 Salads 78 22 0.282051 Sandwiches 358 120 0.335196 Sandwiches/Salads/Mixed Buffet 234 77 0.329060 Scandinavian 8 5 0.625000 Seafood 195 85 0.435897 Soul Food 66 40 0.606061 Soups 8 2 0.250000 Soups & Sandwiches 43 12 0.279070 Southwestern 3 1 0.333333 Spanish 630 349 0.553968 Steak 79 39 0.493671 Tapas 41 18 0.439024 Tex-Mex 126 57 0.452381 Thai 307 167 0.543974 Turkish 71 35 0.492958 Vegetarian 130 70 0.538462 Vietnamese/Cambodian/Malaysia 99 55 0.555556 n mice fraction CUISINE DESCRIPTION Basque 1 0 0.000000 Fruits/Vegetables 6 0 0.000000 Other 1739 43 0.024727 Nuts/Confectionary 6 1 0.166667 Hotdogs 28 6 0.214286 Not Listed/Not Applicable 9 2 0.222222 Donuts 522 129 0.247126 Hawaiian 48 12 0.250000 Soups 8 2 0.250000 Hotdogs/Pretzels 31 8 0.258065 Continental 48 13 0.270833 Barbecue 54 15 0.277778 Soups & Sandwiches 43 12 0.279070 Armenian 25 7 0.280000 Salads 78 22 0.282051 Hamburgers 363 109 0.300275 Café/Coffee/Tea 1798 552 0.307008 Juice, Smoothies, Fruit Salads 423 134 0.316785 Sandwiches/Salads/Mixed Buffet 234 77 0.329060 Southwestern 3 1 0.333333 Sandwiches 358 120 0.335196 Ice Cream, Gelato, Yogurt, Ices 346 128 0.369942 Bottled beverages, including water, sodas, juices, etc. 109 45 0.412844 Egyptian 12 5 0.416667 English 12 5 0.416667 American 5735 2395 0.417611 Korean 308 129 0.418831 Seafood 195 85 0.435897 Tapas 41 18 0.439024 Moroccan 9 4 0.444444 Brazilian 29 13 0.448276 Tex-Mex 126 57 0.452381 French 310 142 0.458065 Asian 402 185 0.460199 Mediterranean 281 131 0.466192 Chicken 505 236 0.467327 Japanese 868 417 0.480415 Mexican 963 463 0.480789 Greek 138 67 0.485507 Turkish 71 35 0.492958 Steak 79 39 0.493671 German 28 14 0.500000 Californian 4 2 0.500000 Italian 964 490 0.508299 Chinese/Japanese 44 23 0.522727 Bakery 748 392 0.524064 Middle Eastern 192 101 0.526042 Pizza 1199 632 0.527106 Ethiopian 17 9 0.529412 Vegetarian 130 70 0.538462 Thai 307 167 0.543974 Russian 68 37 0.544118 Portuguese 11 6 0.545455 Bagels/Pretzels 153 84 0.549020 Spanish 630 349 0.553968 Vietnamese/Cambodian/Malaysia 99 55 0.555556 Jewish/Kosher 335 189 0.564179 Pizza/Italian 476 269 0.565126 Chinese 2380 1356 0.569748 Latin (Cuban, Dominican, Puerto Rican, South & Central American) 828 476 0.574879 Delicatessen 298 173 0.580537 Irish 191 114 0.596859 Polish 25 15 0.600000 Pakistani 30 18 0.600000 Soul Food 66 40 0.606061 Australian 21 13 0.619048 Chinese/Cuban 21 13 0.619048 Scandinavian 8 5 0.625000 Bangladeshi 43 27 0.627907 Indian 320 204 0.637500 Peruvian 83 54 0.650602 Cajun 6 4 0.666667 Pancakes/Waffles 15 10 0.666667 Creole/Cajun 6 4 0.666667 Caribbean 706 473 0.669972 Eastern European 69 47 0.681159 Creole 22 15 0.681818 Filipino 31 22 0.709677 Afghan 14 10 0.714286 African 74 55 0.743243 Iranian 4 3 0.750000 Indonesian 9 7 0.777778 Chilean 1 1 1.000000 Czech 4 4 1.000000