Point your browser at the New York City
datasets page.
Click on the magnifying glass with the plus along the left edge.
In the New York City Restaurant Inspection Results window that pops up,
click
here
to view the dataset.
In the upper right corner,
click on the blue Export button and select Download as CSV
(comma-separated
values).
Move the resulting file
DOHMH_New_York_City_Restaurant_Inspection_Results.csv
from your Downloads folder to your Desktop.
Two restaurants might have the same name (DBA, “Doing Business As”), but each restaurant has its own CAMIS number (“Commercial Activities Management Information System”). Each restaurant has one record for each of its inspections. The building column is text because some values have characters that are not digits.
The
.import
command is confused by the commas in the
cuisine_description
and
violation_description
fields, even though these fields have
"
double quotes"
around them when they contain commas.
To avoid the problem,
perl
changes each of these commas into a
|
,
and
sqlite3
changes them back to commas.
The
||
operator is
concatenation.
I ran the following commands in the Terminal application on my Mac OS X 10.10.4. The downloaded CSV file (minus lowercase L for “long”):
ls -l DOHMH_New_York_City_Restaurant_Inspection_Results.csv -rw-r--r-- 1 myname mygroup 188445195 Aug 15 21:27 DOHMH_New_York_City_Restaurant_Inspection_Results.csv
Number of lines in the file (minus lowercase L for “lines”):
wc -l DOHMH_New_York_City_Restaurant_Inspection_Results.csv 493373 DOHMH_New_York_City_Restaurant_Inspection_Results.csv
First three lines in the file:
head -3 DOHMH_New_York_City_Restaurant_Inspection_Results.csv CAMIS,DBA,BORO,BUILDING,STREET,ZIPCODE,PHONE,CUISINE DESCRIPTION,INSPECTION DATE,ACTION,VIOLATION CODE,VIOLATION DESCRIPTION,CRITICAL FLAG,SCORE,GRADE,GRADE DATE,RECORD DATE,INSPECTION TYPE 30075445,MORRIS PARK BAKE SHOP,BRONX,1007,MORRIS PARK AVE,10462,7188924968,Bakery,02/09/2015,Violations were cited in the following area(s).,06C,"Food not protected from potential source of contamination during storage, preparation, transportation, display or service.",Critical,6,A,02/09/2015,06/23/2015,Cycle Inspection / Initial Inspection 30075445,MORRIS PARK BAKE SHOP,BRONX,1007,MORRIS PARK AVE,10462,7188924968,Bakery,03/03/2014,Violations were cited in the following area(s).,10F,"Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed, raised, spaced or movable to allow accessibility for cleaning on all sides, above and underneath the unit.",Not Critical,2,A,03/03/2014,06/23/2015,Cycle Inspection / Initial Inspection
The file
temp.csv
that will be created by
perl
has no title line,
has an extra column on the left,
and has a
|
in place of each comma that is not a field delimiter.
ls -l temp.csv -rw-r--r-- 1 myname mygroup 192100190 Jun 24 14:56 temp.csv wc -l temp.csv 492560 temp.csv head -3 temp.csv 1,30075445,MORRIS PARK BAKE SHOP,BRONX,1007,MORRIS PARK AVE,10462,7188924968,Bakery,02/09/2015,Violations were cited in the following area(s).,06C,"Food not protected from potential source of contamination during storage| preparation| transportation| display or service.",Critical,6,A,02/09/2015,06/23/2015,Cycle Inspection / Initial Inspection 2,30075445,MORRIS PARK BAKE SHOP,BRONX,1007,MORRIS PARK AVE,10462,7188924968,Bakery,03/03/2014,Violations were cited in the following area(s).,10F,"Non-food contact surface improperly constructed. Unacceptable material used. Non-food contact surface or equipment improperly maintained and/or not properly sealed| raised| spaced or movable to allow accessibility for cleaning on all sides| above and underneath the unit.",Not Critical,2,A,03/03/2014,06/23/2015,Cycle Inspection / Initial Inspection 3,30075445,MORRIS PARK BAKE SHOP,BRONX,1007,MORRIS PARK AVE,10462,7188924968,Bakery,10/10/2013,No violations were recorded at the time of this inspection.,,,Not Applicable,,,,06/23/2015,Trans Fat / Second Compliance Inspection
We installed a version of
sqlite3
when we installed Android Studio.
which sqlite3 /Users/myname/Library/Android/sdk/platform-tools/sqlite3 sqlite3 -version 3.8.6 2014-08-15 11:46:33 9491ba7d738528f168657adb43a198238abde19e whereis sqlite3 /usr/bin/sqlite3
I ran the following
bash
(Bourne Again) shellscript,
named
makedb
,
to create a
.db
file containing a table named
latest_table
that contains only the most recent inspection for each restaurant.
It creates the files
temp.csv
and
restaurants.db
.
Then it removes
temp.csv
.
#!/bin/bash cd /Users/myname/restaurant curl \ 'https://data.cityofnewyork.us/api/views/xx67-kt59/rows.csv?accessType=DOWNLOAD' \ > DOHMH_New_York_City_Restaurant_Inspection_Results.csv #List the file. -l stands for "long". ls -l DOHMH_New_York_City_Restaurant_Inspection_Results.csv #Print the number of lines in the file. -l stands for "lines". wc -l DOHMH_New_York_City_Restaurant_Inspection_Results.csv #Create the file temp.csv, which is a copy of #DOHMH_New_York_City_Restaurant_Inspection_Results.csv #with three changes: # 1. Remove the first line. # 2. Add a new column of numbers on the left: 1, 2, 3, etc. # 3. In any field surrounded by "double quotes", change each comma to | # because the .import statement would be confused by the commas. perl -ne ' next if $. == 1; print $. - 1, ","; chomp; while ($_ ne "") { #If the line begins with a double-quoted string #that contains no double quotes, chop it off and store it in $&. if (s/^"[^"]*"//) { $field = $&; $field =~ tr/,/|/; } else { s/^[^,]*//; $field = $&; } print $field; #If the line begins with a comma, #chop it off and store it in $&. if (s/^,//) { print $&; } } print "\n"; ' DOHMH_New_York_City_Restaurant_Inspection_Results.csv > temp.csv ls -l temp.csv wc -l temp.csv #Remove the old restaurants.db file (if it exists) and create a new one. rm -f restaurants.db sqlite3 restaurants.db << EOF create table android_metadata ( "locale" text default "en_US" ); insert into android_metadata (locale) values ("en_US"); create table restaurant_table ( _id integer primary key autoincrement, camis integer, dba text, boro text, building text, street text, zipcode integer, phone text, cuisine_description text, inspection_date text, action text, violation_code text, violation_description text, critical_flag text, score integer, grade text, grade_date text, record_date text, inspection_type text ); .mode csv .import temp.csv restaurant_table select "Table restaurant_table contains " || count(*) || " records." from restaurant_table; --Change the vertical bars back to commas now that the .import has been done. update restaurant_table set violation_description = replace(violation_description, "|", ","); --Change the format of each date from 12/31/2015 to 2015-12-31. --Now an alphabetical comparison of two dates will tell us which one is earlier. update restaurant_table set inspection_date = substr(inspection_date, 7, 4) || "-" || substr(inspection_date, 1, 2) || "-" || substr(inspection_date, 4, 2) where length(inspection_date) = 10; update restaurant_table set grade_date = substr(grade_date, 7, 4) || "-" || substr(grade_date, 1, 2) || "-" || substr(grade_date, 4, 2) where length(grade_date) = 10; update restaurant_table set record_date = substr(record_date, 7, 4) || "-" || substr(record_date, 1, 2) || "-" || substr(record_date, 4, 2) where length(record_date) = 10; --Create a new table containing only the most recent inspection record --for each restaurant. create table latest_table ( _id integer primary key autoincrement, camis integer, dba text, boro text, building text, street text, zipcode integer, phone text, cuisine_description text, latest_inspection_date text, action text, violation_code text, violation_description text, critical_flag text, score integer, grade text, grade_date text, record_date text, inspection_type text ); insert into latest_table ( _id, camis, dba, boro, building, street, zipcode, phone, cuisine_description, latest_inspection_date, action, violation_code, violation_description, critical_flag, score, grade, grade_date, record_date, inspection_type ) select * from (select _id, camis, dba, boro, building, street, zipcode, phone, cuisine_description, max(inspection_date) as latest_inspection_date, action, violation_code, violation_description, critical_flag, score, grade, grade_date, record_date, inspection_type from restaurant_table group by camis); select "Table latest_table contains " || count(*) || " records." from latest_table; drop table restaurant_table; vacuum; --The following select statement prints too many fields to fit on one line, --so we'll print each field on a separate line. .mode line --dba = "WO HOP 17", 17 Mott Street, Chinatown select * from latest_table where camis = 41320866; .quit EOF rm temp.csv ls -l restaurants.db exit 0
The output of the shellscript is
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 179M 100 179M 0 0 603k 0 0:05:04 0:05:04 --:--:-- 2068k -rw-r--r-- 1 myname mygroup 188218168 Aug 14 12:58 DOHMH_New_York_City_Restaurant_Inspection_Results.csv 492715 DOHMH_New_York_City_Restaurant_Inspection_Results.csv -rw-r--r-- 1 myname mygroup 191555871 Aug 14 12:58 temp.csv 492714 temp.csv "Table restaurant_table contains 492714 records." "Table latest_table contains 25356 records." _id = 192839 camis = 41320866 dba = WO HOP 17 boro = MANHATTAN building = 17 street = MOTT STREET zipcode = 10013 phone = 2122672536 cuisine_description = Chinese latest_inspection_date = 2014-09-04 action = Violations were cited in the following area(s). violation_code = 04L violation_description = Evidence of mice or live mice present in facility's food and/or non-food areas. critical_flag = Critical score = 10 grade = A grade_date = 2014-09-04 record_date = 2015-08-13 inspection_type = Cycle Inspection / Initial Inspection -rw------- 1 myname mygroup 11620352 Aug 14 12:59 restaurants.db