SQL is the Structured Query Language for reading and writing database files.
The Python version of SQL is
SQLite,
convenient to use
because you don’t have to set up a separate server somewhere else.
Before we use SQLite in Python,
however,
we will play with it in by running the program
sqlite3
in the macOS Terminal window,
the Linux shell window,
and the Microsoft Command Prompt window.
You probably already have
sqlite3
on your Mac.
which sqlite3 /usr/bin/sqlite3 sqlite3 -help sqlite3 -version 3.16.0 2016-11-04 19:09:39 0e5ffd9123d6d2d2b8f3701e8a73cc98a3a7ff5f
To get the latest version of SQLite, go
here
and download the
.zip
file for
“Precompiled Binaries for Mac OS X (x86)”.
Double-click on the downloaded
.zip
file to create a new folder with the same name as the
.zip
file but without the
.zip
suffix.
This folder contains the three programs
sqldiff
,
sqlite3
,
and
sqlite_analyzer
.
Move
these three programs to your
/usr/local/bin
directory.
cd ~/Downloads/sqlite-tools-osx-x86-3200000 pwd /Users/myname/Downloads/sqlite-tools-osx-x86-3200000 mv sql* /usr/local/bin cd /usr/local/bin pwd /usr/local/bin ls -l sql* -rwxr-xr-x@ 1 myname mygroup 612864 Aug 1 21:02 sqldiff -rwxr-xr-x@ 1 myname mygroup 974212 Aug 1 21:06 sqlite3 -rwxr-xr-x@ 1 myname mygroup 660080 Aug 1 21:04 sqlite3_analyzer
See if you now have the current version of
sqlite3
:
which sqlite3 /usr/local/bin/sqlite3 sqlite3 -version 3.20.0 2017-08-01 13:24:15 9501e22dfeebdcefa783575e47c60b514d7c2e0cad73b2a496c0bc4b680900a8 sqlite3 -help Usage: /usr/local/bin/sqlite3 [OPTIONS] FILENAME [SQL] FILENAME is the name of an SQLite database. A new database is created if the file does not previously exist. OPTIONS include: -ascii set output mode to 'ascii' -bail stop after hitting an error -batch force batch I/O -column set output mode to 'column' -cmd COMMAND run "COMMAND" before reading stdin -csv set output mode to 'csv' -echo print commands before execution -init FILENAME read/process named file -[no]header turn headers on or off -help show this message -html set output mode to HTML -interactive force interactive I/O -line set output mode to 'line' -list set output mode to 'list' -lookaside SIZE N use N entries of SZ bytes for lookaside memory -mmap N default mmap size set to N -newline SEP set output row separator. Default: '\n' -nullvalue TEXT set text string for NULL values. Default '' -pagecache SIZE N use N slots of SZ bytes each for page cache memory -quote set output mode to 'quote' -scratch SIZE N use N slots of SZ bytes each for scratch memory -separator SEP set output column separator. Default: '|' -stats print memory stats before each finalize -version show SQLite version -vfs NAME use NAME as the default VFS
Let’s create a file named
zipcodes.db
in your
/Users/myname/python
directory.
cd ~/python pwd /Users/myname/python sqlite3 zipcodes.db SQLite version 3.20.0 2017-08-01 13:24:15 Enter ".help" for usage hints. sqlite> .help .auth ON|OFF Show authorizer callbacks .backup ?DB? FILE Backup DB (default "main") to FILE .bail on|off Stop after hitting an error. Default OFF .binary on|off Turn binary output on or off. Default OFF .cd DIRECTORY Change the working directory to DIRECTORY .changes on|off Show number of rows changed by SQL .check GLOB Fail if output since .testcase does not match .clone NEWDB Clone data into NEWDB from the existing database .databases List names and files of attached databases .dbinfo ?DB? Show status information about the database .dump ?TABLE? ... Dump the database in an SQL text format If TABLE specified, only dump tables matching LIKE pattern TABLE. .echo on|off Turn command echo on or off .eqp on|off|full Enable or disable automatic EXPLAIN QUERY PLAN .exit Exit this program .fullschema ?--indent? Show schema and the content of sqlite_stat tables .headers on|off Turn display of headers on or off .help Show this message .import FILE TABLE Import data from FILE into TABLE .imposter INDEX TABLE Create imposter table TABLE on index INDEX .indexes ?TABLE? Show names of all indexes If TABLE specified, only show indexes for tables matching LIKE pattern TABLE. .limit ?LIMIT? ?VAL? Display or change the value of an SQLITE_LIMIT .lint OPTIONS Report potential schema issues. Options: fkey-indexes Find missing foreign key indexes .load FILE ?ENTRY? Load an extension library .log FILE|off Turn logging on or off. FILE can be stderr/stdout .mode MODE ?TABLE? Set output mode where MODE is one of: ascii Columns/rows delimited by 0x1F and 0x1E csv Comma-separated values column Left-aligned columns. (See .width) html HTML <table> code insert SQL insert statements for TABLE line One value per line list Values delimited by "|" quote Escape answers as for SQL tabs Tab-separated values tcl TCL list elements .nullvalue STRING Use STRING in place of NULL values .once FILENAME Output for the next SQL command only to FILENAME .open ?OPTIONS? ?FILE? Close existing database and reopen FILE The --new option starts with an empty file .output ?FILENAME? Send output to FILENAME or stdout .print STRING... Print literal STRING .prompt MAIN CONTINUE Replace the standard prompts .quit Exit this program .read FILENAME Execute SQL in FILENAME .restore ?DB? FILE Restore content of DB (default "main") from FILE .save FILE Write in-memory database into FILE .scanstats on|off Turn sqlite3_stmt_scanstatus() metrics on or off .schema ?PATTERN? Show the CREATE statements matching PATTERN Add --indent for pretty-printing .selftest ?--init? Run tests defined in the SELFTEST table .separator COL ?ROW? Change the column separator and optionally the row separator for both the output mode and .import .sha3sum ?OPTIONS...? Compute a SHA3 hash of database content .shell CMD ARGS... Run CMD ARGS... in a system shell .show Show the current values for various settings .stats ?on|off? Show stats or turn stats on or off .system CMD ARGS... Run CMD ARGS... in a system shell .tables ?TABLE? List names of tables If TABLE specified, only list tables matching LIKE pattern TABLE. .testcase NAME Begin redirecting output to 'testcase-out.txt' .timeout MS Try opening locked tables for MS milliseconds .timer on|off Turn SQL timer on or off .trace FILE|off Output each SQL statement as it is run .vfsinfo ?AUX? Information about the top-level VFS .vfslist List all available VFSes .vfsname ?AUX? Print the name of the VFS stack .width NUM1 NUM2 ... Set column widths for "column" mode Negative values right-justify sqlite> .databases main: /Users/myname/python/zipcodes.db
A
table
is a rectangle made of rows and columns.
The columns are things like lastname, firstname, social security number.
The
datatype
of each column could be
text
,
integer
(a number without a decimal point or fraction),
real
(a number with a decimal point and fraction),
etc.
text
might be called
varchar
in other versions of SQL.
The rows are the records (Moe, Larry, Curly, and Shemp).
The name of our table will be
zipcodeTable
.
Each of the commands
drop
table
and
create
table
is two separate words with a space between them.
Don’t forget the semicolon at the end of every SQL statement.
(The statements that began with a dot are not SQL statements,
and do not need a semicolon.
They are the “dot commands” listed under
Special
commands to sqlite3 (dot-commands).)
The
sqlite3
continuation prompt is
…>
.
Try control-p to go back to the previous line.
sqlite> drop table if exists zipcodeTable; sqlite> .tables sqlite> create table zipcodeTable ( zipcode integer primary key, city text, state text, latitude real, longitude real, county text ); sqlite> .tables zipcodeTable sqlite> .schema zipcodeTable CREATE TABLE zipcodeTable ( zipcode integer primary key, city text, state text, latitude real, longitude real, county text );
Our building 11 West 42nd Street is in zipcode 10036.
select
means “print”.
By default,
the
select
statement selects all the
lines
in the specified table.
The asterisk in
select *
means
“all the
fields
on the selected lines”.
sqlite> select count(*) from zipcodeTable; 0 sqlite> insert into zipcodeTable (zipcode, city, state, latitude, longitude, county) values (10036, "New York", "NY", 40.759724, -73.991826, "New York"); sqlite> select count(*) from zipcodeTable; 1 sqlite> select * from zipcodeTable; 10036|New York|NY|40.759724|-73.991826|New York sqlite> .dump PRAGMA foreign_keys=OFF; BEGIN TRANSACTION; CREATE TABLE zipcodeTable ( zipcode integer primary key, city text, state text, latitude real, longitude real, county text ); INSERT INTO zipcodeTable VALUES(10036,'New York','NY',40.75972399999999851,-73.991826000000003203,'New York'); COMMIT;
And quit
sqlite3
,
and examine the
zipcodes.db
file.
sqlite> delete from zipcodeTable where zipcode == 10036; sqlite> select count(*) from zipcodeTable; 0 sqlite> select * from zipcodeTable; sqlite> .quit file zipcodes.db zipcodes.db: SQLite 3.x database od -c zipcodes.db | head -5 0000000 S Q L i t e f o r m a t 3 \0 0000020 020 \0 001 001 \0 @ \0 \0 \0 \a \0 \0 001 377 0000040 \0 \0 \0 003 \0 \0 001 375 \0 \0 \0 001 \0 \0 \0 004 0000060 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 001 \0 \0 \0 \0 0000100 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
Save the file
zip5.csv
into your
/Users/myname/python
directory.
Then type the following commands.
The last line in the CSV file does not end with a newline character,
so
wc -l
does not count that line.
That’s why
wc -l
theinks the file contains only 41,754 lines,
when it really contains 41,755.
cd ~/python pwd /Users/myname/python ls -l zip5.csv -rw-r--r--@ 1 myname mygroup 1975105 Aug 9 22:46 zip5.csv wc -l zip5.csv 41754 zip5.csv head -5 zip5.csv 00501,Holtsville,NY,40.8152,-73.0455,Suffolk 00544,Holtsville,NY,40.8152,-73.0455,Suffolk 00601,Adjuntas,PR,18.1788,-66.7516,Adjuntas 00602,Aguada,PR,18.381389,-67.188611,Aguada 00603,Aguadilla,PR,18.4554,-67.1308,Aguadilla sqlite3 zipcodes.db sqlite> .mode csv sqlite> .show sqlite> .import zip5.csv zipcodeTable sqlite> select count(*) from zipcodeTable; 41755
New England zipcodes start with zero.
The
f
in the name of the SQLite
printf
function stands for “formatted”.
The
"%05d"
format is like the Python
"{:05}".format()
.
They mean
“print the integer with a minimum of 5 digits.
If there are less than 5 digits, pad out the integer with leading zeros.”
The
d
in
"%05d"
stands for “decimal integer”,
as opposed to hexadecimal or octal integer.
The
as zipcode
specifies the column header above the output of
printf
.
See what happens if you
leave out the
as zipcode
.
sqlite> .headers on sqlite> .mode column sqlite> .show sqlite> select * from zipcodeTable where 10036 <= zipcode and zipcode <= 10040; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 10036 New York NY 40.759724 -73.991826 New York 10037 New York NY 40.813491 -73.9381 New York 10038 New York NY 40.710092 -74.001298 New York 10039 New York NY 40.826458 -73.938266 New York 10040 New York NY 40.858308 -73.929601 New York sqlite> select zipcode, city, county, state, zipcode from zipcodeTable where 10036 <= zipcode and zipcode <= 10040; zipcode city county state zipcode ---------- ---------- ---------- ---------- ---------- 10036 New York New York NY 10036 10037 New York New York NY 10037 10038 New York New York NY 10038 10039 New York New York NY 10039 10040 New York New York NY 10040 sqlite> select * from zipcodeTable where city == "Bennington" and state == "VT"; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 5201 Bennington VT 42.882658 -73.192321 Bennington sqlite> select printf("%05d", zipcode) as zipcode, city, state, latitude, longitude, county from zipcodeTable where city == "Bennington" and state == "VT"; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 05201 Bennington VT 42.882658 -73.192321 Bennington
We can change the second call to
substr
to
substr(city, -3, 3)
,
which means “the last three characters of the city”.
sqlite> select * from zipcodeTable where length(city) == 7 and substr(city, 1, 3) == substr(city, 5, 3); zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 25434 Paw Paw WV 39.492297 -78.458573 Morgan 49079 Paw Paw MI 42.234931 -85.900488 Van Buren 61353 Paw Paw IL 41.685228 -88.967377 Lee
sqlite> select * from zipcodeTable where city like "James%" and state == "NY"; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 11947 Jamesport NY 40.949444 -72.581944 Suffolk 13078 Jamesville NY 42.982973 -76.076571 Onondaga 14701 Jamestown NY 42.092845 -79.243989 Chautauqua 14702 Jamestown NY 42.096944 -79.235556 Chautauqua sqlite> select * from zipcodeTable where city like "%ning" and state == "NY"; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 10562 Ossining NY 41.167344 -73.853791 Westcheste 14830 Corning NY 42.138331 -77.047546 Steuben 14831 Corning NY 42.142778 -77.055 Steuben sqlite> select * from zipcodeTable where city like "Y%k" and state == "NY"; zipcode city state latitude longitude county ---------- ---------- ---------- ---------- ---------- ---------- 11980 Yaphank NY 40.837037 -72.917435 Suffolk 14592 York NY 42.871111 -77.885556 Livingston
Fast and sloppy, no error checking or
sys.exit
.
I didn’t even
close
the input file.
""" widestcity.py How many characters wide is the widest city? """ import csv filename = "zip5.csv" infile = open(filename, encoding = "utf-8", newline = "") lines = csv.reader(infile) listOfIntegers = [len(line[1]) for line in lines] #line[1] is the name of the city print(max(listOfIntegers))
27
sqlite> select * from zipcodeTable where city like "%Saint Johnsbury%"; zipcode city state latitude longitude county ---------- --------------- ---------- ---------- ---------- ---------- 5819 Saint Johnsbury VT 44.427195 -72.005062 Caledonia 5838 East Saint John VT 44.438611 -71.946111 Caledonia 5863 Saint Johnsbury VT 44.456667 -72.016111 Caledonia sqlite> select max(length(city)) from zipcodeTable; max(len ------- 27 sqlite> .width 7, 27, 5, 9, 10, 22 sqlite> .show sqlite> select * from zipcodeTable where city like "%Saint Johnsbury%"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 5819 Saint Johnsbury VT 44.427195 -72.005062 Caledonia 5838 East Saint Johnsbury VT 44.438611 -71.946111 Caledonia 5863 Saint Johnsbury Center VT 44.456667 -72.016111 Caledonia sqlite> --What has (at least) four eyes and can't see? sqlite> select * from zipcodeTable where city like "%i%i%i%i%"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 18341 Minisink Hills PA 40.998333 -75.1375 Monroe 39762 Mississippi State MS 33.459444 -88.790833 Oktibbeha 59755 Virginia City MT 45.247216 -112.00261 Madison 61259 Illinois City IL 41.389236 -90.892507 Rock Island 74549 Kiamichi Christian Mission OK 34.592604 -94.991557 Le Flore 89440 Virginia City NV 39.387282 -119.59606 Storey sqlite> select * from zipcodeTable where city like "% % % % %"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 23085 King And Queen Court House VA 37.717644 -76.863059 King And Queen 47876 Saint Mary Of The Woods IN 39.510833 -87.467222 Vigo 72099 Little Rock Air Force Base AR 34.898611 -92.144167 Pulaski 80840 U S A F Academy CO 38.990448 -104.86013 El Paso 80841 U S A F Academy CO 38.996667 -104.87583 El Paso 83648 Mountain Home A F B ID 43.049315 -115.87360 Elmore 96863 M C B H Kaneohe Bay HI 21.4492 -157.766 Honolulu
The wildcard
_
(underscore)
looks for exactly one character,
no more and no less.
sqlite> select * from zipcodeTable where city like "%gr_ys%"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 7950 Greystone Park NJ 40.843982 -74.479645 Morris 15337 Graysville PA 39.909165 -80.395223 Greene etc.
sqlite> select * from zipcodeTable where city like "fort %" or city like "ft %"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 7024 Fort Lee NJ 40.850312 -73.974455 Bergen 22211 Ft Myer VA 38.8795 -77.080258 Arlington etc. sqlite> select * from zipCodeTable where not city like "%a%" and not city like "%e%" and not city like "%i%" and not city like "%o%" and not city like "%u%"; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 1901 Lynn MA 42.463378 -70.945516 Essex 23066 Gwynn VA 37.500594 -76.29042 Mathews 31795 Ty Ty GA 31.45595 -83.621989 Tift 97622 Bly OR 42.397778 -121.04055 Klamath etc.
A degree of latitude is equal to approximately 57.2957795 miles anywhere on the globe, so a mile is equal to 1/57.2957795 degrees and a half mile is a equal to .5/57.2957795 degrees.
A degree of longitude is equal to approximately .75 * 57.2957795 miles in New York City, because the cosine of our latitude is approximately .75. The longitude lines are closer together than the latitude lines here.
Let’s find all the zip codes in the square mile centered on 10036.
sqlite> select * from zipcodeTable where zipcode == 10036; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 10036 New York NY 40.759724 -73.991826 New York sqlite> select * from zipcodeTable where abs(latitude - 40.759724) < .5 / 57.2957795 and abs(longitude - -73.991826) < .5 / (.75 * 57.2957795) order by zipcode; zipcode city state latitude longitude county ------- --------------------------- ----- --------- ---------- ---------------------- 10018 New York NY 40.754713 -73.992503 New York 10019 New York NY 40.765069 -73.985834 New York 10020 New York NY 40.759729 -73.982347 New York 10036 New York NY 40.759724 -73.991826 New York 10069 New York NY 40.7543 -73.9997 New York 10101 New York NY 40.7632 -73.9862 New York 10102 New York NY 40.7632 -73.9862 New York 10105 New York NY 40.7632 -73.9862 New York 10106 New York NY 40.7647 -73.9804 New York 10107 New York NY 40.7661 -73.9825 New York 10108 New York NY 40.7574 -73.9918 New York 10109 New York NY 40.7574 -73.9918 New York 10110 New York NY 40.7533 -73.9808 New York 10129 New York NY 40.7574 -73.9918 New York 10138 New York NY 40.754 -73.9909 New York 10149 New York NY 40.7655 -73.9873 New York 10156 New York NY 40.753 -73.9924 New York 10157 New York NY 40.753 -73.9924 New York
Staten Island secedes from New York City!
sqlite> select * from zipcodeTable where city == "Staten Island" and county == "Richmond" and state == "NY" limit 5; zipcode city state latitude longitude county ---------- ------------- ---------- ---------- ---------- ---------- 10301 Staten Island NY 40.631602 -74.092663 Richmond 10302 Staten Island NY 40.630597 -74.137918 Richmond 10303 Staten Island NY 40.630062 -74.160679 Richmond 10304 Staten Island NY 40.610249 -74.087836 Richmond 10305 Staten Island NY 40.597296 -74.076795 Richmond sqlite> update zipcodeTable set state = "SI" where city == "Staten Island" and county == "Richmond" and state == "NY";
sqlite> delete from zipcodeTable where zipcode == 10036; --Delete one row. sqlite> delete from zipcodeTable; --Delete every row.
Suppose a
sinister
force
changed the above 5 characters
10036
to the 31 characters
10036; delete from zipcodeTable
Then the computer would execute the following two statements,
deleting every row in the table.
This is called an
injection attack.
sqlite> delete from zipcodeTable where zipcode == 10036; delete from zipcodeTable;
Download the file
states.csv
to your
/Users/myname/python
directory.
Then
sqlite> create table stateTable ( abbreviation text primary key, name text ); sqlite> .mode csv sqlite> .import states.csv stateTable sqlite> .mode column sqlite> .headers on sqlite> select * from stateTable limit 5; abbreviation name ------------ ---------- AL Alabama AK Alaska AZ Arizona AR Arkansas CA California
zipcodeTable.zipcode
means the
zipcode
column in the
zipcodeTable
.
We can now print out the full name of the state of each zip code
(New York vs. NY)
without having to store the full names in the
zipcodeTable
.
That’s because
the
state
column in the
zipcodeTable
contains the same two-letter codes as the
abbreviation
column in the
stateTable
.
sqlite> select zipcodeTable.zipcode, zipcodeTable.city, zipcodeTable.county, stateTable.name from zipcodeTable, stateTable where zipcodeTable.city == "Yonkers" and zipcodeTable.state == stateTable.abbreviation order by zipcodeTable.zipcode; zipcode city county name ---------- ---------- ----------- ---------- 10701 Yonkers Westchester New York 10702 Yonkers Westchester New York 10703 Yonkers Westchester New York 10704 Yonkers Westchester New York 10705 Yonkers Westchester New York 10710 Yonkers Westchester New York sqlite> .quit