Play with SQLite

SQL is the Structured Query Language for reading and writing database files. The Python version of SQL is SQLite, convenient to use because you don’t have to set up a separate server somewhere else. Before we use SQLite in Python, however, we will play with it in by running the program sqlite3 in the macOS Terminal window, the Linux shell window, and the Microsoft Command Prompt window.

Install sqlite3 on macOS

You probably already have sqlite3 on your Mac.

which sqlite3
/usr/bin/sqlite3

sqlite3 -help

sqlite3 -version
3.16.0 2016-11-04 19:09:39 0e5ffd9123d6d2d2b8f3701e8a73cc98a3a7ff5f

To get the latest version of SQLite, go here and download the .zip file for “Precompiled Binaries for Mac OS X (x86)”. Double-click on the downloaded .zip file to create a new folder with the same name as the .zip file but without the .zip suffix. This folder contains the three programs sqldiff, sqlite3, and sqlite_analyzer. Move these three programs to your /usr/local/bin directory.

cd ~/Downloads/sqlite-tools-osx-x86-3200000

pwd
/Users/myname/Downloads/sqlite-tools-osx-x86-3200000

mv sql* /usr/local/bin

cd /usr/local/bin

pwd
/usr/local/bin

ls -l sql*
-rwxr-xr-x@ 1 myname  mygroup  612864 Aug  1 21:02 sqldiff
-rwxr-xr-x@ 1 myname  mygroup  974212 Aug  1 21:06 sqlite3
-rwxr-xr-x@ 1 myname  mygroup  660080 Aug  1 21:04 sqlite3_analyzer

See if you now have the current version of sqlite3:

which sqlite3
/usr/local/bin/sqlite3

sqlite3 -version
3.20.0 2017-08-01 13:24:15 9501e22dfeebdcefa783575e47c60b514d7c2e0cad73b2a496c0bc4b680900a8

sqlite3 -help
Usage: /usr/local/bin/sqlite3 [OPTIONS] FILENAME [SQL]
FILENAME is the name of an SQLite database. A new database is created
if the file does not previously exist.
OPTIONS include:
   -ascii               set output mode to 'ascii'
   -bail                stop after hitting an error
   -batch               force batch I/O
   -column              set output mode to 'column'
   -cmd COMMAND         run "COMMAND" before reading stdin
   -csv                 set output mode to 'csv'
   -echo                print commands before execution
   -init FILENAME       read/process named file
   -[no]header          turn headers on or off
   -help                show this message
   -html                set output mode to HTML
   -interactive         force interactive I/O
   -line                set output mode to 'line'
   -list                set output mode to 'list'
   -lookaside SIZE N    use N entries of SZ bytes for lookaside memory
   -mmap N              default mmap size set to N
   -newline SEP         set output row separator. Default: '\n'
   -nullvalue TEXT      set text string for NULL values. Default ''
   -pagecache SIZE N    use N slots of SZ bytes each for page cache memory
   -quote               set output mode to 'quote'
   -scratch SIZE N      use N slots of SZ bytes each for scratch memory
   -separator SEP       set output column separator. Default: '|'
   -stats               print memory stats before each finalize
   -version             show SQLite version
   -vfs NAME            use NAME as the default VFS

Create a database file in the python directory.

Let’s create a file named zipcodes.db in your /Users/myname/python directory.

cd ~/python

pwd
/Users/myname/python

sqlite3 zipcodes.db
SQLite version 3.20.0 2017-08-01 13:24:15
Enter ".help" for usage hints.

sqlite> .help
.auth ON|OFF           Show authorizer callbacks
.backup ?DB? FILE      Backup DB (default "main") to FILE
.bail on|off           Stop after hitting an error.  Default OFF
.binary on|off         Turn binary output on or off.  Default OFF
.cd DIRECTORY          Change the working directory to DIRECTORY
.changes on|off        Show number of rows changed by SQL
.check GLOB            Fail if output since .testcase does not match
.clone NEWDB           Clone data into NEWDB from the existing database
.databases             List names and files of attached databases
.dbinfo ?DB?           Show status information about the database
.dump ?TABLE? ...      Dump the database in an SQL text format
                         If TABLE specified, only dump tables matching
                         LIKE pattern TABLE.
.echo on|off           Turn command echo on or off
.eqp on|off|full       Enable or disable automatic EXPLAIN QUERY PLAN
.exit                  Exit this program
.fullschema ?--indent? Show schema and the content of sqlite_stat tables
.headers on|off        Turn display of headers on or off
.help                  Show this message
.import FILE TABLE     Import data from FILE into TABLE
.imposter INDEX TABLE  Create imposter table TABLE on index INDEX
.indexes ?TABLE?       Show names of all indexes
                         If TABLE specified, only show indexes for tables
                         matching LIKE pattern TABLE.
.limit ?LIMIT? ?VAL?   Display or change the value of an SQLITE_LIMIT
.lint OPTIONS          Report potential schema issues. Options:
                         fkey-indexes     Find missing foreign key indexes
.load FILE ?ENTRY?     Load an extension library
.log FILE|off          Turn logging on or off.  FILE can be stderr/stdout
.mode MODE ?TABLE?     Set output mode where MODE is one of:
                         ascii    Columns/rows delimited by 0x1F and 0x1E
                         csv      Comma-separated values
                         column   Left-aligned columns.  (See .width)
                         html     HTML <table> code
                         insert   SQL insert statements for TABLE
                         line     One value per line
                         list     Values delimited by "|"
                         quote    Escape answers as for SQL
                         tabs     Tab-separated values
                         tcl      TCL list elements
.nullvalue STRING      Use STRING in place of NULL values
.once FILENAME         Output for the next SQL command only to FILENAME
.open ?OPTIONS? ?FILE? Close existing database and reopen FILE
                         The --new option starts with an empty file
.output ?FILENAME?     Send output to FILENAME or stdout
.print STRING...       Print literal STRING
.prompt MAIN CONTINUE  Replace the standard prompts
.quit                  Exit this program
.read FILENAME         Execute SQL in FILENAME
.restore ?DB? FILE     Restore content of DB (default "main") from FILE
.save FILE             Write in-memory database into FILE
.scanstats on|off      Turn sqlite3_stmt_scanstatus() metrics on or off
.schema ?PATTERN?      Show the CREATE statements matching PATTERN
                          Add --indent for pretty-printing
.selftest ?--init?     Run tests defined in the SELFTEST table
.separator COL ?ROW?   Change the column separator and optionally the row
                         separator for both the output mode and .import
.sha3sum ?OPTIONS...?  Compute a SHA3 hash of database content
.shell CMD ARGS...     Run CMD ARGS... in a system shell
.show                  Show the current values for various settings
.stats ?on|off?        Show stats or turn stats on or off
.system CMD ARGS...    Run CMD ARGS... in a system shell
.tables ?TABLE?        List names of tables
                         If TABLE specified, only list tables matching
                         LIKE pattern TABLE.
.testcase NAME         Begin redirecting output to 'testcase-out.txt'
.timeout MS            Try opening locked tables for MS milliseconds
.timer on|off          Turn SQL timer on or off
.trace FILE|off        Output each SQL statement as it is run
.vfsinfo ?AUX?         Information about the top-level VFS
.vfslist               List all available VFSes
.vfsname ?AUX?         Print the name of the VFS stack
.width NUM1 NUM2 ...   Set column widths for "column" mode
                         Negative values right-justify

sqlite> .databases
main: /Users/myname/python/zipcodes.db

Put a table into the database

A table is a rectangle made of rows and columns. The columns are things like lastname, firstname, social security number. The datatype of each column could be text, integer (a number without a decimal point or fraction), real (a number with a decimal point and fraction), etc. text might be called varchar in other versions of SQL. The rows are the records (Moe, Larry, Curly, and Shemp).

The name of our table will be zipcodeTable. Each of the commands drop table and create table is two separate words with a space between them. Don’t forget the semicolon at the end of every SQL statement. (The statements that began with a dot are not SQL statements, and do not need a semicolon. They are the “dot commands” listed under Special commands to sqlite3 (dot-commands).) The sqlite3 continuation prompt is …>. Try control-p to go back to the previous line.

sqlite> drop table if exists zipcodeTable;

sqlite> .tables

sqlite> create table zipcodeTable (
    zipcode integer primary key,
    city text,
    state text,
    latitude real,
    longitude real,
    county text
);

sqlite> .tables
zipcodeTable

sqlite> .schema zipcodeTable
CREATE TABLE zipcodeTable (
    zipcode integer primary key,
    city text,
    state text,
    latitude real,
    longitude real,
    county text
);

Insert a row into the table.

Our building 11 West 42nd Street is in zipcode 10036. select means “print”. By default, the select statement selects all the lines in the specified table. The asterisk in select * means “all the fields on the selected lines”.

sqlite> select count(*) from zipcodeTable;
0

sqlite> insert into zipcodeTable (zipcode, city, state, latitude, longitude, county)
    values (10036, "New York", "NY", 40.759724, -73.991826, "New York");

sqlite> select count(*) from zipcodeTable;
1

sqlite> select * from zipcodeTable;
10036|New York|NY|40.759724|-73.991826|New York

sqlite> .dump
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE zipcodeTable (
    zipcode integer primary key,
    city text,
    state text,
    latitude real,
    longitude real,
    county text
);
INSERT INTO zipcodeTable VALUES(10036,'New York','NY',40.75972399999999851,-73.991826000000003203,'New York');
COMMIT;

Delete a record from the table.

And quit sqlite3, and examine the zipcodes.db file.

sqlite> delete from zipcodeTable where zipcode == 10036;

sqlite> select count(*) from zipcodeTable;
0

sqlite> select * from zipcodeTable;

sqlite> .quit

file zipcodes.db
zipcodes.db: SQLite 3.x database

od -c zipcodes.db | head -5
0000000    S   Q   L   i   t   e       f   o   r   m   a   t       3  \0
0000020  020  \0 001 001  \0   @          \0  \0  \0  \a  \0  \0 001 377
0000040   \0  \0  \0 003  \0  \0 001 375  \0  \0  \0 001  \0  \0  \0 004
0000060   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0 001  \0  \0  \0  \0
0000100   \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0

Import a CSV file into a table.

Save the file zip5.csv into your /Users/myname/python directory. Then type the following commands. The last line in the CSV file does not end with a newline character, so wc -l does not count that line. That’s why wc -l theinks the file contains only 41,754 lines, when it really contains 41,755.

cd ~/python

pwd
/Users/myname/python

ls -l zip5.csv
-rw-r--r--@ 1 myname  mygroup  1975105 Aug  9 22:46 zip5.csv

wc -l zip5.csv
   41754 zip5.csv

head -5 zip5.csv
00501,Holtsville,NY,40.8152,-73.0455,Suffolk
00544,Holtsville,NY,40.8152,-73.0455,Suffolk
00601,Adjuntas,PR,18.1788,-66.7516,Adjuntas
00602,Aguada,PR,18.381389,-67.188611,Aguada
00603,Aguadilla,PR,18.4554,-67.1308,Aguadilla

sqlite3 zipcodes.db

sqlite> .mode csv
sqlite> .show

sqlite> .import zip5.csv zipcodeTable

sqlite> select count(*) from zipcodeTable;
41755

Select any columns, in any order.

New England zipcodes start with zero. The f in the name of the SQLite printf function stands for “formatted”. The "%05d" format is like the Python "{:05}".format(). They mean “print the integer with a minimum of 5 digits. If there are less than 5 digits, pad out the integer with leading zeros.” The d in "%05d" stands for “decimal integer”, as opposed to hexadecimal or octal integer. The as zipcode specifies the column header above the output of printf. See what happens if you leave out the as zipcode.

sqlite> .headers on
sqlite> .mode column
sqlite> .show

sqlite> select * from zipcodeTable where 10036 <= zipcode and zipcode <= 10040;
zipcode     city        state       latitude    longitude   county
----------  ----------  ----------  ----------  ----------  ----------
10036       New York    NY          40.759724   -73.991826  New York
10037       New York    NY          40.813491   -73.9381    New York
10038       New York    NY          40.710092   -74.001298  New York
10039       New York    NY          40.826458   -73.938266  New York
10040       New York    NY          40.858308   -73.929601  New York

sqlite> select zipcode, city, county, state, zipcode from zipcodeTable
    where 10036 <= zipcode and zipcode <= 10040;
zipcode     city        county      state       zipcode
----------  ----------  ----------  ----------  ----------
10036       New York    New York    NY          10036
10037       New York    New York    NY          10037
10038       New York    New York    NY          10038
10039       New York    New York    NY          10039
10040       New York    New York    NY          10040

sqlite> select * from zipcodeTable where city == "Bennington" and state == "VT";
zipcode     city        state       latitude    longitude   county
----------  ----------  ----------  ----------  ----------  ----------
5201        Bennington  VT          42.882658   -73.192321  Bennington

sqlite> select printf("%05d", zipcode) as zipcode, city, state, latitude, longitude, county
    from zipcodeTable where city == "Bennington" and state == "VT";
zipcode     city        state       latitude    longitude   county
----------  ----------  ----------  ----------  ----------  ----------
05201       Bennington  VT          42.882658   -73.192321  Bennington

The length and substr functions

We can change the second call to substr to substr(city, -3, 3), which means “the last three characters of the city”.

sqlite> select * from zipcodeTable
    where length(city) == 7 and substr(city, 1, 3) == substr(city, 5, 3);
zipcode     city        state       latitude    longitude   county
----------  ----------  ----------  ----------  ----------  ----------
25434       Paw Paw     WV          39.492297   -78.458573  Morgan
49079       Paw Paw     MI          42.234931   -85.900488  Van Buren
61353       Paw Paw     IL          41.685228   -88.967377  Lee

A wildcard

sqlite> select * from zipcodeTable where city like "James%" and state == "NY";
zipcode     city        state       latitude    longitude   county
----------  ----------  ----------  ----------  ----------  ----------
11947       Jamesport   NY          40.949444   -72.581944  Suffolk
13078       Jamesville  NY          42.982973   -76.076571  Onondaga
14701       Jamestown   NY          42.092845   -79.243989  Chautauqua
14702       Jamestown   NY          42.096944   -79.235556  Chautauqua

sqlite> select * from zipcodeTable where city like "%ning" and state == "NY";
zipcode     city        state       latitude    longitude   county
----------  ----------  ----------  ----------  ----------  ----------
10562       Ossining    NY          41.167344   -73.853791  Westcheste
14830       Corning     NY          42.138331   -77.047546  Steuben
14831       Corning     NY          42.142778   -77.055     Steuben

sqlite> select * from zipcodeTable where city like "Y%k" and state == "NY";
zipcode     city        state       latitude    longitude   county
----------  ----------  ----------  ----------  ----------  ----------
11980       Yaphank     NY          40.837037   -72.917435  Suffolk
14592       York        NY          42.871111   -77.885556  Livingston

A Python digression

Fast and sloppy, no error checking or sys.exit. I didn’t even close the input file.

"""
widestcity.py
How many characters wide is the widest city?
"""

import csv

filename = "zip5.csv"
infile = open(filename, encoding = "utf-8", newline = "")
lines = csv.reader(infile)
listOfIntegers = [len(line[1]) for line in lines]   #line[1] is the name of the city
print(max(listOfIntegers))
27

More than one wildcard

sqlite> select * from zipcodeTable where city like "%Saint Johnsbury%";
zipcode     city             state       latitude    longitude   county
----------  ---------------  ----------  ----------  ----------  ----------
5819        Saint Johnsbury  VT          44.427195   -72.005062  Caledonia
5838        East Saint John  VT          44.438611   -71.946111  Caledonia
5863        Saint Johnsbury  VT          44.456667   -72.016111  Caledonia

sqlite> select max(length(city)) from zipcodeTable;
max(len
-------
27

sqlite> .width 7, 27, 5, 9, 10, 22
sqlite> .show

sqlite> select * from zipcodeTable where city like "%Saint Johnsbury%";
zipcode  city                         state  latitude   longitude   county
-------  ---------------------------  -----  ---------  ----------  ----------------------
5819     Saint Johnsbury              VT     44.427195  -72.005062  Caledonia
5838     East Saint Johnsbury         VT     44.438611  -71.946111  Caledonia
5863     Saint Johnsbury Center       VT     44.456667  -72.016111  Caledonia

sqlite> --What has (at least) four eyes and can't see?
sqlite> select * from zipcodeTable where city like "%i%i%i%i%";
zipcode  city                         state  latitude   longitude   county
-------  ---------------------------  -----  ---------  ----------  ----------------------
18341    Minisink Hills               PA     40.998333  -75.1375    Monroe
39762    Mississippi State            MS     33.459444  -88.790833  Oktibbeha
59755    Virginia City                MT     45.247216  -112.00261  Madison
61259    Illinois City                IL     41.389236  -90.892507  Rock Island
74549    Kiamichi Christian Mission   OK     34.592604  -94.991557  Le Flore
89440    Virginia City                NV     39.387282  -119.59606  Storey

sqlite> select * from zipcodeTable where city like "% % % % %";
zipcode  city                         state  latitude   longitude   county
-------  ---------------------------  -----  ---------  ----------  ----------------------
23085    King And Queen Court House   VA     37.717644  -76.863059  King And Queen
47876    Saint Mary Of The Woods      IN     39.510833  -87.467222  Vigo
72099    Little Rock Air Force Base   AR     34.898611  -92.144167  Pulaski
80840    U S A F Academy              CO     38.990448  -104.86013  El Paso
80841    U S A F Academy              CO     38.996667  -104.87583  El Paso
83648    Mountain Home A F B          ID     43.049315  -115.87360  Elmore
96863    M C B H Kaneohe Bay          HI     21.4492    -157.766    Honolulu

The other wildcard

The wildcard _ (underscore) looks for exactly one character, no more and no less.

sqlite> select * from zipcodeTable where city like "%gr_ys%";
zipcode  city                         state  latitude   longitude   county
-------  ---------------------------  -----  ---------  ----------  ----------------------
7950     Greystone Park               NJ     40.843982  -74.479645  Morris
15337    Graysville                   PA     39.909165  -80.395223  Greene
etc.

and, or, not

sqlite> select * from zipcodeTable where city like "fort %" or city like "ft %";
zipcode  city                         state  latitude   longitude   county
-------  ---------------------------  -----  ---------  ----------  ----------------------
7024     Fort Lee                     NJ     40.850312  -73.974455  Bergen
22211    Ft Myer                      VA     38.8795    -77.080258  Arlington
etc.

sqlite> select * from zipCodeTable
    where not city like "%a%"
    and not city like "%e%"
    and not city like "%i%"
    and not city like "%o%"
    and not city like "%u%";
zipcode  city                         state  latitude   longitude   county
-------  ---------------------------  -----  ---------  ----------  ----------------------
1901     Lynn                         MA     42.463378  -70.945516  Essex
23066    Gwynn                        VA     37.500594  -76.29042   Mathews
31795    Ty Ty                        GA     31.45595   -83.621989  Tift
97622    Bly                          OR     42.397778  -121.04055  Klamath
etc.

Zipcodes near 10036

A degree of latitude is equal to approximately 57.2957795 miles anywhere on the globe, so a mile is equal to 1/57.2957795 degrees and a half mile is a equal to .5/57.2957795 degrees.

A degree of longitude is equal to approximately .75 * 57.2957795 miles in New York City, because the cosine of our latitude is approximately .75. The longitude lines are closer together than the latitude lines here.

Let’s find all the zip codes in the square mile centered on 10036.

sqlite> select * from zipcodeTable where zipcode == 10036;
zipcode  city                         state  latitude   longitude   county
-------  ---------------------------  -----  ---------  ----------  ----------------------
10036    New York                     NY     40.759724  -73.991826  New York

sqlite> select * from zipcodeTable
    where abs(latitude  -  40.759724) < .5 / 57.2957795
    and   abs(longitude - -73.991826) < .5 / (.75 * 57.2957795)
    order by zipcode;
zipcode  city                         state  latitude   longitude   county
-------  ---------------------------  -----  ---------  ----------  ----------------------
10018    New York                     NY     40.754713  -73.992503  New York
10019    New York                     NY     40.765069  -73.985834  New York
10020    New York                     NY     40.759729  -73.982347  New York
10036    New York                     NY     40.759724  -73.991826  New York
10069    New York                     NY     40.7543    -73.9997    New York
10101    New York                     NY     40.7632    -73.9862    New York
10102    New York                     NY     40.7632    -73.9862    New York
10105    New York                     NY     40.7632    -73.9862    New York
10106    New York                     NY     40.7647    -73.9804    New York
10107    New York                     NY     40.7661    -73.9825    New York
10108    New York                     NY     40.7574    -73.9918    New York
10109    New York                     NY     40.7574    -73.9918    New York
10110    New York                     NY     40.7533    -73.9808    New York
10129    New York                     NY     40.7574    -73.9918    New York
10138    New York                     NY     40.754     -73.9909    New York
10149    New York                     NY     40.7655    -73.9873    New York
10156    New York                     NY     40.753     -73.9924    New York
10157    New York                     NY     40.753     -73.9924    New York

Update one or more rows in the table.

Staten Island secedes from New York City!

sqlite> select * from zipcodeTable
    where city == "Staten Island" and county == "Richmond" and state == "NY"
    limit 5;
zipcode     city           state       latitude    longitude   county
----------  -------------  ----------  ----------  ----------  ----------
10301       Staten Island  NY          40.631602   -74.092663  Richmond
10302       Staten Island  NY          40.630597   -74.137918  Richmond
10303       Staten Island  NY          40.630062   -74.160679  Richmond
10304       Staten Island  NY          40.610249   -74.087836  Richmond
10305       Staten Island  NY          40.597296   -74.076795  Richmond

sqlite> update zipcodeTable set state = "SI"
    where city == "Staten Island" and county == "Richmond" and state == "NY";

Delete one or more rows from the table.

sqlite> delete from zipcodeTable where zipcode == 10036; --Delete one row.

sqlite> delete from zipcodeTable;                        --Delete every row.

Suppose a sinister force changed the above 5 characters
10036
to the 31 characters
10036; delete from zipcodeTable
Then the computer would execute the following two statements, deleting every row in the table. This is called an injection attack.

sqlite> delete from zipcodeTable where zipcode == 10036; delete from zipcodeTable;

Create another table.

Download the file states.csv to your /Users/myname/python directory. Then

sqlite> create table stateTable (
    abbreviation text primary key,
    name text
);

sqlite> .mode csv
sqlite> .import states.csv stateTable

sqlite> .mode column
sqlite> .headers on

sqlite> select * from stateTable limit 5;
abbreviation  name
------------  ----------
AL            Alabama
AK            Alaska
AZ            Arizona
AR            Arkansas
CA            California

Perform a join using the two tables.

zipcodeTable.zipcode means the zipcode column in the zipcodeTable. We can now print out the full name of the state of each zip code (New York vs. NY) without having to store the full names in the zipcodeTable. That’s because the state column in the zipcodeTable contains the same two-letter codes as the abbreviation column in the stateTable.

sqlite> select zipcodeTable.zipcode, zipcodeTable.city, zipcodeTable.county, stateTable.name
    from zipcodeTable, stateTable
    where zipcodeTable.city == "Yonkers" and zipcodeTable.state == stateTable.abbreviation
    order by zipcodeTable.zipcode;
zipcode     city        county       name
----------  ----------  -----------  ----------
10701       Yonkers     Westchester  New York
10702       Yonkers     Westchester  New York
10703       Yonkers     Westchester  New York
10704       Yonkers     Westchester  New York
10705       Yonkers     Westchester  New York
10710       Yonkers     Westchester  New York

sqlite> .quit