Let’s find the most popular
/~meretzkm/python/
web page on the host
oit2.scps.nyu.edu
,
and the number of times the page was downloaded.
We saw the web server’s
access_log
file in
For line.
This script can be run only on a machine that has an
access_log
file.
The keys of the
count
dictionary
are the filenames read from the
access_log
file.
The value corresponding to each key is the number of times that filename
appeared in the
access_log
file.
The
fields[8]
in
line
29
is the
HTTP
status code
that tells whether or not the web server successfully served out the requested
web page.
Codes
200
and
304
indicate success.
963 /~meretzkm/python/
The above line of output means that most popular Python page on
oit2.scps.nyu.edu
is
http://oit2.scps.nyu.edu/~meretzkm/python/
if "/~meretzkm/python/" in filename \ and (fields[8] == "200" or fields[8] == "304"): if filename not in count: count[filename] = 0 count[filename] += 1to
if "/~meretzkm/python/" in filename \ and (fields[8] == "200" or fields[8] == "304"): count[filename] = count.get(filename, 0) + 1
count
be a
collections.defaultdict
whose items are automatically created with a value of 0.
The function that constructs a
collections.defaultdict
takes as its argument a function such as
defaultValue
that takes no arguments.
import collections #alternatives to the plain vanilla, built-in list and dictionary
def defaultValue(): return 0 count = collections.defaultdict(defaultValue) #Start with an empty defaultdict.
if "/~meretzkm/python/" in filename \ and (fields[8] == "200" or fields[8] == "304"): count[filename] += 1Better yet,
#A lambda function that takes no arguments and returns 0. count = collections.defaultdict(lambda: 0) #Start with an empty defaultdict.
get
function we saw in exercise 1.
mostPopular = max(count, key = count.get) #mostPopular is a filename (a string) print(count[mostPopular], mostPopular)
#Find the most popular Python file and the number of times it was downloaded. downloads = 0 for filename in count: #Loop through all the keys in the dictionary. if count[filename] > downloads: #This file is more popular than any we've seen so far. popular = filename downloads = count[filename] if downloads == 0: sys.exit(1) #No Python files in access_log? Suspicious. print(downloads, popular) sys.exit(0)to the following.
twoColumns
is a
list
with the same number of items as
count
.
Each item in
twoColumns
is a
tuple
of two items: a filename and its number of occurrences.
twoColumns = count.items() def score(item): "item[0] is the filename, item[1] is its number of downloads." return item[1] twoColumns = sorted(twoColumns, key = score, reverse = True) for key, value in twoColumns: #key is filename, value is number of occurrences print("{:4} {}".format(value, key)) sys.exit(0)
964 /~meretzkm/python/ 298 /~meretzkm/python/stylesheet.css 295 /~meretzkm/python/INFO1-CE9990/ 240 /~meretzkm/python/WS19PB02/homework.html 206 /~meretzkm/python/WS19PB02/ 153 /~meretzkm/python/WS19PB02/github.html 138 /~meretzkm/python/string/forURL.html 135 /~meretzkm/python/control/while.html 120 /~meretzkm/python/tkinter/tkflag.html 116 /~meretzkm/python/control/for.html etc.
sorted
to the following.
Remove the
def
inition
of the
score
function.
twoColumns = sorted(twoColumns, key = lambda item: item[1], reverse = True):