Most popular Python webpage on oit2.scps.nyu.edu

Let’s find the most popular /~meretzkm/python/ web page on the host oit2.scps.nyu.edu, and the number of times the page was downloaded. We saw the web server’s access_log file in For line. This script can be run only on a machine that has an access_log file.

The keys of the count dictionary are the filenames read from the access_log file. The value corresponding to each key is the number of times that filename appeared in the access_log file.

The fields[8] in line 29 is the HTTP status code that tells whether or not the web server successfully served out the requested web page. Codes 200 and 304 indicate success.

popular.py

963 /~meretzkm/python/

The above line of output means that most popular Python page on oit2.scps.nyu.edu is
http://oit2.scps.nyu.edu/~meretzkm/python/

Things to try

  1. If the filename is not already one of the keys in the dictionary, lines 30–31 will insert the filename into the dictionary together with a value of 0. Changes lines 28–32 from
        if "/~meretzkm/python/" in filename \
            and (fields[8] == "200" or fields[8] == "304"):
            if filename not in count:
                count[filename] = 0
            count[filename] += 1
    
    to
        if "/~meretzkm/python/" in filename \
            and (fields[8] == "200" or fields[8] == "304"):
            count[filename] = count.get(filename, 0) + 1
    
  2. To make the “add 1” statement even simpler, let count be a collections.defaultdict whose items are automatically created with a value of 0. The function that constructs a collections.defaultdict takes as its argument a function such as defaultValue that takes no arguments.
    import collections   #alternatives to the plain vanilla, built-in list and dictionary
    
    def defaultValue():
        return 0
    
    count = collections.defaultdict(defaultValue)   #Start with an empty defaultdict.
    
        if "/~meretzkm/python/" in filename \
            and (fields[8] == "200" or fields[8] == "304"):
            count[filename] += 1
    
    Better yet,
    #A lambda function that takes no arguments and returns 0.
    count = collections.defaultdict(lambda: 0)   #Start with an empty defaultdict.
    
  3. Here’s a simpler way to find the most popular page. Change lines 38–46 to the following code, which calls the get function we saw in exercise 1.
    mostPopular = max(count, key = count.get)   #mostPopular is a filename (a string)
    print(count[mostPopular], mostPopular)
    
  4. Instead of listing only the most popular page, list all the pages in order of decreasing popularity. Change lines 37–47 of the original program from
    #Find the most popular Python file and the number of times it was downloaded.
    downloads = 0
    
    for filename in count:    #Loop through all the keys in the dictionary.
        if count[filename] > downloads:
            #This file is more popular than any we've seen so far.
            popular = filename
            downloads = count[filename]
    
    if downloads == 0:
        sys.exit(1)           #No Python files in access_log?  Suspicious.
    
    print(downloads, popular)
    sys.exit(0)
    
    to the following. twoColumns is a list with the same number of items as count. Each item in twoColumns is a tuple of two items: a filename and its number of occurrences.
    twoColumns = count.items()
    
    def score(item):
        "item[0] is the filename, item[1] is its number of downloads."
        return item[1]
    
    twoColumns = sorted(twoColumns, key = score, reverse = True)
    
    for key, value in twoColumns:
        #key is filename, value is number of occurrences
        print("{:4} {}".format(value, key))
    
    sys.exit(0)
    
     964 /~meretzkm/python/
     298 /~meretzkm/python/stylesheet.css
     295 /~meretzkm/python/INFO1-CE9990/
     240 /~meretzkm/python/WS19PB02/homework.html
     206 /~meretzkm/python/WS19PB02/
     153 /~meretzkm/python/WS19PB02/github.html
     138 /~meretzkm/python/string/forURL.html
     135 /~meretzkm/python/control/while.html
     120 /~meretzkm/python/tkinter/tkflag.html
     116 /~meretzkm/python/control/for.html
    etc.
    
  5. Better yet, change the above call to sorted to the following. Remove the definition of the score function.
    twoColumns = sorted(twoColumns, key = lambda item: item[1], reverse = True):