Sort a list

One reason we store items in a list is so we can sort them.

Documentation

  1. the two sorting functions
    1. the sort method of class list
    2. the sorted built-in function
  2. Sorting HOWTO
  3. Key function
    1. key function in the Python Glossary
    2. Key Functions in the Sorting HOWTO
  4. Lambda functions
    1. lambda in the Python Glossary
    2. Lambda functions in the Python Tutorial
    3. lambda keyword in the Python Language Reference

Sort a list of numbers

By default (i.e., in the absence of a key function), the sort method sorts ints and floatss into increasing numeric order. That’s because the < operator, when placed between two numbers (a < b), returns True if the two numbers are in increasing numeric order.

sortnumbers.py

 476
1066
1492
1776
1812
1945
1984
2001
2019

Things to try

  1. Sort the numbers in decreasing order. reverse is a keyword argument of the sort function, just as sep and end are keyword arguments of the print function. See Ascending and Descending.
    years.sort(reverse = True)
    
    2019
    2001
    1984
    1945
    1812
    1776
    1492
    1066
     476
    
  2. Sort the numbers in order of how close to 1900 they are. My score function returns the distance between 1900 and the given number n; the absolute value ensures that this distance is never negative. Each number is given a score by being passed to score. The numbers are then sorted in order of their scores.

    Since the scoring function returns an int, the scores are ints. And since the scores are ints, the items in the list are sorted in increasing numeric order of their scores.

    My score function is an example of a key function.

  3. def score(year):
        "Return the distance between the year and 1900."
        return abs(year - 1900)
    
    for year in years:
        print(f"The score of {year:4} is {score(year):4}.")
    
    print()   #Skip a line.
    
    years.sort(key = score)
    
    The score of 1492 is  408.
    The score of 1776 is  124.
    The score of 1066 is  834.
    The score of 1812 is   88.
    The score of 1984 is   84.
    The score of 2001 is  101.
    The score of  476 is 1424.
    The score of 2019 is  119.
    The score of 1945 is   45.
    
    1945
    1984
    1812
    2001
    2019
    1776
    1492
    1066
     476
    
  4. The above score function contains only one statement (the return statement) and is called from only one place (the sort function). We can therefore replace it with a lambda function (i.e., a nameless function) that returns the same thing. You don’t have to give a name to a lambda function, and, in a lambda function, you don’t even have to write the words def and return.

    #Sort in order of increasing distance from 1900.
    years.sort(key = lambda year: abs(year - 1900))
    
    1945
    1984
    1812
    2001
    2019
    1776
    1492
    1066
     476
    
    #Sort in order of decreasing distance from 1900.
    years.sort(key = lambda year: abs(year - 1900), reverse = True)
    
     476
    1066
    1492
    1776
    2019
    2001
    1812
    1984
    1945
    
  5. Each number must be converted to a string before we can call count. That’s because a number does not have a count function, but a str does.
    #Sort in order of increasing number of zeros.
    years.sort(key = lambda year: str(year).count("0"))
    
    1492
    1776
    1812
    1984
     476
    1945
    1066
    2019
    2001
    
  6. #Sort in order of the sum of the digits.
    #Create a list of 4 or 3 ints with a list comprehension.
    
    years.sort(key = lambda year: sum([int(c) for c in str(year)]))
    
    2001   (sum is 3)
    1812
    2019
    1066
    1492
     476
    1945
    1776
    1984   (sum is 22)
    
  7. #Sort in order of the rightmost digit.
    years.sort(key = lambda year: year % 10)
    
    2001
    1492
    1812
    1984
    1945
    1776
    1066
     476
    2019
    
  8. #Put the even numbers first, the odd numbers last.
    #Note that within each of these two groups, the numbers are not necessarily in
    #increasing order.  We'll fix this in the next example.
    
    years.sort(key = lambda year: year % 2)
    
    1492
    1776
    1066
    1812
    1984
     476
    2001
    2019
    1945
    

    Each score in the following example is a list of two items. A pair of two-item lists are compared by comparing their first items, breaking ties by comparing their second items. See the second bullet in “Lexicographical comparison” in Value comparisons. Later, we’ll change the [square brackets] to (parentheses), i.e., the score will be a two-item tuple instead of a two-item list.

    #Put the even numbers first, the odd numbers last.
    #Within each of these two groups, sort the numbers into increasing order.
    
    years.sort(key = lambda year: [year % 2, year])
    
     476
    1066
    1492
    1776
    1812
    1984
    1945
    2001
    2019
    
    #Put the even numbers first, the odd numbers last.
    #Within each of these two groups, sort the numbers into decreasing order.
    
    years.sort(key = lambda year: [year % 2, -year])
    
    1984
    1812
    1776
    1492
    1066
     476
    2019
    2001
    1945
    

Sort a list of strings

By default (i.e., in the absence of a key function), the sort method sorts strings into case-sensitive alphabetical order. That’s because the < operator, when placed between two strings (a < b), returns True if the two strings are in case-sensitive alphabetical order.

sortstrings.py

dog
dragon
hare
horse
monkey
ox
pig
rat
rooster
sheep
snake
tiger

Things to try

  1. Sort the strings in reverse alphabetical order.
    animals.sort(reverse = True)
    
    tiger
    snake
    sheep
    rooster
    rat
    pig
    ox
    monkey
    horse
    hare
    dragon
    dog
    
  2. Sort the strings in order of increasing (or decreasing) length.

    Since the scoring function returns an int, the scores are ints. And since the scores are ints, the items in the list are sorted in increasing numeric order of their scores.

    animals.sort(key = lambda animal: len(animal))   #unnecessarily complicated
    
    #Simpler way to do the same thing:
    #use the built-in function len directly as the scoring function.
    
    animals.sort(key = len)
    
    ox
    dog
    pig
    rat
    hare
    tiger
    snake
    horse
    sheep
    monkey
    dragon
    rooster
    
    animals.sort(key = len, reverse = True)
    
    rooster
    monkey
    dragon
    tiger
    snake
    horse
    sheep
    hare
    dog
    pig
    rat
    ox
    
  3. In case-sensitive alphabetical order, uppercase letters come before lowercase ones as in the ASCII chart.
    words = [
        "boy",
        "Cow",
        "Apple",
        "Boy",
        "apple",
        "cow"
    ]
    
    words.sort()   #Alphabetical order is case-sensitive by default.
    
    for word in words:
        print(word)
    
    Apple
    Boy
    Cow
    apple
    boy
    cow
    
  4. Until now, the score of each item has been a number. But scores can also be strings. When you take a course, for example, your grade could be one of "A", "B", "C", "D", "F".

    Let’s sort the strings in case-insensitive alphabetical order. Since this scoring function returns a string, the scores are strings. And since the scores are strings, the items in the list are sorted in case-sensitive alphabetical order of their scores. (Since the scores are all one case (lower), this case-sensitivity doesn’t matter. In the following code, upper would work just as well as lower.)

    words = [
        "boy",
        "Cow",
        "Apple",
        "Boy",
        "apple",
        "cow"
    ]
    
    words.sort(key = lambda word: word.lower())   #case-insensitive alphabetical order
    
    #Another way to make a lambda function that simply calls one method.  Remember to import operator.
    #words.sort(key = operator.methodcaller("lower"))
    
    for word in words:
        print(word)
    
    Apple
    apple
    boy
    Boy
    Cow
    cow
    

    In the above red and yellow groups, the capitalized word precedes the uncapitalized one. But in the above orange group, the the uncapitalized word precedes the capitalized one. We’ll fix this in the next exercise.

  5. Sort the strings in case-insensitive alphabetical order. If two or more strings are equal in case-insensitive alphabetical order, decide which of them goes first by sorting them in case-sensitive alphabetical order.
    words = [
        "boy",
        "Cow",
        "Apple",
        "Boy",
        "apple",
        "cow"
    ]
    
    words.sort(key = lambda word: [word.lower(), word])
    
    for word in words:
        print(word)
    

    Within the red group, the lines are now in case-insensitive alphabetical order (i.e., uppercase first). Ditto for the other groups.

    Apple
    apple
    Boy
    boy
    Cow
    cow
    
  6. Sort the strings in order of increasing length. Break any ties by sorting in order of how many e’s they contain. Each score is a list of two items. A pair of two-item lists are compared by comparing their first items, breaking ties by comparing their second items.
    def score(animal):
        return [len(animal), animal.count("e")]
    
    animals.sort(key = score)
    
    animals.sort(key = lambda animal: [len(animal), animal.count("e")])
    
    ox
    dog
    pig
    rat
    hare
    tiger
    snake
    horse
    sheep
    dragon
    monkey
    rooster
    

    Sort by length. Break ties by number of e’s. Break further ties by alphabetical order.

    animals.sort(key = lambda animal: [len(animal), animal.count("e"), animal])
    
    ox
    dog
    pig
    rat
    hare
    horse
    snake
    tiger
    sheep
    dragon
    monkey
    rooster
    
  7. Sort by last name, break ties by first name, break further ties by middle name. An absent middle name will come alphabetically before any present middle name.
    persons = [
        "John Adam Doe",
        "John Barry Smith",
        "John Adam Smith",
        "Jane Ann Doe",
        "Jane Ann Smith",
        "John Smith"
    ]
    
    def score(person):
        names = person.split()
        if len(names) == 2:     #no middle name
            names.insert(1, "") #Give them an empty middle name.
        return [names[2], names[0], names[1]]   #last, first, middle
    
    persons.sort(key = score)
    
    for i, person in enumerate(persons, start = 1):
        print(f"{i:2} {person}")
    
     1 Jane Ann Doe
     2 John Adam Doe
     3 Jane Ann Smith
     4 John Smith
     5 John Adam Smith
     6 John Barry Smith
    
  8. Let’s go back to plain vanilla alphabetical order. Sort the list of strings in alphabetical order. Print the original and new orders side by side.

    We must make a copy of the list and then sort the copy. But the following code does not make a copy of the list.

    animals = [
        "monkey",  # 0
        "rooster", # 1
        "dog",     # 2
        "pig",     # 3
        "rat",     # 4
        "ox",      # 5
        "tiger",   # 6
        "hare",    # 7
        "dragon",  # 8
        "snake",   # 9
        "horse",   #10
        "sheep"    #11
    ]
    
    sortedAnimals = animals   #Try (unsuccessfully) to make a copy of the list.
    sortedAnimals.sort()
    
    for animal, sortedAnimal in zip(animals, sortedAnimals):
        #Right pad the left column with spaces.
        print(f"{animal:7} {sortedAnimal}")
    
    dog     dog
    dragon  dragon
    hare    hare
    horse   horse
    monkey  monkey
    ox      ox
    pig     pig
    rat     rat
    rooster rooster
    sheep   sheep
    snake   snake
    tiger   tiger
    

    Here’s what went wrong. The Python script has only one list, and originally we could refer to (i.e., access) that list only by using the variable animals.

    Instead of making a copy of that list, the above assignment statement sortedAnimals = animals merely made it possible to refer to that list by using either one of the variables sortedAnimals or animals. But there is still only one list.

    So how can we create a copy of a list? If a variable contains a plain old number, the number can be copied simply by writing an equal sign:

    i = 10
    j = i   #Create a copy of the number.
    

    If a variable refers to a list, we have to do more than just write an equal sign in order to copy the list. We also have to call the copy function that belongs to the list.

    animals = [
        "monkey",  # 0
        "rooster", # 1
        "dog",     # 2
        "pig",     # 3
        "rat",     # 4
        "ox",      # 5
        "tiger",   # 6
        "hare",    # 7
        "dragon",  # 8
        "snake",   # 9
        "horse",   #10
        "sheep"    #11
    ]
    
    #Create a copy of the list.
    sortedAnimals = animals.copy() #or sortedAnimals = animals[:]
    
    sortedAnimals.sort()           #Now we can sort without disturbing the original list.
    
    for animal, sortedAnimal in zip(animals, sortedAnimals):
        #Right pad the left column with spaces.
        print(f"{animal:7} {sortedAnimal}")
    
    monkey  dog
    rooster dragon
    dog     hare
    pig     horse
    rat     monkey
    ox      ox
    tiger   pig
    hare    rat
    dragon  rooster
    snake   sheep
    horse   snake
    sheep   tiger
    
  9. Here’s a way to copy and sort a list at the same time. The following call to the sorted built-in function creates a new list named sortedAnimals and sorts this new list, without making any change to the original list animals. Unlike the sort function, sorted is an orphan: it does not belong to anything. In other words, there is no dot in front of its name.
    animals = [
        "monkey",  # 0
        "rooster", # 1
        "dog",     # 2
        "pig",     # 3
        "rat",     # 4
        "ox",      # 5
        "tiger",   # 6
        "hare",    # 7
        "dragon",  # 8
        "snake",   # 9
        "horse",   #10
        "sheep"    #11
    ]
    
    sortedAnimals = sorted(animals)
    assert len(sortedAnimals) == len(animals)   #just to explain what's going on
    
    for animal, sortedAnimal in zip(animals, sortedAnimals):
        print(f"{animal:7} {sortedAnimal}")
    
    monkey  dog
    rooster dragon
    dog     hare
    pig     horse
    rat     monkey
    ox      ox
    tiger   pig
    hare    rat
    dragon  rooster
    snake   sheep
    horse   snake
    sheep   tiger
    
  10. Sort the cards of a poker hand in order of increasing rank. Aces are high.

    Some of the cards are single-digit numbers, some are double-digit, and some are non-numeric, so I thought the simplest way to write the score function was to put all possible cards into the list named increasing. If index doesn’t find what it’s looking for, it will raise a ValueError exception, which means we should now put our call to sort between a try and an except. (Later we’ll write the score function more simply with a dict.)

    import sys
    
    cards = [
        "3",
        "J",
        "10",
        "7",
        "Q"
    ]
    
    #a list of 13 strings
    increasing = ["2", "3", "4", "5", "6", "7", "8", "9", "10", "J", "Q", "K", "A"]
    
    def score(card):
        "Return an int in the range 0 to 12 inclusive, or raise ValueError."
        return increasing.index(card)
    
    try:
        cards.sort(key = score)
    except ValueError as error:
        print(error, file = sys.stderr)
        sys.exit(1)
    
    for card in cards:
        print(card)
    
    3
    7
    10
    J
    Q
    

    If the list of cards contains an illegal string such as "X", sort will raise a ValueError because index will raise a ValueError. The ValueError carries information about what went wrong. This information will be copied into the variable error and printed by the print under the keyword except:

    'X' is not in list
    
    A simpler way to do the same thing:
        cards.sort(key = lambda s: increasing.index(s))
    
  11. Sort days in chronological order, starting with Sunday and ending with Saturday.
    days = [
        "Wednesday",
        "Monday",
        "Tuesday",
        "Thursday",
        "Saturday",
        "Monday"
    ]
    
    chronologicalOrder = [
        "Sunday",    #0
        "Monday",    #1
        "Tuesday",   #2
        "Wednesday", #3
        "Thursday",  #4
        "Friday",    #5
        "Saturday"   #6
    ]
    
    days.sort(key = lambda day: chronologicalOrder.index(day))
    
    for day in days:
        print(day)
    
    Monday
    Monday
    Tuesday
    Wednesday
    Thursday
    Saturday
    
  12. Sort states in alphabetical order, except that “Miscellaneous” comes last. How could you make it come first?
    states = [
        "New YorK",
        "New Jersey",
        "Connecticut",
        "Miscellaneous",
        "Pennsylvania",
        "Massachusetts"
    ]
    
    states.sort(key = lambda state: "ZZZZZ" if state == "Miscellaneous" else state)
    
    for state in states:
        print(state)
    
    Connecticut
    Massachusetts
    New Jersey
    New YorK
    Pennsylvania
    Miscellaneous
    
  13. A trick you should know about. If you’re sorting strings that look like non-negative integers, and each string has the same number of digits, then an alphabetical sort will rearrange the strings into ascending numerical order. That’s because all characters, not just the letters, have a place in one big alphabetrical order; see the ASCII chart. This trick works only if all the integers have the same number of digits.
    years = [
        "1776",
        "1492",
        "1968",
        "1945",
        "2019",
        "1066"
    ]
    
    years.sort()   #alphabetical sort, because years is a list of strings.
    
    for year in years:
        print(year)
    
    1066
    1492
    1776
    1945
    1968
    2019
    

    The trick fails if the integers don’t all have the same length:

    years = [
        "1776",
        "1492",
        "1968",
         "476",
        "1945",
        "2019",
        "1066"
    ]
    
    years.sort()   #alphabetical sort, because years is a list of strings.
    
    for year in years:
        print(year)
    
    1066
    1492
    1776
    1945
    1968
    2019
    476
    

    Two ways to get the correct order if the integers do not all have the same number of digits:

    years = [
        "1776",
        "1492",
        "1968",
         "476",
        "1945",
        "2019",
        "1066"
    ]
    
    years.sort(key = lambda year: f"{year:>4}") #Left-pad the shorter strings with spaces.
    years.sort(key = lambda year: int(year))    #Perform a numerical sort.
    
    for year in years:
        print(year)
    
    476
    1066
    1492
    1776
    1945
    1968
    2019
    
  14. Sort a list of addresses by increasing zip code. Assume that the fourth field on each line is always the zip code, and assume that the zip code is always exactly five digits. Since all the zip codes are strings with the same number of characters, alphabetical order is the same as numerical order.
    addresses = [
        "350 Fifth Avenue,New York,NY,10118",           #Empire State Building
        "1600 Pennsylvania Avenue,Washington,DC,20500", #White House
        "89 East 42nd Street,New York,NY,10017",        #Grand Central Terminal
        "405 Lexington Avenue,New York,NY,10174",       #Chrysler Building
        "1000 Fifth Avenue,New York,NY,10028",          #Metropolitan Museum of Art
        "1491 Mill Run Road,Mill Run,PA,15464"          #Fallingwater
    ]
    
    def score(address):
        "Return the address's zip code."
        fields = address.split(",") #fields is a list of 4 strings
        return fields[3]            #fields[3] is a string of 5 characters that are digits
    
    addresses.sort(key = score)
    
    for address in addresses:
        print(address)
    
    89 East 42nd Street,New York,NY,10017
    1000 Fifth Avenue,New York,NY,10028
    350 Fifth Avenue,New York,NY,10118
    405 Lexington Avenue,New York,NY,10174
    1491 Mill Run Road,Mill Run,PA,15464
    1600 Pennsylvania Avenue,Washington,DC,20500
    
    A simpler way to do the same thing:
    addresses.sort(key = lambda address: address.split(",")[3])   #by zip code