Building lists – literals, appending, and comprehensions
If we've decided to create a collection based on each item's position in the container—a list—we have several ways of building this structure. We'll look at a number of ways we can assemble a list object from the inpidual items.
In some cases, we'll need a list because it allows duplicate values. This is common in statistical work, where we will have duplicates but we don't require the index positions. A different structure, called a multiset, would be useful for a statistically oriented collection that permits duplicates. This kind of collection isn't built-in (although collections.Counter is an excellent multiset, as long as items are immutable), leading us to use a list object.
Getting ready
Let's say we need to do some statistical analyses of some file sizes. Here's a short script that will provide us with the sizes of some files:
>>> from pathlib import Path
>>> home = Path.cwd()
>>> for path in home.glob('data/*.csv'):
... print(path.stat().st_size, path.name)
1810 wc1.csv
28 ex2_r12.csv
1790 wc.csv
215 sample.csv
45 craps.csv
28 output.csv
225 fuel.csv
166 waypoints.csv
412 summary_log.csv
156 fuel2.csv
We've used a pathlib.Path object to represent a directory in our filesystem. The glob() method expands all names that match a given pattern. In this case, we used a pattern of 'data/*.csv' to locate all CSV-formatted data files. We can use the for statement to assign each item to the path variable. The print() function displays the size from the file's OS stat data and the name from the Path instance, path.
We'd like to accumulate a list object that has the various file sizes. From that, we can compute the total size and average size. We can look for files that seem too large or too small.
How to do it...
We have many ways to create list objects:
- Literal: We can create a literal display of a list using a sequence of values surrounded by [] characters. It looks like this: [value, ... ]. Python needs to match the [ and ] to see a complete logical line, so the literal can span physical lines. For more information, refer to the Writing long lines of code recipe in Chapter 2, Statements and Syntax.
- Conversion Function: We can convert some other data collection into a list using the list() function. We can convert a set, or the keys of a dict, or the values of a dict. We'll look at a more sophisticated example of this in the Slicing and dicing a list recipe.
- Append Method: We have list methods that allow us to build a list one item a time. These methods include append(), extend(), and insert(). We'll look at append() in the Building a list with the append() method section of this recipe. We'll look at the other methods in the How to do it… and There's more... sections of this recipe.
- Comprehension: A comprehension is a specialized generator expression that describes the items in a list using a sophisticated expression to define membership. We'll look at this in detail in the Writing a list comprehension section of this recipe.
- Generator Expression: We can use generator expressions to build list objects. This is a generalization of the idea of a list comprehension. We'll look at this in detail in the Using the list function on a generator expression section of this recipe.
The first two ways to create lists are single Python expressions. We won't provide recipes for these. The last three are more complex, and we'll show recipes for each of them.
Building a list with the append() method
- Create an empty list using literal syntax, [], or the list() function:
>>> file_sizes = []
- Iterate through some source of data. Append the items to the list using the append() method:
>>> home = Path.cwd() >>> for path in home.glob('data/*.csv'): ... file_sizes.append(path.stat().st_size) >>> print(file_sizes) [1810, 28, 1790, 160, 215, 45, 28, 225, 166, 39, 412, 156] >>> print(sum(file_sizes)) 5074
We used the path's glob() method to find all files that match the given pattern. The stat() method of a path provides the OS stat data structure, which includes the size, st_size, in bytes.
When we print the list, Python displays it in literal notation. This is handy if we ever need to copy and paste the list into another script.
It's very important to note that the append() method does not return a value. The append() method mutates the list object, and does not return anything.
Generally, almost all methods that mutate an object have no return value. Methods like append(), extend(), sort(), and reverse() have no return value. They adjust the structure of the list object itself. The notable exception is the pop() method, which mutates a collection and returns a value.
It's surprisingly common to see wrong code like this:
a = ['some', 'data']
a = a.append('more data')
This is emphatically wrong. This will set a to None. The correct approach is a statement like this, without any additional assignment:
a.append('more data')
Writing a list comprehension
The goal of a list comprehension is to create an object that occupies a syntax role, similar to a list literal:
- Write the wrapping [] brackets that surround the list object to be built.
- Write the source of the data. This will include the target variable. Note that there's no : at the end because we're not writing a complete statement:
for path in home.glob('data/*.csv')
- Prefix this with the expression to evaluate for each value of the target variable. Again, since this is only a single expression, we cannot use complex statements here:
[path.stat().st_size for path in home.glob('data/*.csv')]
In some cases, we'll need to add a filter. This is done with an if clause, included after the for clause. We can make the generator expression quite sophisticated.
Here's the entire list object construction:
>>> [path.stat().st_size
... for path in home.glob('data/*.csv')]
[1810, 28, 1790, 160, 215, 45, 28, 225, 166, 39, 412, 156]
Now that we've created a list object, we can assign it to a variable and do other calculations and summaries on the data.
The list comprehension is built around a central generator expression, called a comprehension in the language manual. The generator expression at the heart of the comprehension has a data expression clause and a for clause. Since this generator is an expression, not a complete statement, there are some limitations on what it can do. The data expression clause is evaluated repeatedly, driven by the variables assigned in the for clause.
Using the list function on a generator expression
We'll create a list function that uses the generator expression:
- Write the wrapping list() function that surrounds the generator expression.
- We'll reuse steps 2 and 3 from the list comprehension version to create a generator expression. Here's the generator expression:
list(path.stat().st_size for path in home.glob('data/*.csv'))
Here's the entire list object:
>>> list(path.stat().st_size
... for path in home.glob('data/*.csv'))
[1810, 28, 1790, 160, 215, 45, 28, 225, 166, 39, 412, 156]
Using the explicit list() function had an advantage when we consider the possibility of changing the data structure. We can easily replace list() with set(). In the case where we have a more advanced collection class, which is the subject of Chapter 6, User Inputs and Outputs, we may use one of our own customized collections here. List comprehension syntax, using [], can be a tiny bit harder to change because [] are used for many things in Python.
How it works...
A Python list object has a dynamic size. The bounds of the array are adjusted when items are appended or inserted, or list is extended with another list. Similarly, the bounds shrink when items are popped or deleted. We can access any item very quickly, and the speed of access doesn't depend on the size of the list.
In rare cases, we might want to create a list with a given initial size, and then set the values of the items separately. We can do this with a list comprehension, like this:
sieve = [True for i in range(100)]
This will create a list with an initial size of 100 items, each of which is True. It's rare to need this, though, because lists can grow in size as needed. We might need this kind of initialization to implement the Sieve of Eratosthenes:
>>> sieve[0] = sieve[1] = False
>>> for p in range(100):
... if sieve[p]:
... for n in range(p*2, 100, p):
... sieve[n] = False
>>> prime = [p for p in range(100) if sieve[p]]
The list comprehension syntax, using [], and the list() function both consume items from a generator and append them to create a new list object.
There's more...
A common goal for creating a list object is to be able to summarize it. We can use a variety of Python functions for this. Here are some examples:
>>> sizes = list(path.stat().st_size
... for path in home.glob('data/*.csv'))
>>> sum(sizes)
5074
>>> max(sizes)
1810
>>> min(sizes)
28
>>> from statistics import mean
>>> round(mean(sizes), 3)
422.833
We've used the built-in sum(), min(), and max() methods to produce some descriptive statistics of these document sizes. Which of these index files is the smallest? We want to know the position of the minimum in the list of values. We can use the index() method for this:
>>> sizes.index(min(sizes))
1
We found the minimum, and then used the index() method to locate the position of that minimal value.
Other ways to extend a list
We can extend a list object, as well as insert one into the middle or beginning of a list. We have two ways to extend a list: we can use the + operator or we can use the extend() method. Here's an example of creating two lists and putting them together with +:
>>> home = Path.cwd()
>>> ch3 = list(path.stat().st_size
... for path in home.glob('Chapter_03/*.py'))
>>> ch4 = list(path.stat().st_size
... for path in home.glob('Chapter_04/*.py'))
>>> len(ch3)
12
>>> len(ch4)
16
>>> final = ch3 + ch4
>>> len(final)
28
>>> sum(final)
61089
We have created a list of sizes of documents with names like chapter_03/*.py. We then created a second list of sizes of documents with a slightly different name pattern, chapter_04/*.py. We then combined the two lists into a final list.
We can do this using the extend() method as well. We'll reuse the two lists and build a new list from them:
>>> final_ex = []
>>> final_ex.extend(ch3)
>>> final_ex.extend(ch4)
>>> len(final_ex)
28
>>> sum(final_ex)
61089
Previously, we noted that the append() method does not return a value. Similarly, the extend() method does not return a value either. Like append(), the extend() method mutates the list object "in-place."
We can insert a value prior to any particular position in a list as well. The insert() method accepts the position of an item; the new value will be before the given position:
>>> p = [3, 5, 11, 13]
>>> p.insert(0, 2)
>>> p
[2, 3, 5, 11, 13]
>>> p.insert(3, 7)
>>> p
[2, 3, 5, 7, 11, 13]
We've inserted two new values into a list object. As with the append() and extend() methods, the insert() method does not return a value. It mutates the list object.
See also
- Refer to the Slicing and dicing a list recipe for ways to copy lists and pick sublists from a list.
- Refer to the Deleting from a list – deleting, removing, popping, and filtering recipe for other ways to remove items from a list.
- In the Reversing a copy of a list recipe, we'll look at reversing a list.
- This article provides some insights into how Python collections work internally:
https://wiki.python.org/moin/TimeComplexity. When looking at the tables, it's important to note the expression O(1) means that the cost is essentially constant. The expression O(n) means the cost varies with the index of the item we're trying to process; the cost grows as the size of the collection grows.