- Mastering Python
- Rick van Hattem
- 2502字
- 2021-07-16 11:10:32
Common pitfalls
Python is a language meant to be clear and readable without any ambiguities and unexpected behaviors. Unfortunately, these goals are not achievable in all cases, and that is why Python does have a few corner cases where it might do something different than what you were expecting.
This section will show you some issues that you might encounter when writing Python code.
Scope matters!
There are a few cases in Python where you might not be using the scope that you are actually expecting. Some examples are when declaring a class and with function arguments.
The following example shows a case that breaks due to a careless choice in default parameters:
def spam(key, value, list_=[], dict_={}): list_.append(value) dict_[key] = value print('List: %r' % list_) print('Dict: %r' % dict_) spam('key 1', 'value 1') spam('key 2', 'value 2')
You would probably expect the following output:
List: ['value 1'] Dict: {'key 1': 'value 1'} List: ['value 2'] Dict: {'key 2': 'value 2'}
But it's actually this:
List: ['value 1'] Dict: {'key 1': 'value 1'} List: ['value 1', 'value 2'] Dict: {'key 1': 'value 1', 'key 2': 'value 2'}
The reason is that list_
and dict_
are actually shared between multiple calls. The only time this is actually useful is if you are doing something hacky, so please avoid using mutable objects as default parameters in a function.
The safe alternative of the same example is as follows:
def spam(key, value, list_=None, dict_=None): if list_ is None: list_ = [] if dict_ is None: dict_ {} list_.append(value) dict_[key] = value
The problem also occurs when defining classes. It is very easy to mix class attributes and instance attributes. Especially when coming from other languages such as C#, this can be confusing. Let's illustrate it:
class Spam(object): list_ = [] dict_ = {} def __init__(self, key, value): self.list_.append(value) self.dict_[key] = value print('List: %r' % self.list_) print('Dict: %r' % self.dict_) Spam('key 1', 'value 1') Spam('key 2', 'value 2')
As with the function arguments, the list and dictionaries are shared. So, the output is as follows:
List: ['value 1'] Dict: {'key 1': 'value 1'} List: ['value 1', 'value 2'] Dict: {'key 1': 'value 1', 'key 2': 'value 2'}
A better alternative is to initialize the mutable objects within the __init__
method of the class. This way, they are not shared between instances:
class Spam(object): def __init__(self, key, value): self.list_ = [key] self.dict_ = {key: value} print('List: %r' % self.list_) print('Dict: %r' % self.dict_)
Another important thing to note when dealing with classes is that a class property will be inherited, and that's where things might prove to be confusing. When inheriting, the original properties will stay (unless overwritten), even in subclasses:
>>> class A(object): ... spam = 1 >>> class B(A): ... pass Regular inheritance, the spam attribute of both A and B are 1 as you would expect. >>> A.spam 1 >>> B.spam 1 Assigning 2 to A.spam now modifies B.spam as well >>> A.spam = 2 >>> A.spam 2 >>> B.spam 2
While this is to be expected due to inheritance, someone else using the class might not suspect the variable to change in the meantime. After all, we modified A.spam
, not B.spam
.
There are two easy ways to prevent this. It is obviously possible to simply set spam
for every class separately. But the better solution is never to modify class properties. It's easy to forget that the property will change in multiple locations, and if it has to be modifiable anyway, it's usually better to put it in an instance variable instead.
A common problem when accessing variables from the global scope is that setting a variable makes it local, even when accessing the global variable.
This works:
>>> def eggs(): ... print('Spam: %r' % spam) >>> eggs() Spam: 1
But the following does not:
>>> spam = 1 >>> def eggs(): ... spam += 1 ... print('Spam: %r' % spam) >>> eggs() Traceback (most recent call last): ... UnboundLocalError: local variable 'spam' referenced before assignment
The problem is that spam += 1
actually translates to spam = spam + 1
, and anything containing spam =
makes the variable local to your scope. Since the local variable is being assigned at that point, it has no value yet and you are trying to use it. For these cases, there is the global
statement, although I would really recommend that you avoid globals altogether.
Overwriting and/or creating extra built-ins
While it can be useful in some cases, generally you will want to avoid overwriting global functions. The PEP8
convention for naming your functions—similar to built-in statements, functions, and variables—is to use a trailing underscore.
So, do not use this:
list = [1, 2, 3]
Instead, use the following:
list_ = [1, 2, 3]
For lists and such, this is just a good convention. For statements such as from
, import
, and with
, it's a requirement. Forgetting about this can lead to very confusing errors:
>>> list = list((1, 2, 3)) >>> list [1, 2, 3] >>> list((4, 5, 6)) Traceback (most recent call last): ... TypeError: 'list' object is not callable >>> import = 'Some import' Traceback (most recent call last): ... SyntaxError: invalid syntax
If you actually want to define a built-in that is available everywhere, it's possible. For debugging purposes, I've been known to add this code to a project while developing:
import builtins import inspect import pprint import re def pp(*args, **kwargs): '''PrettyPrint function that prints the variable name when available and pprints the data''' name = None # Fetch the current frame from the stack frame = inspect.currentframe().f_back # Prepare the frame info frame_info = inspect.getframeinfo(frame) # Walk through the lines of the function for line in frame_info[3]: # Search for the pp() function call with a fancy regexp m = re.search(r'\bpp\s*\(\s*([^)]*)\s*\)', line) if m: print('# %s:' % m.group(1), end=' ') break pprint.pprint(*args, **kwargs) builtins.pf = pprint.pformat builtins.pp = pp
Much too hacky for production code, but it is still useful when working on a large project where you need print statements to debug. Alternative (and better) debugging solutions can be found in Chapter 11, Debugging – Solving the Bugs.
The usage is quite simple:
x = 10 pp(x)
Here is the output:
# x: 10
Modifying while iterating
At one point or another, you will run into this problem: while iterating through mutable objects such as lists, dicts, or sets, you cannot modify them. All of these result in a RuntimeError
telling you that you cannot modify the object during iteration:
dict_ = {'spam': 'eggs'} list_ = ['spam'] set_ = {'spam', 'eggs'} for key in dict_: del dict_[key] for item in list_: list_.remove(item) for item in set_: set_.remove(item)
This can be avoided by copying the object. The most convenient option is by using the list
function:
dict_ = {'spam': 'eggs'} list_ = ['spam'] set_ = {'spam', 'eggs'} for key in list(dict_): del dict_[key] for item in list(list_): list_.remove(item) for item in list(set_): set_.remove(item)
Catching exceptions – differences between Python 2 and 3
With Python 3, catching an exception and storing it has been made more obvious with the as
statement. The problem is that many people are still used to the except Exception, variable
syntax, which doesn't work anymore. Luckily, the Python 3 syntax has been backported to Python 2, so now you can use the following syntax everywhere:
try: ... # do something here except (ValueError, TypeError) as e: print('Exception: %r' % e)
Another important difference is that Python 3 makes this variable local to the exception scope. The result is that you need to declare the exception variable before the try
/except
block if you want to use it later:
def spam(value): try: value = int(value) except ValueError as exception: print('We caught an exception: %r' % exception) return exception spam('a')
You might expect that since we get an exception here, this works; but actually, it doesn't, because exception
does not exist at the point of the return
statement.
The actual output is as follows:
We caught an exception: ValueError("invalid literal for int() with base 10: 'a'",) Traceback (most recent call last): File "test.py", line 14, in <module> spam('a') File "test.py", line 11, in spam return exception UnboundLocalError: local variable 'exception' referenced before assignment
Personally I would argue that the preceding code is broken in any case: what if there isn't an exception somehow? It would have raised the same error. Luckily, the fix is simple; just write the value to a variable outside of the scope. One important thing to note here is that you explicitly need to save the variable to the parent scope. This code does not work either:
def spam(value): exception = None try: value = int(value) except ValueError as exception: print('We caught an exception: %r' % exception) return exception
We really need to save it explicitly because Python 3 automatically deletes anything saved with as variable
at the end of the except
statements. The reason for this is that exceptions in Python 3 contain a __traceback__
attribute. Having this attribute makes it much more difficult for the garbage collector to handle as it introduces a recursive self-referencing cycle (exception -> traceback -> exception -> traceback… ad nauseum). To solve this, Python essentially does the following:
exception = None try: value = int(value) except ValueError as exception: try: print('We caught an exception: %r' % exception) finally: del exception
The solution is simple enough—luckily—but you should keep in mind that this can introduce memory leaks into your program. The Python garbage collector is smart enough to understand that the variables are not visible anymore and will delete it eventually, but it can take a lot more time. How the garbage collection actually works is covered in Chapter 12, Performance – Tracking and Reducing Your Memory and CPU Usage. Here is the working version of the code:
def spam(value): exception = None try: value = int(value) except ValueError as e: exception = e print('We caught an exception: %r' % exception) return exception
Late binding – be careful with closures
Closures are a method of implementing local scopes in code. They make it possible to locally define variables without overriding variables in the parent (or global) scope and hide the variables from the outside scope later. The problem with closures in Python is that Python tries to bind its variables as late as possible for performance reasons. While generally useful, it does have some unexpected side effects:
eggs = [lambda a: i * a for i in range(3)] for egg in eggs: print(egg(5))
The expected result? Should be something along the lines of this, right?
0 5 10
No, unfortunately not. This is similar to how class inheritance works with properties. Due to late binding, the variable i
gets called from the surrounding scope at call time, and not when it's actually defined.
The actual result is as follows:
10 10 10
So what to do instead? As with the cases mentioned earlier, the variable needs to be made local. One alternative is to force immediate binding by currying the function with partial
:
import functools eggs = [functools.partial(lambda i, a: i * a, i) for i in range(3)] for egg in eggs: print(egg(5))
A better solution would be to avoid binding problems altogether by not introducing extra scopes (the lambda
), that use external variables. If both i
and a
were specified as arguments to lambda
, this will not be a problem.
Circular imports
Even though Python is fairly tolerant towards circular imports, there are some cases where you will get errors.
Let's assume we have two files.
eggs.py
:
from spam import spam def eggs(): print('This is eggs') spam()
spam.py
:
from eggs import eggs def spam(): print('This is spam') if __name__ == '__main__': eggs()
Running spam.py
will result in a circular import
error:
Traceback (most recent call last): File "spam.py", line 1, in <module> from eggs import eggs File "eggs.py", line 1, in <module> from spam import spam File "spam.py", line 1, in <module> from eggs import eggs ImportError: cannot import name 'eggs'
There are a few ways to work around this. Restructuring the code is usually the best to go around, but the best solution depends on the problem. In the preceding case, it can be solved easily. Just use module imports instead of function imports (which I recommend regardless of circular imports).
eggs.py
:
import spam def eggs(): print('This is eggs') spam.spam()
spam.py
:
import eggs def spam(): print('This is spam') if __name__ == '__main__': eggs.eggs()
An alternative solution is to move the imports within the functions so that they occur at runtime. This is not the prettiest solution but it does the trick in many cases.
eggs.py
:
def eggs(): from spam import spam print('This is eggs') spam()
spam.py
:
def spam(): from eggs import eggs print('This is spam') if __name__ == '__main__': eggs()
Lastly there is the solution of moving the imports below the code that actually uses them. This is generally not recommended because it can make it non-obvious where the imports are, but I still find it preferable to having the import
within the function calls.
eggs.py
:
def eggs(): print('This is eggs') spam() from spam import spam
spam.py
:
def spam(): print('This is spam') from eggs import eggs if __name__ == '__main__': eggs()
And yes, there are still other solutions such as dynamic imports. One example of this is how the Django ForeignKey
fields support strings instead of actual classes. But those are generally a really bad idea to use since they will be checked only at runtime. Because of this, bugs will introduce themselves only when executing any code that uses it instead of when modifying the code. So please try to avoid these whenever possible, or make sure you add proper automated tests to prevent unexpected bugs. Especially when they cause circular imports internally, they become an enormous pain to debug.
Import collisions
One problem that can be extremely confusing is having colliding imports—multiple packages/modules with the same name. I have had more than a few bug reports on my packages where, for example, people tried to use my numpy-stl
project, which resides in a package named stl
from a test file named stl.py
. The result: it was importing itself instead of the stl
package. While this case is difficult to avoid, at least within packages, a relative import is generally a better option. This is because it also tells other programmers that the import comes from the local scope instead of another package. So, instead of writing import spam
, write from . import spam
. This way, the code will always load from the current package instead of any global package that happens to have the same name.
In addition to this there is also the problem of packages being incompatible with each other. Common names might be used by several packages, so be careful when installing those packages. When in doubt, just create a new virtual environment and try again. Doing this can save you a lot of debugging.