官术网_书友最值得收藏!

Common pitfalls

Python is a language meant to be clear and readable without any ambiguities and unexpected behaviors. Unfortunately, these goals are not achievable in all cases, and that is why Python does have a few corner cases where it might do something different than what you were expecting.

This section will show you some issues that you might encounter when writing Python code.

Scope matters!

There are a few cases in Python where you might not be using the scope that you are actually expecting. Some examples are when declaring a class and with function arguments.

Function arguments

The following example shows a case that breaks due to a careless choice in default parameters:

def spam(key, value, list_=[], dict_={}):
    list_.append(value)
    dict_[key] = value

    print('List: %r' % list_)
    print('Dict: %r' % dict_)

spam('key 1', 'value 1')
spam('key 2', 'value 2')

You would probably expect the following output:

List: ['value 1']
Dict: {'key 1': 'value 1'}
List: ['value 2']
Dict: {'key 2': 'value 2'}

But it's actually this:

List: ['value 1']
Dict: {'key 1': 'value 1'}
List: ['value 1', 'value 2']
Dict: {'key 1': 'value 1', 'key 2': 'value 2'}

The reason is that list_ and dict_ are actually shared between multiple calls. The only time this is actually useful is if you are doing something hacky, so please avoid using mutable objects as default parameters in a function.

The safe alternative of the same example is as follows:

def spam(key, value, list_=None, dict_=None):
    if list_ is None:
        list_ = []

    if dict_ is None:
        dict_ {}

    list_.append(value)
    dict_[key] = value

Class properties

The problem also occurs when defining classes. It is very easy to mix class attributes and instance attributes. Especially when coming from other languages such as C#, this can be confusing. Let's illustrate it:

class Spam(object):
    list_ = []
    dict_ = {}

    def __init__(self, key, value):
        self.list_.append(value)
        self.dict_[key] = value

        print('List: %r' % self.list_)
        print('Dict: %r' % self.dict_)


Spam('key 1', 'value 1')
Spam('key 2', 'value 2')

As with the function arguments, the list and dictionaries are shared. So, the output is as follows:

List: ['value 1']
Dict: {'key 1': 'value 1'}
List: ['value 1', 'value 2']
Dict: {'key 1': 'value 1', 'key 2': 'value 2'}

A better alternative is to initialize the mutable objects within the __init__ method of the class. This way, they are not shared between instances:

class Spam(object):
    def __init__(self, key, value):
        self.list_ = [key]
        self.dict_ = {key: value}

        print('List: %r' % self.list_)
        print('Dict: %r' % self.dict_)

Another important thing to note when dealing with classes is that a class property will be inherited, and that's where things might prove to be confusing. When inheriting, the original properties will stay (unless overwritten), even in subclasses:

 >>> class A(object):
... spam = 1


>>> class B(A):
... pass


Regular inheritance, the spam attribute of both A and B are 1 as
you would expect.
>>> A.spam
1
>>> B.spam
1

Assigning 2 to A.spam now modifies B.spam as well
>>> A.spam = 2

>>> A.spam
2
>>> B.spam
2

While this is to be expected due to inheritance, someone else using the class might not suspect the variable to change in the meantime. After all, we modified A.spam, not B.spam.

There are two easy ways to prevent this. It is obviously possible to simply set spam for every class separately. But the better solution is never to modify class properties. It's easy to forget that the property will change in multiple locations, and if it has to be modifiable anyway, it's usually better to put it in an instance variable instead.

Modifying variables in the global scope

A common problem when accessing variables from the global scope is that setting a variable makes it local, even when accessing the global variable.

This works:

 >>> def eggs():
... print('Spam: %r' % spam)

>>> eggs()
Spam: 1

But the following does not:

 >>> spam = 1


>>> def eggs():
... spam += 1
... print('Spam: %r' % spam)

>>> eggs()
Traceback (most recent call last):
 ...
UnboundLocalError: local variable 'spam' referenced before assignment

The problem is that spam += 1 actually translates to spam = spam + 1, and anything containing spam = makes the variable local to your scope. Since the local variable is being assigned at that point, it has no value yet and you are trying to use it. For these cases, there is the global statement, although I would really recommend that you avoid globals altogether.

Overwriting and/or creating extra built-ins

While it can be useful in some cases, generally you will want to avoid overwriting global functions. The PEP8 convention for naming your functions—similar to built-in statements, functions, and variables—is to use a trailing underscore.

So, do not use this:

list = [1, 2, 3]

Instead, use the following:

list_ = [1, 2, 3]

For lists and such, this is just a good convention. For statements such as from, import, and with, it's a requirement. Forgetting about this can lead to very confusing errors:

>>> list = list((1, 2, 3))
>>> list
[1, 2, 3]

>>> list((4, 5, 6))
Traceback (most recent call last):
 ...
TypeError: 'list' object is not callable

>>> import = 'Some import'
Traceback (most recent call last):
 ...
SyntaxError: invalid syntax

If you actually want to define a built-in that is available everywhere, it's possible. For debugging purposes, I've been known to add this code to a project while developing:

import builtins
import inspect
import pprint
import re


def pp(*args, **kwargs):
    '''PrettyPrint function that prints the variable name when
    available and pprints the data'''
    name = None
    # Fetch the current frame from the stack
    frame = inspect.currentframe().f_back
    # Prepare the frame info
    frame_info = inspect.getframeinfo(frame)

    # Walk through the lines of the function
    for line in frame_info[3]:
        # Search for the pp() function call with a fancy regexp
        m = re.search(r'\bpp\s*\(\s*([^)]*)\s*\)', line)
        if m:
            print('# %s:' % m.group(1), end=' ')
            break

    pprint.pprint(*args, **kwargs)

builtins.pf = pprint.pformat
builtins.pp = pp

Much too hacky for production code, but it is still useful when working on a large project where you need print statements to debug. Alternative (and better) debugging solutions can be found in Chapter 11, Debugging – Solving the Bugs.

The usage is quite simple:

x = 10
pp(x)

Here is the output:

# x: 10

Modifying while iterating

At one point or another, you will run into this problem: while iterating through mutable objects such as lists, dicts, or sets, you cannot modify them. All of these result in a RuntimeError telling you that you cannot modify the object during iteration:

dict_ = {'spam': 'eggs'}
list_ = ['spam']
set_ = {'spam', 'eggs'}

for key in dict_:
    del dict_[key]

for item in list_:
    list_.remove(item)

for item in set_:
    set_.remove(item)

This can be avoided by copying the object. The most convenient option is by using the list function:

dict_ = {'spam': 'eggs'}
list_ = ['spam']
set_ = {'spam', 'eggs'}

for key in list(dict_):
    del dict_[key]

for item in list(list_):
    list_.remove(item)

for item in list(set_):
    set_.remove(item)

Catching exceptions – differences between Python 2 and 3

With Python 3, catching an exception and storing it has been made more obvious with the as statement. The problem is that many people are still used to the except Exception, variable syntax, which doesn't work anymore. Luckily, the Python 3 syntax has been backported to Python 2, so now you can use the following syntax everywhere:

try:
    ... # do something here
except (ValueError, TypeError) as e:
    print('Exception: %r' % e)

Another important difference is that Python 3 makes this variable local to the exception scope. The result is that you need to declare the exception variable before the try/except block if you want to use it later:

def spam(value):
    try:
        value = int(value)
    except ValueError as exception:
        print('We caught an exception: %r' % exception)

    return exception


spam('a')

You might expect that since we get an exception here, this works; but actually, it doesn't, because exception does not exist at the point of the return statement.

The actual output is as follows:

We caught an exception: ValueError("invalid literal for int() with base 10: 'a'",)
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    spam('a')
  File "test.py", line 11, in spam
    return exception
UnboundLocalError: local variable 'exception' referenced before assignment

Personally I would argue that the preceding code is broken in any case: what if there isn't an exception somehow? It would have raised the same error. Luckily, the fix is simple; just write the value to a variable outside of the scope. One important thing to note here is that you explicitly need to save the variable to the parent scope. This code does not work either:

def spam(value):
    exception = None
    try:
        value = int(value)
    except ValueError as exception:
        print('We caught an exception: %r' % exception)

    return exception

We really need to save it explicitly because Python 3 automatically deletes anything saved with as variable at the end of the except statements. The reason for this is that exceptions in Python 3 contain a __traceback__ attribute. Having this attribute makes it much more difficult for the garbage collector to handle as it introduces a recursive self-referencing cycle (exception -> traceback -> exception -> traceback… ad nauseum). To solve this, Python essentially does the following:

exception = None
try:
    value = int(value)
except ValueError as exception:
    try:
        print('We caught an exception: %r' % exception)
    finally:
        del exception

The solution is simple enough—luckily—but you should keep in mind that this can introduce memory leaks into your program. The Python garbage collector is smart enough to understand that the variables are not visible anymore and will delete it eventually, but it can take a lot more time. How the garbage collection actually works is covered in Chapter 12, Performance – Tracking and Reducing Your Memory and CPU Usage. Here is the working version of the code:

def spam(value):
    exception = None
    try:
        value = int(value)
    except ValueError as e:
        exception = e
        print('We caught an exception: %r' % exception)

    return exception

Late binding – be careful with closures

Closures are a method of implementing local scopes in code. They make it possible to locally define variables without overriding variables in the parent (or global) scope and hide the variables from the outside scope later. The problem with closures in Python is that Python tries to bind its variables as late as possible for performance reasons. While generally useful, it does have some unexpected side effects:

eggs = [lambda a: i * a for i in range(3)]

for egg in eggs:
    print(egg(5))

The expected result? Should be something along the lines of this, right?

0
5
10

No, unfortunately not. This is similar to how class inheritance works with properties. Due to late binding, the variable i gets called from the surrounding scope at call time, and not when it's actually defined.

The actual result is as follows:

10
10
10

So what to do instead? As with the cases mentioned earlier, the variable needs to be made local. One alternative is to force immediate binding by currying the function with partial:

import functools


eggs = [functools.partial(lambda i, a: i * a, i) for i in range(3)]

for egg in eggs:
    print(egg(5))

A better solution would be to avoid binding problems altogether by not introducing extra scopes (the lambda), that use external variables. If both i and a were specified as arguments to lambda, this will not be a problem.

Circular imports

Even though Python is fairly tolerant towards circular imports, there are some cases where you will get errors.

Let's assume we have two files.

eggs.py:

from spam import spam


def eggs():
    print('This is eggs')
    spam()

spam.py:

from eggs import eggs


def spam():
    print('This is spam')


if __name__ == '__main__':
    eggs()

Running spam.py will result in a circular import error:

Traceback (most recent call last):
  File "spam.py", line 1, in <module>
    from eggs import eggs
  File "eggs.py", line 1, in <module>
    from spam import spam
  File "spam.py", line 1, in <module>
    from eggs import eggs
ImportError: cannot import name 'eggs'

There are a few ways to work around this. Restructuring the code is usually the best to go around, but the best solution depends on the problem. In the preceding case, it can be solved easily. Just use module imports instead of function imports (which I recommend regardless of circular imports).

eggs.py:

import spam


def eggs():
    print('This is eggs')
    spam.spam()

spam.py:

import eggs


def spam():
    print('This is spam')


if __name__ == '__main__':
    eggs.eggs()

An alternative solution is to move the imports within the functions so that they occur at runtime. This is not the prettiest solution but it does the trick in many cases.

eggs.py:

def eggs():
    from spam import spam
    print('This is eggs')
    spam()

spam.py:

def spam():
    from eggs import eggs
    print('This is spam')


if __name__ == '__main__':
    eggs()

Lastly there is the solution of moving the imports below the code that actually uses them. This is generally not recommended because it can make it non-obvious where the imports are, but I still find it preferable to having the import within the function calls.

eggs.py:

def eggs():
    print('This is eggs')
    spam()


from spam import spam

spam.py:

def spam():
    print('This is spam')


from eggs import eggs


if __name__ == '__main__':
    eggs()

And yes, there are still other solutions such as dynamic imports. One example of this is how the Django ForeignKey fields support strings instead of actual classes. But those are generally a really bad idea to use since they will be checked only at runtime. Because of this, bugs will introduce themselves only when executing any code that uses it instead of when modifying the code. So please try to avoid these whenever possible, or make sure you add proper automated tests to prevent unexpected bugs. Especially when they cause circular imports internally, they become an enormous pain to debug.

Import collisions

One problem that can be extremely confusing is having colliding imports—multiple packages/modules with the same name. I have had more than a few bug reports on my packages where, for example, people tried to use my numpy-stl project, which resides in a package named stl from a test file named stl.py. The result: it was importing itself instead of the stl package. While this case is difficult to avoid, at least within packages, a relative import is generally a better option. This is because it also tells other programmers that the import comes from the local scope instead of another package. So, instead of writing import spam, write from . import spam. This way, the code will always load from the current package instead of any global package that happens to have the same name.

In addition to this there is also the problem of packages being incompatible with each other. Common names might be used by several packages, so be careful when installing those packages. When in doubt, just create a new virtual environment and try again. Doing this can save you a lot of debugging.

主站蜘蛛池模板: 凉山| 乐至县| 保亭| 桃园市| 东至县| 白玉县| 申扎县| 大化| 满洲里市| 临海市| 峨眉山市| 类乌齐县| 宁武县| 社旗县| 江油市| 石嘴山市| 枣强县| 资溪县| 惠州市| 兴安县| 西充县| 禄丰县| 平安县| 尉犁县| 当雄县| 新丰县| 茶陵县| 当雄县| 广德县| 香河县| 聂荣县| 高青县| 芷江| 牟定县| 彭泽县| 海晏县| 锡林郭勒盟| 锡林郭勒盟| 礼泉县| 葫芦岛市| 渭源县|