官术网_书友最值得收藏!

Learning to log for robust error checking

Notebooks are useful to keep track of what you did and what went wrong. Logging works in a similar fashion, and we can log errors and other useful information with the standard Python logging library.

For reproducible data analysis, it is good to know the modules our Python scripts import. In this recipe, I will introduce a minimal API from dautil that logs package versions of imported modules in a best effort manner.

Getting ready

In this recipe, we import NumPy and pandas, so you may need to import them. See the Configuring pandas recipe for pandas installation instructions. Installation instructions for NumPy can be found at http://docs.scipy.org/doc/numpy/user/install.html (retrieved July 2015). Alternatively, install NumPy with pip using the following command:

$ [sudo] pip install numpy

The command for Anaconda users is as follows:

$ conda install numpy

I have installed NumPy 1.9.2 via Anaconda. We also require AppDirs to find the appropriate directory to store logs. Install it with the following command:

$ [sudo] pip install appdirs

I have AppDirs 1.4.0 on my system.

How to do it...

To log, we need to create and set up loggers. We can either set up the loggers with code or use a configuration file. Configuring loggers with code is the more flexible option, but configuration files tend to be more readable. I use the log.conf configuration file from dautil:

[loggers]
keys=root

[handlers]
keys=consoleHandler,fileHandler

[formatters]
keys=simpleFormatter

[logger_root]
level=DEBUG
handlers=consoleHandler,fileHandler

[handler_consoleHandler]
class=StreamHandler
level=INFO
formatter=simpleFormatter
args=(sys.stdout,)

[handler_fileHandler]
class=dautil.log_api.VersionsLogFileHandler
formatter=simpleFormatter
args=('versions.log',)

[formatter_simpleFormatter]
format=%(asctime)s - %(name)s - %(levelname)s - %(message)s
datefmt=%d-%b-%Y

The file configures a logger to log to a file with the DEBUG level and to the screen with the INFO level. So, the logger logs more to the file than to the screen. The file also specifies the format of the log messages. I created a tiny API in dautil, which creates a logger with its get_logger() function and uses it to log the package versions of a client program with its log() function. The code is in the log_api.py file of dautil:

from pkg_resources import get_distribution
from pkg_resources import resource_filename
import logging
import logging.config
import pprint
from appdirs import AppDirs
import os


def get_logger(name):
    log_config = resource_filename(__name__, 'log.conf')
    logging.config.fileConfig(log_config)
    logger = logging.getLogger(name)

    return logger


def shorten(module_name):
    dot_i = module_name.find('.')

    return module_name[:dot_i]


def log(modules, name):
    skiplist = ['pkg_resources', 'distutils']

    logger = get_logger(name)
    logger.debug('Inside the log function')

    for k in modules.keys():
        str_k = str(k)

        if '.version' in str_k:
            short = shorten(str_k)

            if short in skiplist:
                continue

            try:
                logger.info('%s=%s' % (short,    
                            get_distribution(short).version))
            except ImportError:
                logger.warn('Could not impport', short)


class VersionsLogFileHandler(logging.FileHandler):
    def __init__(self, fName):
        dirs = AppDirs("PythonDataAnalysisCookbook", 
                       "Ivan Idris")
        path = dirs.user_log_dir
        print(path)

        if not os.path.exists(path):
            os.mkdir(path)

        super(VersionsLogFileHandler, self).__init__(
              os.path.join(path, fName))

The program that uses the API is in the log_demo.py file in this book's code bundle:

import sys
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from dautil import log_api

log_api.log(sys.modules, sys.argv[0])

How it works...

We configured a handler (VersionsLogFileHandler) that writes to file and a handler (StreamHandler) that displays messages on the screen. StreamHandler is a class in the Python standard library. To configure the format of the log messages, we used the SimpleFormater class from the Python standard library.

The API I made goes through modules listed in the sys.modules variable and tries to get the versions of the modules. Some of the modules are not relevant for data analysis, so we skip them. The log() function of the API logs a DEBUG level message with the debug() method. The info() method logs the package version at INFO level.

See also

主站蜘蛛池模板: 凉山| 宣恩县| 洛阳市| 巴楚县| 楚雄市| 桃源县| 项城市| 辽阳市| 常熟市| 博客| 和静县| 漳平市| 克什克腾旗| 双城市| 行唐县| 山阳县| 平顶山市| 克山县| 玛纳斯县| 镇江市| 望谟县| 定西市| 平顺县| 马公市| 陵川县| 孙吴县| 田林县| 疏附县| 黔南| 藁城市| 静安区| 漯河市| 莱州市| 东方市| 永丰县| 永嘉县| 宁海县| 鄯善县| 镇坪县| 乐至县| 林州市|