- Practical Data Analysis Using Jupyter Notebook
- Marc Wintjen Andrew Vlahutin
- 999字
- 2021-06-18 18:59:00
Exploring Python packages
Before wrapping up this chapter, let's explore the different Python packages required with data analysis and validate they are available to use in the Jupyter Notebook app. These packages have evolved over time and are open source so programmers can contribute and improve the source code.
We will go into more depth about each individual package as we use their awesome features in future chapters. The focus in this chapter is to verify the specific libraries are available, and there are a few different approaches to use such as inspecting the installation folder for specific files on your workstation or running commands from a Python command line. I find the easiest method is to run a few simple commands in a new notebook.
Navigate back to the notebooks folder and create a new notebook file by clicking on the New menu and select Python 3 in the submenu to create a default Untitled notebook. To stay consistent with best practices, be sure to rename the notebook verify_python_packages before moving forward.
Checking for pandas
The steps to verify whether each Python package is available are similar with slight variations to the code. The first one will be pandas, which will make it easier to complete common data analysis techniques such as pivoting, cleaning, merging and grouping datasets all in one place without going back to the source of record.
To verify whether the pandas library is available in Jupyter, follow these steps:
- Type inimport pandas as pdin theIn []:cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- Select Run Cellsfrom theCell menu.
- Press the Shift + Enter orCtrl + Enter keys.
- Type in thenp.__version__command in the nextIn []:cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed asOut [].
Now you will repeat these steps for each of the following required packages used in this book: numpy, sklearn, matplotlib, and scipy. Note that I have used the commonly known shortcut names for each library to make it consistent with best practices found in the industry.
For example, pandas has been shortened to pd, so as you call features from each library, you can just use the shortcut name.
Checking for NumPy
NumPy is a powerful and common mathematical extension of Python created to perform fast numeric calculations against a list of values that is known as an array. We will learn more about the power of NumPy features in Chapter 3, Getting Started with NumPy.
To verify whether the numpy library is available in Jupyter, follow these steps:
- Type in import numpy as np in the In []: cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- Select Run Cells from the Cell menu.
- Press the Shift + Enter or Ctrl + Enter keys.
- Type in the np.__version__ command in the next In []: cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed as Out [].
Checking for sklearn
sklearn is an advanced open source data science library used for clustering and regression analysis. While we will not leverage all of the advanced capabilities of this library, having it installed will make it easier for future lessons.
To verify if the sklearn library is available in Jupyter, follow these steps:
- Type in import sklearn as sk in the In []: cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- SelectRun Cellsfrom theCellmenu.
- Press the Shift + Enter or Ctrl + Enter keys.
- Type in the sk.__version__ command in the next In []: cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed as Out [].
Checking for Matplotlib
The Matplotlib Python library package is used for data visualization and plotting charts using Python.
To verify whether the matplotlib library is available in Jupyter, follow these steps:
- Type in import matplotlib as mp in the In []: cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- SelectRun Cellsfrom theCellmenu.
- Press the Shift + Enter or Ctrl + Enter keys.
- Type in the mp.__version__ command in the next In []: cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed as Out [].
Checking for SciPy
SciPy is a library that's dependent on NumPy and includes additional mathematical functions used for the analysis of data.
To verify whether the scipy library is available in Jupyter, follow these steps:
- Type in import scipy as sc in the In []: cell.
- Run the cell using the preferred method discussed earlier in the Installing Python and using Jupyter Notebook section:
- Click the button from the toolbar.
- SelectRun Cellsfrom theCellmenu.
- Press the Shift + Enter or Ctrl + Enter keys.
- Type in the sc.__version__ command in the next In []: cell.
- Run the cell using the preferred method from step 2.
- Verify the output cell displayed as Out [].
Once you have completed all of the steps, your notebook should look similar to the following screenshot:

- GitHub Essentials
- 程序員修煉之道:從小工到專家
- 有趣的二進制:軟件安全與逆向分析
- 虛擬化與云計算
- 文本數(shù)據(jù)挖掘:基于R語言
- 商業(yè)分析思維與實踐:用數(shù)據(jù)分析解決商業(yè)問題
- 深入淺出MySQL:數(shù)據(jù)庫開發(fā)、優(yōu)化與管理維護(第2版)
- Spark大數(shù)據(jù)編程實用教程
- 大數(shù)據(jù)架構商業(yè)之路:從業(yè)務需求到技術方案
- 達夢數(shù)據(jù)庫運維實戰(zhàn)
- 數(shù)字IC設計入門(微課視頻版)
- Mastering LOB Development for Silverlight 5:A Case Study in Action
- Hadoop 3實戰(zhàn)指南
- Unity 2018 By Example(Second Edition)
- 數(shù)據(jù)庫與數(shù)據(jù)處理:Access 2010實現(xiàn)