作者名:Brian Lipp Shubhadeep Roychowdhury Dr. Tirthajyoti Sarkar
本章字數:2316字
更新時間:2021-06-18 18:11:51
NumPy Arrays
A NumPy array is similar to a list but differs in some ways. In the life of a data scientist, reading and manipulating an array is of prime importance, and it is also the most frequently encountered task. These arrays could be a one-dimensional list, a multi-dimensional table, or a matrix full of numbers and can be used for a variety of mathematical calculations.
An array could be filled with integers, floating-point numbers, Booleans, strings, or even mixed types. However, in the majority of cases, numeric data types are predominant. Some example scenarios where you will need to handle numeric arrays are as follows:
To read a list of phone numbers and postal codes and extract a certain pattern
To create a matrix with random numbers to run a Monte Carlo simulation on a statistical process
To scale and normalize a sales figure table, with lots of financial and transactional data
To create a smaller table of key descriptive statistics (for example, mean, median, min/max range, variance, and inter-quartile ranges) from a large raw data table
To read in and analyze time series data in a one-dimensional array daily, such as the stock price of an organization over a year or daily temperature data from a weather station
In short, arrays and numeric data tables are everywhere. As a data wrangling professional, the importance of the ability to read and process numeric arrays cannot be overstated. It is very common to work with data and need to modify it with a mathematical function. In this regard, NumPy arrays are the most important objects in Python that you need to know about.
NumPy Arrays and Features
NumPy and SciPy are open source add-on modules for Python that provide common mathematical and numerical routines in pre-compiled, fast functions. Over the years, these have grown into highly mature libraries that provide functionality that meets, or perhaps exceeds, what is associated with common commercial software such as Matlab or Mathematica.
One of the main advantages of the NumPy module is that it can be used to handle or create one-dimensional or multi-dimensional arrays. This advanced data structure/class is at the heart of the NumPy package and it serves as the fundamental building block of more advanced concepts, such as the pandas library and specifically, the pandas DataFrame, which we will cover shortly in this chapter.
NumPy arrays are different than common Python lists since Python lists can be thought of as simple arrays. NumPy arrays are built for mathematical vectorized operations that process a lot of numerical data with just a single line of code. Many built-in mathematical functions in NumPy arrays are written in low-level languages such as C or Fortran and are pre-compiled for really fast execution.
Note
NumPy arrays are optimized data structures for numerical analysis, and that's why they are so important to data scientists.
Let's go through the first exercise in this chapter, where we will learn how to create a NumPy array from a list.
Exercise 3.01: Creating a NumPy Array (from a List)
In this exercise, we will create a NumPy array from a list. We're going to define a list first and use the array function of the NumPy library to convert the list into an array. Next, we'll read from a .csv file and store the data in a NumPy array using the genfromtxt function of the NumPy library. To do so, let's go through the following steps:
To work with NumPy, we must import it. By convention, we give it a short name, np, while importing it. This will make referencing the objects under the NumPy package organized:
import numpy as np
Create a list with three elements: 1, 2, and 3:
list_1 = [1,2,3]
list_1
The output is as follows:
[1, 2, 3]
Use the array function to convert it into an array:
array_1 = np.array(list_1)
array_1
The output is as follows:
array([1, 2, 3])
We just created a NumPy array object called array_1 from the regular Python list object, list_1.
Create an array of floating type elements, that is, 1.2, 3.4, and 5.6, using the array function directly:
a = np.array([1.2, 3.4, 5.6])
a
The output is as follows:
array([1.2, 3.4, 5.6])
Let's check the type of the newly created object, a, using the type function:
type(a)
The output is as follows:
numpy.ndarray
Use the type function to check the type of array_1:
type(array_1)
The output is as follows:
numpy.ndarray
As we can see, both a and array_1 are NumPy arrays.
Now, use type on list_1:
type(list_1)
The output is as follows:
list
As we can see, list_1 is essentially a Python list and we have used the array function of the NumPy library to create a NumPy array from that list.
Now, let's read a .csv file as a NumPy array using the genfromtxt function of the NumPy library:
data = np.genfromtxt('../datasets/stock.csv', \
delimiter=',',names=True,dtype=None, \
encoding='ascii')
data
Note
The path (highlighted) should be specified based on the location of the file on your system. The stock.csv file can be found here: https://packt.live/2YK0XB2.
From this exercise, we can observe that the NumPy array is different from the regular list object. The most important point to keep in mind is that NumPy arrays do not have the same methods as lists and that they are essentially designed for mathematical functions.
NumPy arrays are like mathematical objects – vectors. They are built for element-wise operations, that is, when we add two NumPy arrays, we add the first element of the first array to the first element of the second array – there is an element-to-element correspondence in this operation. This is in contrast to Python lists, where the elements are simply appended and there is no element-to-element relation. This is the real power of a NumPy array: they can be treated just like mathematical vectors.
A vector is a collection of numbers that can represent, for example, the coordinates of points in a three-dimensional space or the color of numbers (RGB) in a picture. Naturally, relative order is important for such a collection and as we discussed previously, a NumPy array can maintain such order relationships. That's why they are perfectly suitable to use in numerical computations.
With this knowledge, we're going to perform the addition operation on NumPy arrays in the next exercise.
Exercise 3.02: Adding Two NumPy Arrays
This simple exercise will demonstrate the addition of two NumPy arrays using the + notation, and thereby show the key difference between a regular Python list/array and a NumPy array. Let's perform the following steps:
Import the NumPy library:
import numpy as np
Declare a Python list called list_1 and a NumPy array:
list_1 = [1,2,3]
array_1 = np.array(list_1)
Use the + notation to concatenate two list_1 objects and save the results in list_2:
list_2 = list_1 + list_1
list_2
The output is as follows:
[1, 2, 3, 1, 2, 3]
Use the same + notation to concatenate two array_1 objects and save the result in array_2:
array_2 = array_1 + array_1
array_2
The output is as follows:
[2 ,4, 6]
Load a .csv file and concatenate it with itself:
data = np.genfromtxt('../datasets/numbers.csv', \
delimiter=',', names=True)
data = data.astype('float64')
data + data
Note
The path (highlighted) should be specified based on the location of the file on your system. The .csv file that will be used is numbers.csv; this can be found at: https://packt.live/30Om2wC.
Did you notice the difference? The first print shows a list with 6 elements, [1, 2, 3, 1, 2, 3], but the second print shows another NumPy array (or vector) with the elements [2, 4, 6], which are just the sum of the inpidual elements of array_1. As we discussed earlier, NumPy arrays are perfectly designed to perform element-wise operations since there is element-to-element correspondence.
NumPy arrays even support element-wise exponentiation. For example, suppose there are two arrays – the elements of the first array will be raised to the power of the elements in the second array.
In the following exercise, we will try out some mathematical operations on NumPy arrays.
Exercise 3.03: Mathematical Operations on NumPy Arrays
In this exercise, we'll generate a NumPy array with the values extracted from a .csv file. We'll be using the multiplication and pision operators on the generated NumPy array. Let's go through the following steps: