官术网_书友最值得收藏!

  • The Data Science Workshop
  • Anthony So Thomas V. Joseph Robert Thas John Andrew Worsley Dr. Samuel Asare
  • 1950字
  • 2021-06-11 18:27:18

Overview of Python

As mentioned earlier, Python is one of the most popular programming languages for data science. But before ping into Python's data science applications, let's have a quick introduction to some core Python concepts.

Types of Variable

In Python, you can handle and manipulate different types of variables. Each has its own specificities and benefits. We will not go through every single one of them but rather focus on the main ones that you will have to use in this book. For each of the following code examples, you can run the code in Google Colab to view the given output.

Numeric Variables

The most basic variable type is numeric. This can contain integer or decimal (or float) numbers, and some mathematical operations can be performed on top of them.

Let's use an integer variable called var1 that will take the value 8 and another one called var2 with the value 160.88, and add them together with the + operator, as shown here:

var1 = 8

var2 = 160.88

var1 + var2

You should get the following output:

Figure 1.3: Output of the addition of two variables

In Python, you can perform other mathematical operations on numerical variables, such as multiplication (with the * operator) and pision (with /).

Text Variables

Another interesting type of variable is string, which contains textual information. You can create a variable with some specific text using the single or double quote, as shown in the following example:

var3 = 'Hello, '

var4 = 'World'

In order to display the content of a variable, you can call the print() function:

print(var3)

print(var4)

You should get the following output:

Figure 1.4: Printing the two text variables

Python also provides an interface called f-strings for printing text with the value of defined variables. It is very handy when you want to print results with additional text to make it more readable and interpret results. It is also quite common to use f-strings to print logs. You will need to add f before the quotes (or double quotes) to specify that the text will be an f-string. Then you can add an existing variable inside the quotes and display the text with the value of this variable. You need to wrap the variable with curly brackets, {}.

For instance, if we want to print Text: before the values of var3 and var4, we will write the following code:

print(f"Text: {var3} {var4}!")

You should get the following output:

Figure 1.5: Printing with f-strings

You can also perform some text-related transformations with string variables, such as capitalizing or replacing characters. For instance, you can concatenate the two variables together with the + operator:

var3 + var4

You should get the following output:

Figure 1.6: Concatenation of the two text variables

Python List

Another very useful type of variable is the list. It is a collection of items that can be changed (you can add, update, or remove items). To declare a list, you will need to use square brackets, [], like this:

var5 = ['I', 'love', 'data', 'science']

print(var5)

You should get the following output:

Figure 1.7: List containing only string items

A list can have different item types, so you can mix numerical and text variables in it:

var6 = ['Packt', 15019, 2020, 'Data Science']

print(var6)

You should get the following output:

Figure 1.8: List containing numeric and string items

An item in a list can be accessed by its index (its position in the list). To access the first (index 0) and third elements (index 2) of a list, you do the following:

print(var6[0])

print(var6[2])

Note

In Python, all indexes start at 0.

You should get the following output:

Figure 1.9: The first and third items in the var6 list

Python provides an API to access a range of items using the : operator. You just need to specify the starting index on the left side of the operator and the ending index on the right side. The ending index is always excluded from the range. So, if you want to get the first three items (index 0 to 2), you should do as follows:

print(var6[0:3])

You should get the following output:

Figure 1.10: The first three items of var6

You can also iterate through every item of a list using a for loop. If you want to print every item of the var6 list, you should do this:

for item in var6:

    print(item)

You should get the following output:

Figure 1.11: Output of the for loop

You can add an item at the end of the list using the .append() method:

var6.append('Python')

print(var6)

You should get the following output:

Figure 1.12: Output of var6 after inserting the 'Python' item

To delete an item from the list, you use the .remove() method:

var6.remove(15019)

print(var6)

You should get the following output:

Figure 1.13: Output of var6 after removing the '15019' item

Python Dictionary

Another very popular Python variable used by data scientists is the dictionary type. For example, it can be used to load JSON data into Python so that it can then be converted into a DataFrame (you will learn more about the JSON format and DataFrames in the following sections). A dictionary contains multiple elements, like a list, but each element is organized as a key-value pair. A dictionary is not indexed by numbers but by keys. So, to access a specific value, you will have to call the item by its corresponding key. To define a dictionary in Python, you will use curly brackets, {}, and specify the keys and values separated by :, as shown here:

var7 = {'Topic': 'Data Science', 'Language': 'Python'}

print(var7)

You should get the following output:

Figure 1.14: Output of var7

To access a specific value, you need to provide the corresponding key name. For instance, if you want to get the value Python, you do this:

var7['Language']

You should get the following output:

Figure 1.15: Value for the 'Language' key

Note

Each key-value pair in a dictionary needs to be unique.

Python provides a method to access all the key names from a dictionary, .keys(), which is used as shown in the following code snippet:

var7.keys()

You should get the following output:

Figure 1.16: List of key names

There is also a method called .values(), which is used to access all the values of a dictionary:

var7.values()

You should get the following output:

Figure 1.17: List of values

You can iterate through all items from a dictionary using a for loop and the .items() method, as shown in the following code snippet:

for key, value in var7.items():

    print(key)

    print(value)

You should get the following output:

Figure 1.18: Output after iterating through the items of a dictionary

You can add a new element in a dictionary by providing the key name like this:

var7['Publisher'] = 'Packt'

print(var7)

You should get the following output:

Figure 1.19: Output of a dictionary after adding an item

You can delete an item from a dictionary with the del command:

del var7['Publisher']

print(var7)

You should get the following output:

Figure 1.20: Output of a dictionary after removing an item

In Exercise 1.01, Creating a Dictionary That Will Contain Machine Learning Algorithms, we will be looking to use these concepts that we've just looked at.

Note

If you are interested in exploring Python in more depth, head over to our website (https://packt.live/2FcXpOp) to get yourself the Python Workshop.

Exercise 1.01: Creating a Dictionary That Will Contain Machine Learning Algorithms

In this exercise, we will create a dictionary using Python that will contain a collection of different machine learning algorithms that will be covered in this book.

The following steps will help you complete the exercise:

Note

Every exercise and activity in this book is to be executed on Google Colab.

  1. Open on a new Colab notebook.
  2. Create a list called algorithm that will contain the following elements: Linear Regression, Logistic Regression, RandomForest, and a3c:

    algorithm = ['Linear Regression', 'Logistic Regression', \

                 'RandomForest', 'a3c']

    Note

    The code snippet shown above uses a backslash ( \ ) to split the logic across multiple lines. When the code is executed, Python will ignore the backslash, and treat the code on the next line as a direct continuation of the current line.

  3. Now, create a list called learning that will contain the following elements: Supervised, Supervised, Supervised, and Reinforcement:

    learning = ['Supervised', 'Supervised', 'Supervised', \

                'Reinforcement']

  4. Create a list called algorithm_type that will contain the following elements: Regression, Classification, Regression or Classification, and Game AI:

    algorithm_type = ['Regression', 'Classification', \

                      'Regression or Classification', 'Game AI']

  5. Add an item called k-means into the algorithm list using the .append() method:

    algorithm.append('k-means')

  6. Display the content of algorithm using the print() function:

    print(algorithm)

    You should get the following output:

    Figure 1.21: Output of 'algorithm'

    From the preceding output, we can see that we added the k-means item to the list.

  7. Now, add the Unsupervised item into the learning list using the .append() method:

    learning.append('Unsupervised')

  8. Display the content of learning using the print() function:

    print(learning)

    You should get the following output:

    Figure 1.22: Output of 'learning'

    From the preceding output, we can see that we added the Unsupervised item into the list.

  9. Add the Clustering item into the algorithm_type list using the .append() method:

    algorithm_type.append('Clustering')

  10. Display the content of algorithm_type using the print() function:

    print(algorithm_type)

    You should get the following output:

    Figure 1.23: Output of 'algorithm_type'

    From the preceding output, we can see that we added the Clustering item into the list.

  11. Create an empty dictionary called machine_learning using curly brackets, {}:

    machine_learning = {}

  12. Create a new item in machine_learning with the key as algorithm and the value as all the items from the algorithm list:

    machine_learning['algorithm'] = algorithm

  13. Display the content of machine_learning using the print() function.

    print(machine_learning)

    You should get the following output:

    Figure 1.24: Output of 'machine_learning'

    From the preceding output, we notice that we have created a dictionary from the algorithm list.

  14. Create a new item in machine_learning with the key as learning and the value as all the items from the learning list:

    machine_learning['learning'] = learning

  15. Now, create a new item in machine_learning with the key as algorithm_type and the value as all the items from the algorithm_type list:

    machine_learning['algorithm_type'] = algorithm_type

  16. Display the content of machine_learning using the print() function.

    print(machine_learning)

    You should get the following output:

    Figure 1.25: Output of 'machine_learning'

  17. Remove the a3c item from the algorithm key using the .remove() method:

    machine_learning['algorithm'].remove('a3c')

  18. Display the content of the algorithm item from the machine_learning dictionary using the print() function:

    print(machine_learning['algorithm'])

    You should get the following output:

    Figure 1.26: Output of 'algorithm' from 'machine_learning'

  19. Remove the Reinforcement item from the learning key using the .remove() method:

    machine_learning['learning'].remove('Reinforcement')

  20. Remove the Game AI item from the algorithm_type key using the .remove() method:

    machine_learning['algorithm_type'].remove('Game AI')

  21. Display the content of machine_learning using the print() function:

    print(machine_learning)

    You should get the following output:

Figure 1.27: Output of 'machine_learning'

You have successfully created a dictionary containing the machine learning algorithms that you will come across in this book. You learned how to create and manipulate Python lists and dictionaries.

Note

To access the source code for this specific section, please refer to https://packt.live/315EmRP.

You can also run this example online at https://packt.live/3ay1tYg.

In the next section, you will learn more about the two main Python packages used for data science:

  • pandas
  • scikit-learn
主站蜘蛛池模板: 五峰| 青岛市| 金川县| 新田县| 柯坪县| 长春市| 衡山县| 环江| 郯城县| 拉萨市| 黎城县| 商城县| 兴安县| 榕江县| 庆城县| 无锡市| 大石桥市| 洪湖市| 贵州省| 门源| 济宁市| 古浪县| 泽库县| 德格县| 昭通市| 泗水县| 开封市| 巢湖市| 阜阳市| 拉萨市| 邵阳市| 和田县| 闽清县| 汽车| 习水县| 磴口县| 林州市| 怀来县| 浦北县| 抚顺县| 大方县|