- Learning Jupyter 5
- Dan Toomey
- 382字
- 2021-08-13 15:42:10
Python pandas in Jupyter
One of the most widely used features of Python is pandas. The pandas are built-in libraries of data analysis packages that can be used freely. In this example, we will develop a Python script that uses pandas to see if there is any affect of using them in Jupyter.
I am using the Titanic dataset from https://www.kaggle.com/c/titanic/data. I am sure that the same data is available from a variety of sources.
Here is our Python script that we want to run in Jupyter:
from pandas import * training_set = read_csv('train.csv') training_set.head() male = training_set[training_set.Sex == 'male'] female = training_set[training_set.Sex =='female'] womens_survival_rate = float(sum(female.Survived))/len(female) mens_survival_rate = float(sum(male.Survived))/len(male)
womens_survival_rate, mens_survival_rate
The result is that we calculate the survival rates of the passengers based on sex.
We create a new Notebook, enter the script into the appropriate cells, include adding displays of calculated data at each point, and produce our results.
Here is our Notebook laid out, where we added displays of calculated data at each cell:
On Windows, it is common to use a backslash ( \) to separate parts of a filename. However, this coding uses the backslash as a special character. So, I had to change over to using a forward slash ( /) in my .csv file path.
The dataset column names are taken directly from the file and are case-sensitive. In this case, I was originally using the sex field in my script, but in the .csv file, the column is named Sex. Similarly, I had to change survived to Survived.
The final script and results look like this when we run it:
I have used the head() function to display the first few lines of the dataset. It is interesting the amount of detail that is available for all of the passengers.
If you scroll down, you will see the results:
We can see that 74% of the survivors were women versus just 19% men. I would like to think that chivalry is not dead.
It's curious that the results do not total to 100%. However, like every other dataset I have seen, there is missing and/or inaccurate data present.
- 亮劍.NET:.NET深入體驗與實戰精要
- Microsoft Dynamics CRM Customization Essentials
- 輕松學Java
- 水晶石精粹:3ds max & ZBrush三維數字靜幀藝術
- RPA(機器人流程自動化)快速入門:基于Blue Prism
- PostgreSQL 10 Administration Cookbook
- 運動控制系統
- 筆記本電腦維修90個精選實例
- 設計模式
- 自適應學習:人工智能時代的教育革命
- 智能+:制造業的智能化轉型
- Kubernetes on AWS
- Flash CS3動畫制作
- SolarWinds Server & Application Monitor:Deployment and Administration
- Web滲透技術及實戰案例解析