目錄(195章)
倒序
- coverpage
- Title Page
- Copyright
- Practical Data Wrangling
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Programming with Data
- Understanding data wrangling
- Getting and reading data
- Cleaning data
- Shaping and structuring data
- Storing data
- The tools for data wrangling
- Python
- R
- Summary
- Introduction to Programming in Python
- External resources
- Logistical overview
- Installation requirements
- Using other learning resources
- Python 2 versus Python 3
- Running programs in python
- Using text editors to write and manage programs
- Writing the hello world program
- Using the terminal to run programs
- Running the Hello World program
- What if it didn't work?
- Data types variables and the Python shell
- Numbers - integers and floats
- Why integers?
- Strings
- Booleans
- The print function
- Variables
- Adding to a variable
- Subtracting from a variable
- Multiplication
- Division
- Naming variables
- Arrays (lists if you ask Python)
- Dictionaries
- Compound statements
- Compound statement syntax and indentation level
- For statements and iterables
- If statements
- Else and elif clauses
- Functions
- Passing arguments to a function
- Returning values from a function
- Making annotations within programs
- A programmer's resources
- Documentation
- Online forums and mailing lists
- Summary
- Reading Exploring and Modifying Data - Part I
- External resources
- Logistical overview
- Installation requirements
- Data
- File system setup
- Introducing a basic data wrangling work flow
- Introducing the JSON file format
- Opening and closing a file in Python using file I/O
- The open function and file objects
- File structure - best practices to store your data
- Opening a file
- Reading the contents of a file
- Modules in Python
- Parsing a JSON file using the json module
- Exploring the contents of a data file
- Extracting the core content of the data
- Listing out all of the variables in the data
- Modifying a dataset
- Extracting data variables from the original dataset
- Using a for loop to iterate over the data
- Using a nested for loop to iterate over the data variables
- Outputting the modified data to a new file
- Specifying input and output file names in the Terminal
- Specifying the filenames from the Terminal
- Summary
- Reading Exploring and Modifying Data - Part II
- Logistical overview
- File system setup
- Data
- Understanding the CSV format
- Introducing the CSV module
- Using the CSV module to read CSV data
- Using the CSV module to write CSV data
- Using the pandas module to read and process data
- Counting the total road length in 2011 revisited
- Handling non-standard CSV encoding and dialect
- Understanding XML
- XML versus JSON
- Using the XML module to parse XML data
- XPath
- Summary
- Manipulating Text Data - An Introduction to Regular Expressions
- Logistical overview
- Data
- File structure setup
- Understanding the need for pattern recognition
- Introducting regular expressions
- Writing and using a regular expression
- Special characters
- Matching whitespace
- Matching the start of string
- Matching the end of a string
- Matching a range of characters
- Matching any one of several patterns
- Matching a sequence instead of just one character
- Putting patterns together
- Extracting a pattern from a string
- The regex split() function
- Python regex documentation
- Looking for patterns
- Quantifying the existence of patterns
- Creating a regular expression to match the street address
- Counting the number of matches
- Verifying the correctness of the matches
- Extracting patterns
- Outputting the data to a new file
- Summary
- Cleaning Numerical Data - An Introduction to R and RStudio
- Logistical overview
- Data
- Directory structure
- Installing R and RStudio
- Introducing R and RStudio
- Familiarizing yourself with RStudio
- Running R commands
- Setting the working directory
- Reading data
- The R dataframe
- R vectors
- Indexing R dataframes
- Finding the 2011 total in R
- Conducting basic outlier detection and removal
- Handling NA values
- Deleting missing values
- Replacing missing values with a constant
- Imputation of missing values
- Variable names and contents
- Summary
- Simplifying Data Manipulation with dplyr
- Logistical overview
- Data
- File system setup
- Installing the dplyr and tibble packages
- Introducing dplyr
- Getting started with dplyr
- Chaining operations together
- Filtering the rows of a dataframe
- Summarizing data by category
- Rewriting code using dplyr
- Summary
- Getting Data from the Web
- Logistical overview
- Filesystem setup
- Installing the requests module
- Internet connection
- Introducing APIs
- Using Python to retrieve data from APIs
- Using URL parameters to filter the results
- Summary
- Working with Large Datasets
- Logistical overview
- System requirements
- Data
- File system setup
- Installing MongoDB
- Planning out your time
- Cleaning up
- Understanding computer memory
- Understanding databases
- Introducing MongoDB
- Interfacing with MongoDB from Python
- Summary 更新時間:2021-07-02 15:16:37
推薦閱讀
- 基于C語言的程序設計
- Design for the Future
- 網上沖浪
- 控制與決策系統仿真
- Photoshop CS4經典380例
- 智能工業報警系統
- 深度學習中的圖像分類與對抗技術
- Docker High Performance(Second Edition)
- 數據通信與計算機網絡
- Python:Data Analytics and Visualization
- 新編計算機圖形學
- INSTANT Adobe Story Starter
- 計算機硬件技術基礎學習指導與練習
- PyTorch深度學習
- 從機器學習到無人駕駛
- Eclipse全程指南
- 深度剖析:硬盤固件級數據恢復
- 仿蛛機器人的設計與制作
- 數據共享與數據整合技術
- 數字媒體交互設計原理與方法
- 深度學習
- 人工智能與大數據技術導論
- 數碼照片處理輕松入門
- 組態控制技術實訓教程(MCGS)
- 電子商務網站設計與開發
- Spark海量數據處理:技術詳解與平臺實戰
- Hands-On Data Science with the Command Line
- 大道至簡:軟件工程實踐者的思想
- R Programming Fundamentals
- Generative Adversarial Networks Cookbook