目錄(195章)
倒序
- coverpage
- Title Page
- Copyright
- Practical Data Wrangling
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Programming with Data
- Understanding data wrangling
- Getting and reading data
- Cleaning data
- Shaping and structuring data
- Storing data
- The tools for data wrangling
- Python
- R
- Summary
- Introduction to Programming in Python
- External resources
- Logistical overview
- Installation requirements
- Using other learning resources
- Python 2 versus Python 3
- Running programs in python
- Using text editors to write and manage programs
- Writing the hello world program
- Using the terminal to run programs
- Running the Hello World program
- What if it didn't work?
- Data types variables and the Python shell
- Numbers - integers and floats
- Why integers?
- Strings
- Booleans
- The print function
- Variables
- Adding to a variable
- Subtracting from a variable
- Multiplication
- Division
- Naming variables
- Arrays (lists if you ask Python)
- Dictionaries
- Compound statements
- Compound statement syntax and indentation level
- For statements and iterables
- If statements
- Else and elif clauses
- Functions
- Passing arguments to a function
- Returning values from a function
- Making annotations within programs
- A programmer's resources
- Documentation
- Online forums and mailing lists
- Summary
- Reading Exploring and Modifying Data - Part I
- External resources
- Logistical overview
- Installation requirements
- Data
- File system setup
- Introducing a basic data wrangling work flow
- Introducing the JSON file format
- Opening and closing a file in Python using file I/O
- The open function and file objects
- File structure - best practices to store your data
- Opening a file
- Reading the contents of a file
- Modules in Python
- Parsing a JSON file using the json module
- Exploring the contents of a data file
- Extracting the core content of the data
- Listing out all of the variables in the data
- Modifying a dataset
- Extracting data variables from the original dataset
- Using a for loop to iterate over the data
- Using a nested for loop to iterate over the data variables
- Outputting the modified data to a new file
- Specifying input and output file names in the Terminal
- Specifying the filenames from the Terminal
- Summary
- Reading Exploring and Modifying Data - Part II
- Logistical overview
- File system setup
- Data
- Installing pandas
- Understanding the CSV format
- Introducing the CSV module
- Using the CSV module to read CSV data
- Using the CSV module to write CSV data
- Using the pandas module to read and process data
- Counting the total road length in 2011 revisited
- Handling non-standard CSV encoding and dialect
- Understanding XML
- XML versus JSON
- Using the XML module to parse XML data
- XPath
- Summary
- Manipulating Text Data - An Introduction to Regular Expressions
- Logistical overview
- Data
- File structure setup
- Understanding the need for pattern recognition
- Introducting regular expressions
- Writing and using a regular expression
- Special characters
- Matching whitespace
- Matching the start of string
- Matching the end of a string
- Matching a range of characters
- Matching any one of several patterns
- Matching a sequence instead of just one character
- Putting patterns together
- Extracting a pattern from a string
- The regex split() function
- Python regex documentation
- Looking for patterns
- Quantifying the existence of patterns
- Creating a regular expression to match the street address
- Counting the number of matches
- Verifying the correctness of the matches
- Extracting patterns
- Outputting the data to a new file
- Summary
- Cleaning Numerical Data - An Introduction to R and RStudio
- Logistical overview
- Data
- Directory structure
- Installing R and RStudio
- Introducing R and RStudio
- Familiarizing yourself with RStudio
- Running R commands
- Setting the working directory
- Reading data
- The R dataframe
- R vectors
- Indexing R dataframes
- Finding the 2011 total in R
- Conducting basic outlier detection and removal
- Handling NA values
- Deleting missing values
- Replacing missing values with a constant
- Imputation of missing values
- Variable names and contents
- Summary
- Simplifying Data Manipulation with dplyr
- Logistical overview
- Data
- File system setup
- Installing the dplyr and tibble packages
- Introducing dplyr
- Getting started with dplyr
- Chaining operations together
- Filtering the rows of a dataframe
- Summarizing data by category
- Rewriting code using dplyr
- Summary
- Getting Data from the Web
- Logistical overview
- Filesystem setup
- Installing the requests module
- Internet connection
- Introducing APIs
- Using Python to retrieve data from APIs
- Using URL parameters to filter the results
- Summary
- Working with Large Datasets
- Logistical overview
- System requirements
- Data
- File system setup
- Installing MongoDB
- Planning out your time
- Cleaning up
- Understanding computer memory
- Understanding databases
- Introducing MongoDB
- Interfacing with MongoDB from Python
- Summary 更新時間:2021-07-02 15:16:37
推薦閱讀
- Practical Data Analysis
- Practical Ansible 2
- R Data Mining
- Dreamweaver CS3網頁設計50例
- Dreamweaver 8中文版商業案例精粹
- B2B2C網上商城開發指南
- 現代傳感技術
- 網絡綜合布線設計與施工技術
- Kubernetes for Developers
- Salesforce for Beginners
- MCGS嵌入版組態軟件應用教程
- 智能生產線的重構方法
- 大數據技術基礎:基于Hadoop與Spark
- 在實戰中成長:C++開發之路
- 數據要素:全球經濟社會發展的新動力
- 案例解說Delphi典型控制應用
- Deep Learning Essentials
- 牛津通識讀本:大數據(中文版)
- Practical Internet of Things with JavaScript
- 樂高機器人:Scratch與WeDo編程基礎實戰應用
- 傳感器技術及實訓(第2版)
- Python Data Analysis
- 裝配式混凝土建筑:甲方管理問題分析與對策
- CentOS 5系統管理
- Big Data Analysis with Python
- 計算機網絡綜合布線實訓教程
- Apache Tomcat 7 Essentials
- Puppet for Containerization
- ARM嵌入式系統技術開發與應用實踐
- 劍指Offer