舉報

會員
Practical Data Wrangling
Allan Visochek 著
更新時間:2021-07-02 15:16:37
開會員,本書免費讀 >
最新章節:
Summary
Ifyouareadatascientist,dataanalyst,orastatisticianwhowantstolearnhowtowrangleyourdataforanalysisinthebestpossiblemanner,thisbookisforyou.AsthisbookcoversbothRandPython,someunderstandingofthemwillbebeneficial.
最新章節
- Summary
- Interfacing with MongoDB from Python
- Introducing MongoDB
- Understanding databases
- Understanding computer memory
- Cleaning up
品牌:中圖公司
上架時間:2021-07-02 12:41:47
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Summary 更新時間:2021-07-02 15:16:37
- Interfacing with MongoDB from Python
- Introducing MongoDB
- Understanding databases
- Understanding computer memory
- Cleaning up
- Planning out your time
- Installing MongoDB
- File system setup
- Data
- System requirements
- Logistical overview
- Working with Large Datasets
- Summary
- Using URL parameters to filter the results
- Using Python to retrieve data from APIs
- Introducing APIs
- Internet connection
- Installing the requests module
- Filesystem setup
- Logistical overview
- Getting Data from the Web
- Summary
- Rewriting code using dplyr
- Summarizing data by category
- Filtering the rows of a dataframe
- Chaining operations together
- Getting started with dplyr
- Introducing dplyr
- Installing the dplyr and tibble packages
- File system setup
- Data
- Logistical overview
- Simplifying Data Manipulation with dplyr
- Summary
- Variable names and contents
- Imputation of missing values
- Replacing missing values with a constant
- Deleting missing values
- Handling NA values
- Conducting basic outlier detection and removal
- Finding the 2011 total in R
- Indexing R dataframes
- R vectors
- The R dataframe
- Reading data
- Setting the working directory
- Running R commands
- Familiarizing yourself with RStudio
- Introducing R and RStudio
- Installing R and RStudio
- Directory structure
- Data
- Logistical overview
- Cleaning Numerical Data - An Introduction to R and RStudio
- Summary
- Outputting the data to a new file
- Extracting patterns
- Verifying the correctness of the matches
- Counting the number of matches
- Creating a regular expression to match the street address
- Quantifying the existence of patterns
- Looking for patterns
- Python regex documentation
- The regex split() function
- Extracting a pattern from a string
- Putting patterns together
- Matching a sequence instead of just one character
- Matching any one of several patterns
- Matching a range of characters
- Matching the end of a string
- Matching the start of string
- Matching whitespace
- Special characters
- Writing and using a regular expression
- Introducting regular expressions
- Understanding the need for pattern recognition
- File structure setup
- Data
- Logistical overview
- Manipulating Text Data - An Introduction to Regular Expressions
- Summary
- XPath
- Using the XML module to parse XML data
- XML versus JSON
- Understanding XML
- Handling non-standard CSV encoding and dialect
- Counting the total road length in 2011 revisited
- Using the pandas module to read and process data
- Using the CSV module to write CSV data
- Using the CSV module to read CSV data
- Introducing the CSV module
- Understanding the CSV format
- Installing pandas
- Data
- File system setup
- Logistical overview
- Reading Exploring and Modifying Data - Part II
- Summary
- Specifying the filenames from the Terminal
- Specifying input and output file names in the Terminal
- Outputting the modified data to a new file
- Using a nested for loop to iterate over the data variables
- Using a for loop to iterate over the data
- Extracting data variables from the original dataset
- Modifying a dataset
- Listing out all of the variables in the data
- Extracting the core content of the data
- Exploring the contents of a data file
- Parsing a JSON file using the json module
- Modules in Python
- Reading the contents of a file
- Opening a file
- File structure - best practices to store your data
- The open function and file objects
- Opening and closing a file in Python using file I/O
- Introducing the JSON file format
- Introducing a basic data wrangling work flow
- File system setup
- Data
- Installation requirements
- Logistical overview
- External resources
- Reading Exploring and Modifying Data - Part I
- Summary
- Online forums and mailing lists
- Documentation
- A programmer's resources
- Making annotations within programs
- Returning values from a function
- Passing arguments to a function
- Functions
- Else and elif clauses
- If statements
- For statements and iterables
- Compound statement syntax and indentation level
- Compound statements
- Dictionaries
- Arrays (lists if you ask Python)
- Naming variables
- Division
- Multiplication
- Subtracting from a variable
- Adding to a variable
- Variables
- The print function
- Booleans
- Strings
- Why integers?
- Numbers - integers and floats
- Data types variables and the Python shell
- What if it didn't work?
- Running the Hello World program
- Using the terminal to run programs
- Writing the hello world program
- Using text editors to write and manage programs
- Running programs in python
- Python 2 versus Python 3
- Using other learning resources
- Installation requirements
- Logistical overview
- External resources
- Introduction to Programming in Python
- Summary
- R
- Python
- The tools for data wrangling
- Storing data
- Shaping and structuring data
- Cleaning data
- Getting and reading data
- Understanding data wrangling
- Programming with Data
- Questions
- Piracy
- Errata
- Downloading the color images of this book
- Downloading the example code
- Customer support
- Reader feedback
- Conventions
- Who this book is for
- What you need for this book
- What this book covers
- Preface
- Customer Feedback
- Why subscribe?
- www.PacktPub.com
- About the Reviewer
- About the Author
- Credits
- Practical Data Wrangling
- Copyright
- Title Page
- coverpage
- coverpage
- Title Page
- Copyright
- Practical Data Wrangling
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Programming with Data
- Understanding data wrangling
- Getting and reading data
- Cleaning data
- Shaping and structuring data
- Storing data
- The tools for data wrangling
- Python
- R
- Summary
- Introduction to Programming in Python
- External resources
- Logistical overview
- Installation requirements
- Using other learning resources
- Python 2 versus Python 3
- Running programs in python
- Using text editors to write and manage programs
- Writing the hello world program
- Using the terminal to run programs
- Running the Hello World program
- What if it didn't work?
- Data types variables and the Python shell
- Numbers - integers and floats
- Why integers?
- Strings
- Booleans
- The print function
- Variables
- Adding to a variable
- Subtracting from a variable
- Multiplication
- Division
- Naming variables
- Arrays (lists if you ask Python)
- Dictionaries
- Compound statements
- Compound statement syntax and indentation level
- For statements and iterables
- If statements
- Else and elif clauses
- Functions
- Passing arguments to a function
- Returning values from a function
- Making annotations within programs
- A programmer's resources
- Documentation
- Online forums and mailing lists
- Summary
- Reading Exploring and Modifying Data - Part I
- External resources
- Logistical overview
- Installation requirements
- Data
- File system setup
- Introducing a basic data wrangling work flow
- Introducing the JSON file format
- Opening and closing a file in Python using file I/O
- The open function and file objects
- File structure - best practices to store your data
- Opening a file
- Reading the contents of a file
- Modules in Python
- Parsing a JSON file using the json module
- Exploring the contents of a data file
- Extracting the core content of the data
- Listing out all of the variables in the data
- Modifying a dataset
- Extracting data variables from the original dataset
- Using a for loop to iterate over the data
- Using a nested for loop to iterate over the data variables
- Outputting the modified data to a new file
- Specifying input and output file names in the Terminal
- Specifying the filenames from the Terminal
- Summary
- Reading Exploring and Modifying Data - Part II
- Logistical overview
- File system setup
- Data
- Installing pandas
- Understanding the CSV format
- Introducing the CSV module
- Using the CSV module to read CSV data
- Using the CSV module to write CSV data
- Using the pandas module to read and process data
- Counting the total road length in 2011 revisited
- Handling non-standard CSV encoding and dialect
- Understanding XML
- XML versus JSON
- Using the XML module to parse XML data
- XPath
- Summary
- Manipulating Text Data - An Introduction to Regular Expressions
- Logistical overview
- Data
- File structure setup
- Understanding the need for pattern recognition
- Introducting regular expressions
- Writing and using a regular expression
- Special characters
- Matching whitespace
- Matching the start of string
- Matching the end of a string
- Matching a range of characters
- Matching any one of several patterns
- Matching a sequence instead of just one character
- Putting patterns together
- Extracting a pattern from a string
- The regex split() function
- Python regex documentation
- Looking for patterns
- Quantifying the existence of patterns
- Creating a regular expression to match the street address
- Counting the number of matches
- Verifying the correctness of the matches
- Extracting patterns
- Outputting the data to a new file
- Summary
- Cleaning Numerical Data - An Introduction to R and RStudio
- Logistical overview
- Data
- Directory structure
- Installing R and RStudio
- Introducing R and RStudio
- Familiarizing yourself with RStudio
- Running R commands
- Setting the working directory
- Reading data
- The R dataframe
- R vectors
- Indexing R dataframes
- Finding the 2011 total in R
- Conducting basic outlier detection and removal
- Handling NA values
- Deleting missing values
- Replacing missing values with a constant
- Imputation of missing values
- Variable names and contents
- Summary
- Simplifying Data Manipulation with dplyr
- Logistical overview
- Data
- File system setup
- Installing the dplyr and tibble packages
- Introducing dplyr
- Getting started with dplyr
- Chaining operations together
- Filtering the rows of a dataframe
- Summarizing data by category
- Rewriting code using dplyr
- Summary
- Getting Data from the Web
- Logistical overview
- Filesystem setup
- Installing the requests module
- Internet connection
- Introducing APIs
- Using Python to retrieve data from APIs
- Using URL parameters to filter the results
- Summary
- Working with Large Datasets
- Logistical overview
- System requirements
- Data
- File system setup
- Installing MongoDB
- Planning out your time
- Cleaning up
- Understanding computer memory
- Understanding databases
- Introducing MongoDB
- Interfacing with MongoDB from Python
- Summary 更新時間:2021-07-02 15:16:37