舉報

會員
Python Data Science Essentials
Fullyexpandedandupgraded,thelatesteditionofPythonDataScienceEssentialswillhelpyousucceedindatascienceoperationsusingthemostcommonPythonlibraries.Thisbookoffersup-to-dateinsightintothecoreofPython,includingthelatestversionsoftheJupyterNotebook,NumPy,pandas,andscikit-learn.Thebookcoversdetailedexamplesandlargehybriddatasetstohelpyougraspessentialstatisticaltechniquesfordatacollection,datamungingandanalysis,visualization,andreportingactivities.Youwillalsogainanunderstandingofadvanceddatasciencetopicssuchasmachinelearningalgorithms,distributedcomputing,tuningpredictivemodels,andnaturallanguageprocessing.Furthermore,You’llalsobeintroducedtodeeplearningandgradientboostingsolutionssuchasXGBoost,LightGBM,andCatBoost.Bytheendofthebook,youwillhavegainedacompleteoverviewoftheprincipalmachinelearningalgorithms,graphanalysistechniques,andallthevisualizationanddeploymentinstrumentsthatmakeiteasiertopresentyourresultstoanaudienceofbothdatascienceexpertsandbusinessusers.
最新章節
- Leave a review - let other readers know what you think
- Other Books You May Enjoy
- Don't be shy take a real challenge
- Interactive Jupyter
- PyCon and PyData
- Massive open online courses (MOOCs)
品牌:中圖公司
上架時間:2021-08-13 15:12:05
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Leave a review - let other readers know what you think 更新時間:2021-08-13 15:20:17
- Other Books You May Enjoy
- Don't be shy take a real challenge
- Interactive Jupyter
- PyCon and PyData
- Massive open online courses (MOOCs)
- Learn by watching reading and doing
- Comprehensions for lists and dictionaries
- Conditionals
- Iterators and generators
- Exceptions
- Classes objects and object-oriented programming
- Defining functions
- Dictionaries
- Lists
- Your learning list
- Strengthen Your Python Foundations
- Summary
- Final cleanup
- Cross-validation
- Manual tuning
- The power of the machine learning pipeline
- Evaluating a learner's performance
- Training a learner
- Feature engineering
- Reading the dataset
- Spark on the KDD99 dataset
- Machine learning with Spark
- Working with Spark DataFrames
- Writing the preprocessed DataFrame or RDD to disk
- Grouping and creating tables in-memory
- Dealing with missing data
- CSV files and Spark DataFrames
- Data preprocessing in Spark
- Broadcast and accumulator variables together—an example
- Write-only accumulator variables
- Read-only broadcast variables
- Sharing variables across cluster nodes
- Experimenting with Resilient Distributed Datasets
- Setting up your local Spark instance
- Starting with PySpark
- PySpark
- Introducing Apache Spark
- MapReduce
- Hadoop Distributed File System
- Hadoop architecture
- The Hadoop ecosystem
- Making sense of why we need a distributed framework
- From a standalone machine to a bunch of nodes
- Spark for Big Data
- Summary
- Working with temporal sequences
- Using pre-trained models
- Classifying images with CNN
- Approaching deep learning
- Deep Learning Beyond the Basics
- Summary
- Graph loading dumping and sampling
- Partitioning a network
- Types of node centrality
- Graph algorithms
- Introduction to graph theory
- Social Network Analysis
- Summary
- Creating a prediction server with machine-learning-as-a-service
- Gradient Boosting Trees partial dependence plotting
- Feature importance for RandomForests
- Validation curves
- Learning curves
- Advanced data learning representation
- Enhancing your EDA capabilities
- Introducing Seaborn
- Wrapping up matplotlib's commands
- Discovering patterns by parallel coordinates
- Plotting scatterplots
- Working with boxplots and histograms
- Selected graphical examples with pandas
- Image visualization
- Bar graphs
- Histograms
- Plotting scatterplots for relationships in data
- Using panels for clearer representations
- Trying curve plotting
- Introducing the basics of matplotlib
- Visualization Insights and Results
- Summary
- Latent Dirichlet Allocation (LDA)
- DBSCAN – a density-based clustering technique
- K-means
- An overview of unsupervised learning
- A complete data science example – text classification
- Stopwords
- Named entity recognition (NER)
- Word tagging
- Stemming
- Word tokenization
- A peek into natural language processing (NLP)
- An overview of Stochastic Gradient Descent (SGD)
- Dealing with variety
- Keeping up with velocity
- Scalability with volume
- Creating some big datasets as examples
- Dealing with big data
- CatBoost
- LightGBM
- XGBoost
- Gradient tree boosting (GTB)
- Sequences of models – AdaBoost
- Estimating probabilities from an ensemble
- Random Forests and Extra-Trees
- Random Subspaces and Random Patches
- Bagging with weak classifiers
- Pasting by random samples
- Ensemble strategies
- Tuning SVM
- SVM for regression
- SVM for classification
- Nonlinear algorithms
- K-Nearest Neighbors
- Naive Bayes
- Linear and logistic regression
- Preparing tools and datasets
- Machine Learning
- Summary
- Building custom transformation functions
- Combining features together and chaining transformations
- Wrapping everything in a pipeline
- Stability and L1-based selection
- Recursive elimination
- Univariate selection
- Selection based on feature variance
- Feature selection
- Reducing the grid search runtime
- Building custom scoring functions
- Hyperparameter optimization
- Sampling and bootstrapping
- Using cross-validation iterators
- Cross-validation
- Testing and validating
- Regression
- Binary classification
- Multilabel classification
- Validation metrics
- OneClassSVM
- EllipticEnvelope
- Univariate outlier detection
- The detection and treatment of outliers
- Restricted Boltzmann Machine
- T-SNE
- Kernel PCA
- Independent component analysis
- Latent semantical analysis
- Linear discriminant analysis
- Latent factor analysis
- PCA for big data – RandomizedPCA
- Principal component analysis
- The covariance matrix
- Dimensionality reduction
- Building new features
- Introducing EDA
- The Data Pipeline
- Summary
- Working with sparse arrays
- Stacking NumPy arrays
- Slicing and indexing with NumPy arrays
- Matrix operations
- NumPy fast operation and computations
- Extracting data from pandas
- Getting an array directly from a file
- Arrays derived from NumPy functions
- Resizing arrays
- From lists to multidimensional arrays
- Heterogeneous lists
- Controlling memory size
- From lists to unidimensional arrays
- Creating NumPy arrays
- The basics of NumPy ndarray objects
- NumPy's n-dimensional array
- Data processing with NumPy
- Scraping the web with Beautiful Soup
- A special type of data – text
- Working with categorical and textual data
- Data selection
- Data preprocessing
- Putting data together
- Accessing other data formats
- Dealing with big datasets
- Dealing with problematic data
- Fast and easy data loading
- Data loading and preprocessing with pandas
- The data science process
- Data Munging
- Summary
- Scikit-learn sample generators
- Loading data directly from CSV or text files
- LIBSVM data examples
- The MLdata.org and other public repositories for open source data
- Scikit-learn toy datasets
- Datasets and code used in this book
- Alternatives to Jupyter
- How Jupyter Notebooks can help data scientists
- Checking the new JupyterLab environment
- Installing packages directly from Jupyter Notebooks
- Jupyter magic commands
- Fast installation and first test usage
- Introducing Jupyter
- Keras
- TensorFlow
- CatBoost
- LightGBM
- XGBoost
- PyPy
- Gensim
- NLTK
- NetworkX
- Beautiful Soup
- Statsmodels
- Seaborn
- Matplotlib
- JupyterLab
- Jupyter
- Scikit-learn
- pandas-profiling
- pandas
- SciPy
- NumPy
- A glance at the essential packages
- Conda for managing environments
- Explaining virtual environments
- WinPython
- Enthought Canopy
- Leveraging conda to install packages
- Anaconda
- Scientific distributions
- Package upgrades
- Installing the necessary packages
- Step-by-step installation
- Python 2 or Python 3?
- Installing Python
- Introducing data science and Python
- First Steps
- Reviews
- Get in touch
- Conventions used
- Download the color images
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the reviewers
- About the authors
- Contributors
- Packt.com
- Why subscribe?
- Packt Upsell
- Python Data Science Essentials Third Edition
- Copyright and Credits
- Title Page
- 封面
- 封面
- Title Page
- Copyright and Credits
- Python Data Science Essentials Third Edition
- Packt Upsell
- Why subscribe?
- Packt.com
- Contributors
- About the authors
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- First Steps
- Introducing data science and Python
- Installing Python
- Python 2 or Python 3?
- Step-by-step installation
- Installing the necessary packages
- Package upgrades
- Scientific distributions
- Anaconda
- Leveraging conda to install packages
- Enthought Canopy
- WinPython
- Explaining virtual environments
- Conda for managing environments
- A glance at the essential packages
- NumPy
- SciPy
- pandas
- pandas-profiling
- Scikit-learn
- Jupyter
- JupyterLab
- Matplotlib
- Seaborn
- Statsmodels
- Beautiful Soup
- NetworkX
- NLTK
- Gensim
- PyPy
- XGBoost
- LightGBM
- CatBoost
- TensorFlow
- Keras
- Introducing Jupyter
- Fast installation and first test usage
- Jupyter magic commands
- Installing packages directly from Jupyter Notebooks
- Checking the new JupyterLab environment
- How Jupyter Notebooks can help data scientists
- Alternatives to Jupyter
- Datasets and code used in this book
- Scikit-learn toy datasets
- The MLdata.org and other public repositories for open source data
- LIBSVM data examples
- Loading data directly from CSV or text files
- Scikit-learn sample generators
- Summary
- Data Munging
- The data science process
- Data loading and preprocessing with pandas
- Fast and easy data loading
- Dealing with problematic data
- Dealing with big datasets
- Accessing other data formats
- Putting data together
- Data preprocessing
- Data selection
- Working with categorical and textual data
- A special type of data – text
- Scraping the web with Beautiful Soup
- Data processing with NumPy
- NumPy's n-dimensional array
- The basics of NumPy ndarray objects
- Creating NumPy arrays
- From lists to unidimensional arrays
- Controlling memory size
- Heterogeneous lists
- From lists to multidimensional arrays
- Resizing arrays
- Arrays derived from NumPy functions
- Getting an array directly from a file
- Extracting data from pandas
- NumPy fast operation and computations
- Matrix operations
- Slicing and indexing with NumPy arrays
- Stacking NumPy arrays
- Working with sparse arrays
- Summary
- The Data Pipeline
- Introducing EDA
- Building new features
- Dimensionality reduction
- The covariance matrix
- Principal component analysis
- PCA for big data – RandomizedPCA
- Latent factor analysis
- Linear discriminant analysis
- Latent semantical analysis
- Independent component analysis
- Kernel PCA
- T-SNE
- Restricted Boltzmann Machine
- The detection and treatment of outliers
- Univariate outlier detection
- EllipticEnvelope
- OneClassSVM
- Validation metrics
- Multilabel classification
- Binary classification
- Regression
- Testing and validating
- Cross-validation
- Using cross-validation iterators
- Sampling and bootstrapping
- Hyperparameter optimization
- Building custom scoring functions
- Reducing the grid search runtime
- Feature selection
- Selection based on feature variance
- Univariate selection
- Recursive elimination
- Stability and L1-based selection
- Wrapping everything in a pipeline
- Combining features together and chaining transformations
- Building custom transformation functions
- Summary
- Machine Learning
- Preparing tools and datasets
- Linear and logistic regression
- Naive Bayes
- K-Nearest Neighbors
- Nonlinear algorithms
- SVM for classification
- SVM for regression
- Tuning SVM
- Ensemble strategies
- Pasting by random samples
- Bagging with weak classifiers
- Random Subspaces and Random Patches
- Random Forests and Extra-Trees
- Estimating probabilities from an ensemble
- Sequences of models – AdaBoost
- Gradient tree boosting (GTB)
- XGBoost
- LightGBM
- CatBoost
- Dealing with big data
- Creating some big datasets as examples
- Scalability with volume
- Keeping up with velocity
- Dealing with variety
- An overview of Stochastic Gradient Descent (SGD)
- A peek into natural language processing (NLP)
- Word tokenization
- Stemming
- Word tagging
- Named entity recognition (NER)
- Stopwords
- A complete data science example – text classification
- An overview of unsupervised learning
- K-means
- DBSCAN – a density-based clustering technique
- Latent Dirichlet Allocation (LDA)
- Summary
- Visualization Insights and Results
- Introducing the basics of matplotlib
- Trying curve plotting
- Using panels for clearer representations
- Plotting scatterplots for relationships in data
- Histograms
- Bar graphs
- Image visualization
- Selected graphical examples with pandas
- Working with boxplots and histograms
- Plotting scatterplots
- Discovering patterns by parallel coordinates
- Wrapping up matplotlib's commands
- Introducing Seaborn
- Enhancing your EDA capabilities
- Advanced data learning representation
- Learning curves
- Validation curves
- Feature importance for RandomForests
- Gradient Boosting Trees partial dependence plotting
- Creating a prediction server with machine-learning-as-a-service
- Summary
- Social Network Analysis
- Introduction to graph theory
- Graph algorithms
- Types of node centrality
- Partitioning a network
- Graph loading dumping and sampling
- Summary
- Deep Learning Beyond the Basics
- Approaching deep learning
- Classifying images with CNN
- Using pre-trained models
- Working with temporal sequences
- Summary
- Spark for Big Data
- From a standalone machine to a bunch of nodes
- Making sense of why we need a distributed framework
- The Hadoop ecosystem
- Hadoop architecture
- Hadoop Distributed File System
- MapReduce
- Introducing Apache Spark
- PySpark
- Starting with PySpark
- Setting up your local Spark instance
- Experimenting with Resilient Distributed Datasets
- Sharing variables across cluster nodes
- Read-only broadcast variables
- Write-only accumulator variables
- Broadcast and accumulator variables together—an example
- Data preprocessing in Spark
- CSV files and Spark DataFrames
- Dealing with missing data
- Grouping and creating tables in-memory
- Writing the preprocessed DataFrame or RDD to disk
- Working with Spark DataFrames
- Machine learning with Spark
- Spark on the KDD99 dataset
- Reading the dataset
- Feature engineering
- Training a learner
- Evaluating a learner's performance
- The power of the machine learning pipeline
- Manual tuning
- Cross-validation
- Final cleanup
- Summary
- Strengthen Your Python Foundations
- Your learning list
- Lists
- Dictionaries
- Defining functions
- Classes objects and object-oriented programming
- Exceptions
- Iterators and generators
- Conditionals
- Comprehensions for lists and dictionaries
- Learn by watching reading and doing
- Massive open online courses (MOOCs)
- PyCon and PyData
- Interactive Jupyter
- Don't be shy take a real challenge
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-08-13 15:20:17