舉報

會員
Python Data Science Essentials
Fullyexpandedandupgraded,thelatesteditionofPythonDataScienceEssentialswillhelpyousucceedindatascienceoperationsusingthemostcommonPythonlibraries.Thisbookoffersup-to-dateinsightintothecoreofPython,includingthelatestversionsoftheJupyterNotebook,NumPy,pandas,andscikit-learn.Thebookcoversdetailedexamplesandlargehybriddatasetstohelpyougraspessentialstatisticaltechniquesfordatacollection,datamungingandanalysis,visualization,andreportingactivities.Youwillalsogainanunderstandingofadvanceddatasciencetopicssuchasmachinelearningalgorithms,distributedcomputing,tuningpredictivemodels,andnaturallanguageprocessing.Furthermore,You’llalsobeintroducedtodeeplearningandgradientboostingsolutionssuchasXGBoost,LightGBM,andCatBoost.Bytheendofthebook,youwillhavegainedacompleteoverviewoftheprincipalmachinelearningalgorithms,graphanalysistechniques,andallthevisualizationanddeploymentinstrumentsthatmakeiteasiertopresentyourresultstoanaudienceofbothdatascienceexpertsandbusinessusers.
目錄(261章)
倒序
- 封面
- Title Page
- Copyright and Credits
- Python Data Science Essentials Third Edition
- Packt Upsell
- Why subscribe?
- Packt.com
- Contributors
- About the authors
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- First Steps
- Introducing data science and Python
- Installing Python
- Python 2 or Python 3?
- Step-by-step installation
- Installing the necessary packages
- Package upgrades
- Scientific distributions
- Anaconda
- Leveraging conda to install packages
- Enthought Canopy
- WinPython
- Explaining virtual environments
- Conda for managing environments
- A glance at the essential packages
- NumPy
- SciPy
- pandas
- pandas-profiling
- Scikit-learn
- Jupyter
- JupyterLab
- Matplotlib
- Seaborn
- Statsmodels
- Beautiful Soup
- NetworkX
- NLTK
- Gensim
- PyPy
- XGBoost
- LightGBM
- CatBoost
- TensorFlow
- Keras
- Introducing Jupyter
- Fast installation and first test usage
- Jupyter magic commands
- Installing packages directly from Jupyter Notebooks
- Checking the new JupyterLab environment
- How Jupyter Notebooks can help data scientists
- Alternatives to Jupyter
- Datasets and code used in this book
- Scikit-learn toy datasets
- The MLdata.org and other public repositories for open source data
- LIBSVM data examples
- Loading data directly from CSV or text files
- Scikit-learn sample generators
- Summary
- Data Munging
- The data science process
- Data loading and preprocessing with pandas
- Fast and easy data loading
- Dealing with problematic data
- Dealing with big datasets
- Accessing other data formats
- Putting data together
- Data preprocessing
- Data selection
- Working with categorical and textual data
- A special type of data – text
- Scraping the web with Beautiful Soup
- Data processing with NumPy
- NumPy's n-dimensional array
- The basics of NumPy ndarray objects
- Creating NumPy arrays
- From lists to unidimensional arrays
- Controlling memory size
- Heterogeneous lists
- From lists to multidimensional arrays
- Resizing arrays
- Arrays derived from NumPy functions
- Getting an array directly from a file
- Extracting data from pandas
- NumPy fast operation and computations
- Matrix operations
- Slicing and indexing with NumPy arrays
- Stacking NumPy arrays
- Working with sparse arrays
- Summary
- The Data Pipeline
- Introducing EDA
- Building new features
- Dimensionality reduction
- The covariance matrix
- Principal component analysis
- PCA for big data – RandomizedPCA
- Latent factor analysis
- Linear discriminant analysis
- Latent semantical analysis
- Independent component analysis
- Kernel PCA
- T-SNE
- Restricted Boltzmann Machine
- The detection and treatment of outliers
- Univariate outlier detection
- EllipticEnvelope
- OneClassSVM
- Validation metrics
- Multilabel classification
- Binary classification
- Regression
- Testing and validating
- Cross-validation
- Using cross-validation iterators
- Sampling and bootstrapping
- Hyperparameter optimization
- Building custom scoring functions
- Reducing the grid search runtime
- Feature selection
- Selection based on feature variance
- Univariate selection
- Recursive elimination
- Stability and L1-based selection
- Wrapping everything in a pipeline
- Combining features together and chaining transformations
- Building custom transformation functions
- Summary
- Machine Learning
- Preparing tools and datasets
- Linear and logistic regression
- Naive Bayes
- K-Nearest Neighbors
- Nonlinear algorithms
- SVM for classification
- SVM for regression
- Tuning SVM
- Ensemble strategies
- Pasting by random samples
- Bagging with weak classifiers
- Random Subspaces and Random Patches
- Random Forests and Extra-Trees
- Estimating probabilities from an ensemble
- Sequences of models – AdaBoost
- Gradient tree boosting (GTB)
- XGBoost
- LightGBM
- CatBoost
- Dealing with big data
- Creating some big datasets as examples
- Scalability with volume
- Keeping up with velocity
- Dealing with variety
- An overview of Stochastic Gradient Descent (SGD)
- A peek into natural language processing (NLP)
- Word tokenization
- Stemming
- Word tagging
- Named entity recognition (NER)
- Stopwords
- A complete data science example – text classification
- An overview of unsupervised learning
- K-means
- DBSCAN – a density-based clustering technique
- Latent Dirichlet Allocation (LDA)
- Summary
- Visualization Insights and Results
- Introducing the basics of matplotlib
- Trying curve plotting
- Using panels for clearer representations
- Plotting scatterplots for relationships in data
- Histograms
- Bar graphs
- Image visualization
- Selected graphical examples with pandas
- Working with boxplots and histograms
- Plotting scatterplots
- Discovering patterns by parallel coordinates
- Wrapping up matplotlib's commands
- Introducing Seaborn
- Enhancing your EDA capabilities
- Advanced data learning representation
- Learning curves
- Validation curves
- Feature importance for RandomForests
- Gradient Boosting Trees partial dependence plotting
- Creating a prediction server with machine-learning-as-a-service
- Summary
- Social Network Analysis
- Introduction to graph theory
- Graph algorithms
- Types of node centrality
- Partitioning a network
- Graph loading dumping and sampling
- Summary
- Deep Learning Beyond the Basics
- Approaching deep learning
- Classifying images with CNN
- Using pre-trained models
- Working with temporal sequences
- Summary
- Spark for Big Data
- From a standalone machine to a bunch of nodes
- Making sense of why we need a distributed framework
- The Hadoop ecosystem
- Hadoop architecture
- Hadoop Distributed File System
- MapReduce
- Introducing Apache Spark
- PySpark
- Starting with PySpark
- Setting up your local Spark instance
- Experimenting with Resilient Distributed Datasets
- Sharing variables across cluster nodes
- Read-only broadcast variables
- Write-only accumulator variables
- Broadcast and accumulator variables together—an example
- Data preprocessing in Spark
- CSV files and Spark DataFrames
- Dealing with missing data
- Grouping and creating tables in-memory
- Writing the preprocessed DataFrame or RDD to disk
- Working with Spark DataFrames
- Machine learning with Spark
- Spark on the KDD99 dataset
- Reading the dataset
- Feature engineering
- Training a learner
- Evaluating a learner's performance
- The power of the machine learning pipeline
- Manual tuning
- Cross-validation
- Final cleanup
- Summary
- Strengthen Your Python Foundations
- Your learning list
- Lists
- Dictionaries
- Defining functions
- Classes objects and object-oriented programming
- Exceptions
- Iterators and generators
- Conditionals
- Comprehensions for lists and dictionaries
- Learn by watching reading and doing
- Massive open online courses (MOOCs)
- PyCon and PyData
- Interactive Jupyter
- Don't be shy take a real challenge
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-08-13 15:20:17
推薦閱讀
- 軟件架構設計
- Getting Started with Clickteam Fusion
- 并行數據挖掘及性能優化:關聯規則與數據相關性分析
- 輕松學Java
- 最簡數據挖掘
- Implementing AWS:Design,Build,and Manage your Infrastructure
- 液壓機智能故障診斷方法集成技術
- 貫通開源Web圖形與報表技術全集
- Natural Language Processing and Computational Linguistics
- Redash v5 Quick Start Guide
- 人工智能云平臺:原理、設計與應用
- 軟件需求最佳實踐
- 服務器配置與應用(Windows Server 2008 R2)
- Architecting Cloud Computing Solutions
- 中文版Flash CS6高手速成
- 智能機器人制作完全手冊(第2版)
- Cloud-Native Continuous Integration and Delivery
- Bash Quick Start Guide
- 微機原理與接口技術(基于32位機)
- 單片開關電源集成電路應用設計實例
- 電氣控制從理論到實踐:變頻器應用一點通
- SQL機器學習庫MADlib技術解析
- 基于多核平臺的嵌入式系統設計方法
- Deep Learning By Example
- Windows XP操作系統考前12小時
- Hands-On Data Science with R
- Windows Server 2008系統管理與網絡管理
- 計算機繪圖
- VMware vSphere Resource Management Essentials
- Hadoop大數據開發實戰