舉報

會員
Python Data Science Essentials
Fullyexpandedandupgraded,thelatesteditionofPythonDataScienceEssentialswillhelpyousucceedindatascienceoperationsusingthemostcommonPythonlibraries.Thisbookoffersup-to-dateinsightintothecoreofPython,includingthelatestversionsoftheJupyterNotebook,NumPy,pandas,andscikit-learn.Thebookcoversdetailedexamplesandlargehybriddatasetstohelpyougraspessentialstatisticaltechniquesfordatacollection,datamungingandanalysis,visualization,andreportingactivities.Youwillalsogainanunderstandingofadvanceddatasciencetopicssuchasmachinelearningalgorithms,distributedcomputing,tuningpredictivemodels,andnaturallanguageprocessing.Furthermore,You’llalsobeintroducedtodeeplearningandgradientboostingsolutionssuchasXGBoost,LightGBM,andCatBoost.Bytheendofthebook,youwillhavegainedacompleteoverviewoftheprincipalmachinelearningalgorithms,graphanalysistechniques,andallthevisualizationanddeploymentinstrumentsthatmakeiteasiertopresentyourresultstoanaudienceofbothdatascienceexpertsandbusinessusers.
目錄(261章)
倒序
- 封面
- Title Page
- Copyright and Credits
- Python Data Science Essentials Third Edition
- Packt Upsell
- Why subscribe?
- Packt.com
- Contributors
- About the authors
- About the reviewers
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- First Steps
- Introducing data science and Python
- Installing Python
- Python 2 or Python 3?
- Step-by-step installation
- Installing the necessary packages
- Package upgrades
- Scientific distributions
- Anaconda
- Leveraging conda to install packages
- Enthought Canopy
- WinPython
- Explaining virtual environments
- Conda for managing environments
- A glance at the essential packages
- NumPy
- SciPy
- pandas
- pandas-profiling
- Scikit-learn
- Jupyter
- JupyterLab
- Matplotlib
- Seaborn
- Statsmodels
- Beautiful Soup
- NetworkX
- NLTK
- Gensim
- PyPy
- XGBoost
- LightGBM
- CatBoost
- TensorFlow
- Keras
- Introducing Jupyter
- Fast installation and first test usage
- Jupyter magic commands
- Installing packages directly from Jupyter Notebooks
- Checking the new JupyterLab environment
- How Jupyter Notebooks can help data scientists
- Alternatives to Jupyter
- Datasets and code used in this book
- Scikit-learn toy datasets
- The MLdata.org and other public repositories for open source data
- LIBSVM data examples
- Loading data directly from CSV or text files
- Scikit-learn sample generators
- Summary
- Data Munging
- The data science process
- Data loading and preprocessing with pandas
- Fast and easy data loading
- Dealing with problematic data
- Dealing with big datasets
- Accessing other data formats
- Putting data together
- Data preprocessing
- Data selection
- Working with categorical and textual data
- A special type of data – text
- Scraping the web with Beautiful Soup
- Data processing with NumPy
- NumPy's n-dimensional array
- The basics of NumPy ndarray objects
- Creating NumPy arrays
- From lists to unidimensional arrays
- Controlling memory size
- Heterogeneous lists
- From lists to multidimensional arrays
- Resizing arrays
- Arrays derived from NumPy functions
- Getting an array directly from a file
- Extracting data from pandas
- NumPy fast operation and computations
- Matrix operations
- Slicing and indexing with NumPy arrays
- Stacking NumPy arrays
- Working with sparse arrays
- Summary
- The Data Pipeline
- Introducing EDA
- Building new features
- Dimensionality reduction
- The covariance matrix
- Principal component analysis
- PCA for big data – RandomizedPCA
- Latent factor analysis
- Linear discriminant analysis
- Latent semantical analysis
- Independent component analysis
- Kernel PCA
- T-SNE
- Restricted Boltzmann Machine
- The detection and treatment of outliers
- Univariate outlier detection
- EllipticEnvelope
- OneClassSVM
- Validation metrics
- Multilabel classification
- Binary classification
- Regression
- Testing and validating
- Cross-validation
- Using cross-validation iterators
- Sampling and bootstrapping
- Hyperparameter optimization
- Building custom scoring functions
- Reducing the grid search runtime
- Feature selection
- Selection based on feature variance
- Univariate selection
- Recursive elimination
- Stability and L1-based selection
- Wrapping everything in a pipeline
- Combining features together and chaining transformations
- Building custom transformation functions
- Summary
- Machine Learning
- Preparing tools and datasets
- Linear and logistic regression
- Naive Bayes
- K-Nearest Neighbors
- Nonlinear algorithms
- SVM for classification
- SVM for regression
- Tuning SVM
- Ensemble strategies
- Pasting by random samples
- Bagging with weak classifiers
- Random Subspaces and Random Patches
- Random Forests and Extra-Trees
- Estimating probabilities from an ensemble
- Sequences of models – AdaBoost
- Gradient tree boosting (GTB)
- XGBoost
- LightGBM
- CatBoost
- Dealing with big data
- Creating some big datasets as examples
- Scalability with volume
- Keeping up with velocity
- Dealing with variety
- An overview of Stochastic Gradient Descent (SGD)
- A peek into natural language processing (NLP)
- Word tokenization
- Stemming
- Word tagging
- Named entity recognition (NER)
- Stopwords
- A complete data science example – text classification
- An overview of unsupervised learning
- K-means
- DBSCAN – a density-based clustering technique
- Latent Dirichlet Allocation (LDA)
- Summary
- Visualization Insights and Results
- Introducing the basics of matplotlib
- Trying curve plotting
- Using panels for clearer representations
- Plotting scatterplots for relationships in data
- Histograms
- Bar graphs
- Image visualization
- Selected graphical examples with pandas
- Working with boxplots and histograms
- Plotting scatterplots
- Discovering patterns by parallel coordinates
- Wrapping up matplotlib's commands
- Introducing Seaborn
- Enhancing your EDA capabilities
- Advanced data learning representation
- Learning curves
- Validation curves
- Feature importance for RandomForests
- Gradient Boosting Trees partial dependence plotting
- Creating a prediction server with machine-learning-as-a-service
- Summary
- Social Network Analysis
- Introduction to graph theory
- Graph algorithms
- Types of node centrality
- Partitioning a network
- Graph loading dumping and sampling
- Summary
- Deep Learning Beyond the Basics
- Approaching deep learning
- Classifying images with CNN
- Using pre-trained models
- Working with temporal sequences
- Summary
- Spark for Big Data
- From a standalone machine to a bunch of nodes
- Making sense of why we need a distributed framework
- The Hadoop ecosystem
- Hadoop architecture
- Hadoop Distributed File System
- MapReduce
- Introducing Apache Spark
- PySpark
- Starting with PySpark
- Setting up your local Spark instance
- Experimenting with Resilient Distributed Datasets
- Sharing variables across cluster nodes
- Read-only broadcast variables
- Write-only accumulator variables
- Broadcast and accumulator variables together—an example
- Data preprocessing in Spark
- CSV files and Spark DataFrames
- Dealing with missing data
- Grouping and creating tables in-memory
- Writing the preprocessed DataFrame or RDD to disk
- Working with Spark DataFrames
- Machine learning with Spark
- Spark on the KDD99 dataset
- Reading the dataset
- Feature engineering
- Training a learner
- Evaluating a learner's performance
- The power of the machine learning pipeline
- Manual tuning
- Cross-validation
- Final cleanup
- Summary
- Strengthen Your Python Foundations
- Your learning list
- Lists
- Dictionaries
- Defining functions
- Classes objects and object-oriented programming
- Exceptions
- Iterators and generators
- Conditionals
- Comprehensions for lists and dictionaries
- Learn by watching reading and doing
- Massive open online courses (MOOCs)
- PyCon and PyData
- Interactive Jupyter
- Don't be shy take a real challenge
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-08-13 15:20:17
推薦閱讀
- Mastering Spark for Data Science
- Cinema 4D R13 Cookbook
- 21天學通C++
- RPA:流程自動化引領數字勞動力革命
- 數據通信與計算機網絡
- 貫通Java Web開發三劍客
- 電子設備及系統人機工程設計(第2版)
- TensorFlow Deep Learning Projects
- Microsoft Dynamics CRM 2013 Marketing Automation
- DynamoDB Applied Design Patterns
- Mastering Machine Learning with R
- Spark Streaming實時流式大數據處理實戰
- Learning OpenShift
- Apache Spark Machine Learning Blueprints
- 多傳感器數據智能融合理論與應用
- Learn SOLIDWORKS 2020
- 嵌入式系統應用開發基礎
- 嵌入式系統原理與接口技術
- PyTorch 1.x Reinforcement Learning Cookbook
- 西門子故障安全系統應用指南
- 數據庫應用技術:Visual FoxPro 6.0上機指導與練習
- PLC與步進伺服快速入門與實踐
- 樂高創意機器人教程(初級 下冊 6-12歲) (青少年iCAN+創新創意實踐指導叢書)
- Big Data Analysis with Python
- 工業機器人應用技術
- Python Programming with Raspberry Pi
- 新編計算機導論(第2版)
- 數據庫應用基礎:Visual FoxPro 6.0計算機網絡基礎
- 看圖學中文版Word 2007
- 人人可懂的數據科學