- Index 更新時間:2021-07-14 11:06:29
- Docker tips
- Reproducible sessions
- Command-line history
- The alias command
- Command-line tools
- IPython notebooks
- Appendix D. Tips and Tricks for Command-Line and Miscellaneous Tools
- Mathematics and statistics
- IPython notebooks and open data
- Appendix C. Online Resources
- Statsmodels
- Seaborn
- SciPy
- Scikit-learn
- pandas
- NumPy
- Matplotlib
- IPython
- Appendix B. Function Reference
- Appendix A. Glossary
- Harnessing the power of the GPU with OpenCL
- Streaming counting with the Count-min sketch
- Caching HTTP requests
- Caching with a least recently used cache
- Calculating the mean variance skewness and kurtosis on the fly
- Profiling memory usage
- Distributed processing with execnet
- Accessing resources asynchronously with the asyncio module
- Launching multiple tasks with the concurrent.futures module
- Running multiple threads with the threading module
- Speeding up numerical expressions with Numexpr
- Just-in-time compiling with Numba
- Introduction
- Chapter 12. Parallelism and Performance
- Segmenting images with spectral clustering
- Applying hierarchical clustering on images
- Extracting texture features from images
- Extracting metadata from images
- Searching for bright stars
- Detecting faces with Haar cascades
- Extracting patches from an image
- Denoising images
- Quantizing colors
- Detecting features with SURF
- Applying Scale-Invariant Feature Transform (SIFT)
- Setting up OpenCV
- Introduction
- Chapter 11. Analyzing Images
- Taking a look at the Matthews correlation coefficient
- Examining the kappa of classification
- Calculating the mean absolute error and the residual sum of squares
- Comparing with a dummy regressor
- Determining MAPE and MPE
- Comparing results with a dummy classifier
- Evaluating clusters with the mean silhouette coefficient
- Computing MSE and median absolute error
- Visualizing the goodness of fit
- Examining a receiver operating characteristic and the area under a curve
- Computing precision recall and F1-score
- Getting classification straight with the confusion matrix
- Introduction
- Chapter 10. Evaluating Classifiers Regressors and Clusters
- Taking a Theano tour
- Hierarchically clustering data
- Reusing models with joblib
- Nesting cross-validation
- Boosting for better learning
- Bagging to improve results
- Fitting noisy data with the RANSAC algorithm
- Learning with random forests
- Stacking and majority voting for multiple models
- Applying linear discriminant analysis for dimension reduction
- Applying principal component analysis for dimension reduction
- Recursively eliminating features
- Introduction
- Chapter 9. Ensemble Learning and Dimensionality Reduction
- Creating a document graph with cosine similarity
- Getting the clique number of a graph
- Calculating the assortativity coefficient of a graph
- Estimating the average clustering coefficient
- Determining the betweenness centrality
- Calculating social network closeness centrality
- Computing social network density
- Implementing a basic terms database
- Extracting topics with non-negative matrix factorization
- Recognizing named entities
- Stemming lemmatizing filtering and TF-IDF scores
- Tokenizing news articles in sentences and words
- Creating a categorized corpus
- Introduction
- Chapter 8. Text Mining and Social Network Analysis
- Optimizing an equal weights two-asset portfolio
- Populating the stock prices database
- Creating tables for a stock prices database
- Determining market efficiency with autoregressive models
- Testing for random walks
- Examining the market with the non-parametric runs test
- Exploring risk and return
- Correlating individual stocks with the broader market
- Analyzing returns statistics
- Ranking stocks with the Calmar and Sortino ratios
- Ranking stocks with the Sharpe ratio and liquidity
- Computing simple and log returns
- Introduction
- Chapter 7. Selecting Stocks with Financial Data Analysis
- Applying the discrete wavelet transform
- Moving block bootstrapping time series data
- Block bootstrapping time series data
- Analyzing signals with the discrete cosine transform
- Analyzing the frequency spectrum of audio
- Using the Lomb-Scargle periodogram
- Evaluating smoothing
- Exponential smoothing
- Measuring phase synchronization
- Analyzing peaks
- Estimating power spectral density with the Welch method
- Spectral analysis with periodograms
- Introduction
- Chapter 6. Signal Processing and Timeseries
- Clustering data with Spark
- Setting up Spark
- Using HDFS
- Implementing a star schema with fact and dimension tables
- Setting up a test web server
- Adding indices after table creation
- Adding a table column to an existing table
- Setting up database migration scripts
- Implementing association tables
- Dealing with non-ASCII text and HTML entities
- Scraping the Web
- Simulating web browsing
- Introduction
- Chapter 5. Web Mining Databases and Big Data
- Using arbitrary precision for linear algebra
- Using arbitrary precision for optimization
- Taking variance into account with weighted least squares
- Fitting a robust linear model
- Applying logit() to transform proportions
- Rebinning data
- Transforming data with logarithms
- Transforming data with the power ladder
- Normalizing with the Box-Cox transformation
- Measuring central tendency of noisy data
- Winsorizing data
- Clipping and filtering outliers
- Introduction
- Chapter 4. Dealing with Data and Numerical Issues
- Evaluating relations between variables with ANOVA
- Correlating a binary and a continuous variable with the point biserial correlation
- Correlating variables with the Spearman rank correlation
- Correlating variables with Pearson's correlation
- Exploring extreme values
- Sampling with probability weights
- Determining confidence intervals for mean variance and standard deviation
- Estimating kernel density
- Determining bias
- Fitting aggregated counts to the Poisson distribution
- Fitting aggregated data to the gamma distribution
- Fitting data to the exponential distribution
- Introduction
- Chapter 3. Statistical Data Analysis and Probability
- Highlighting data points with influence plots
- Using ggplot2-like plots
- Displaying geographical maps
- Visualizing network graphs with hive plots
- Combining box plots and kernel density plots with violin plots
- Creating heatmaps
- Visualizing with d3.js via mpld3
- Viewing a matrix of scatterplots
- Interacting with IPython Notebook widgets
- Choosing matplotlib color maps
- Choosing seaborn color palettes
- Graphing Anscombe's quartet
- Introduction
- Chapter 2. Creating Attractive Data Visualizations
- Standardizing reports code style and data access
- Seeding random number generators and NumPy print options
- Configuring matplotlib
- Configuring pandas
- Unit testing your code
- Learning to log for robust error checking
- Configuring IPython
- Keeping track of package versions and history in IPython Notebook
- Sandboxing Python applications with Docker images
- Creating a virtual environment with virtualenv and virtualenvwrapper
- Installing the Data Science Toolbox
- Setting up Anaconda
- Introduction
- Chapter 1. Laying the Foundation for Reproducible Data Analysis
- Customer support
- Reader feedback
- Conventions
- Sections
- Who this book is for
- What you need for this book
- What this book covers
- A conjecture about the future
- A brief of history of data analysis with Python
- Data analysis data science big data – what is the big deal?
- Why do you need this book?
- Preface
- eBooks discount offers and more
- www.PacktPub.com
- About the Reviewers
- About the Author
- Credits
- 版權頁
- 封面
- 封面
- 版權頁
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- eBooks discount offers and more
- Preface
- Why do you need this book?
- Data analysis data science big data – what is the big deal?
- A brief of history of data analysis with Python
- A conjecture about the future
- What this book covers
- What you need for this book
- Who this book is for
- Sections
- Conventions
- Reader feedback
- Customer support
- Chapter 1. Laying the Foundation for Reproducible Data Analysis
- Introduction
- Setting up Anaconda
- Installing the Data Science Toolbox
- Creating a virtual environment with virtualenv and virtualenvwrapper
- Sandboxing Python applications with Docker images
- Keeping track of package versions and history in IPython Notebook
- Configuring IPython
- Learning to log for robust error checking
- Unit testing your code
- Configuring pandas
- Configuring matplotlib
- Seeding random number generators and NumPy print options
- Standardizing reports code style and data access
- Chapter 2. Creating Attractive Data Visualizations
- Introduction
- Graphing Anscombe's quartet
- Choosing seaborn color palettes
- Choosing matplotlib color maps
- Interacting with IPython Notebook widgets
- Viewing a matrix of scatterplots
- Visualizing with d3.js via mpld3
- Creating heatmaps
- Combining box plots and kernel density plots with violin plots
- Visualizing network graphs with hive plots
- Displaying geographical maps
- Using ggplot2-like plots
- Highlighting data points with influence plots
- Chapter 3. Statistical Data Analysis and Probability
- Introduction
- Fitting data to the exponential distribution
- Fitting aggregated data to the gamma distribution
- Fitting aggregated counts to the Poisson distribution
- Determining bias
- Estimating kernel density
- Determining confidence intervals for mean variance and standard deviation
- Sampling with probability weights
- Exploring extreme values
- Correlating variables with Pearson's correlation
- Correlating variables with the Spearman rank correlation
- Correlating a binary and a continuous variable with the point biserial correlation
- Evaluating relations between variables with ANOVA
- Chapter 4. Dealing with Data and Numerical Issues
- Introduction
- Clipping and filtering outliers
- Winsorizing data
- Measuring central tendency of noisy data
- Normalizing with the Box-Cox transformation
- Transforming data with the power ladder
- Transforming data with logarithms
- Rebinning data
- Applying logit() to transform proportions
- Fitting a robust linear model
- Taking variance into account with weighted least squares
- Using arbitrary precision for optimization
- Using arbitrary precision for linear algebra
- Chapter 5. Web Mining Databases and Big Data
- Introduction
- Simulating web browsing
- Scraping the Web
- Dealing with non-ASCII text and HTML entities
- Implementing association tables
- Setting up database migration scripts
- Adding a table column to an existing table
- Adding indices after table creation
- Setting up a test web server
- Implementing a star schema with fact and dimension tables
- Using HDFS
- Setting up Spark
- Clustering data with Spark
- Chapter 6. Signal Processing and Timeseries
- Introduction
- Spectral analysis with periodograms
- Estimating power spectral density with the Welch method
- Analyzing peaks
- Measuring phase synchronization
- Exponential smoothing
- Evaluating smoothing
- Using the Lomb-Scargle periodogram
- Analyzing the frequency spectrum of audio
- Analyzing signals with the discrete cosine transform
- Block bootstrapping time series data
- Moving block bootstrapping time series data
- Applying the discrete wavelet transform
- Chapter 7. Selecting Stocks with Financial Data Analysis
- Introduction
- Computing simple and log returns
- Ranking stocks with the Sharpe ratio and liquidity
- Ranking stocks with the Calmar and Sortino ratios
- Analyzing returns statistics
- Correlating individual stocks with the broader market
- Exploring risk and return
- Examining the market with the non-parametric runs test
- Testing for random walks
- Determining market efficiency with autoregressive models
- Creating tables for a stock prices database
- Populating the stock prices database
- Optimizing an equal weights two-asset portfolio
- Chapter 8. Text Mining and Social Network Analysis
- Introduction
- Creating a categorized corpus
- Tokenizing news articles in sentences and words
- Stemming lemmatizing filtering and TF-IDF scores
- Recognizing named entities
- Extracting topics with non-negative matrix factorization
- Implementing a basic terms database
- Computing social network density
- Calculating social network closeness centrality
- Determining the betweenness centrality
- Estimating the average clustering coefficient
- Calculating the assortativity coefficient of a graph
- Getting the clique number of a graph
- Creating a document graph with cosine similarity
- Chapter 9. Ensemble Learning and Dimensionality Reduction
- Introduction
- Recursively eliminating features
- Applying principal component analysis for dimension reduction
- Applying linear discriminant analysis for dimension reduction
- Stacking and majority voting for multiple models
- Learning with random forests
- Fitting noisy data with the RANSAC algorithm
- Bagging to improve results
- Boosting for better learning
- Nesting cross-validation
- Reusing models with joblib
- Hierarchically clustering data
- Taking a Theano tour
- Chapter 10. Evaluating Classifiers Regressors and Clusters
- Introduction
- Getting classification straight with the confusion matrix
- Computing precision recall and F1-score
- Examining a receiver operating characteristic and the area under a curve
- Visualizing the goodness of fit
- Computing MSE and median absolute error
- Evaluating clusters with the mean silhouette coefficient
- Comparing results with a dummy classifier
- Determining MAPE and MPE
- Comparing with a dummy regressor
- Calculating the mean absolute error and the residual sum of squares
- Examining the kappa of classification
- Taking a look at the Matthews correlation coefficient
- Chapter 11. Analyzing Images
- Introduction
- Setting up OpenCV
- Applying Scale-Invariant Feature Transform (SIFT)
- Detecting features with SURF
- Quantizing colors
- Denoising images
- Extracting patches from an image
- Detecting faces with Haar cascades
- Searching for bright stars
- Extracting metadata from images
- Extracting texture features from images
- Applying hierarchical clustering on images
- Segmenting images with spectral clustering
- Chapter 12. Parallelism and Performance
- Introduction
- Just-in-time compiling with Numba
- Speeding up numerical expressions with Numexpr
- Running multiple threads with the threading module
- Launching multiple tasks with the concurrent.futures module
- Accessing resources asynchronously with the asyncio module
- Distributed processing with execnet
- Profiling memory usage
- Calculating the mean variance skewness and kurtosis on the fly
- Caching with a least recently used cache
- Caching HTTP requests
- Streaming counting with the Count-min sketch
- Harnessing the power of the GPU with OpenCL
- Appendix A. Glossary
- Appendix B. Function Reference
- IPython
- Matplotlib
- NumPy
- pandas
- Scikit-learn
- SciPy
- Seaborn
- Statsmodels
- Appendix C. Online Resources
- IPython notebooks and open data
- Mathematics and statistics
- Appendix D. Tips and Tricks for Command-Line and Miscellaneous Tools
- IPython notebooks
- Command-line tools
- The alias command
- Command-line history
- Reproducible sessions
- Docker tips
- Index 更新時間:2021-07-14 11:06:29