- Hands-On Machine Learning with scikit:learn and Scientific Python Toolkits
- Tarek Amr
- 645字
- 2021-06-18 18:24:28
Introduction to scikit-learn
Since you have already picked up this book, you probably don't need me to convince you why machine learning is important. However, you may still have doubts about why to use scikit-learn in particular. You may encounter names such as TensorFlow, PyTorch, and Spark more often during your daily news consumption than scikit-learn. So, let me convince you of my preference for the latter.
It plays well with the Python data ecosystem
scikit-learn is a Python toolkit built on top of NumPy, SciPy, and Matplotlib. These choices mean that it fits well into your daily data pipeline. As a data scientist, Python is most likely your language of choice since it is good for both offline analysis and real-time implementations. You will also be using tools such as pandas to load data from your database, which allows you to perform a vast amount of transformation to your data. Since both pandas and scikit-learn are built on top of NumPy, they play very well with each other. Matplotlib is the de facto data visualization tool for Python, which means you can use its sophisticated data visualization capabilities to explore your data and unravel your model's ins and outs.
Since it is an open source tool that is heavily used in the community, it is very common to see other data tools use an almost identical interface to scikit-learn. Many of these tools are built on top of the same scientific Python libraries, and they are collectively known as SciKits (short for SciPyToolkits)—hence, the scikit prefix in scikit-learn. For example, scikit-image is a library for image processing, while categorical-encoding and imbalanced-learn are separate libraries for data preprocessing that are built as add-ons to scikit-learn.
We are going to use some of these tools in this book, and you will notice how easy it is to integrate these different tools into your workflow when using scikit-learn.
Being a key player in the Python data ecosystem is what makes scikit-learn the de facto toolset for machine learning. This is the tool that you will most likely hand your job application assignment to, as well as use for Kaggle competitions and to solve most of your professional day-to-day machine learning problems for your job.
Practical level of abstraction
scikit-learn implements a vast amount of machine learning, data processing, and model selection algorithms. These implementations are abstract enough, so you only need to apply minor changes when switching from one algorithm to another. This is a key feature since you will need to quickly iterate between different algorithms when developing a model to pick the best one for your problem. Having that said, this abstraction doesn't shield you from the algorithms' configurations. In other words, you are still in full control of your hyperparameters and settings.
When not to use scikit-learn
Most likely, the reasons to not use scikit-learn will include combinations of deep learning or scale. scikit-learn's implementation of neural networks is limited. Unlike scikit-learn, TensorFlow and PyTorch allow you to use a custom architecture, and they support GPUs for a massive training scale. All of scikit-learn's implementations run in memory on a single machine. I'd say that way more than 90% of businesses are at a scale where these constraints are fine. Data scientists can still fit their data in memory in large enough machines thanks to the cloud optionsavailable. They can cleverly engineer workarounds to deal with scaling issues, but if these limitations become something that they can no longer deal with, then they will need other tools to do the trick for them.
- 編程的修煉
- Java編程指南:基礎知識、類庫應用及案例設計
- Cassandra Design Patterns(Second Edition)
- Access 2010數據庫基礎與應用項目式教程(第3版)
- INSTANT CakePHP Starter
- Effective Python Penetration Testing
- Rust Essentials(Second Edition)
- Learning jQuery(Fourth Edition)
- 微信小程序開發與實戰(微課版)
- Swift 4 Protocol-Oriented Programming(Third Edition)
- Orchestrating Docker
- 一步一步跟我學Scratch3.0案例
- 從零開始學Android開發
- Scratch從入門到精通
- Learning Jakarta Struts 1.2: a concise and practical tutorial