- Mastering Java for Data Science
- Alexey Grigorev
- 358字
- 2021-07-02 23:44:31
Machine learning
Machine learning is a part of computer science, and it is at the core of data science. The data itself, especially in big volumes, is hardly useful, but inside it hides highly valuable patterns. With the help of machine learning, we can recognize these hidden patterns, extract them, and then apply the learned information to the new unseen items.
For example, given the image of an animal, a machine learning algorithm can say whether the picture is a dog or a cat; or, given the history of a bank client, it will say how likely the client is to default, that is, to fail to pay the debt.
Often, machine learning models are seen as black boxes that take in a data point and output a prediction for it. In this book, we will look at what is inside these black boxes and see how and when it is best to use them.
The typical problems that machine learning solves can be categorized in the following groups:
- Supervised learning: For each data point, we have a label--extra information that describes the outcome that we want to learn. In the cats versus dogs case, the data point is an image of the animal; the label describes whether it's a dog or a cat.
- Unsupervised learning: We only have raw data points and no label information is available. For example, we have a collection of e-mails and we would like to group them based on how similar they are. There is no explicit label associated with the e-mails, which makes this problem unsupervised.
- Semi-supervised learning: Labels are given only for a part of the data.
- Reinforcement learning: Instead of labels, we have a reward; something the model gets by interacting with the environment it runs in. Based on the reward, it can adapt and maximize it. For example, a model that learns how to play chess gets a positive reward each time it eats a figure of the opponent, and gets a negative reward each time it loses a figure; and the reward is proportional to the value of the figure.
- GitHub Essentials
- 大數(shù)據(jù)技術(shù)基礎(chǔ)
- 在你身邊為你設(shè)計Ⅲ:騰訊服務(wù)設(shè)計思維與實戰(zhàn)
- Python數(shù)據(jù)挖掘:入門、進階與實用案例分析
- Google Visualization API Essentials
- 信息系統(tǒng)與數(shù)據(jù)科學(xué)
- Oracle高性能自動化運維
- 數(shù)據(jù)革命:大數(shù)據(jù)價值實現(xiàn)方法、技術(shù)與案例
- Microsoft Power BI數(shù)據(jù)可視化與數(shù)據(jù)分析
- 數(shù)亦有道:Python數(shù)據(jù)科學(xué)指南
- 基于OPAC日志的高校圖書館用戶信息需求與檢索行為研究
- 深入淺出 Hyperscan:高性能正則表達式算法原理與設(shè)計
- 云數(shù)據(jù)中心網(wǎng)絡(luò)與SDN:技術(shù)架構(gòu)與實現(xiàn)
- Hadoop大數(shù)據(jù)開發(fā)案例教程與項目實戰(zhàn)(在線實驗+在線自測)
- 一本書講透Elasticsearch:原理、進階與工程實踐