- Mastering Java for Data Science
- Alexey Grigorev
- 358字
- 2021-07-02 23:44:31
Machine learning
Machine learning is a part of computer science, and it is at the core of data science. The data itself, especially in big volumes, is hardly useful, but inside it hides highly valuable patterns. With the help of machine learning, we can recognize these hidden patterns, extract them, and then apply the learned information to the new unseen items.
For example, given the image of an animal, a machine learning algorithm can say whether the picture is a dog or a cat; or, given the history of a bank client, it will say how likely the client is to default, that is, to fail to pay the debt.
Often, machine learning models are seen as black boxes that take in a data point and output a prediction for it. In this book, we will look at what is inside these black boxes and see how and when it is best to use them.
The typical problems that machine learning solves can be categorized in the following groups:
- Supervised learning: For each data point, we have a label--extra information that describes the outcome that we want to learn. In the cats versus dogs case, the data point is an image of the animal; the label describes whether it's a dog or a cat.
- Unsupervised learning: We only have raw data points and no label information is available. For example, we have a collection of e-mails and we would like to group them based on how similar they are. There is no explicit label associated with the e-mails, which makes this problem unsupervised.
- Semi-supervised learning: Labels are given only for a part of the data.
- Reinforcement learning: Instead of labels, we have a reward; something the model gets by interacting with the environment it runs in. Based on the reward, it can adapt and maximize it. For example, a model that learns how to play chess gets a positive reward each time it eats a figure of the opponent, and gets a negative reward each time it loses a figure; and the reward is proportional to the value of the figure.
- 有趣的二進(jìn)制:軟件安全與逆向分析
- 數(shù)據(jù)庫基礎(chǔ)與應(yīng)用:Access 2010
- 算法競(jìng)賽入門經(jīng)典:習(xí)題與解答
- SQL Server 2008數(shù)據(jù)庫應(yīng)用技術(shù)(第二版)
- InfluxDB原理與實(shí)戰(zhàn)
- 大數(shù)據(jù)可視化
- Live Longer with AI
- 大數(shù)據(jù)時(shí)代下的智能轉(zhuǎn)型進(jìn)程精選(套裝共10冊(cè))
- 數(shù)據(jù)革命:大數(shù)據(jù)價(jià)值實(shí)現(xiàn)方法、技術(shù)與案例
- Hadoop 3.x大數(shù)據(jù)開發(fā)實(shí)戰(zhàn)
- OracleDBA實(shí)戰(zhàn)攻略:運(yùn)維管理、診斷優(yōu)化、高可用與最佳實(shí)踐
- Spark大數(shù)據(jù)分析實(shí)戰(zhàn)
- 數(shù)據(jù)庫技術(shù)實(shí)用教程
- 數(shù)據(jù)庫設(shè)計(jì)與應(yīng)用(SQL Server 2014)(第二版)
- 視覺大數(shù)據(jù)智能分析算法實(shí)戰(zhàn)