- Mastering Java for Data Science
- Alexey Grigorev
- 359字
- 2021-07-02 23:44:31
Data science
Data science is the discipline of extracting actionable knowledge from data of various forms. The name data science emerged quite recently--it was invented by DJ Patil and Jeff Hammerbacher and popularized in the article Data Scientist: The Sexiest Job of the 21st Century in 2012. But the discipline itself had existed before for quite a while and previously was known by other names such as data mining or predictive analytics. Data science, like its predecessors, is built on statistics and machine learning algorithms for knowledge extraction and model building.
The science part of the term data science is no coincidence--if we look up science, its definition can be summarized to systematic organization of knowledge in terms testable explanations and predictions. This is exactly what data scientists do, by extracting patterns from available data, they can make predictions about future unseen data, and they make sure the predictions are validated beforehand.
Nowadays, data science is used across many fields, including (but not limited to):
- Banking: Risk management (for example, credit scoring), fraud detection, trading
- Insurance: Claims management (for example, accelerating claim approval), risk and losses estimation, also fraud detection
- Health care: Predicting diseases (such as strokes, diabetes, cancer) and relapses
- Retail and e-commerce: Market basket analysis (identifying product that go well together), recommendation engines, product categorization, and personalized searches
This book covers the following practical use cases:
- Predicting whether an URL is likely to appear on the first page of a search engine
- Predicting how fast an operation will be completed given the hardware specifications
- Ranking text documents for a search engine
- Checking whether there is a cat or a dog on a picture
- Recommending friends in a social network
- Processing large-scale textual data on a cluster of computers
In all these cases, we will use data science to learn from data and use the learned knowledge to solve a particular business problem.
We will also use a running example throughout the book, building a search engine. We will use it to illustrate many data science concepts such as, supervised machine learning, dimensionality reduction, text mining, and learning to rank models.
- 數(shù)據(jù)要素安全流通
- 數(shù)據(jù)庫(kù)系統(tǒng)原理及應(yīng)用教程(第4版)
- 大話Oracle Grid:云時(shí)代的RAC
- OracleDBA實(shí)戰(zhàn)攻略:運(yùn)維管理、診斷優(yōu)化、高可用與最佳實(shí)踐
- Python金融實(shí)戰(zhàn)
- 區(qū)域云計(jì)算和大數(shù)據(jù)產(chǎn)業(yè)發(fā)展:浙江樣板
- Hadoop 3實(shí)戰(zhàn)指南
- Mastering ROS for Robotics Programming(Second Edition)
- 大數(shù)據(jù)時(shí)代系列(套裝9冊(cè))
- 大數(shù)據(jù)測(cè)試技術(shù):數(shù)據(jù)采集、分析與測(cè)試實(shí)踐(在線實(shí)驗(yàn)+在線自測(cè))
- Kubernetes快速進(jìn)階與實(shí)戰(zhàn)
- 數(shù)據(jù)迷霧:洞察數(shù)據(jù)的價(jià)值與內(nèi)涵
- 數(shù)據(jù)庫(kù)技術(shù)與應(yīng)用:SQL Server 2008
- 實(shí)用預(yù)測(cè)分析
- Flume日志收集與MapReduce模式