官术网_书友最值得收藏!

Data science

Data science is the discipline of extracting actionable knowledge from data of various forms. The name data science emerged quite recently--it was invented by DJ Patil and Jeff Hammerbacher and popularized in the article Data Scientist: The Sexiest Job of the 21st Century in 2012. But the discipline itself had existed before for quite a while and previously was known by other names such as data mining or predictive analytics. Data science, like its predecessors, is built on statistics and machine learning algorithms for knowledge extraction and model building.

The science part of the term data science is no coincidence--if we look up science, its definition can be summarized to systematic organization of knowledge in terms testable explanations and predictions. This is exactly what data scientists do, by extracting patterns from available data, they can make predictions about future unseen data, and they make sure the predictions are validated beforehand. 

Nowadays, data science is used across many fields, including (but not limited to):

  • Banking: Risk management (for example, credit scoring), fraud detection, trading
  • Insurance: Claims management (for example, accelerating claim approval), risk and losses estimation, also fraud detection
  • Health care: Predicting diseases (such as strokes, diabetes, cancer) and relapses
  • Retail and e-commerce: Market basket analysis (identifying product that go well together), recommendation engines, product categorization, and personalized searches

This book covers the following practical use cases:

  • Predicting whether an URL is likely to appear on the first page of a search engine
  • Predicting how fast an operation will be completed given the hardware specifications
  • Ranking text documents for a search engine
  • Checking whether there is a cat or a dog on a picture
  • Recommending friends in a social network
  • Processing large-scale textual data on a cluster of computers

In all these cases, we will use data science to learn from data and use the learned knowledge to solve a particular business problem.

We will also use a running example throughout the book, building a search engine. We will use it to illustrate many data science concepts such as, supervised machine learning, dimensionality reduction, text mining, and learning to rank models. 

主站蜘蛛池模板: 太仆寺旗| 正阳县| 芒康县| 成安县| 丰镇市| 固安县| 蒲城县| 莱州市| 普兰店市| 云林县| 榕江县| 南靖县| 陕西省| 遂昌县| 巴林左旗| 甘谷县| 平武县| 七台河市| 广宗县| 罗源县| 彭阳县| 北安市| 衡东县| 曲靖市| 当涂县| 定远县| 平陆县| 宜兴市| 修武县| 雷波县| 静宁县| 绵竹市| 揭西县| 潮州市| 通化市| 金溪县| 临泉县| 东城区| 塔城市| 九龙城区| 阿拉善左旗|