書名： Mastering Java for Data Science
作者名： Alexey Grigorev
本章字數： 359字
更新時間： 2021-07-02 23:44:31

Data science

Data science is the discipline of extracting actionable knowledge from data of various forms. The name data science emerged quite recently--it was invented by DJ Patil and Jeff Hammerbacher and popularized in the article Data Scientist: The Sexiest Job of the 21st Century in 2012. But the discipline itself had existed before for quite a while and previously was known by other names such as data mining or predictive analytics. Data science, like its predecessors, is built on statistics and machine learning algorithms for knowledge extraction and model building.

The science part of the term data science is no coincidence--if we look up science, its definition can be summarized to systematic organization of knowledge in terms testable explanations and predictions. This is exactly what data scientists do, by extracting patterns from available data, they can make predictions about future unseen data, and they make sure the predictions are validated beforehand.

Nowadays, data science is used across many fields, including (but not limited to):

Banking: Risk management (for example, credit scoring), fraud detection, trading
Insurance: Claims management (for example, accelerating claim approval), risk and losses estimation, also fraud detection
Health care: Predicting diseases (such as strokes, diabetes, cancer) and relapses
Retail and e-commerce: Market basket analysis (identifying product that go well together), recommendation engines, product categorization, and personalized searches

This book covers the following practical use cases:

Predicting whether an URL is likely to appear on the first page of a search engine
Predicting how fast an operation will be completed given the hardware specifications
Ranking text documents for a search engine
Checking whether there is a cat or a dog on a picture
Recommending friends in a social network
Processing large-scale textual data on a cluster of computers

In all these cases, we will use data science to learn from data and use the learned knowledge to solve a particular business problem.

We will also use a running example throughout the book, building a search engine. We will use it to illustrate many data science concepts such as, supervised machine learning, dimensionality reduction, text mining, and learning to rank models.

官术网_书友最值得收藏!

Mastering Java for Data Science

Data science