- Hands-On Data Science with SQL Server 2017
- Marek Chmel Vladimír Mu?n?
- 398字
- 2021-06-10 19:13:52
Modelling and analysis
This part of the project might be the most creative one, since it includes numerous tasks, which have to be taken to deliver the final product. The list of tasks can be very long, and may include these:
- Data mining
- Text analytics
- Model building
- Feature engineering and extraction
- Model testing
Microsoft SQL Server has tools built in, which can provide a delivery platform for most of the tasks. When we talk about data mining, there are several different methodologies or frameworks to follow, where so far the Cross Industry Standard Process for Data Mining (CRISP-DM) is the most frequently used one, based on several different methods of research regarding the methodology usage. In 2015, IBM released a new methodology called Analytics Solutions Unified Method for Data Mining/Predictive Analytics, which refined and extended CRISP-DM. CRISP-DM is an open-standard process model that describes common approaches used by data-mining experts, and it's still the most widely used analytics model. CRISP-DM breaks the process of data mining into six major phases. The sequence of the phases is not strict and moves back and forth between different phases, as it is always required. The arrows in the process diagram indicate the most important and frequent dependencies between phases. The outer circle in the diagram symbolizes the cyclic nature of data mining itself. A data-mining process continues after a solution has been deployed. The lessons learned during the process can trigger new, often more focused business questions, and subsequent data-mining processes will benefit from the experiences of the previous ones:
The purpose of data mining is to put structured and unstructured data in relation to each other so as to easily interface them and provide the workers in the sector with a system that is easy to use. The experts of each specified area of business will therefore have access to a complex data system that is able to process information at different levels. This has the advantage of bringing to light the relationships among data, predictive analysis, assessments for specific business decisions, and much more.
Data mining can be used for solving many business problems and to prepare the data for a more advanced approach, such as machine learning, which can be used for:
- Searching for anomalies
- Churn analysis
- Customer segmentation
- Forecasting
- Market basket analysis
- Network intrusion detection
- Targeted advertisement