官术网_书友最值得收藏!

Modelling and analysis

This part of the project might be the most creative one, since it includes numerous tasks, which have to be taken to deliver the final product. The list of tasks can be very long, and may include these:

  • Data mining
  • Text analytics
  • Model building
  • Feature engineering and extraction
  • Model testing

Microsoft SQL Server has tools built in, which can provide a delivery platform for most of the tasks. When we talk about data mining, there are several different methodologies or frameworks to follow, where so far the Cross Industry Standard Process for Data Mining (CRISP-DM) is the most frequently used one, based on several different methods of research regarding the methodology usage. In 2015, IBM released a new methodology called Analytics Solutions Unified Method for Data Mining/Predictive Analytics, which refined and extended CRISP-DM. CRISP-DM is an open-standard process model that describes common approaches used by data-mining experts, and it's still the most widely used analytics model. CRISP-DM breaks the process of data mining into six major phases. The sequence of the phases is not strict and moves back and forth between different phases, as it is always required. The arrows in the process diagram indicate the most important and frequent dependencies between phases. The outer circle in the diagram symbolizes the cyclic nature of data mining itself. A data-mining process continues after a solution has been deployed. The lessons learned during the process can trigger new, often more focused business questions, and subsequent data-mining processes will benefit from the experiences of the previous ones:

The purpose of data mining is to put structured and unstructured data in relation to each other so as to easily interface them and provide the workers in the sector with a system that is easy to use. The experts of each specified area of business will therefore have access to a complex data system that is able to process information at different levels. This has the advantage of bringing to light the relationships among data, predictive analysis, assessments for specific business decisions, and much more.

Data mining can be used for solving many business problems and to prepare the data for a more advanced approach, such as machine learning, which can be used for:

  • Searching for anomalies
  • Churn analysis
  • Customer segmentation
  • Forecasting
  • Market basket analysis
  • Network intrusion detection
  • Targeted advertisement
主站蜘蛛池模板: 静宁县| 哈巴河县| 唐山市| 东莞市| 镇江市| 扶余县| 长乐市| 安徽省| 上栗县| 华安县| 南通市| 普兰店市| 长宁区| 张北县| 噶尔县| 沂水县| 新乐市| 佛坪县| 遂平县| 开封市| 合江县| 宿迁市| 华蓥市| 宝清县| 筠连县| 白银市| 昭通市| 太湖县| 仁布县| 望江县| 新野县| 措勤县| 桓仁| 英山县| 乌兰县| 中卫市| 宁蒗| 承德市| 蓝田县| 布尔津县| 肇州县|