官术网_书友最值得收藏!

Developing a churn analytics pipeline

In ML, we observe an algorithm's performance in two stages: learning and inference. The ultimate target of the learning stage is to prepare and describe the available data, also called the feature vector, which is used to train the model.

The learning stage is one of the most important stages, but it is also truly time-consuming. It involves preparing a list of vectors, also called feature vectors (vectors of numbers representing the value of each feature), from the training data after transformation so that we can feed them to the learning algorithms. On the other hand, training data also sometimes contains impure information that needs some pre-processing, such as cleaning.

Once we have the feature vectors, the next step in this stage is preparing (or writing/reusing) the learning algorithm. The next important step is training the algorithm to prepare the predictive model. Typically, (and of course based on data size), running an algorithm may take hours (or even days) so that the features converge into a useful model, as shown in the following figure:

Figure 2: Learning and training a predictive model - it shows how to generate the feature vectors from the training data to train the learning algorithm that produces a predictive model

The second most important stage is the inference that is used for making an intelligent use of the model, such as predicting from the never-before-seen data, making recommendations, deducing future rules, and so on. Typically, it takes less time compared to the learning stage, and is sometimes even in real time. Thus, inferencing is all about testing the model against new (that is, unobserved) data and evaluating the performance of the model itself, as shown in the following figure:

Figure 3: Inferencing from an existing model towards predictive analytics (feature vectors are generated from unknown data for making predictions)

However, during the whole process and for making the predictive model a successful one, data acts as the first-class citizen in all ML tasks. Keeping all this in mind, the following figure shows an analytics pipeline that can be used by telecommunication companies:

Figure 4: Churn analytics pipeline

With this kind of analysis, telecom companies can discern how to predict and enhance the customer experience, which can, in turn, prevent churn and tailor marketing campaigns. In practice, often these business assessments are used in order to retain the customers most likely to leave, as opposed to those who are likely to stay.

Thus, we need to develop a predictive model so that it ensures that our model is sensitive to the Churn = True samples—that is, a binary classification problem. We will see more details in upcoming sections.

主站蜘蛛池模板: 西乡县| 安达市| 晋江市| 呼图壁县| 个旧市| 乡城县| 琼中| 年辖:市辖区| 华宁县| 峨边| 密山市| 开封县| 洛宁县| 丹东市| 滦南县| 新晃| 尤溪县| 黎城县| 彰化县| 富平县| 丹巴县| 肥东县| 三河市| 称多县| 巴南区| 任丘市| 曲麻莱县| 余庆县| 甘南县| 尼木县| 广汉市| 平昌县| 驻马店市| 洛宁县| 高邑县| 中方县| 行唐县| 安徽省| 广德县| 淮滨县| 阳信县|