官术网_书友最值得收藏!

Analytics challenges

Analytics often requires deciding on whether to fill in or ignore the missing values. Either choice may lead to a dataset that is not a representative of reality.

As an example of how this can affect results, consider the case of inaccurate political poll results in recent years. Many experts believe it is now in near crisis due to the shift of much of the world to mobile numbers as their only phone number. For pollsters, it is cheaper and easier to reach people on landline numbers. This can lead to the over representation of people with landlines. These people tend to be both older and wealthier than mobile-only respondents.

The response rate has also dropped from near 80% in the 1970s to about 8% (if you are lucky) today. This makes it more difficult (and expensive) to obtain a representative sample leading to many embarrassingly wrong poll predictions.

There can also be outside influences, such as environment conditions, that are not captured in the data. Winter storms can lead to power failures affecting devices that are able to report back data. You may end up drawing conclusions based on a non-representative sample of data without realizing it. This can affect the results of IoT analytics – and it will not be clear why.

Since connectivity is a new thing for many devices, there is also often a lack of historical data to base predictive models on. This can limit the type of analytics that can be done with the data.

It can also lead to a recency bias in datasets, as newer products are over represented in the data simply because a higher percentage are now a part of the IoT.

This leads us to the author's number one rule in IoT analytics:

Never trust data you don't know.

Treat it like a stranger offering you candy.

主站蜘蛛池模板: 常宁市| 那坡县| 湘潭县| 多伦县| 宁津县| 雅安市| 九台市| 龙里县| 轮台县| 加查县| 宜宾县| 大同市| 牙克石市| 汤原县| 深州市| 喜德县| 德阳市| 平乐县| 鄯善县| 自贡市| 奎屯市| 工布江达县| 特克斯县| 肃北| 鹿泉市| 海原县| 金昌市| 子长县| 德令哈市| 抚远县| 玛曲县| 沭阳县| 津南区| 景谷| 寻乌县| 虹口区| 洱源县| 丰台区| 土默特左旗| 中超| 大渡口区|