官术网_书友最值得收藏!

The data volume

A company can easily have thousands to millions of IoT devices with several sensors on each unit, each sensor reporting values on a regular basis. The inflow of data can grow quite large very quickly. Since IoT devices send data on an ongoing basis, the volume of data in total can increase much faster than many companies are used to.

To demonstrate how this can happen, imagine a company that manufactures small monitoring devices. It produces 12,000 devices a year, starting in 2010 when the product was launched. Each one is tested at the end of assembly and the values reported by the sensors on the device are kept for analysis for five years. The data growth looks like the following image:

A chart showing data storage needs for production snapshot of 200 KB and 1,000 units per month. Five years of production data is kept

Now, imagine the device also had internet connectivity to track sensor values, and each one remains connected for two years. Since the data inflow continues well after the devices are built, data growth is exponential until it stabilizes when older devices stop reporting values. This looks more like the blue area in the following chart:

Chart shows the addition of IoT data at 0.5 KB per message, 10 messages per day. Devices are connected for two years from production

In order to illustrate how large this can get, consider the following example. If you capture 10 messages per day and the message size is half of a full production snapshot, by 2017, data storage requirements would be over 1,500 times higher than production-only data.

For many companies, this introduces some problems. The database software, storage infrastructure, and available computing horsepower is not typically intended to handle this kind of growth. The licensing agreements with software vendors tends to be tied to the number of servers and CPU cores. Storage is handled by standard backup planning and retention policies.

The data volume rapidly leads to computing and storage requirements well beyond what can be held by a single server. It gets cost prohibitive very quickly under traditional architectures to distribute it across hundreds or thousands of servers. To do the best analytics, you need lots of historical data, and since you are unlikely to know ahead of time which data is most predictive, you have to keep as much as you can on hand.

With large-scale data, computing horsepower requirements for analytics are not very predictable and change dramatically depending on the question being asked. Analytic needs are very elastic. Traditional server planning ratchets up on premise resources with the anticipated number of servers needed to meet peak needs determined in advance. Doubling compute power in a short amount of time, if even possible, is very expensive.

IoT data volumes and computing resource requirements can quickly outpace all the other company data needs combined.

主站蜘蛛池模板: 龙山县| 尼勒克县| 湖南省| 青河县| 杭锦旗| 双江| 高邑县| 五家渠市| 托里县| 连州市| 沧州市| 阿尔山市| 喀什市| 枣庄市| 吐鲁番市| 磴口县| 新巴尔虎右旗| 徐闻县| 崇文区| 贵州省| 岱山县| 邵阳县| 方正县| 特克斯县| 柯坪县| 灵石县| 彭泽县| 城固县| 吉安县| 彰化县| 新民市| 盐池县| 武冈市| 梧州市| 察哈| 东山县| 江孜县| 榆社县| 嘉鱼县| 任丘市| 泸州市|