官术网_书友最值得收藏!

The big problem

Hadoop is a distributed file system and a distributed framework designed to compute large chunks of data. It is relatively easy to get data into Hadoop. There are plenty of tools for getting data into different formats, such as Apache Phoenix. However it is actually extremely difficult to get value out of the data you put into Hadoop.

Let's look at the path from data to value. First we have to start with collecting data. Then we also spend a lot of time preparing it, making sure that this data is available for analysis, and being able to question the data. This process is as follows:

Unfortunately, you may not have asked the right questions or the answers are not clear, and you have to repeat this cycle. Maybe you have transformed and formatted your data. In other words, it is a long and challenging process.

What you actually want is to collect the data and spend some time preparing it; then you can ask questions and get answers repetitively. Now, you can spend a lot of time asking multiple questions. In addition, you can iterate with data on those questions to refine the answers that you are looking for. Let's look at the following diagram, in order to find a new approach:

主站蜘蛛池模板: 镇远县| 寿宁县| 仙游县| 阿克| 册亨县| 山东省| 长汀县| 金坛市| 新平| 陆河县| 平乡县| 墨玉县| 田东县| 常州市| 十堰市| 彩票| 平度市| 田阳县| 贡嘎县| 海盐县| 施秉县| 汉中市| 光泽县| 营山县| 宣化县| 沧源| 金山区| 华宁县| 威宁| 青阳县| 瑞昌市| 巴彦县| 肥东县| 雷州市| 广宁县| 团风县| 普宁市| 青铜峡市| 武邑县| 中西区| 盐边县|