官术网_书友最值得收藏!

Summary

We finished covering the basics of interacting with data in different commonly used storage mechanisms from the simple ones, such as text files, over more structured ones, such as HDF5, to more sophisticated data storage systems, such as MongoDB and Redis. The most suitable type of storage will depend on your use case. The choice of the data storage layer technology plays an important role in the overall design of data processing systems. Sometimes, we need to combine various database systems to store our data, such as complexity of the data, performance of the system or computation requirements.

Practice exercises

  • Take a data set of your choice and design storage options for it. Consider text files, HDF5, a document database, and a data structure store as possible persistent options. Also evaluate how difficult (by some metric, for examples, how many lines of code) it would be to update or delete a specific item. Which storage type is the easiest to set up? Which storage type supports the most flexible queries?
  • In Chapter 3, Data Analysis with Pandas we saw that it is possible to create hierarchical indices with Pandas. As an example, assume that you have data on each city with more than 1 million inhabitants and that we have a two level index, so we can address inpidual cities, but also whole countries. How would you represent this hierarchical relationship with the various storage options presented in this chapter: text files, HDF5, MongoDB, and Redis? What do you believe would be most convenient to work with in the long run?
主站蜘蛛池模板: 阳朔县| 蕉岭县| 教育| 巴林右旗| 那曲县| 寻乌县| 阆中市| 贵港市| 金乡县| 临沂市| 星子县| 乌兰察布市| 项城市| 灵石县| 潜山县| 老河口市| 宁安市| 新绛县| 衡阳市| 大丰市| 托克逊县| 安丘市| 南雄市| 廊坊市| 丹寨县| 郸城县| 东阳市| 岗巴县| 务川| 西城区| 牙克石市| 通河县| 济阳县| 荃湾区| 彭阳县| 花莲县| 临汾市| 亳州市| 石狮市| 绥德县| 泰宁县|