- Apache Hadoop 3 Quick Start Guide
- Hrishikesh Vijay Karambelkar
- 118字
- 2021-06-10 19:18:44
Initial load of data
The initial load of data is driven by existing content that migrates on Hadoop. The initial load can be calculated from the existing landscape. For example, if there are three applications holding different types of data (structured and unstructured), the initial storage estimation will be calculated based on the existing data size. However, the data size will change based on the Hadoop component. So, if you are moving tables from RDBMS to Hive, you need to look at the size of each table as well as the table data types to compute the size accordingly. This is instead of looking at DB files for sizing. Note that Hive data sizes are available here.