- Hadoop Real-World Solutions Cookbook(Second Edition)
- Tanmay Deshpande
- 363字
- 2021-07-09 20:02:50
Setting the HDFS block size for a specific file in a cluster
In this recipe, we are going to take a look at how to set the block size for a specific file only.
Getting ready
To perform this recipe, you should already have a running Hadoop cluster.
How to do it...
In the previous recipe, we learned how to change the block size at the cluster level. But this is not always required. HDFS provides us with the facility to set the block size for a single file as well. The following command copies a file called myfile
to HDFS, setting the block size to 1MB:
hadoop fs -Ddfs.block.size=1048576 -put /home/ubuntu/myfile /
Once the file is copied, you can verify whether the block size is set to 1MB and has been broken into exact chunks:
hdfs fsck -blocks /myfile Connecting to namenode via http://localhost:50070/fsck?ugi=ubuntu&blocks=1&path=%2Fmyfile FSCK started by ubuntu (auth:SIMPLE) from /127.0.0.1 for path /myfile at Thu Oct 29 14:58:00 UTC 2015 .Status: HEALTHY Total size: 17276808 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 17 (avg. block size 1016282 B) Minimally replicated blocks: 17 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Thu Oct 29 14:58:00 UTC 2015 in 2 milliseconds The filesystem under path '/myfile' is HEALTHY
How it works...
When we specify the block size at the time of copying a file, it overwrites the default block size and copies the file to HDFS by breaking the file into chunks of a given size. Generally, these modifications are made in order to perform other optimizations. Make sure you make these changes, and you are aware of their consequences. If the block size isn't adequate enough, it will increase the parallelization, but it will also increase the load on NameNode as it would have more entries in FSImage
. On the other hand, if the block size is too big, then it will reduce the parallelization and degrade the processing performance.
- Hands-On Graph Analytics with Neo4j
- 樂高機器人:WeDo編程與搭建指南
- ABB工業(yè)機器人編程全集
- TIBCO Spotfire:A Comprehensive Primer(Second Edition)
- Photoshop CS4經(jīng)典380例
- 程序設(shè)計語言與編譯
- Mastering D3.js
- UTM(統(tǒng)一威脅管理)技術(shù)概論
- 控制系統(tǒng)計算機仿真
- 基于32位ColdFire構(gòu)建嵌入式系統(tǒng)
- HTML5 Canvas Cookbook
- 一步步寫嵌入式操作系統(tǒng)
- 智能制造系統(tǒng)及關(guān)鍵使能技術(shù)
- 傳感器原理及實用技術(shù)
- 電機與電力拖動