- Hadoop Real-World Solutions Cookbook(Second Edition)
- Tanmay Deshpande
- 478字
- 2021-07-09 20:02:50
Enabling transparent encryption for HDFS
When handling sensitive data, it is always important to consider the security measures. Hadoop allows us to encrypt sensitive data that's present in HDFS. In this recipe, we are going to see how to encrypt data in HDFS.
Getting ready
To perform this recipe, you should already have a running Hadoop cluster.
How to do it...
For many applications that hold sensitive data, it is very important to adhere to standards such as PCI, HIPPA, FISMA, and so on. To enable this, HDFS provides a utility called encryption zone in which we can create a directory so that data is encrypted on writes and decrypted on read.
To use this encryption facility, we first need to enable Hadoop Key Management Server (KMS):
/usr/local/hadoop/sbin/kms.sh start
This would start KMS in the Tomcat web server.
Next, we need to append the following properties in core-site.xml
and hdfs-site.xml
.
In core-site.xml
, add the following property:
<property> <name>hadoop.security.key.provider.path</name> <value>kms://http@localhost:16000/kms</value> </property>
In hds-site.xml
, add the following property:
<property> <name>dfs.encryption.key.provider.uri</name> <value>kms://http@localhost:16000/kms</value> </property>
Restart the HDFS daemons:
/usr/local/hadoop/sbin/stop-dfs.sh /usr/local/hadoop/sbin/start-dfs.sh
Now, we are all set to use KMS. Next, we need to create a key that will be used for the encryption:
hadoop key create mykey
This will create a key, and then, save it on KMS. Next, we have to create an encryption zone, which is a directory in HDFS where all the encrypted data is saved:
hadoop fs -mkdir /zone hdfs crypto -createZone -keyName mykey -path /zone
We will change the ownership to the current user:
hadoop fs -chown ubuntu:ubuntu /zone
If we put any file into this directory, it will encrypt and would decrypt at the time of reading:
hadoop fs -put myfile /zone hadoop fs -cat /zone/myfile
How it works...
There can be various types of encryptions one can do in order to comply with security standards, for example, application-level encryption, database-level, file-level, and disk-level encryption.
The HDFS transparent encryption sits between the database and file-level encryptions. KMS acts like a proxy between HDFS clients and HDFS's encryption provider via HTTP REST APIs. There are two types of keys used for encryption: Encryption Zone Key( EZK) and Data Encryption Key (DEK). EZK is used to encrypt DEK, which is also called Encrypted Data Encryption Key(EDEK). This is then saved on NameNode
.
When a file needs to be written to the HDFS encryption zone, the client gets EDEK from NameNode
and EZK from KMS to form DEK, which is used to encrypt data and store it in HDFS (the encrypted zone).
When an encrypted file needs to be read, the client needs DEK, which is formed by combining EZK and EDEK. These are obtained from KMS and NameNode
, respectively. Thus, encryption and decryption is automatically handled by HDFS. and the end user does not need to worry about executing this on their own.
Note
You can read more on this topic at http://blog.cloudera.com/blog/2015/01/new-in-cdh-5-3-transparent-encryption-in-hdfs/.
- Dreamweaver CS3網(wǎng)頁設(shè)計與網(wǎng)站建設(shè)詳解
- 模型制作
- 21天學(xué)通Visual Basic
- Windows 7寶典
- 變頻器、軟啟動器及PLC實用技術(shù)260問
- Apache Superset Quick Start Guide
- 愛犯錯的智能體
- Hadoop應(yīng)用開發(fā)基礎(chǔ)
- 人工智能:語言智能處理
- 生物3D打印:從醫(yī)療輔具制造到細(xì)胞打印
- Kubernetes on AWS
- PostgreSQL 10 High Performance
- Oracle 11g基礎(chǔ)與提高
- 百度智能小程序:AI賦能新機(jī)遇
- Modern Big Data Processing with Hadoop