官术网_书友最值得收藏!

Enabling transparent encryption for HDFS

When handling sensitive data, it is always important to consider the security measures. Hadoop allows us to encrypt sensitive data that's present in HDFS. In this recipe, we are going to see how to encrypt data in HDFS.

Getting ready

To perform this recipe, you should already have a running Hadoop cluster.

How to do it...

For many applications that hold sensitive data, it is very important to adhere to standards such as PCI, HIPPA, FISMA, and so on. To enable this, HDFS provides a utility called encryption zone in which we can create a directory so that data is encrypted on writes and decrypted on read.

To use this encryption facility, we first need to enable Hadoop Key Management Server (KMS):

/usr/local/hadoop/sbin/kms.sh start

This would start KMS in the Tomcat web server.

Next, we need to append the following properties in core-site.xml and hdfs-site.xml.

In core-site.xml, add the following property:

<property>
    <name>hadoop.security.key.provider.path</name>
    <value>kms://http@localhost:16000/kms</value>
</property>

In hds-site.xml, add the following property:

<property>
    <name>dfs.encryption.key.provider.uri</name>
    <value>kms://http@localhost:16000/kms</value>
</property>

Restart the HDFS daemons:

/usr/local/hadoop/sbin/stop-dfs.sh
/usr/local/hadoop/sbin/start-dfs.sh

Now, we are all set to use KMS. Next, we need to create a key that will be used for the encryption:

hadoop key create mykey

This will create a key, and then, save it on KMS. Next, we have to create an encryption zone, which is a directory in HDFS where all the encrypted data is saved:

hadoop fs -mkdir /zone
hdfs crypto -createZone -keyName mykey -path /zone

We will change the ownership to the current user:

hadoop fs -chown ubuntu:ubuntu /zone

If we put any file into this directory, it will encrypt and would decrypt at the time of reading:

hadoop fs -put myfile /zone
hadoop fs -cat /zone/myfile

How it works...

There can be various types of encryptions one can do in order to comply with security standards, for example, application-level encryption, database-level, file-level, and disk-level encryption.

The HDFS transparent encryption sits between the database and file-level encryptions. KMS acts like a proxy between HDFS clients and HDFS's encryption provider via HTTP REST APIs. There are two types of keys used for encryption: Encryption Zone Key( EZK) and Data Encryption Key (DEK). EZK is used to encrypt DEK, which is also called Encrypted Data Encryption Key(EDEK). This is then saved on NameNode.

When a file needs to be written to the HDFS encryption zone, the client gets EDEK from NameNode and EZK from KMS to form DEK, which is used to encrypt data and store it in HDFS (the encrypted zone).

When an encrypted file needs to be read, the client needs DEK, which is formed by combining EZK and EDEK. These are obtained from KMS and NameNode, respectively. Thus, encryption and decryption is automatically handled by HDFS. and the end user does not need to worry about executing this on their own.

Note

You can read more on this topic at http://blog.cloudera.com/blog/2015/01/new-in-cdh-5-3-transparent-encryption-in-hdfs/.

主站蜘蛛池模板: 乳山市| 湄潭县| 新乐市| 重庆市| 股票| 六安市| 黎平县| 香格里拉县| 沁阳市| 水富县| 武清区| 固始县| 洪江市| 浦县| 浦县| 法库县| 吉林市| 阳信县| 沭阳县| 望奎县| 乐业县| 许昌县| 蛟河市| 于田县| 宝坻区| 镇安县| 习水县| 兖州市| 利辛县| 滨海县| 邛崃市| 磴口县| 凭祥市| 鸡西市| 同心县| 石首市| 德钦县| 玉田县| 凌云县| 乃东县| 介休市|