官术网_书友最值得收藏!

Enabling transparent encryption for HDFS

When handling sensitive data, it is always important to consider the security measures. Hadoop allows us to encrypt sensitive data that's present in HDFS. In this recipe, we are going to see how to encrypt data in HDFS.

Getting ready

To perform this recipe, you should already have a running Hadoop cluster.

How to do it...

For many applications that hold sensitive data, it is very important to adhere to standards such as PCI, HIPPA, FISMA, and so on. To enable this, HDFS provides a utility called encryption zone in which we can create a directory so that data is encrypted on writes and decrypted on read.

To use this encryption facility, we first need to enable Hadoop Key Management Server (KMS):

/usr/local/hadoop/sbin/kms.sh start

This would start KMS in the Tomcat web server.

Next, we need to append the following properties in core-site.xml and hdfs-site.xml.

In core-site.xml, add the following property:

<property>
    <name>hadoop.security.key.provider.path</name>
    <value>kms://http@localhost:16000/kms</value>
</property>

In hds-site.xml, add the following property:

<property>
    <name>dfs.encryption.key.provider.uri</name>
    <value>kms://http@localhost:16000/kms</value>
</property>

Restart the HDFS daemons:

/usr/local/hadoop/sbin/stop-dfs.sh
/usr/local/hadoop/sbin/start-dfs.sh

Now, we are all set to use KMS. Next, we need to create a key that will be used for the encryption:

hadoop key create mykey

This will create a key, and then, save it on KMS. Next, we have to create an encryption zone, which is a directory in HDFS where all the encrypted data is saved:

hadoop fs -mkdir /zone
hdfs crypto -createZone -keyName mykey -path /zone

We will change the ownership to the current user:

hadoop fs -chown ubuntu:ubuntu /zone

If we put any file into this directory, it will encrypt and would decrypt at the time of reading:

hadoop fs -put myfile /zone
hadoop fs -cat /zone/myfile

How it works...

There can be various types of encryptions one can do in order to comply with security standards, for example, application-level encryption, database-level, file-level, and disk-level encryption.

The HDFS transparent encryption sits between the database and file-level encryptions. KMS acts like a proxy between HDFS clients and HDFS's encryption provider via HTTP REST APIs. There are two types of keys used for encryption: Encryption Zone Key( EZK) and Data Encryption Key (DEK). EZK is used to encrypt DEK, which is also called Encrypted Data Encryption Key(EDEK). This is then saved on NameNode.

When a file needs to be written to the HDFS encryption zone, the client gets EDEK from NameNode and EZK from KMS to form DEK, which is used to encrypt data and store it in HDFS (the encrypted zone).

When an encrypted file needs to be read, the client needs DEK, which is formed by combining EZK and EDEK. These are obtained from KMS and NameNode, respectively. Thus, encryption and decryption is automatically handled by HDFS. and the end user does not need to worry about executing this on their own.

Note

You can read more on this topic at http://blog.cloudera.com/blog/2015/01/new-in-cdh-5-3-transparent-encryption-in-hdfs/.

主站蜘蛛池模板: 天等县| 凌海市| 虹口区| 离岛区| 芜湖县| 金阳县| 邵东县| 青海省| 黄龙县| 乡城县| 乐昌市| 博罗县| 京山县| 哈巴河县| 贡嘎县| 上饶县| 太和县| 石门县| 崇仁县| 淮安市| 墨脱县| 绥德县| 察雅县| 洱源县| 平遥县| 英山县| 衡山县| 威远县| 彭州市| 元谋县| 察哈| 通化县| 公安县| 弥勒县| 邵阳市| 吉水县| 麻栗坡县| 綦江县| 阳朔县| 宜宾市| 潼南县|