官术网_书友最值得收藏!

Hive overview

Hive is a standard for SQL queries over petabytes of data in Hadoop. It provides SQL-like access for data in HDFS making Hadoop to be used like a warehouse structure. The Hive Query Language (HQL) has similar semantics and functions as standard SQL in the relational database so that experienced database analysts can easily get their hands on it. Hive's query language can run on different computing frameworks, such as MapReduce, Tez, and Spark for better performance.

Hive's data model provides a high-level, table-like structure on top of HDFS. It supports three data structures: tables, partitions, and buckets, where tables correspond to HDFS directories and can be divided into partitions, which in turn can be divided into buckets. Hive supports a majority of primitive data formats such as TIMESTAMP, STRING, FLOAT, BOOLEAN, DECIMAL, DOUBLE, INT, SMALLINT, BIGINT, and complex data types, such as UNION, STRUCT, MAP, and ARRAY.

The following diagram is the architecture seen inside the view of Hive in the Hadoop ecosystem. The Hive metadata store (or called metastore) can use either embedded, local, or remote databases. Hive servers are built on Apache Thrift Server technology. Since Hive has released 0.11, Hive Server 2 is available to handle multiple concurrent clients, which support Kerberos, LDAP, and custom pluggable authentication, providing better options for JDBC and ODBC clients, especially for metadata access.

Hive overview

Hive architecture

Here are some highlights of Hive that we can keep in mind moving forward:

  • Hive provides a simpler query model with less coding than MapReduce
  • HQL and SQL have similar syntax
  • Hive provides lots of functions that lead to easier analytics usage
  • The response time is typically much faster than other types of queries on the same type of huge datasets
  • Hive supports running on different computing frameworks
  • Hive supports ad hoc querying data on HDFS
  • Hive supports user-defined functions, scripts, and a customized I/O format to extend its functionality
  • Hive is scalable and extensible to various types of data and bigger datasets
  • Matured JDBC and ODBC drivers allow many applications to pull Hive data for seamless reporting
  • Hive allows users to read data in arbitrary formats, using SerDes and Input/Output formats
  • Hive has a well-defined architecture for metadata management, authentication, and query optimizations
  • There is a big community of practitioners and developers working on and using Hive
主站蜘蛛池模板: 兴和县| 黄山市| 邛崃市| 赤水市| 福海县| 河间市| 桐柏县| 西昌市| 长治市| 东丰县| 长海县| 瑞昌市| 屏南县| 咸宁市| 太湖县| 大英县| 瑞安市| 湟源县| 黔江区| 乌拉特中旗| 阿鲁科尔沁旗| 乌拉特中旗| 额敏县| 济阳县| 北川| 华池县| 安陆市| 泊头市| 瑞昌市| 贵港市| 金寨县| 四会市| 天津市| 唐河县| 屏边| 永城市| 乐至县| 太保市| 北流市| 什邡市| 喀喇|