- MySQL 8 for Big Data
- Shabbir Challawala Jaydip Lakhatariya Chintan Mehta Kandarp Patel
- 227字
- 2021-08-20 10:06:01
Store
In this section, we will discuss storing data that has been collected from various sources. Let's consider an example of crawling reviews of organizations for sentiment analysis, wherein each gathers data from different sites with each of them having data uniquely displayed.
Traditionally, data was processed using the ETL (Extract, Transform, and Load) procedure, which used to gather data from various sources, modify it according to the requirements, and upload it to the store for further processing or display. Tools that were every so often used for such scenarios were spreadsheets, relational databases, business intelligence tools, and so on, and sometimes manual effort was also a part of it.
The most common storage used in Big Data platform is HDFS. HDFS also provides HQL (Hive Query Language), which helps us do many analytical tasks that are traditionally done in business intelligence tools. A few other storage options that can be considered are Apache Spark, Redis, and MongoDB. Each storage option has their own way of working in the backend; however, most storage providers exposes SQL APIs which can be used to do further data analysis.
There might be a case where we need to gather real-time data and showcase in real time, which practically doesn't need the data to be stored for future purposes and can run real-time analytics to produce results based on the requests.
- Vue.js 3.x快速入門
- 深入理解Android(卷I)
- C# 7 and .NET Core Cookbook
- 玩轉Scratch少兒趣味編程
- 劍指Offer(專項突破版):數據結構與算法名企面試題精講
- C#程序設計(慕課版)
- C語言從入門到精通(第4版)
- Nginx Essentials
- Learning Python by Building Games
- Creating Stunning Dashboards with QlikView
- Java Web從入門到精通(第3版)
- QPanda量子計算編程
- 原型設計:打造成功產品的實用方法及實踐
- PHP動態網站開發實踐教程
- Elastix Unified Communications Server Cookbook