- Elasticsearch for Hadoop
- Vishal Shukla
- 224字
- 2021-07-09 21:34:30
Chapter 2. Getting Started with ES-Hadoop
Hadoop provides you with a batch-oriented distributed storage and a computing engine. Elasticsearch is a full-text search engine with rich aggregation capabilities. Getting the data from Hadoop to Elasticsearch can open doors to run some data discovery tools to find out interesting patterns and perform full-text search or geospatial analytics. ES-Hadoop is a library that bridges Hadoop with Elasticsearch. The goal of this book is to get you up-and-running with ES-Hadoop and enable you to solve real-world analytics problems.
Our goal in this chapter is to develop MapReduce jobs to write/read the data to/from Elasticsearch. You probably already know how to write basic MapReduce jobs using Hadoop that writes its output to HDFS. ES-Hadoop is a connector library that provides a dedicated InputFormat
and OutputFormat
that you can use to read/write data from/to Elasticsearch in Hadoop jobs. To take the first step in this direction, we will start with how to set up Hadoop, Elasticsearch, and the related toolsets, which you will use throughout the rest of the book.
We encourage you to try the examples in the book to speed up the learning process.
We will cover the following topics in this chapter:
- Understanding the
WordCount
program - Going real—network monitoring data
- Writing a network logs mapper job
- Getting data from Elasticsearch to HDFS
- Spring 5企業級開發實戰
- Debian 7:System Administration Best Practices
- Vue.js前端開發基礎與項目實戰
- Three.js開發指南:基于WebGL和HTML5在網頁上渲染3D圖形和動畫(原書第3版)
- D3.js By Example
- Building Serverless Architectures
- SQL Server 2008 R2數據庫技術及應用(第3版)
- Vue.js 3應用開發與核心源碼解析
- Windows Phone 8 Game Development
- Puppet 5 Beginner's Guide(Third Edition)
- Analytics for the Internet of Things(IoT)
- INSTANT Apache Maven Starter
- JavaWeb入門經典
- Neo4j High Performance
- Mastering Puppet(Second Edition)