舉報

會員
Apache Hadoop 3 Quick Start Guide
ApacheHadoopisawidelyuseddistributeddataplatform.Itenableslargedatasetstobeefficientlyprocessedinsteadofusingonelargecomputertostoreandprocessthedata.ThisbookwillgetyoustartedwiththeHadoopecosystem,andintroduceyoutothemaintechnicaltopics,includingMapReduce,YARN,andHDFS.ThebookbeginswithanoverviewofbigdataandApacheHadoop.Then,youwillsetupapseudoHadoopdevelopmentenvironmentandamulti-nodeenterpriseHadoopcluster.Youwillseehowtheparallelprogrammingparadigm,suchasMapReduce,cansolvemanycomplexdataprocessingproblems.Thebookalsocoverstheimportantaspectsofthebigdatasoftwaredevelopmentlifecycle,includingqualityassuranceandcontrol,performance,administration,andmonitoring.YouwillthenlearnabouttheHadoopecosystem,andtoolssuchasKafka,Sqoop,Flume,Pig,Hive,andHBase.Finally,youwilllookatadvancedtopics,includingrealtimestreamingusingApacheStorm,anddataanalyticsusingApacheSpark.Bytheendofthebook,youwillbewellversedwithdifferentconfigurationsoftheHadoop3cluster.
目錄(176章)
倒序
- coverpage
- Title Page
- Dedication
- Packt Upsell
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Code in action
- Conventions used
- Get in touch
- Reviews
- Hadoop 3.0 - Background and Introduction
- How it all started
- What Hadoop is and why it is important
- How Apache Hadoop works
- Resource Manager
- Node Manager
- YARN Timeline Service version 2
- NameNode
- DataNode
- Hadoop 3.0 releases and new features
- Choosing the right Hadoop distribution
- Cloudera Hadoop distribution
- Hortonworks Hadoop distribution
- MapR Hadoop distribution
- Summary
- Planning and Setting Up Hadoop Clusters
- Technical requirements
- Prerequisites for Hadoop setup
- Preparing hardware for Hadoop
- Readying your system
- Installing the prerequisites
- Working across nodes without passwords (SSH in keyless)
- Downloading Hadoop
- Running Hadoop in standalone mode
- Setting up a pseudo Hadoop cluster
- Planning and sizing clusters
- Initial load of data
- Organizational data growth
- Workload and computational requirements
- High availability and fault tolerance
- Velocity of data and other factors
- Setting up Hadoop in cluster mode
- Installing and configuring HDFS in cluster mode
- Setting up YARN in cluster mode
- Diagnosing the Hadoop cluster
- Working with log files
- Cluster debugging and tuning tools
- JPS (Java Virtual Machine Process Status)
- JStack
- Summary
- Deep Dive into the Hadoop Distributed File System
- Technical requirements
- How HDFS works
- Key features of HDFS
- Achieving multi tenancy in HDFS
- Snapshots of HDFS
- Safe mode
- Hot swapping
- Federation
- Intra-DataNode balancer
- Data flow patterns of HDFS
- HDFS as primary storage with cache
- HDFS as archival storage
- HDFS as historical storage
- HDFS as a backbone
- HDFS configuration files
- Hadoop filesystem CLIs
- Working with HDFS user commands
- Working with Hadoop shell commands
- Working with data structures in HDFS
- Understanding SequenceFile
- MapFile and its variants
- Summary
- Developing MapReduce Applications
- Technical requirements
- How MapReduce works
- What is MapReduce?
- An example of MapReduce
- Configuring a MapReduce environment
- Working with mapred-site.xml
- Working with Job history server
- RESTful APIs for Job history server
- Understanding Hadoop APIs and packages
- Setting up a MapReduce project
- Setting up an Eclipse project
- Deep diving into MapReduce APIs
- Configuring MapReduce jobs
- Understanding input formats
- Understanding output formats
- Working with Mapper APIs
- Working with the Reducer API
- Compiling and running MapReduce jobs
- Triggering the job remotely
- Using Tool and ToolRunner
- Unit testing of MapReduce jobs
- Failure handling in MapReduce
- Streaming in MapReduce programming
- Summary
- Building Rich YARN Applications
- Technical requirements
- Understanding YARN architecture
- Key features of YARN
- Resource models in YARN
- YARN federation
- RESTful APIs
- Configuring the YARN environment in a cluster
- Working with YARN distributed CLI
- Deep dive with YARN application framework
- Setting up YARN projects
- Writing your YARN application with YarnClient
- Writing a custom application master
- Building and monitoring a YARN application on a cluster
- Building a YARN application
- Monitoring your application
- Summary
- Monitoring and Administration of a Hadoop Cluster
- Roles and responsibilities of Hadoop administrators
- Planning your distributed cluster
- Hadoop applications ports and URLs
- Resource management in Hadoop
- Fair Scheduler
- Capacity Scheduler
- High availability of Hadoop
- High availability for NameNode
- High availability for Resource Manager
- Securing Hadoop clusters
- Securing your Hadoop application
- Securing your data in HDFS
- Performing routine tasks
- Working with safe mode
- Archiving in Hadoop
- Commissioning and decommissioning of nodes
- Working with Hadoop Metric
- Summary
- Demystifying Hadoop Ecosystem Components
- Technical requirements
- Understanding Hadoop's Ecosystem
- Working with Apache Kafka
- Writing Apache Pig scripts
- Pig Latin
- User-defined functions (UDFs)
- Transferring data with Sqoop
- Writing Flume jobs
- Understanding Hive
- Interacting with Hive – CLI beeline and web interface
- Hive as a transactional system
- Using HBase for NoSQL storage
- Summary
- Advanced Topics in Apache Hadoop
- Technical requirements
- Hadoop use cases in industries
- Healthcare
- Oil and Gas
- Finance
- Government Institutions
- Telecommunications
- Retail
- Insurance
- Advanced Hadoop data storage file formats
- Parquet
- Apache ORC
- Avro
- Real-time streaming with Apache Storm
- Data analytics with Apache Spark
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-10 19:19:10
推薦閱讀
- 三菱FX3U/5U PLC從入門到精通
- 機器學(xué)習(xí)及應(yīng)用(在線實驗+在線自測)
- Practical Data Wrangling
- 21天學(xué)通ASP.NET
- 西門子S7-200 SMART PLC實例指導(dǎo)學(xué)與用
- 基于多目標(biāo)決策的數(shù)據(jù)挖掘方法評估與應(yīng)用
- 完全掌握AutoCAD 2008中文版:綜合篇
- 統(tǒng)計學(xué)習(xí)理論與方法:R語言版
- CompTIA Linux+ Certification Guide
- 控制系統(tǒng)計算機仿真
- R Data Analysis Projects
- 運動控制系統(tǒng)(第2版)
- ARM體系結(jié)構(gòu)與編程
- 人工智能:重塑個人、商業(yè)與社會
- Microsoft Power BI Complete Reference
- TensorFlow 2.0卷積神經(jīng)網(wǎng)絡(luò)實戰(zhàn)
- 淘寶網(wǎng)店頁面設(shè)計、布局、配色、裝修一本通
- Android High Performance Programming
- Robust Cloud Integration with Azure
- KUKA工業(yè)機器人與西門子S7-1200 PLC技術(shù)及應(yīng)用
- 軸向磁場無刷同步電機理論與設(shè)計
- 名家傳道:數(shù)碼攝影后期處理秘笈
- Learning Responsive Data Visualization
- 51單片機C語言應(yīng)用開發(fā)三位一體實戰(zhàn)精講
- Information Security Handbook
- 數(shù)據(jù)處理與深度學(xué)習(xí)
- Excel 2007公式、函數(shù)與圖表應(yīng)用
- 工業(yè)機器人編程高手教程
- Photoshop CS3中文版圖像處理與平面設(shè)計精彩百練
- 程序員羊皮卷