舉報

會員
Hadoop Beginner's Guide
最新章節:
Index
AsaPacktBeginner'sGuide,thebookispackedwithclearstep-by-stepinstructionsforperformingthemostusefultasks,gettingyouupandrunningquickly,andlearningbydoing.ThisbookassumesnoexistingexperiencewithHadooporcloudservices.ItassumesyouhavefamiliaritywithaprogramminglanguagesuchasJavaorRubybutgivesyoutheneededbackgroundontheothertopics.
最新章節
- Index
- Chapter 7 Keeping Things Running
- Chapter 3 Understanding MapReduce
- Appendix A. Pop Quiz Answers
- Summary
- Sources of information
品牌:中圖公司
上架時間:2021-07-29 16:47:12
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Index 更新時間:2021-07-29 16:52:10
- Chapter 7 Keeping Things Running
- Chapter 3 Understanding MapReduce
- Appendix A. Pop Quiz Answers
- Summary
- Sources of information
- AWS resources
- Other programming abstractions
- Other Apache projects
- Alternative distributions
- Upcoming Hadoop changes
- What we did and didn't cover in this book
- Chapter 11. Where to Go Next
- Summary
- The bigger picture
- Time for action – writing to multiple sinks
- Time for action – multi level Flume networks
- Time for action – adding timestamps
- Time for action – writing network traffic onto HDFS
- Time for action – capturing a remote file in a local flat file
- Time for action – capturing the output of a command to a flat file
- Time for action – logging to the console
- Time for action – capturing network traffic in a log file
- Time for action – installing and configuring Flume
- Introducing Apache Flume
- Time for action – getting web server data into Hadoop
- Data data everywhere...
- A note about AWS
- Chapter 10. Data Collection with Flume
- Summary
- AWS considerations
- Time for action – fixing the mapping and re-running the export
- Time for action – importing Hive data into MySQL
- Time for action – importing data from Hadoop into MySQL
- Getting data out of Hadoop
- Time for action – importing data from a raw query
- Time for action – using a type mapping
- Time for action – a more selective import
- Time for action – exporting data from MySQL into Hive
- Time for action – exporting data from MySQL to HDFS
- Time for action – downloading and configuring Sqoop
- Getting data into Hadoop
- Time for action – setting up the employee database
- Time for action – configuring MySQL to allow remote connections
- Time for action – installing and setting up MySQL
- Setting up MySQL
- Common data paths
- Chapter 9. Working with Relational Databases
- Summary
- Time for action – running UFO analysis on EMR
- Hive on Amazon Web Services
- Time for action – adding a new User Defined Function (UDF)
- Time for action – making a partitioned UFO sighting table
- Time for action – exporting query output
- Time for action – using views
- Time for action – performing a join
- Time for action – creating a table from an existing file
- Time for action – redefining the table with the correct column separator
- Time for action – validating the table
- Time for action – inserting the UFO data
- Time for action – creating a table for the UFO data
- Using Hive
- Time for action – installing Hive
- Setting up Hive
- Overview of Hive
- Chapter 8. A Relational View on Data with Hive
- Summary
- Scaling
- Time for action – changing job priorities and killing a job
- MapReduce management
- Managing HDFS
- Time for action – swapping to a new NameNode host
- Time for action – adding an additional fsimage location
- Managing the NameNode
- Time for action – demonstrating the default security
- Cluster access control
- Time for action – adding a rack awareness script
- Time for action – examining the default rack configuration
- Setting up a cluster
- Time for action – browsing default properties
- Hadoop configuration properties
- A note on EMR
- Chapter 7. Keeping Things Running
- Summary
- Time for action – handling dirty data by using skip mode
- Time for action – causing task failure
- Time for action – killing the NameNode process
- Time for action – killing the JobTracker
- Time for action – killing a TaskTracker process
- Time for action – intentionally causing missing blocks
- Time for action – the replication factor in action
- Time for action – killing a DataNode process
- Failure
- Chapter 6. When Things Break
- Summary
- Time for action – examining the output data with Java
- Time for action – examining the output data with Ruby
- Time for action – generating shape summaries in MapReduce
- Time for action – consuming the Avro data with Java
- Time for action – creating the source Avro data with Ruby
- Time for action – defining the schema
- Time for action – getting and installing Avro
- Using language-independent data structures
- Time for action – the fourth and last run
- Time for action – the third run
- Time for action – the second run
- Time for action – the first run
- Time for action – creating the source code
- Time for action – representing the graph
- Graph algorithms
- Time for action – reduce-side join using MultipleInputs
- Joins
- Simple advanced and in-between
- Chapter 5. Advanced MapReduce Techniques
- Summary
- Time for action – creating counters task states and writing log output
- Counters status and other output
- Time for action – using the Distributed Cache to improve location output
- Time for action – using ChainMapper for field validation/analysis
- Time for action – performing the shape/time analysis from the command line
- Time for action – correlating of sighting duration to UFO shape
- Time for action – summarizing the shape data
- Time for action – summarizing the UFO data
- Analyzing a large dataset
- Time for action – implementing WordCount using Streaming
- Using languages other than Java with Hadoop
- Chapter 4. Developing MapReduce Programs
- Summary
- Input/output
- Time for action – using the Writable wrapper classes
- Hadoop-specific data types
- Time for action – fixing WordCount to work with a combiner
- Time for action – WordCount with a combiner
- Walking through a run of WordCount
- Time for action – WordCount the easy way
- Time for action – running WordCount on EMR
- Time for action – running WordCount on a local Hadoop cluster
- Time for action – building a JAR file
- Time for action – implementing WordCount
- Time for action – setting up the classpath
- Writing MapReduce programs
- The Hadoop Java API for MapReduce
- Key/value pairs
- Chapter 3. Understanding MapReduce
- Summary
- Comparison of local versus EMR Hadoop
- Time for action – WordCount on EMR using the management console
- Using Elastic MapReduce
- Time for action – WordCount the Hello World of MapReduce
- Time for action – using HDFS
- Time for action – starting Hadoop
- Time for action – formatting the NameNode
- Time for action – changing the base HDFS directory
- Time for action – configuring the pseudo-distributed mode
- Time for action – using Hadoop to calculate Pi
- Time for action – setting up SSH
- Time for action – downloading Hadoop
- Time for action – checking the prerequisites
- Hadoop on a local Ubuntu host
- Chapter 2. Getting Hadoop Up and Running
- Summary
- Cloud computing with Amazon Web Services
- Big data processing
- Chapter 1. What It's All About
- Customer support
- Reader feedback
- Time for action – heading
- Conventions
- Who this book is for
- What you need for this book
- What this book covers
- Preface
- Support files eBooks discount offers and more
- www.PacktPub.com
- About the Reviewers
- About the Author
- Credits
- Hadoop Beginner's Guide
- coverpage
- coverpage
- Hadoop Beginner's Guide
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Support files eBooks discount offers and more
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Time for action – heading
- Reader feedback
- Customer support
- Chapter 1. What It's All About
- Big data processing
- Cloud computing with Amazon Web Services
- Summary
- Chapter 2. Getting Hadoop Up and Running
- Hadoop on a local Ubuntu host
- Time for action – checking the prerequisites
- Time for action – downloading Hadoop
- Time for action – setting up SSH
- Time for action – using Hadoop to calculate Pi
- Time for action – configuring the pseudo-distributed mode
- Time for action – changing the base HDFS directory
- Time for action – formatting the NameNode
- Time for action – starting Hadoop
- Time for action – using HDFS
- Time for action – WordCount the Hello World of MapReduce
- Using Elastic MapReduce
- Time for action – WordCount on EMR using the management console
- Comparison of local versus EMR Hadoop
- Summary
- Chapter 3. Understanding MapReduce
- Key/value pairs
- The Hadoop Java API for MapReduce
- Writing MapReduce programs
- Time for action – setting up the classpath
- Time for action – implementing WordCount
- Time for action – building a JAR file
- Time for action – running WordCount on a local Hadoop cluster
- Time for action – running WordCount on EMR
- Time for action – WordCount the easy way
- Walking through a run of WordCount
- Time for action – WordCount with a combiner
- Time for action – fixing WordCount to work with a combiner
- Hadoop-specific data types
- Time for action – using the Writable wrapper classes
- Input/output
- Summary
- Chapter 4. Developing MapReduce Programs
- Using languages other than Java with Hadoop
- Time for action – implementing WordCount using Streaming
- Analyzing a large dataset
- Time for action – summarizing the UFO data
- Time for action – summarizing the shape data
- Time for action – correlating of sighting duration to UFO shape
- Time for action – performing the shape/time analysis from the command line
- Time for action – using ChainMapper for field validation/analysis
- Time for action – using the Distributed Cache to improve location output
- Counters status and other output
- Time for action – creating counters task states and writing log output
- Summary
- Chapter 5. Advanced MapReduce Techniques
- Simple advanced and in-between
- Joins
- Time for action – reduce-side join using MultipleInputs
- Graph algorithms
- Time for action – representing the graph
- Time for action – creating the source code
- Time for action – the first run
- Time for action – the second run
- Time for action – the third run
- Time for action – the fourth and last run
- Using language-independent data structures
- Time for action – getting and installing Avro
- Time for action – defining the schema
- Time for action – creating the source Avro data with Ruby
- Time for action – consuming the Avro data with Java
- Time for action – generating shape summaries in MapReduce
- Time for action – examining the output data with Ruby
- Time for action – examining the output data with Java
- Summary
- Chapter 6. When Things Break
- Failure
- Time for action – killing a DataNode process
- Time for action – the replication factor in action
- Time for action – intentionally causing missing blocks
- Time for action – killing a TaskTracker process
- Time for action – killing the JobTracker
- Time for action – killing the NameNode process
- Time for action – causing task failure
- Time for action – handling dirty data by using skip mode
- Summary
- Chapter 7. Keeping Things Running
- A note on EMR
- Hadoop configuration properties
- Time for action – browsing default properties
- Setting up a cluster
- Time for action – examining the default rack configuration
- Time for action – adding a rack awareness script
- Cluster access control
- Time for action – demonstrating the default security
- Managing the NameNode
- Time for action – adding an additional fsimage location
- Time for action – swapping to a new NameNode host
- Managing HDFS
- MapReduce management
- Time for action – changing job priorities and killing a job
- Scaling
- Summary
- Chapter 8. A Relational View on Data with Hive
- Overview of Hive
- Setting up Hive
- Time for action – installing Hive
- Using Hive
- Time for action – creating a table for the UFO data
- Time for action – inserting the UFO data
- Time for action – validating the table
- Time for action – redefining the table with the correct column separator
- Time for action – creating a table from an existing file
- Time for action – performing a join
- Time for action – using views
- Time for action – exporting query output
- Time for action – making a partitioned UFO sighting table
- Time for action – adding a new User Defined Function (UDF)
- Hive on Amazon Web Services
- Time for action – running UFO analysis on EMR
- Summary
- Chapter 9. Working with Relational Databases
- Common data paths
- Setting up MySQL
- Time for action – installing and setting up MySQL
- Time for action – configuring MySQL to allow remote connections
- Time for action – setting up the employee database
- Getting data into Hadoop
- Time for action – downloading and configuring Sqoop
- Time for action – exporting data from MySQL to HDFS
- Time for action – exporting data from MySQL into Hive
- Time for action – a more selective import
- Time for action – using a type mapping
- Time for action – importing data from a raw query
- Getting data out of Hadoop
- Time for action – importing data from Hadoop into MySQL
- Time for action – importing Hive data into MySQL
- Time for action – fixing the mapping and re-running the export
- AWS considerations
- Summary
- Chapter 10. Data Collection with Flume
- A note about AWS
- Data data everywhere...
- Time for action – getting web server data into Hadoop
- Introducing Apache Flume
- Time for action – installing and configuring Flume
- Time for action – capturing network traffic in a log file
- Time for action – logging to the console
- Time for action – capturing the output of a command to a flat file
- Time for action – capturing a remote file in a local flat file
- Time for action – writing network traffic onto HDFS
- Time for action – adding timestamps
- Time for action – multi level Flume networks
- Time for action – writing to multiple sinks
- The bigger picture
- Summary
- Chapter 11. Where to Go Next
- What we did and didn't cover in this book
- Upcoming Hadoop changes
- Alternative distributions
- Other Apache projects
- Other programming abstractions
- AWS resources
- Sources of information
- Summary
- Appendix A. Pop Quiz Answers
- Chapter 3 Understanding MapReduce
- Chapter 7 Keeping Things Running
- Index 更新時間:2021-07-29 16:52:10