官术网_书友最值得收藏!

Comparison of local versus EMR Hadoop

After our first experience of both a local Hadoop cluster and its equivalent in EMR, this is a good point at which we can consider the differences of the two approaches.

As may be apparent, the key differences are not really about capability; if all we want is an environment to run MapReduce jobs, either approach is completely suited. Instead, the distinguishing characteristics revolve around a topic we touched on in Chapter 1, What It's All About, that being whether you prefer a cost model that involves upfront infrastructure costs and ongoing maintenance effort over one with a pay-as-you-go model with a lower maintenance burden along with rapid and conceptually infinite scalability. Other than the cost decisions, there are a few things to keep in mind:

  • EMR supports specific versions of Hadoop and has a policy of upgrading over time. If you have a need for a specific version, in particular if you need the latest and greatest versions immediately after release, then the lag before these are live on EMR may be unacceptable.
  • You can start up a persistent EMR job flow and treat it much as you would a local Hadoop cluster, logging into the hosting nodes and tweaking their configuration. If you find yourself doing this, its worth asking if that level of control is really needed and, if so, is it stopping you getting all the cost model benefits of a move to EMR?
  • If it does come down to a cost consideration, remember to factor in all the hidden costs of a local cluster that are often forgotten. Think about the costs of power, space, cooling, and facilities. Not to mention the administration overhead, which can be nontrivial if things start breaking in the early hours of the morning.
主站蜘蛛池模板: 家居| 团风县| 沅江市| 晋江市| 凌源市| 乐清市| 英吉沙县| 四川省| 丰县| 保德县| 车险| 安塞县| 平乐县| 广宁县| 札达县| 台前县| 克什克腾旗| 新乐市| 江都市| 保定市| 辰溪县| 鄯善县| 定西市| 博爱县| 衡阳市| 托克逊县| 浑源县| 收藏| 延吉市| 巴东县| 曲松县| 土默特左旗| 平原县| 茂名市| 甘孜县| 古丈县| 济宁市| 旅游| 宜兰县| 永年县| 洛南县|