官术网_书友最值得收藏!

Comparison of local versus EMR Hadoop

After our first experience of both a local Hadoop cluster and its equivalent in EMR, this is a good point at which we can consider the differences of the two approaches.

As may be apparent, the key differences are not really about capability; if all we want is an environment to run MapReduce jobs, either approach is completely suited. Instead, the distinguishing characteristics revolve around a topic we touched on in Chapter 1, What It's All About, that being whether you prefer a cost model that involves upfront infrastructure costs and ongoing maintenance effort over one with a pay-as-you-go model with a lower maintenance burden along with rapid and conceptually infinite scalability. Other than the cost decisions, there are a few things to keep in mind:

  • EMR supports specific versions of Hadoop and has a policy of upgrading over time. If you have a need for a specific version, in particular if you need the latest and greatest versions immediately after release, then the lag before these are live on EMR may be unacceptable.
  • You can start up a persistent EMR job flow and treat it much as you would a local Hadoop cluster, logging into the hosting nodes and tweaking their configuration. If you find yourself doing this, its worth asking if that level of control is really needed and, if so, is it stopping you getting all the cost model benefits of a move to EMR?
  • If it does come down to a cost consideration, remember to factor in all the hidden costs of a local cluster that are often forgotten. Think about the costs of power, space, cooling, and facilities. Not to mention the administration overhead, which can be nontrivial if things start breaking in the early hours of the morning.
主站蜘蛛池模板: 梅河口市| 揭西县| 鲁山县| 肃宁县| 湖北省| 清水河县| 岑巩县| 中西区| 土默特左旗| 崇信县| 峨眉山市| 湛江市| 西充县| 商城县| 密山市| 固阳县| 宝鸡市| 突泉县| 邛崃市| 长乐市| 民乐县| 宜城市| 新乐市| 邵东县| 阿城市| 双流县| 宁蒗| 昭平县| 通化市| 修文县| 许昌市| 亚东县| 车致| 垦利县| 信宜市| 海口市| 论坛| 吴桥县| 太谷县| 孝昌县| 宁陕县|