In case your group is about to enter the world of huge knowledge, you not solely have to resolve whether or not Apache Hadoop is the correct platform to make use of, but additionally which of its many elements are greatest suited to your process. This discipline information makes the train manageable by breaking down the Hadoop ecosystem into quick, digestible sections. You’ll shortly perceive how Hadoop’s tasks, subprojects, and associated applied sciences work collectively.
Every chapter introduces a distinct subject—similar to core applied sciences or knowledge switch—and explains why sure elements could or is probably not helpful for specific wants. In relation to knowledge, Hadoop is an entire new ballgame, however with this helpful reference, you’ll have a superb grasp of the taking part in discipline.
* Core applied sciences—Hadoop Distributed File System (HDFS), MapReduce, YARN, and Spark
* Database and knowledge administration—Cassandra, HBase, MongoDB, and Hive
* Serialization—Avro, JSON, and Parquet
* Administration and monitoring—Puppet, Chef, Zookeeper, and Oozie
* Analytic helpers—Pig, Mahout, and MLLib
* Information switch—Scoop, Flume, distcp, and Storm
* Safety, entry management, auditing—Sentry, Kerberos, and Knox
* Cloud computing and virtualization—Serengeti, Docker, and Whirr
Tags: #IT Ebooks