Tuesday, September 1, 2015

Hadoop Introduction

Hadoop Introduction

For detailed intoduction of Hadoop please visit: http://hadoop.apache.org/

Hadoop

  • Setting Up Hadoop Single Node Cluster
  • Settting Up Hadoop Multi Node Cluster
  • Hadoop Documentation Classified as follows:
  • Hadoop Common
  • Hadoop Distributed File System (HDFS)
  • Hadoop Map Reduce
  • Hadoop Yet Another Resource Negotiator (YARN)

Other Projects Under Hadoop

  • Ambari - Web Based Tool for provisioning, managing and monitoring Apache Hadoop Clusters
  • Avro - Data Serialization System
  • Cassandra - Scalable Multi-Master Database with no single point of failure.
  • Chukwa - Data Collection System
  • Hbase - A scalable, distributed database that supports structured data storage for large tables.
  • Hive - A data warehouse infrastructure that provides data summarization and ad hoc querying.
  • Mahout - A Scalable machine learning and data mining library
  • Pig - A high-level data-flow language and execution framework for parallel computation
  • Spark - A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation
  • Tez - A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine
  • ZooKeeper - A high-performance coordination service for distributed applications

No comments:

Post a Comment