Introduction To Apache Hadoop Eco System

Apache Hadoop

                                   In the last article we understood what is Bigdata? In this article we will see a big data framework called “Hadoop”. Hadoop is a software library which will enable the users to distribute and process the large amount of data using clusters of commodity servers. This project includes,

Hadoop Common – Contains common utilities to work with Hadoop

Hadoop Distributed File System(HDFS) – A distributed file system which will provide high throughput access to the data.

Hadoop YARN(Yet Another Resource Negotiator) – A framework for job scheduling and resource management.

Hadoop MapReduce(MRv2) – Parallel processing programming model.

 In addition to above modules, there are other Hadoop related projects,

ZooKeeper – Coordination service for distributed applications.

Pig –  A high-level data-flow language and execution framework for parallel computation.

HBase – A scalable distributed columnar database.

Hive – A data warehouse infrastructure that provides data summarization and ad hoc querying.

Oozie – A workflow scheduler system to manage Apache Hadoop jobs.

Flume – A tool to move the unstructured data to Apache Hadoop.

Sqoop – A tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases.

The Apache Hadoop eco system is depicted below.

Apache Hadoop EcoSystem

In the coming article we will see introduction of Hadoop Distributed File System.



Siva Janapati is an Architect with experience in building Cloud Native Microservices architectures, Reactive Systems, Large scale distributed systems, and Serverless Systems. Siva has hands-on in architecture, design, and implementation of scalable systems using Cloud, Java, Go lang, Apache Kafka, Apache Solr, Spring, Spring Boot, Lightbend reactive tech stack, APIGEE edge & on-premise and other open-source, proprietary technologies. Expertise working with and building RESTful, GraphQL APIs. He has successfully delivered multiple applications in retail, telco, and financial services domains. He manages the GitHub( where he put the source code of his work related to his blog posts.

Tagged with: ,
Posted in Big Data, Hadoop

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.


Java Code Geeks
Java Code Geeks
%d bloggers like this: