Get Latest Exam Updates, Free Study materials and Tips

Big Data and Hadoop Viva Question

Introduction to Big Data and Hadoop

1.What is the definition of Software Engineering?

Ans:

Big Data in general is defined as high volume, velocity and variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.

2.What are the characteristics of Big Data ?

Ans:

i) Volume- vast 'volumes' of data is generated from many sources daily, such as business processes, machines, social media platforms, networks, human interactions, and many more.

ii) Variety- Big Data can be structured, unstructured, and semi-structured that are being collected from different sources.

iii) Velocity- Velocity creates the speed by which the data is created in real-time.The primary aspect of Big Data is to provide demanding data rapidly.

iv) Veracity- Veracity means how reliable the data is. It has many ways to filter or translate the data. Veracity is the process of being able to handle and manage data efficiently.

3.Distinguish between Traditional data and Big data

Ans:

Traditional Data:
● Traditional data is the structured data which is being majorly maintained by all types of businesses starting from very small to big organizations.
● In traditional database system a centralized database architecture used to store and maintain the data in a fixed format or fields in a file
Big Data:
● Big data can be considered as an upper version of traditional data.
● Big data deals with too large or complex data sets which is difficult to manage in traditional data-processing application software.
● It deals with large volumes of both structured, semi structured and unstructured data.

4.What is Hadoop? Why is Hadoop used in big data ?

Ans:

Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models.
Hadoop is used in Big data because it allows companies to manage huge volumes of data easily. It allows big problems to be broken down into smaller elements so that analysis could be done quickly and cost-effectively.

5.What are the components of Hadoop ?

Ans:

Hadoop has two major layers namely −
● Processing/Computation layer (MapReduce)
● Storage layer (Hadoop Distributed File System)
● Hadoop Common
● Hadoop YARN

6. Explain the components of Hadoop Architecture.

Ans:

MapReduce
MapReduce is a parallel programming model for writing distributed applications devised at Google for efficient processing of large amounts of data. Hadoop Distributed File System (HDFS) The Hadoop Distributed File System (HDFS) is based on the Google File System (GFS) and provides a distributed file system that is designed to run on commodity hardware.
Hadoop Common These are Java libraries and utilities required by other Hadoop modules.
Hadoop YARN (Yet Another Resource Navigator)
This is a framework for job scheduling and cluster resource management.

7. Explain the Hadoop Ecosystem

Ans:

The Hadoop Ecosystem has the following stages:
i) Data Management
ii) Data Access
iii) Data Processing
iv) Data Storage
● HDFS: Hadoop Distributed File System
● YARN: Yet Another Resource Negotiator
● MapReduce: Programming based Data Processing
● Spark: In-Memory data processing
● PIG, HIVE: Query based processing of data services
● HBase: NoSQL Database
● Mahout, Spark MLLib: Machine Learning algorithm libraries
● Solar, Lucene: Searching and Indexing
● Zookeeper: Managing cluster
● Oozie: Job Scheduling

Not Allowed

Right free icon  Prepare For Your Placements  Left free icon

Stay connected with us on