Top 30 Mapreduce Quiz/Test with Answers : Big Data and Hadoop Interview Questions Quiz


Here we come with most growing technology quiz set, which is know as Big Data, Many database developer are now taking training for same to make there career in this technology. Expert says that this technology have bright future, So if you are taking training or prepare side by side then must go through this quiz to test you knowledge in Big data & Hadoop. These are questions with answer which can be asked during interview. If you want more quiz/test like this then comment, we will prepare most test set. We have more such test like this one
Click here for another Quiz/Test 1
Click here for another Quiz/Test 2

1). Which of the following describes the map function?  
A) It processes data to create a list of key-value pairs
B) It converts a relational database into key-value pairs
C) It indexes the data to list all the words occurring in it
D) It tracks data across multiple tables and clusters in Hadoop

2). In a word count query using MapReduce, what does the map function do?
A) It sorts the words alphabetically and returns a list of the most frequently used words.
B) It returns a list with each document as a key and the number of words in it as the value.The master JobTracker sends map and reduce functions to the same machines or nodes in a cluster.
C) It creates a list with each word as a key and every occurrence as value 1.
D) It creates a list with each word as a key and the number of occurrences as the value.

3).The Combine stage, if present, must perform the same aggregation operation as Reduce.True or False?
A) True
B) False

4). A combinator in MapReduce is a function that:
A) Assembles program fragments
B) Builds programs from program fragments
C) Helps in fragmenting the program
D) Builds program fragments

5)Indentify the utility that allows you to create and run MapReduce jobs with any executable or script as the mapper and/or the reducer?
A) Oozie
B) Sqoop
C) Flume
D) Hadoop Streaming

6). What messages does a data node also provide to detect and ensure connectivity between the NameNode and the data nodes?
A) Heartbeat
B) Pipeline
C) Map
D) None

7). Which MapReduce phase is theoretically able to utilize features of the underlying file system in order to optimize parallel execution? 
A) Split
B) Map
C) Combine
D) MapCombine

8). In designing the MapReduce framework, which of the following needs did the engineers consider?
A) Developers should be able to create new languages
B) Processing should expand and contract automatically
C) Processing should be stopped in the case of network failure
D) It should be cheap and distributed free of cost

9). MapReduce can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of unstructured data.
A) True
B) False

10). The function of secondary namenode: 
A) is to serve as a backup for NameNode
B) is to continue the functioning of NameNode
C) is to serve as a checkpoint mechanism for primary NameNode
D) is to provide advanced technology as compared with primary
11). Hadoop is a framework that works with a variety of related tools. Common cohorts include:
A) MapReduce, Hive and HBase
B) MapReduce, MySQL and Google Apps
C) MapReduce, Hummer and Iguana
D) MapReduce, Heron and Trumpet

12). In a MapReduce job, you want each of your input files processed by a single map task. How do you configure a MapReduce job so that a single map task processes each input file regardless of how many blocks the input file occupies?
A) Set the number of mappers equal to the number of input files you want to process.
B) Write a custom MapRunner that iterates over all key–value pairs in the entire file.
C) Write a custom FileInputFormat and override the method isSplittable to always return false.
D) Increase the parameter that controls minimum split size in the job configuration

13). What is the input to the Reduce function? 
A) An arbitrarily sized list of key/value pairs.
B) An arbitrarily sized list of key/value with that key.
C) One key and a list of all values associated with that key.
D) One key and a list of some values associated with that key.

14). A combinator in MapReduce is a function that:
A) Assembles program fragments
B) Helps in fragmenting the program
C) Builds programs from program fragments
D) Builds program fragments

15). What is the implementation language of the Hadoop MapReduce framework?
A) Java
C) Python
D) C

16). Which of the following MapReduce execution frameworks focus on execution in shared-memory environments? 
A) Hadoop
B) Twister
C) Phoenix
D) BitXise

17). If the output directory specified in the output format already exists, MapReduce execution:
A) Throws an error
B) Continues as execution
C) Seeks further instructions
D) Shuts down completely

18). Which of following statement(s) are correct? 
a) Master and slaves files are optional in Hadoop 2.x
b) Master file has list of all name nodes
c) Core-site has hdfs and MapReduce related common properties
d) hdfs-site file is now deprecated in Hadoop 2.x

19). What does the following piece of code return?
pempublic abstract float getProgress() throws IOException, InterruptedException/em
A) A number above 1
B) A number below 0
C) Either 0 or 1
D) A number between 0 and 1

20). Mike has a Hadoop cluster with 50 machines under default setup (replication factor 3, 128MB input split size). Each machine has 100GB of HDFS disk space. The cluster is currently empty (no job, no data). Mike intends to upload 1 Terabyte of plain text (in 5 files of approximately 200GB each), followed by running Hadoop’s standard WordCount job. What is going to happen?
A) WordCount fails: too many input splits to process
B) WordCount runs successfully
C) The data upload fails at the last file: due to replication, all disks are full
D) The data upload fails at the first file: it is too large to fit onto a node

Note: WordCount runs successfully Explanation:the total HDFS storage is 50*100 = 5000 GB. The total input file size is 5*200 = 1000 GB, with 3 times replication factor, it is 3000 GB, which is still lesser than the total HDFS storage. So the files will get loaded and Wordcount will run successfully.

22). How does HDFS ensure the integrity of the stored data?
A) Through checksums
B) Through error logs
C) By comparing the replicated data blocks with each other
D) By comparing the replicated blocks to the master copy Shuffle and sort

23). The time it takes for a Hadoop job’s Map task to finish mostly depends on?
A) The duration of the job’s Reduce task by comparing the replicated data blocks with each other
B) The duration of the job’s shuffle & sort phase
C) The placement of the NameNode in the cluster
D) The placement of the blocks required for the Map task

24). What is speculative execution in Hadoop?
A) Hadoop executes the delayed map/reduce tasks in parallel in other Datanodes, speculating failures/delays in Datanodes.
B) Hadoop always executes every map/reduce task in more than one Datanode, speculatively.
C) Hadoop does not implement any speculative execution unless specified by the user.
D) Nodes in Hadoop cluster never fail. So there is no speculative execution in Hadoop.

25). If the output directory specified in the output format already exists, MapReduce execution:
A) Seeks further instructions
B) Throws an error
C) Shuts down completely
D) Continues execution

26). What decides number of Mappers for a MapReduce job?
A) File Location
B) parameter
C) Input file size
D) Input Splits

27). What happens if mapper output does not match reducer input?
A) Hadoop API will convert the data to the type that is needed by the reducer.
B) Data input/output inconsistency cannot occur. A preliminary validation check is executed prior to the full execution of the job to ensure there is consistency
C) A real-time exception will be thrown, and MapReduce job will fail
D) The Java compiler will report an error during compilation, but the job will complete with exceptions.

28).A 10-GB file is split into chunks of 100 MB and is distributed among the nodes of a Hadoop cluster.
Due to power failure, the system got switched off, and when power returns, the system administrator restarts the process. How will the NameNode know what kind of processing was being performed on which file?
A) Through the combiner
B) Through the scheduler
C) Through the input list
D) Through the DataNode

29). What are the core methods of a Reducer?
A) setup (),reduce (),cleanup ()
B) Get (),Mapreduce (),cleanup ()
C) Put (),reduce (),clean ()
D) set-up (),reduce (),cleanup ()

30).Which of the following is the correct sequence of MapReduce flow? 
A) Map ??Reduce ??Combine
B) Combine ??Reduce ??Map
C) Map ??Combine ??Reduce
D) Reduce ??Combine ??Map

Answers: 1). A 2). C 3).B  4).B  5).D  6).A  7).A  8).B  9).A  10).A  11).A  12).B  13).C  14).C  15).A  16).C  17).A  18).C  19).D  20).B  21).  22).A  23).D  24).A  25).B  26).C  27).C  28).C 29).A  30).C
Click here for another Quiz/Test 1
Click here for another Quiz/Test 2


  1. Big amounts of data mean big responsibility for most of the projects running on cloud-based data rooms so if your company is going to transfer to the cloud, it will be necessary to check all the points.
    Thanks for insightful Q&A

  2. Much obliged to you for requiring significant investment to give us a portion of the valuable and restrictive data with us.
    Hadoop Training in Chennai | Hadoop course in Chennai | Hadoop Training institutes in Chennai



Contact Form