HADOOP QUIZ DESCRIPTION

__________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.

  • Maptask
     

  •  Mapper
     

  •  Task execution
     

  • All of the mentioned

Point out the wrong statement.

  • Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data
     

  • Hadoop uses a programming model called “MapReduce”, all the programs should conform to this model in order to work on the Hadoop platform
     

  • The programming model, MapReduce, used by Hadoop is difficult to write and test
     

  •  All of the mentioned

What was Hadoop written in?

  • Java (software platform)
     

  •  Perl
     

  • Java (programming language)
     

  •  Lua (programming language)

Mapper implementations are passed the JobConf for the job via the ________ method.

  •  JobConfigure.configure
     

  •  JobConfigurable.configure
     

  •  JobConfigurable.configurable
     

  •  None of the mentioned

 Point out the correct statement.

  • Applications can use the Reporter to report progress
     

  •  The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job
     

  •  The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format
     

  •  All of the mentioned

_________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.

  • MapReduce
     

  •  Mahout
     

  •  Oozie
     

  •  All of the mentioned

________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution.

  • Map Parameters
     

  • JobConf
     

  • MemoryConf
     

  •  None of the mentioned

_________ is the default Partitioner for partitioning key space.

  • HashPar
     

  •  Partitioner
     

  •  HashPartitioner
     

  •  None of the mentioned

_______ is the most popular high-level Java API in Hadoop Ecosystem

  • Scalding
     

  •  HCatalog
     

  • Cascalog
     

  •  Cascading

Point out the wrong statement.

  • Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering
     

  • Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering
     

  •  Scalding is a Scala API on top of Cascading that removes most Java boilerplate
     

  • All of the mentioned

______ is a framework for performing remote procedure calls and data serialization.

  • Drill
     

  •  BigTop
     

  •  Avro
     

  •  Chukwa

The output of the _______ is not sorted in the Mapreduce framework for Hadoop.

  • Cascader
     

  • Cascader
     

  •  Scalding
     

  • None of the mentioned

_______ jobs are optimized for scalability but not latency.

  • Mapreduce
     

  •  Drill
     

  •  Oozie
     

  • Hive

The number of maps is usually driven by the total size of ____________

  •  outputs
     

  • inputs
     

  •  tasks
     

  •  None of the mentioned

Point out the correct statement.

  • MapReduce tries to place the data and the compute as close as possible
     

  • Map Task in MapReduce is performed using the Mapper() function
     

  •  Reduce Task in MapReduce is performed using the Map() function
     

  •  All of the mentioned

Input to the _______ is the sorted output of the mappers.

  • Reducer
     

  •  Mapper
     

  • Shuffle
     

  •  All of the mentioned

Point out the correct statement.

  • Hadoop is an ideal environment for extracting and transforming small volumes of data
     

  • Hadoop stores data in HDFS and supports data compression/decompression
     

  • The Giraph framework is less useful than a MapReduce job to solve graph and machine learning
     

  • None of the mentioned

The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________

  • SQL
     

  •  JSON
     

  •  XML
     

  •  All of the mentioned

_______ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets.

  • Pig Latin
     

  • Oozie
     

  •  Pig
     

  • Hive

Which of the following phases occur simultaneously?

  • Shuffle and Sort
     

  •  Reduce and Sort
     

  •  Shuffle and Map
     

  •  All of the mentioned

Which of the following genres does Hadoop produce?

  • Distributed file system
     

  •  JAX-RS
     

  •  Java Message Service
     

  •  Relational Database Management System

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.

  • MapReduce
     

  • Mapper
     

  •  TaskTracker
     

  •  JobTracker

__________ is general-purpose computing model and runtime system for distributed data analytics.

  •  Mapreduce
     

  •  Drill
     

  •  Oozie
     

  • None of the mentioned

The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations.

  • Machine learning
     

  • Pattern recognition
     

  •  Statistical classification
     

  •  Artificial intelligence

 As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________

  • Improved data storage and information retrieval
     

  •  Improved extract, transform and load features for data integration
     

  •  Improved data warehousing functionality
     

  •  Improved security, workload management, and SQL support

Which of the following platforms does Hadoop run on?

  • Bare metal
     

  •  Debian
     

  • Cross-platform
     

  •  Unix-like

Point out the wrong statement.

  • A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner
     

  • The MapReduce framework operates exclusively on <key, value> pairs
     

  •  Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods
     

  • None of the mentioned

What license is Hadoop distributed under?

  • Apache License 2.0
     

  • Apache License 2.0
     

  • Shareware
     

  •  Commercial

________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.

  •  Scalding
     

  •  HCatalog
     

  • Cascalog
     

  •  All of the mentioned

Point out the correct statement.

  • Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data
     

  •  Hive is a relational database with SQL support
     

  •  Pig is a relational database with SQL support
     

  • All of the mentioned