The number of maps is usually driven by the total size of ____________

  •  outputs

  • inputs

  •  tasks

  •  None of the mentioned

A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.

  • MapReduce

  • Mapper

  •  TaskTracker

  •  JobTracker

_________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data.

  • MapReduce

  •  Mahout

  •  Oozie

  •  All of the mentioned

What was Hadoop written in?

  • Java (software platform)

  •  Perl

  • Java (programming language)

  •  Lua (programming language)

What license is Hadoop distributed under?

  • Apache License 2.0

  • Apache License 2.0

  • Shareware

  •  Commercial

________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading.

  •  Scalding

  •  HCatalog

  • Cascalog

  •  All of the mentioned

________ is a generalization of the facility provided by the MapReduce framework to collect data output by the Mapper or the Reducer.

  • Partitioner

  • OutputCollector

  •  Reporter

  •  All of the mentioned

 Point out the correct statement.

  • Applications can use the Reporter to report progress

  •  The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job

  •  The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format

  •  All of the mentioned

Mapper and Reducer implementations can use the ________ to report progress or just indicate that they are alive.

  • Partitioner

  • OutputCollector

  •  Reporter

  • All of the mentioned

______ is a framework for performing remote procedure calls and data serialization.

  • Drill

  •  BigTop

  •  Avro

  •  Chukwa

_________ has the world’s largest Hadoop cluster.

  • Apple

  • Datamatics

  •  Facebook

  • None of the mentioned

Point out the correct statement.

  • Hadoop is an ideal environment for extracting and transforming small volumes of data

  • Hadoop stores data in HDFS and supports data compression/decompression

  • The Giraph framework is less useful than a MapReduce job to solve graph and machine learning

  • None of the mentioned

Hive also support custom extensions written in ____________

  • C#

  •  Java

  • C

  •  C++

Point out the wrong statement.

  • A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner

  • The MapReduce framework operates exclusively on <key, value> pairs

  •  Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods

  • None of the mentioned

_________ is the default Partitioner for partitioning key space.

  • HashPar

  •  Partitioner

  •  HashPartitioner

  •  None of the mentioned

Which of the following genres does Hadoop produce?

  • Distributed file system

  •  JAX-RS

  •  Java Message Service

  •  Relational Database Management System

_________ maps input key/value pairs to a set of intermediate key/value pairs.

  • Mapper


  •  Reducer

  •  Both Mapper and Reducer

  •  None of the mentioned

Point out the correct statement.

  • Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data

  •  Hive is a relational database with SQL support

  •  Pig is a relational database with SQL support

  • All of the mentioned

__________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.

  • Maptask

  •  Mapper

  •  Task execution

  • All of the mentioned

Point out the wrong statement.

  • Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data

  • Hadoop uses a programming model called “MapReduce”, all the programs should conform to this model in order to work on the Hadoop platform

  • The programming model, MapReduce, used by Hadoop is difficult to write and test

  •  All of the mentioned

The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations.

  • Machine learning

  • Pattern recognition

  •  Statistical classification

  •  Artificial intelligence

Hadoop achieves reliability by replicating the data across multiple hosts and hence does not require ________ storage on hosts.

  • RAID

  •  Standard RAID levels

  •  ZFS

  •  Operating system

_______ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.

  • Hadoop Strdata

  •  Hadoop Streaming

  •  Hadoop Stream

  •  None of the mentioned

Point out the wrong statement.

  • Reducer has 2 primary phases

  •  Increasing the number of reduces increases the framework overhead, but increases load balancing and lowers the cost of failures

  •  It is legal to set the number of reduce-tasks to zero if no reduction is desired

  • The framework groups Reducer inputs by keys (since different mappers may have output the same key) in the sort stage

Which of the following phases occur simultaneously?

  • Shuffle and Sort

  •  Reduce and Sort

  •  Shuffle and Map

  •  All of the mentioned

Which of the following platforms does Hadoop run on?

  • Bare metal

  •  Debian

  • Cross-platform

  •  Unix-like

The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________

  • SQL

  •  JSON

  •  XML

  •  All of the mentioned

_______ jobs are optimized for scalability but not latency.

  • Mapreduce

  •  Drill

  •  Oozie

  • Hive

 As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________

  • Improved data storage and information retrieval

  •  Improved extract, transform and load features for data integration

  •  Improved data warehousing functionality

  •  Improved security, workload management, and SQL support

IBM and ________ have announced a major initiative to use Hadoop to support university courses in distributed computer programming.

  • Google Latitude

  • Android (operating system)

  • Google Variations

  • Google