HADOOP QUIZ DESCRIPTION Total Questions −30 00 Max Time − 15:00 __________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results. Maptask Mapper Task execution All of the mentioned Point out the wrong statement. Hardtop processing capabilities are huge and its real advantage lies in the ability to process terabytes & petabytes of data Hadoop uses a programming model called “MapReduce”, all the programs should conform to this model in order to work on the Hadoop platform The programming model, MapReduce, used by Hadoop is difficult to write and test All of the mentioned What was Hadoop written in? Java (software platform) Perl Java (programming language) Lua (programming language) Mapper implementations are passed the JobConf for the job via the ________ method. JobConfigure.configure JobConfigurable.configure JobConfigurable.configurable None of the mentioned Point out the correct statement. Applications can use the Reporter to report progress The Hadoop MapReduce framework spawns one map task for each InputSplit generated by the InputFormat for the job The intermediate, sorted outputs are always stored in a simple (key-len, key, value-len, value) format All of the mentioned _________ can best be described as a programming model used to develop Hadoop-based applications that can process massive amounts of data. MapReduce Mahout Oozie All of the mentioned ________ is the primary interface for a user to describe a MapReduce job to the Hadoop framework for execution. Map Parameters JobConf MemoryConf None of the mentioned _________ is the default Partitioner for partitioning key space. HashPar Partitioner HashPartitioner None of the mentioned _______ is the most popular high-level Java API in Hadoop Ecosystem Scalding HCatalog Cascalog Cascading Point out the wrong statement. Elastic MapReduce (EMR) is Facebook’s packaged Hadoop offering Amazon Web Service Elastic MapReduce (EMR) is Amazon’s packaged Hadoop offering Scalding is a Scala API on top of Cascading that removes most Java boilerplate All of the mentioned ______ is a framework for performing remote procedure calls and data serialization. Drill BigTop Avro Chukwa The output of the _______ is not sorted in the Mapreduce framework for Hadoop. Cascader Cascader Scalding None of the mentioned _______ jobs are optimized for scalability but not latency. Mapreduce Drill Oozie Hive The number of maps is usually driven by the total size of ____________ outputs inputs tasks None of the mentioned Point out the correct statement. MapReduce tries to place the data and the compute as close as possible Map Task in MapReduce is performed using the Mapper() function Reduce Task in MapReduce is performed using the Map() function All of the mentioned Input to the _______ is the sorted output of the mappers. Reducer Mapper Shuffle All of the mentioned Point out the correct statement. Hadoop is an ideal environment for extracting and transforming small volumes of data Hadoop stores data in HDFS and supports data compression/decompression The Giraph framework is less useful than a MapReduce job to solve graph and machine learning None of the mentioned The Pig Latin scripting language is not only a higher-level data flow language but also has operators similar to ____________ SQL JSON XML All of the mentioned _______ is a platform for constructing data flows for extract, transform, and load (ETL) processing and analysis of large datasets. Pig Latin Oozie Pig Hive Which of the following phases occur simultaneously? Shuffle and Sort Reduce and Sort Shuffle and Map All of the mentioned Which of the following genres does Hadoop produce? Distributed file system JAX-RS Java Message Service Relational Database Management System A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker. MapReduce Mapper TaskTracker JobTracker __________ is general-purpose computing model and runtime system for distributed data analytics. Mapreduce Drill Oozie None of the mentioned The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. Machine learning Pattern recognition Statistical classification Artificial intelligence As companies move past the experimental phase with Hadoop, many cite the need for additional capabilities, including _______________ Improved data storage and information retrieval Improved extract, transform and load features for data integration Improved data warehousing functionality Improved security, workload management, and SQL support Which of the following platforms does Hadoop run on? Bare metal Debian Cross-platform Unix-like Point out the wrong statement. A MapReduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner The MapReduce framework operates exclusively on <key, value> pairs Applications typically implement the Mapper and Reducer interfaces to provide the map and reduce methods None of the mentioned What license is Hadoop distributed under? Apache License 2.0 Apache License 2.0 Shareware Commercial ________ hides the limitations of Java behind a powerful and concise Clojure API for Cascading. Scalding HCatalog Cascalog All of the mentioned Point out the correct statement. Hive is not a relational database, but a query engine that supports the parts of SQL specific to querying data Hive is a relational database with SQL support Pig is a relational database with SQL support All of the mentioned Previous Next Total Question16 Wrong Answer13 Right Answer13