APACHE SPARK QUIZ DESCRIPTION Total Questions −30 00 Max Time − 15:00 Which of the following is a tool of Machine Learning Library? Persistence Utilities like linear algebra, statistics Pipelines All of the above When does Apache Spark evaluate RDD? Upon action Upon transformation On both transformation and action In how many ways RDD can be created? 4 3 2 1 Can you combine the libraries of Apache Spark into the same Application, for example, MLlib, GraphX, SQL and DataFrames etc. Yes No Can we add or setup new string computation after SparkContext starts Yes No Apache Spark was made open-source in which year? 2010 2011 2008 2009 Which of the following algorithm is not present in MLlib? Streaming Linear Regression Streaming KMeans Tanimoto distance None of the above Apache Spark has API’s in Java Scala Python All of the above Which of the following is not the feature of Spark? Supports in-memory computation Fault-tolerance It is cost efficient Compatible with other file storage system In Spark Streaming the data can be from what all sources? Kafka Flume Kinesis All of the above Which of the following is true for RDD? RDD is programming paradigm RDD in Apache Spark is an immutable collection of objects It is database None of the above RDD are fault-tolerant and immutable True False Which of the following is not output operation on DStream SaveAsTextFiles ForeachRDD SaveAsHadoopFiles ReduceByKeyAndWindow Which of the following is true about DataFrame? DataFrames provide a more user-friendly API than RDDs. DataFrame API have provision for compile-time type safety Both the above Which of the following is false for Apache Spark? It provides high-level API in Java, Python, R, Scala It can be integrated with Hadoop and can process existing Hadoop HDFS data Spark is an open source framework which is written in Java Spark is 100 times faster than Bigdata Hadoop Which Cluster Manager do Spark Support? Standalone Cluster Manager MESOS YARN All of the above In addition to stream processing jobs, what all functionality do Spark provides? Machine learning Graph processing Batch processing All of the above Does Spark R make use of MLlib in any aspect? Yes No Which of the following is not an action? collect() take(n) top() map How much faster can Apache Spark potentially run batch-processing programs when processed in memory than MapReduce can? 10 times faster 20 times faster 100 times faster 200 times faster How many tasks does Spark run on each partition? Any number of task One More than one less than five None of the above For Regression problem which algorithm is not the solution? Logistic Regression Ridge Regression Decision Trees Gradient-Boosted Trees Which of the following provide the Spark Core’s fast scheduling capability to perform streaming analytics. RDD GraphX Spark Streaming Spark R Which of the following is true for Spark Shell? It helps Spark applications to easily run on the command line of the system It runs/tests application code interactively It allows reading from many types of data sources All of the above Which of the following is not a component of Spark Ecosystem? Sqoop GraphX MLlib BlinkDB Dataset was introduced in which Spark release? Spark 1.6 Spark 1.4.0 Spark 2.1.0 Spark 1.1 Is MLlib deprecated? Yes No The basic abstraction of Spark Streaming is Dstream RDD Shared Variable None of the above You can connect R program to a Spark cluster from – RStudio R Shell Rscript All of the above Which Cluster Manager do Spark Support? Standalone Cluster Manager MESOS YARN All of the above Previous Next Total Question16 Wrong Answer13 Right Answer13