APACHE SPARK QUIZ DESCRIPTION Total Questions −30 00 Max Time − 15:00 Which of the following is true for Spark Shell? It helps Spark applications to easily run on the command line of the system It runs/tests application code interactively It allows reading from many types of data sources All of the above The default storage level of cache() is? MEMORY_ONLY MEMORY_AND_DISK DISK_ONLY MEMORY_ONLY_SER What are the features of Spark RDD? In-memory computation Lazy evaluations Fault Tolerance All of the above In addition to stream processing jobs, what all functionality do Spark provides? Machine learning Graph processing Batch processing All of the above In Spark Streaming the data can be from what all sources? Kafka Flume Kinesis All of the above Dataset was introduced in which Spark release? Spark 1.6 Spark 1.4.0 Spark 2.1.0 Spark 1.1 Which of the following is true for Spark R? It allows data scientists to analyze large datasets and interactively run jobs It is the kernel of Spark It is the scalable machine learning library which delivers efficiencies It enables users to run SQL / HQL queries on the top of Spark. When does Apache Spark evaluate RDD? Upon action Upon transformation On both transformation and action Which of the following is not the feature of Spark? Supports in-memory computation Fault-tolerance It is cost efficient Compatible with other file storage system Is MLlib deprecated? Yes No Apache Spark was made open-source in which year? 2010 2011 2008 2009 Apache Spark has API’s in Java Scala Python All of the above Which is not a component on the top of Spark Core? Spark RDD Spark Streaming MLlib None of the above Dstream internally is Continuous Stream of RDD Continuous Stream of DataFrame Continuous Stream of DataSet None of the above Which of the following is true for Spark MLlib? Provides an execution platform for all the Spark applications It is the scalable machine learning library which delivers efficiencies enables powerful interactive and data analytics application across live streaming data All of the above Which of the following is not an action? collect() take(n) top() map Which of the following is not true for Hadoop and Spark? Both are data processing platforms Both are cluster computing environments Both have their own file system Both use open source APIs to link between different tools Which of the following algorithm is not present in MLlib? Streaming Linear Regression Streaming KMeans Tanimoto distance None of the above Which of the following is true about DataFrame? DataFrames provide a more user-friendly API than RDDs. DataFrame API have provision for compile-time type safety Both the above Which of the following is true for RDD? RDD is programming paradigm RDD in Apache Spark is an immutable collection of objects It is database None of the above Which of the following is true for RDD? We can operate Spark RDDs in parallel with a low-level API RDDs are similar to the table in a relational database It allows processing of a large amount of structured data It has built-in optimization engine Which of the following is not output operation on DStream SaveAsTextFiles ForeachRDD SaveAsHadoopFiles ReduceByKeyAndWindow Which Cluster Manager do Spark Support? Standalone Cluster Manager MESOS YARN All of the above Can we add or setup new string computation after SparkContext starts Yes No Which of the following is true for Spark core? It is the kernel of Spark It enables users to run SQL / HQL queries on the top of Spark. It is the scalable machine learning library which delivers efficiencies Improves the performance of iterative algorithm drastically. The basic abstraction of Spark Streaming is Dstream RDD Shared Variable None of the above You can connect R program to a Spark cluster from – RStudio R Shell Rscript All of the above Which of the following is a tool of Machine Learning Library? Persistence Utilities like linear algebra, statistics Pipelines All of the above Does Spark R make use of MLlib in any aspect? Yes No Which of the following is false for Apache Spark? It provides high-level API in Java, Python, R, Scala It can be integrated with Hadoop and can process existing Hadoop HDFS data Spark is an open source framework which is written in Java Spark is 100 times faster than Bigdata Hadoop Previous Next Total Question16 Wrong Answer13 Right Answer13