APACHE SPARK QUIZ DESCRIPTION Total Questions −30 00 Max Time − 15:00 Which is not a component on the top of Spark Core? Spark RDD Spark Streaming MLlib None of the above What are the features of Spark RDD? In-memory computation Lazy evaluations Fault Tolerance All of the above How many tasks does Spark run on each partition? Any number of task One More than one less than five None of the above Apache Spark has API’s in Java Scala Python All of the above RDD are fault-tolerant and immutable True False Which of the following is not true for Hadoop and Spark? Both are data processing platforms Both are cluster computing environments Both have their own file system Both use open source APIs to link between different tools Which of the following is true for Spark core? It is the kernel of Spark It enables users to run SQL / HQL queries on the top of Spark. It is the scalable machine learning library which delivers efficiencies Improves the performance of iterative algorithm drastically. The default storage level of cache() is? MEMORY_ONLY MEMORY_AND_DISK DISK_ONLY MEMORY_ONLY_SER Which of the following is true about DataFrame? DataFrames provide a more user-friendly API than RDDs. DataFrame API have provision for compile-time type safety Both the above How much faster can Apache Spark potentially run batch-processing programs when processed in memory than MapReduce can? 10 times faster 20 times faster 100 times faster 200 times faster Which of the following is not a transformation? Flatmap Map Reduce Filter Which Cluster Manager do Spark Support? Standalone Cluster Manager MESOS YARN All of the above Is MLlib deprecated? Yes No When does Apache Spark evaluate RDD? Upon action Upon transformation On both transformation and action The basic abstraction of Spark Streaming is Dstream RDD Shared Variable None of the above Which of the following is a tool of Machine Learning Library? Persistence Utilities like linear algebra, statistics Pipelines All of the above Which is the abstraction of Apache Spark? Shared Variable RDD Both the above None of the above Can we add or setup new string computation after SparkContext starts Yes No Dstream internally is Continuous Stream of RDD Continuous Stream of DataFrame Continuous Stream of DataSet None of the above Which of the following is true for RDD? We can operate Spark RDDs in parallel with a low-level API RDDs are similar to the table in a relational database It allows processing of a large amount of structured data It has built-in optimization engine Spark is developed in which language Java Scala Python R Which of the following is not an action? collect() take(n) top() map Can we edit the data of RDD, for example, the case conversion? Yes No In Spark Streaming the data can be from what all sources? Kafka Flume Kinesis All of the above Which of the following provide the Spark Core’s fast scheduling capability to perform streaming analytics. RDD GraphX Spark Streaming Spark R Which Cluster Manager do Spark Support? Standalone Cluster Manager MESOS YARN All of the above Which of the following is not output operation on DStream SaveAsTextFiles ForeachRDD SaveAsHadoopFiles ReduceByKeyAndWindow In how many ways RDD can be created? 4 3 2 1 In addition to stream processing jobs, what all functionality do Spark provides? Machine learning Graph processing Batch processing All of the above Which of the following is true for Spark Shell? It helps Spark applications to easily run on the command line of the system It runs/tests application code interactively It allows reading from many types of data sources All of the above Previous Next Total Question16 Wrong Answer13 Right Answer13