APACHE SPARK QUIZ DESCRIPTION

Which of the following is true for Spark Shell?

  •  It helps Spark applications to easily run on the command line of the system
     

  •   It runs/tests application code interactively
     

  •    It allows reading from many types of data sources
     

  •      All of the above

The default storage level of cache() is?

  •  MEMORY_ONLY
     

  •   MEMORY_AND_DISK
     

  •   DISK_ONLY
     

  •     MEMORY_ONLY_SER

What are the features of Spark RDD?

  •  In-memory computation
     

  •   Lazy evaluations
     

  •    Fault Tolerance
     

  •    All of the above

In addition to stream processing jobs, what all functionality do Spark provides?

  •  Machine learning
     

  •     Graph processing
     

  •   Batch processing
     

  •    All of the above

In Spark Streaming the data can be from what all sources?

  •  Kafka
     

  •    Flume
     

  •    Kinesis
     

  •   All of the above

Dataset was introduced in which Spark release?

  •  Spark 1.6
     

  •     Spark 1.4.0
     

  •     Spark 2.1.0
     

  •   Spark 1.1

Which of the following is true for Spark R?

  •  It allows data scientists to analyze large datasets and interactively run jobs
     

  •     It is the kernel of Spark
     

  •     It is the scalable machine learning library which delivers efficiencies
     

  •     It enables users to run SQL / HQL queries on the top of Spark.

When does Apache Spark evaluate RDD?

  • Upon action
     

  •   Upon transformation
     

  •  On both transformation and action

Which of the following is not the feature of Spark?

  •  Supports in-memory computation
     

  •   Fault-tolerance
     

  •    It is cost efficient
     

  •     Compatible with other file storage system

Is MLlib deprecated?

  • Yes

  • No

Apache Spark was made open-source in which year?

  •  2010
     

  •     2011
     

  •    2008
     

  •    2009

Apache Spark has API’s in

  •  Java
     

  •   Scala
     

  •   Python
     

  •   All of the above

Which is not a component on the top of Spark Core?

  •  Spark RDD
     

  •   Spark Streaming
     

  •     MLlib
     

  •     None of the above

Dstream internally is

  •  Continuous Stream of RDD
     

  • Continuous Stream of DataFrame
     

  •     Continuous Stream of DataSet

     

  •     None of the above
     

Which of the following is true for Spark MLlib?

  •      Provides an execution platform for all the Spark applications
     

  •     It is the scalable machine learning library which delivers efficiencies
     

  •   enables powerful interactive and data analytics application across live streaming data
     

  •     All of the above

Which of the following is not an action?

  •  collect()
     

  •    take(n)
     

  •     top()
     

  •    map
     

Which of the following is not true for Hadoop and Spark?

  •  Both are data processing platforms
     

  •     Both are cluster computing environments
     

  •    Both have their own file system
     

  •    Both use open source APIs to link between different tools

Which of the following algorithm is not present in MLlib?

  •  Streaming Linear Regression
     

  •    Streaming KMeans
     

  •  Tanimoto distance
     

  •     None of the above

Which of the following is true about DataFrame?

  •  DataFrames provide a more user-friendly API than RDDs.
     

  •      DataFrame API have provision for compile-time type safety
     

  •    Both the above

Which of the following is true for RDD?

  •  RDD is programming paradigm
     

  •  RDD in Apache Spark is an immutable collection of objects
     

  •   It is database
     

  •    None of the above

Which of the following is true for RDD?

  • We can operate Spark RDDs in parallel with a low-level API
     

  •   RDDs are similar to the table in a relational database
     

  •     It allows processing of a large amount of structured data
     

  •     It has built-in optimization engine

Which of the following is not output operation on DStream

  •  SaveAsTextFiles
     

  •      ForeachRDD
     

  •    SaveAsHadoopFiles
     

  •    ReduceByKeyAndWindow

Which Cluster Manager do Spark Support?

  •  Standalone Cluster Manager
     

  •     MESOS
     

  •     YARN
     

  •     All of the above

Can we add or setup new string computation after SparkContext starts

  • Yes

  • No

Which of the following is true for Spark core?

  •  It is the kernel of Spark
     

  •    It enables users to run SQL / HQL queries on the top of Spark.
     

  •    It is the scalable machine learning library which delivers efficiencies
     

  •    Improves the performance of iterative algorithm drastically.

The basic abstraction of Spark Streaming is

  •  Dstream
     

  •     RDD
     

  •   Shared Variable
     

  •    None of the above

You can connect R program to a Spark cluster from –

  •  RStudio
     

  •    R Shell
     

  •     Rscript
     

  •     All of the above

Which of the following is a tool of Machine Learning Library?

  •  Persistence
     

  •   Utilities like linear algebra, statistics
     

  •     Pipelines
     

  •    All of the above

Does Spark R make use of MLlib in any aspect?

  • Yes

  • No

Which of the following is false for Apache Spark?

  •  It provides high-level API in Java, Python, R, Scala
     

  •     It can be integrated with Hadoop and can process existing Hadoop HDFS data
     

  •    Spark is an open source framework which is written in Java
     

  •  Spark is 100 times faster than Bigdata Hadoop