APACHE SPARK QUIZ DESCRIPTION

Which of the following is a tool of Machine Learning Library?

  •  Persistence
     

  •   Utilities like linear algebra, statistics
     

  •     Pipelines
     

  •    All of the above

When does Apache Spark evaluate RDD?

  • Upon action
     

  •   Upon transformation
     

  •  On both transformation and action

In how many ways RDD can be created?

  •  4
     

  • 3

  • 2

  • 1

Can you combine the libraries of Apache Spark into the same Application, for example, MLlib, GraphX, SQL and DataFrames etc.

  • Yes

  • No

Can we add or setup new string computation after SparkContext starts

  • Yes

  • No

Apache Spark was made open-source in which year?

  •  2010
     

  •     2011
     

  •    2008
     

  •    2009

Which of the following algorithm is not present in MLlib?

  •  Streaming Linear Regression
     

  •    Streaming KMeans
     

  •  Tanimoto distance
     

  •     None of the above

Apache Spark has API’s in

  •  Java
     

  •   Scala
     

  •   Python
     

  •   All of the above

Which of the following is not the feature of Spark?

  •  Supports in-memory computation
     

  •   Fault-tolerance
     

  •    It is cost efficient
     

  •     Compatible with other file storage system

In Spark Streaming the data can be from what all sources?

  •  Kafka
     

  •    Flume
     

  •    Kinesis
     

  •   All of the above

Which of the following is true for RDD?

  •  RDD is programming paradigm
     

  •  RDD in Apache Spark is an immutable collection of objects
     

  •   It is database
     

  •    None of the above

RDD are fault-tolerant and immutable

  • True

  • False

Which of the following is not output operation on DStream

  •  SaveAsTextFiles
     

  •      ForeachRDD
     

  •    SaveAsHadoopFiles
     

  •    ReduceByKeyAndWindow

Which of the following is true about DataFrame?

  •  DataFrames provide a more user-friendly API than RDDs.
     

  •      DataFrame API have provision for compile-time type safety
     

  •    Both the above

Which of the following is false for Apache Spark?

  •  It provides high-level API in Java, Python, R, Scala
     

  •     It can be integrated with Hadoop and can process existing Hadoop HDFS data
     

  •    Spark is an open source framework which is written in Java
     

  •  Spark is 100 times faster than Bigdata Hadoop

Which Cluster Manager do Spark Support?

  •  Standalone Cluster Manager
     

  •    MESOS
     

  •   YARN
     

  •   All of the above

In addition to stream processing jobs, what all functionality do Spark provides?

  •  Machine learning
     

  •     Graph processing
     

  •   Batch processing
     

  •    All of the above

Does Spark R make use of MLlib in any aspect?

  • Yes

  • No

Which of the following is not an action?

  •  collect()
     

  •    take(n)
     

  •     top()
     

  •    map
     

How much faster can Apache Spark potentially run batch-processing programs when processed in memory than MapReduce can?

  •  10 times faster
     

  •  20 times faster
     

  •   100 times faster
     

  •  200 times faster

How many tasks does Spark run on each partition?

  •  Any number of task
     

  •    One
     

  •    More than one less than five
     

  • None of the above

For Regression problem which algorithm is not the solution?

  •  Logistic Regression
     

  •     Ridge Regression
     

  •    Decision Trees
     

  •   Gradient-Boosted Trees

Which of the following provide the Spark Core’s fast scheduling capability to perform streaming analytics.

  •  RDD
     

  •     GraphX
     

  •    Spark Streaming
     

  •   Spark R

Which of the following is true for Spark Shell?

  •  It helps Spark applications to easily run on the command line of the system
     

  •   It runs/tests application code interactively
     

  •    It allows reading from many types of data sources
     

  •      All of the above

Which of the following is not a component of Spark Ecosystem?

  • Sqoop
     

  •     GraphX
     

  •   MLlib
     

  •  BlinkDB

Dataset was introduced in which Spark release?

  •  Spark 1.6
     

  •     Spark 1.4.0
     

  •     Spark 2.1.0
     

  •   Spark 1.1

Is MLlib deprecated?

  • Yes

  • No

The basic abstraction of Spark Streaming is

  •  Dstream
     

  •     RDD
     

  •   Shared Variable
     

  •    None of the above

You can connect R program to a Spark cluster from –

  •  RStudio
     

  •    R Shell
     

  •     Rscript
     

  •     All of the above

Which Cluster Manager do Spark Support?

  •  Standalone Cluster Manager
     

  •     MESOS
     

  •     YARN
     

  •     All of the above