APACHE SPARK QUIZ DESCRIPTION

Which is not a component on the top of Spark Core?

  •  Spark RDD
     

  •   Spark Streaming
     

  •     MLlib
     

  •     None of the above

What are the features of Spark RDD?

  •  In-memory computation
     

  •   Lazy evaluations
     

  •    Fault Tolerance
     

  •    All of the above

How many tasks does Spark run on each partition?

  •  Any number of task
     

  •    One
     

  •    More than one less than five
     

  • None of the above

Apache Spark has API’s in

  •  Java
     

  •   Scala
     

  •   Python
     

  •   All of the above

RDD are fault-tolerant and immutable

  • True

  • False

Which of the following is not true for Hadoop and Spark?

  •  Both are data processing platforms
     

  •     Both are cluster computing environments
     

  •    Both have their own file system
     

  •    Both use open source APIs to link between different tools

Which of the following is true for Spark core?

  •  It is the kernel of Spark
     

  •    It enables users to run SQL / HQL queries on the top of Spark.
     

  •    It is the scalable machine learning library which delivers efficiencies
     

  •    Improves the performance of iterative algorithm drastically.

The default storage level of cache() is?

  •  MEMORY_ONLY
     

  •   MEMORY_AND_DISK
     

  •   DISK_ONLY
     

  •     MEMORY_ONLY_SER

Which of the following is true about DataFrame?

  •  DataFrames provide a more user-friendly API than RDDs.
     

  •      DataFrame API have provision for compile-time type safety
     

  •    Both the above

How much faster can Apache Spark potentially run batch-processing programs when processed in memory than MapReduce can?

  •  10 times faster
     

  •  20 times faster
     

  •   100 times faster
     

  •  200 times faster

Which of the following is not a transformation?

  •  Flatmap
     

  •    Map
     

  •   Reduce
     

  •    Filter

Which Cluster Manager do Spark Support?

  •  Standalone Cluster Manager
     

  •     MESOS
     

  •     YARN
     

  •     All of the above

Is MLlib deprecated?

  • Yes

  • No

When does Apache Spark evaluate RDD?

  • Upon action
     

  •   Upon transformation
     

  •  On both transformation and action

The basic abstraction of Spark Streaming is

  •  Dstream
     

  •     RDD
     

  •   Shared Variable
     

  •    None of the above

Which of the following is a tool of Machine Learning Library?

  •  Persistence
     

  •   Utilities like linear algebra, statistics
     

  •     Pipelines
     

  •    All of the above

Which is the abstraction of Apache Spark?

  •  Shared Variable
     

  •   RDD
     

  •  Both the above
     

  • None of the above

Can we add or setup new string computation after SparkContext starts

  • Yes

  • No

Dstream internally is

  •  Continuous Stream of RDD
     

  • Continuous Stream of DataFrame
     

  •     Continuous Stream of DataSet

     

  •     None of the above
     

Which of the following is true for RDD?

  • We can operate Spark RDDs in parallel with a low-level API
     

  •   RDDs are similar to the table in a relational database
     

  •     It allows processing of a large amount of structured data
     

  •     It has built-in optimization engine

Spark is developed in which language

  •  Java
     

  •   Scala
     

  •    Python
     

  •   R

Which of the following is not an action?

  •  collect()
     

  •    take(n)
     

  •     top()
     

  •    map
     

Can we edit the data of RDD, for example, the case conversion?

  • Yes

  • No

In Spark Streaming the data can be from what all sources?

  •  Kafka
     

  •    Flume
     

  •    Kinesis
     

  •   All of the above

Which of the following provide the Spark Core’s fast scheduling capability to perform streaming analytics.

  •  RDD
     

  •     GraphX
     

  •    Spark Streaming
     

  •   Spark R

Which Cluster Manager do Spark Support?

  •  Standalone Cluster Manager
     

  •    MESOS
     

  •   YARN
     

  •   All of the above

Which of the following is not output operation on DStream

  •  SaveAsTextFiles
     

  •      ForeachRDD
     

  •    SaveAsHadoopFiles
     

  •    ReduceByKeyAndWindow

In how many ways RDD can be created?

  •  4
     

  • 3

  • 2

  • 1

In addition to stream processing jobs, what all functionality do Spark provides?

  •  Machine learning
     

  •     Graph processing
     

  •   Batch processing
     

  •    All of the above

Which of the following is true for Spark Shell?

  •  It helps Spark applications to easily run on the command line of the system
     

  •   It runs/tests application code interactively
     

  •    It allows reading from many types of data sources
     

  •      All of the above