asked 107k views
3 votes
Which statement about Apache Spark is true?

A. It supports HDFS, MS-SQL, and Oracle.
B. It is much faster than MapReduce for complex applications on disk.
C. It runs on Hadoop clusters with RAM drives configured on each DataNode.
D. It features APIs for C++ and .NET.

1 Answer

5 votes

Final answer:

Apache Spark is an open-source distributed data processing framework that is faster than MapReduce for complex applications on disk.

Step-by-step explanation:

Apache Spark is an open-source distributed data processing framework designed for big data processing and analytics. It provides a high-level API in Java, Scala, Python, and R, making it accessible to developers in different programming languages.

Three of the given statements are incorrect:

  1. Statement A is incorrect because Spark supports various data sources, but it doesn't specifically support MS-SQL or Oracle databases.
  2. Statement B is correct. Apache Spark is known for its speed, especially when compared to MapReduce, as it performs many operations in memory rather than on disk.
  3. Statement C is incorrect as Spark can run on Hadoop clusters, but does not require RAM drives to be configured on each DataNode.
  4. Statement D is incorrect because although Spark has APIs for Java, Scala, Python, and R, it does not have native APIs for C++ or .NET.