Databricks Exam Databricks Certified Associate Developer for Apache Spark 3.0 Topic 1 Question 28 Discussion

Actual exam question for Databricks's Databricks Certified Associate Developer for Apache Spark 3.0 exam

Question #: 28
Topic #: 1

[All Databricks Certified Associate Developer for Apache Spark 3.0 Questions]

Which of the following statements about storage levels is incorrect?

AThe cache operator on DataFrames is evaluated like a transformation.

BIn client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node's memory.

CCaching can be undone using the DataFrame.unpersist() operator.

DMEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

EDISK_ONLY will not use the worker node's memory.

Show Suggested Answer

Suggested Answer: D

MEMORY_AND_DISK replicates cached DataFrames both on memory and disk.

Correct, this statement is wrong. Spark prioritizes storage in memory, and will only store data on disk that does not fit into memory.

DISK_ONLY will not use the worker node's memory.

Wrong, this statement is correct. DISK_ONLY keeps data only on the worker node's disk, but not in memory.

In client mode, DataFrames cached with the MEMORY_ONLY_2 level will not be stored in the edge node's memory.

Wrong, this statement is correct. In fact, Spark does not have a provision to cache DataFrames in the driver (which sits on the edge node in client mode). Spark caches DataFrames in the executors'

memory.

Caching can be undone using the DataFrame.unpersist() operator.

Wrong, this statement is correct. Caching, as achieved via the DataFrame.cache() or DataFrame.persist() operators can be undone using the DataFrame.unpersist() operator. This operator will

remove all of its parts from the executors' memory and disk.

The cache operator on DataFrames is evaluated like a transformation.

Wrong, this statement is correct. DataFrame.cache() is evaluated like a transformation: Through lazy evaluation. This means that after calling DataFrame.cache() the command will not have any

effect until you call a subsequent action, like DataFrame.cache().count().

More info: pyspark.sql.DataFrame.unpersist --- PySpark 3.1.2 documentation

by Bernardo at Oct 25, 2022, 02:40 AM

Limited Time Offer

25%

Off

Get Premium Databricks Certified Associate Developer for Apache Spark 3.0 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!