New Year Sale ! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 6 Question 15 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 15
Topic #: 6
[All Databricks-Certified-Professional-Data-Engineer Questions]

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

Show Suggested Answer Hide Answer
Suggested Answer: C

In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.


Contribute your Thoughts:

Anissa
4 months ago
I'm no data engineer, but I hear caching is like trying to remember where you left your car keys. If the Spark UI is confused, you know you've got a problem!
upvoted 0 times
Phillip
4 months ago
B) The number of Cached Partitions> the number of Spark Partitions
upvoted 0 times
...
Jettie
4 months ago
A) Size on Disk is> 0
upvoted 0 times
...
...
Raul
4 months ago
I feel like the answer is B. Anything that doesn't match the actual Spark partitions is probably not a good sign.
upvoted 0 times
Lizbeth
4 months ago
Yeah, it's important to keep an eye on those indicators in the Spark UI's Storage tab.
upvoted 0 times
...
Jerry
4 months ago
I agree, that would definitely be a sign that the cached table is not performing optimally.
upvoted 0 times
...
Frederica
4 months ago
I think the answer is B. If the number of Cached Partitions is greater than the number of Spark Partitions, it's not good.
upvoted 0 times
...
...
Lea
5 months ago
I think C) The RDD Block Name included the '' annotation signaling failure to cache is also a valid indicator of poor performance.
upvoted 0 times
...
Enola
5 months ago
But if the number of Cached Partitions is greater than the number of Spark Partitions, wouldn't that indicate a performance issue?
upvoted 0 times
...
Lemuel
5 months ago
I disagree, I believe the correct answer is A) Size on Disk is> 0.
upvoted 0 times
...
Georgeanna
5 months ago
This is a tough one, but I'm going to go with C. The '' annotation in the RDD Block Name is a clear indication of caching failure.
upvoted 0 times
Valentin
3 months ago
I agree with User1, C) The RDD Block Name included the '' annotation signaling failure to cache seems like the right choice
upvoted 0 times
...
Lachelle
3 months ago
I'm leaning towards B) The number of Cached Partitions> the number of Spark Partitions
upvoted 0 times
...
Jenelle
3 months ago
I think it's A) Size on Disk is> 0
upvoted 0 times
...
Latanya
4 months ago
I'm not sure, but I think it might be B) The number of Cached Partitions> the number of Spark Partitions
upvoted 0 times
...
Helaine
4 months ago
I disagree, I believe it's C) The RDD Block Name included the '' annotation signaling failure to cache
upvoted 0 times
...
Nell
5 months ago
I think it's A) Size on Disk is> 0
upvoted 0 times
...
...
Thaddeus
5 months ago
Hmm, I'm going with B. If the number of cached partitions is greater than the actual Spark partitions, that's a red flag.
upvoted 0 times
Yolando
4 months ago
Yeah, if the number of cached partitions is more than the Spark partitions, it's not performing optimally.
upvoted 0 times
...
Darci
4 months ago
I think B is the right indicator to look for.
upvoted 0 times
...
...
Enola
5 months ago
I think the answer is B) The number of Cached Partitions> the number of Spark Partitions.
upvoted 0 times
...
Christoper
5 months ago
D sounds like the right choice to me. If the on-heap and off-heap memory usage are out of balance, that's a sign of suboptimal caching.
upvoted 0 times
...
Marylin
5 months ago
I think the correct answer is C. The RDD Block Name with the '' annotation signals that the caching was unsuccessful.
upvoted 0 times
Scarlet
4 months ago
D) On Heap Memory Usage is within 75% of off Heap Memory usage
upvoted 0 times
...
Gwenn
4 months ago
C) The RDD Block Name included the '' annotation signaling failure to cache
upvoted 0 times
...
Mariko
4 months ago
B) The number of Cached Partitions> the number of Spark Partitions
upvoted 0 times
...
Clare
5 months ago
A) Size on Disk is> 0
upvoted 0 times
...
Vivienne
5 months ago
User2
upvoted 0 times
...
Layla
5 months ago
User1
upvoted 0 times
...
...

Save Cancel