Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 6 Question 15 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam

Question #: 15
Topic #: 6

[All Databricks-Certified-Professional-Data-Engineer Questions]

The data engineer is using Spark's MEMORY_ONLY storage level.

Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

ASize on Disk is> 0

BThe number of Cached Partitions> the number of Spark Partitions

CThe RDD Block Name included the '' annotation signaling failure to cache

DOn Heap Memory Usage is within 75% of off Heap Memory usage

Show Suggested Answer

Suggested Answer: C

In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.

by Lazaro at Jul 06, 2024, 01:14 PM

Limited Time Offer

25%

Off

Get Premium Databricks-Certified-Professional-Data-Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Anissa

4 months ago

I'm no data engineer, but I hear caching is like trying to remember where you left your car keys. If the Spark UI is confused, you know you've got a problem!

upvoted 0 times

Phillip

4 months ago

B) The number of Cached Partitions> the number of Spark Partitions

upvoted 0 times

...

Jettie

4 months ago

A) Size on Disk is> 0

upvoted 0 times

...

Raul

4 months ago

I feel like the answer is B. Anything that doesn't match the actual Spark partitions is probably not a good sign.

upvoted 0 times

Lizbeth

4 months ago

Yeah, it's important to keep an eye on those indicators in the Spark UI's Storage tab.

upvoted 0 times

...

Jerry

4 months ago

I agree, that would definitely be a sign that the cached table is not performing optimally.

upvoted 0 times

...

Frederica

4 months ago

I think the answer is B. If the number of Cached Partitions is greater than the number of Spark Partitions, it's not good.

upvoted 0 times

...

Lea

5 months ago

I think C) The RDD Block Name included the '' annotation signaling failure to cache is also a valid indicator of poor performance.

upvoted 0 times

...

Enola

5 months ago

But if the number of Cached Partitions is greater than the number of Spark Partitions, wouldn't that indicate a performance issue?

upvoted 0 times

...

Lemuel

5 months ago

I disagree, I believe the correct answer is A) Size on Disk is> 0.

upvoted 0 times

...

Georgeanna

5 months ago

This is a tough one, but I'm going to go with C. The '' annotation in the RDD Block Name is a clear indication of caching failure.

upvoted 0 times

Valentin

3 months ago

I agree with User1, C) The RDD Block Name included the '' annotation signaling failure to cache seems like the right choice

upvoted 0 times

...

Lachelle

3 months ago

I'm leaning towards B) The number of Cached Partitions> the number of Spark Partitions

upvoted 0 times

...

Jenelle

3 months ago

I think it's A) Size on Disk is> 0

upvoted 0 times

...

Latanya

4 months ago

I'm not sure, but I think it might be B) The number of Cached Partitions> the number of Spark Partitions

upvoted 0 times

...

Helaine

4 months ago

I disagree, I believe it's C) The RDD Block Name included the '' annotation signaling failure to cache

upvoted 0 times

...

Nell

5 months ago

I think it's A) Size on Disk is> 0

upvoted 0 times

...

Thaddeus

5 months ago

Hmm, I'm going with B. If the number of cached partitions is greater than the actual Spark partitions, that's a red flag.

upvoted 0 times

Yolando

4 months ago

Yeah, if the number of cached partitions is more than the Spark partitions, it's not performing optimally.

upvoted 0 times

...

Darci

4 months ago

I think B is the right indicator to look for.

upvoted 0 times

...

Enola

5 months ago

I think the answer is B) The number of Cached Partitions> the number of Spark Partitions.

upvoted 0 times

...

Christoper

5 months ago

D sounds like the right choice to me. If the on-heap and off-heap memory usage are out of balance, that's a sign of suboptimal caching.

upvoted 0 times

...

Marylin

5 months ago

I think the correct answer is C. The RDD Block Name with the '' annotation signals that the caching was unsuccessful.

upvoted 0 times

Scarlet

4 months ago

D) On Heap Memory Usage is within 75% of off Heap Memory usage

upvoted 0 times

...

Gwenn

4 months ago

C) The RDD Block Name included the '' annotation signaling failure to cache

upvoted 0 times

...

Mariko

4 months ago

B) The number of Cached Partitions> the number of Spark Partitions

upvoted 0 times

...

Clare

5 months ago

A) Size on Disk is> 0

upvoted 0 times

...

Vivienne

5 months ago

User2

upvoted 0 times

...

Layla

5 months ago

User1

upvoted 0 times

...