The data engineer is using Spark's MEMORY_ONLY storage level.
Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?
In the Spark UI's Storage tab, an indicator that a cached table is not performing optimally would be the presence of the _disk annotation in the RDD Block Name. This annotation indicates that some partitions of the cached data have been spilled to disk because there wasn't enough memory to hold them. This is suboptimal because accessing data from disk is much slower than from memory. The goal of caching is to keep data in memory for fast access, and a spill to disk means that this goal is not fully achieved.
Anissa
4 months agoPhillip
4 months agoJettie
4 months agoRaul
4 months agoLizbeth
4 months agoJerry
4 months agoFrederica
4 months agoLea
5 months agoEnola
5 months agoLemuel
5 months agoGeorgeanna
5 months agoValentin
3 months agoLachelle
3 months agoJenelle
3 months agoLatanya
4 months agoHelaine
4 months agoNell
5 months agoThaddeus
5 months agoYolando
4 months agoDarci
4 months agoEnola
5 months agoChristoper
5 months agoMarylin
5 months agoScarlet
4 months agoGwenn
4 months agoMariko
4 months agoClare
5 months agoVivienne
5 months agoLayla
5 months ago