Databricks Machine Learning Professional Exam - Topic 1 Question 5 Discussion

Actual exam question for Databricks's Databricks Machine Learning Professional exam

Question #: 5
Topic #: 1

[All Databricks Machine Learning Professional Questions]

A machine learning engineering team has written predictions computed in a batch job to a Delta table for querying. However, the team has noticed that the querying is running slowly. The team has already tuned the size of the data files. Upon investigating, the team has concluded that the rows meeting the query condition are sparsely located throughout each of the data files.

Based on the scenario, which of the following optimization techniques could speed up the query by colocating similar records while considering values in multiple columns?

AZ-Ordering

BBin-packing

CWrite as a Parquet file

DData skipping

ETuning the file size

Show Suggested Answer

Suggested Answer: E

by Kanisha at Dec 08, 2023, 07:38 PM

Limited Time Offer

25%

Off

Get Premium Databricks Machine Learning Professional Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Dong

3 months ago

Parquet files are great, but I doubt they'll solve this issue.

upvoted 0 times

...

Kaycee

4 months ago

Definitely not tuning file size again, that didn't work!

upvoted 0 times

...

Jesse

4 months ago

Wait, can bin-packing really improve query speed?

upvoted 0 times

...

Frederic

4 months ago

I think data skipping could help too.

upvoted 0 times

...

Carisa

4 months ago

Z-Ordering is the way to go for this!

upvoted 0 times

...

Shawna

4 months ago

Bin-packing sounds familiar, but I can't recall how it specifically relates to this scenario. I feel like Z-Ordering is the stronger option.

upvoted 0 times

...

Leonida

5 months ago

I practiced a question about file formats, and I recall that Parquet files are efficient, but I don't think that's the main issue here.

upvoted 0 times

...

Becky

5 months ago

I'm not entirely sure, but I think data skipping could help with performance too. It might reduce the amount of data scanned.

upvoted 0 times

...

Kristel

5 months ago

I remember Z-Ordering being mentioned in class as a way to optimize queries by colocating similar records. It seems like a good fit here.

upvoted 0 times

...

Sanjuana

5 months ago

Z-Ordering sounds promising, but I'm not entirely sure how it works. I'll need to do some research on that technique before I can confidently select it as the answer.

upvoted 0 times

...

Pedro

5 months ago

Tuning the file size is something they've already tried, so that's not the answer. I'm leaning towards Z-Ordering or Parquet as the best options to consider.

upvoted 0 times

...

Audrie

5 months ago

I'm a bit confused by the options. Bin-packing and data skipping don't seem directly relevant to the problem statement. I'll need to think this through more carefully.

upvoted 0 times

...

Jovita

5 months ago

I think Z-Ordering could be a good option here. It allows you to colocate similar records based on multiple columns, which should help speed up the queries.

upvoted 0 times

...

Corazon

5 months ago

I think Z-Ordering is the way to go. It's designed to colocate similar records, which should help with the sparse data distribution issue described in the problem.

upvoted 0 times

...

Emmanuel

5 months ago

I'm pretty confident that the answer is A. The question is very specific about checking the certificate, and the AAM SMI's Server/Application Certificates section is where I would expect to find that information.

upvoted 0 times

...

Melodie

5 months ago

Hmm, I'm not totally sure about this one. I think it might be Psychographic, but I'm not 100% confident. I'll have to think it through carefully.

upvoted 0 times

...