BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 6 Question 18 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 18
Topic #: 6
[All Databricks-Certified-Professional-Data-Engineer Questions]

The business intelligence team has a dashboard configured to track various summary metrics for retail stories. This includes total sales for the previous day alongside totals and averages for a variety of time periods. The fields required to populate this dashboard have the following schema:

For Demand forecasting, the Lakehouse contains a validated table of all itemized sales updated incrementally in near real-time. This table named products_per_order, includes the following fields:

Because reporting on long-term sales trends is less volatile, analysts using the new dashboard only require data to be refreshed once daily. Because the dashboard will be queried interactively by many users throughout a normal business day, it should return results quickly and reduce total compute associated with each materialization.

Which solution meets the expectations of the end users while controlling and limiting possible costs?

Show Suggested Answer Hide Answer
Suggested Answer: D

Given the requirement for daily refresh of data and the need to ensure quick response times for interactive queries while controlling costs, a nightly batch job to pre-compute and save the required summary metrics is the most suitable approach.

By pre-aggregating data during off-peak hours, the dashboard can serve queries quickly without requiring on-the-fly computation, which can be resource-intensive and slow, especially with many users.

This approach also limits the cost by avoiding continuous computation throughout the day and instead leverages a batch process that efficiently computes and stores the necessary data.

The other options (A, C, D) either do not address the cost and performance requirements effectively or are not suitable for the use case of less frequent data refresh and high interactivity.


Databricks Documentation on Batch Processing: Databricks Batch Processing

Data Lakehouse Patterns: Data Lakehouse Best Practices

Contribute your Thoughts:

Giovanna
2 months ago
That's a valid point, but it also ensures real-time data availability for the users. It's a trade-off between speed and cost.
upvoted 0 times
...
Vilma
2 months ago
But won't live streaming consume more compute resources and increase costs?
upvoted 0 times
...
Giovanna
2 months ago
I disagree, I believe option C is better as it allows for live updates and interactive querying.
upvoted 0 times
...
Vilma
2 months ago
I think option A is the best choice because caching the table in memory will make the dashboard faster.
upvoted 0 times
...
Zona
2 months ago
Option A sounds tempting, but caching the entire table in memory might not be the most cost-effective solution. I'd probably go with option D as well.
upvoted 0 times
...
Kristel
2 months ago
Option C, huh? Looks like someone's been watching too many Databricks demos. Let's keep it simple, folks.
upvoted 0 times
Mollie
1 months ago
B) Populate the dashboard by configuring a nightly batch job to save the required to quickly update the dashboard with each query.
upvoted 0 times
...
Rex
1 months ago
A) Use the Delta Cache to persists the products_per_order table in memory to quickly the dashboard with each query.
upvoted 0 times
...
...
Milly
2 months ago
Hold on, a nightly batch job? That's so 2010s. What is this, the dark ages of data engineering?
upvoted 0 times
Noemi
2 months ago
Hold on, a nightly batch job? That's so 2010s. What is this, the dark ages of data engineering?
upvoted 0 times
...
Willodean
2 months ago
A) Use the Delta Cache to persists the products_per_order table in memory to quickly the dashboard with each query.
upvoted 0 times
...
...
Kenia
3 months ago
I think option D is the best solution. Defining a view against the products_per_order table and using that for the dashboard will provide the required data refresh frequency and reduce compute costs.
upvoted 0 times
Viki
1 months ago
I think using a view for the dashboard is a smart choice in this scenario.
upvoted 0 times
...
Lai
1 months ago
Yeah, defining a view against the table will definitely help with data refresh and cost control.
upvoted 0 times
...
Ona
2 months ago
I agree, option D seems like the most efficient solution.
upvoted 0 times
...
...

Save Cancel