Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 6 Question 18 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 18
Topic #: 6
[All Databricks Certified Data Engineer Professional Questions]

The business intelligence team has a dashboard configured to track various summary metrics for retail stories. This includes total sales for the previous day alongside totals and averages for a variety of time periods. The fields required to populate this dashboard have the following schema:

For Demand forecasting, the Lakehouse contains a validated table of all itemized sales updated incrementally in near real-time. This table named products_per_order, includes the following fields:

Because reporting on long-term sales trends is less volatile, analysts using the new dashboard only require data to be refreshed once daily. Because the dashboard will be queried interactively by many users throughout a normal business day, it should return results quickly and reduce total compute associated with each materialization.

Which solution meets the expectations of the end users while controlling and limiting possible costs?

Show Suggested Answer Hide Answer
Suggested Answer: D

Given the requirement for daily refresh of data and the need to ensure quick response times for interactive queries while controlling costs, a nightly batch job to pre-compute and save the required summary metrics is the most suitable approach.

By pre-aggregating data during off-peak hours, the dashboard can serve queries quickly without requiring on-the-fly computation, which can be resource-intensive and slow, especially with many users.

This approach also limits the cost by avoiding continuous computation throughout the day and instead leverages a batch process that efficiently computes and stores the necessary data.

The other options (A, C, D) either do not address the cost and performance requirements effectively or are not suitable for the use case of less frequent data refresh and high interactivity.


Databricks Documentation on Batch Processing: Databricks Batch Processing

Data Lakehouse Patterns: Data Lakehouse Best Practices

Contribute your Thoughts:

Giovanna
17 days ago
That's a valid point, but it also ensures real-time data availability for the users. It's a trade-off between speed and cost.
upvoted 0 times
...
Vilma
18 days ago
But won't live streaming consume more compute resources and increase costs?
upvoted 0 times
...
Giovanna
20 days ago
I disagree, I believe option C is better as it allows for live updates and interactive querying.
upvoted 0 times
...
Vilma
22 days ago
I think option A is the best choice because caching the table in memory will make the dashboard faster.
upvoted 0 times
...
Zona
27 days ago
Option A sounds tempting, but caching the entire table in memory might not be the most cost-effective solution. I'd probably go with option D as well.
upvoted 0 times
...
Kristel
1 months ago
Option C, huh? Looks like someone's been watching too many Databricks demos. Let's keep it simple, folks.
upvoted 0 times
Mollie
8 days ago
B) Populate the dashboard by configuring a nightly batch job to save the required to quickly update the dashboard with each query.
upvoted 0 times
...
Rex
11 days ago
A) Use the Delta Cache to persists the products_per_order table in memory to quickly the dashboard with each query.
upvoted 0 times
...
...
Milly
1 months ago
Hold on, a nightly batch job? That's so 2010s. What is this, the dark ages of data engineering?
upvoted 0 times
Noemi
22 days ago
Hold on, a nightly batch job? That's so 2010s. What is this, the dark ages of data engineering?
upvoted 0 times
...
Willodean
27 days ago
A) Use the Delta Cache to persists the products_per_order table in memory to quickly the dashboard with each query.
upvoted 0 times
...
...
Kenia
1 months ago
I think option D is the best solution. Defining a view against the products_per_order table and using that for the dashboard will provide the required data refresh frequency and reduce compute costs.
upvoted 0 times
Viki
11 days ago
I think using a view for the dashboard is a smart choice in this scenario.
upvoted 0 times
...
Lai
12 days ago
Yeah, defining a view against the table will definitely help with data refresh and cost control.
upvoted 0 times
...
Ona
13 days ago
I agree, option D seems like the most efficient solution.
upvoted 0 times
...
...

Save Cancel