BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 1 Question 2 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 2
Topic #: 1
[All Databricks-Certified-Professional-Data-Engineer Questions]

A data engineer wants to join a stream of advertisement impressions (when an ad was shown) with another stream of user clicks on advertisements to correlate when impression led to monitizable clicks.

Which solution would improve the performance?

A)

B)

C)

D)

Show Suggested Answer Hide Answer
Suggested Answer: A

When joining a stream of advertisement impressions with a stream of user clicks, you want to minimize the state that you need to maintain for the join. Option A suggests using a left outer join with the condition that clickTime == impressionTime, which is suitable for correlating events that occur at the exact same time. However, in a real-world scenario, you would likely need some leeway to account for the delay between an impression and a possible click. It's important to design the join condition and the window of time considered to optimize performance while still capturing the relevant user interactions. In this case, having the watermark can help with state management and avoid state growing unbounded by discarding old state data that's unlikely to match with new data.


Contribute your Thoughts:

Brittni
5 months ago
I agree with Nana, Option B looks like the best choice for improving performance.
upvoted 0 times
...
Nana
5 months ago
Option B seems to have a more efficient way of joining the streams based on the image provided.
upvoted 0 times
...
Werner
5 months ago
Why do you think Option B is better?
upvoted 0 times
...
Nana
6 months ago
I disagree, I believe Option B would be more effective.
upvoted 0 times
...
Werner
6 months ago
I think the solution to improve performance is Option A.
upvoted 0 times
...
Juliana
6 months ago
I think option D is the way to go, it seems to offer a more scalable solution for correlating ad impressions with clicks.
upvoted 0 times
...
Werner
6 months ago
I'm leaning towards option C because it looks like it could potentially enhance the performance of joining the streams.
upvoted 0 times
...
Rebeca
6 months ago
I disagree, I believe option B is the better choice as it might offer a more optimized solution for correlating impressions with clicks.
upvoted 0 times
...
Quiana
6 months ago
I think the answer is option A because it seems to provide a more efficient way to join the two streams.
upvoted 0 times
...

Save Cancel