Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Engineer Topic 1 Question 2 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Engineer exam
Question #: 2
Topic #: 1
[All Databricks-Certified-Professional-Data-Engineer Questions]

A data engineer wants to join a stream of advertisement impressions (when an ad was shown) with another stream of user clicks on advertisements to correlate when impression led to monitizable clicks.

Which solution would improve the performance?

A)

B)

C)

D)

Show Suggested Answer Hide Answer
Suggested Answer: A

When joining a stream of advertisement impressions with a stream of user clicks, you want to minimize the state that you need to maintain for the join. Option A suggests using a left outer join with the condition that clickTime == impressionTime, which is suitable for correlating events that occur at the exact same time. However, in a real-world scenario, you would likely need some leeway to account for the delay between an impression and a possible click. It's important to design the join condition and the window of time considered to optimize performance while still capturing the relevant user interactions. In this case, having the watermark can help with state management and avoid state growing unbounded by discarding old state data that's unlikely to match with new data.


Contribute your Thoughts:

Brittni
10 months ago
I agree with Nana, Option B looks like the best choice for improving performance.
upvoted 0 times
...
Nana
10 months ago
Option B seems to have a more efficient way of joining the streams based on the image provided.
upvoted 0 times
...
Werner
10 months ago
Why do you think Option B is better?
upvoted 0 times
...
Nana
11 months ago
I disagree, I believe Option B would be more effective.
upvoted 0 times
...
Werner
11 months ago
I think the solution to improve performance is Option A.
upvoted 0 times
...
Juliana
11 months ago
I think option D is the way to go, it seems to offer a more scalable solution for correlating ad impressions with clicks.
upvoted 0 times
...
Werner
11 months ago
I'm leaning towards option C because it looks like it could potentially enhance the performance of joining the streams.
upvoted 0 times
...
Rebeca
12 months ago
I disagree, I believe option B is the better choice as it might offer a more optimized solution for correlating impressions with clicks.
upvoted 0 times
...
Quiana
12 months ago
I think the answer is option A because it seems to provide a more efficient way to join the two streams.
upvoted 0 times
...

Save Cancel