Independence Day Deal! Unlock 25% OFF Today – Limited-Time Offer - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Data Engineer Professional Topic 1 Question 3 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 3
Topic #: 1
[All Databricks Certified Data Engineer Professional Questions]

A data architect has heard about lake's built-in versioning and time travel capabilities. For auditing purposes they have a requirement to maintain a full of all valid street addresses as they appear in the customers table.

The architect is interested in implementing a Type 1 table, overwriting existing records with new values and relying on Delta Lake time travel to support long-term auditing. A data engineer on the project feels that a Type 2 table will provide better performance and scalability.

Which piece of information is critical to this decision?

Show Suggested Answer Hide Answer
Suggested Answer: A

Delta Lake's time travel feature allows users to access previous versions of a table, providing a powerful tool for auditing and versioning. However, using time travel as a long-term versioning solution for auditing purposes can be less optimal in terms of cost and performance, especially as the volume of data and the number of versions grow. For maintaining a full history of valid street addresses as they appear in a customers table, using a Type 2 table (where each update creates a new record with versioning) might provide better scalability and performance by avoiding the overhead associated with accessing older versions of a large table. While Type 1 tables, where existing records are overwritten with new values, seem simpler and can leverage time travel for auditing, the critical piece of information is that time travel might not scale well in cost or latency for long-term versioning needs, making a Type 2 approach more viable for performance and scalability. Reference:

Databricks Documentation on Delta Lake's Time Travel: Delta Lake Time Travel

Databricks Blog on Managing Slowly Changing Dimensions in Delta Lake: Managing SCDs in Delta Lake


Contribute your Thoughts:

Verdell
12 months ago
True, we need to weigh the pros and cons of Type 1 and Type 2 tables before moving forward with our implementation.
upvoted 0 times
...
Adelle
12 months ago
I still think a Type 2 table would provide better performance and scalability overall. It's worth considering all options before making a decision.
upvoted 0 times
...
Lucy
12 months ago
That's a good point. Shallow clones combined with Type 1 tables could be a game changer for performance.
upvoted 0 times
...
Deandrea
12 months ago
What about using shallow clones with Type 1 tables? Would that help accelerate historic queries for long-term versioning?
upvoted 0 times
...
Verdell
12 months ago
I agree. If we can't query previous versions, then relying on Delta Lake time travel for long-term auditing wouldn't make sense.
upvoted 0 times
...
Lucy
12 months ago
I think the critical information for this decision is whether Delta Lake time travel can be used to query previous versions of the tables.
upvoted 0 times
...
Laine
12 months ago
That's a valid point. We need to ensure our chosen table type can handle updates without risking data integrity.
upvoted 0 times
...
Selene
1 years ago
But wouldn't data corruption be a big issue if we don't handle queries in a Type 2 table properly?
upvoted 0 times
...
Sharika
1 years ago
I believe shallow clones can help accelerate historic queries, which could be beneficial for long-term versioning.
upvoted 0 times
...
Shayne
1 years ago
I agree. It's important to consider the scalability and performance of the versioning solution.
upvoted 0 times
...
Laine
1 years ago
I think the critical information is whether Delta Lake time travel can support long-term auditing effectively.
upvoted 0 times
...

Save Cancel