Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks Certified Data Engineer Professional Topic 6 Question 27 Discussion

Actual exam question for Databricks's Databricks Certified Data Engineer Professional exam
Question #: 27
Topic #: 6
[All Databricks Certified Data Engineer Professional Questions]

A data pipeline uses Structured Streaming to ingest data from kafka to Delta Lake. Data is being stored in a bronze table, and includes the Kafka_generated timesamp, key, and value. Three months after the pipeline is deployed the data engineering team has noticed some latency issued during certain times of the day.

A senior data engineer updates the Delta Table's schema and ingestion logic to include the current timestamp (as recoded by Apache Spark) as well the Kafka topic and partition. The team plans to use the additional metadata fields to diagnose the transient processing delays:

Which limitation will the team face while diagnosing this problem?

Show Suggested Answer Hide Answer
Suggested Answer: B

A unit test is designed to verify the correctness of a small, isolated piece of code, typically a single function. Testing a mathematical function that calculates the area under a curve is an example of a unit test because it is testing a specific, individual function to ensure it operates as expected.


Software Testing Fundamentals: Unit Testing

Contribute your Thoughts:

Vallie
1 months ago
Wait, they're using Structured Streaming with Kafka and Delta Lake? Someone's been watching too many Big Data tutorials on YouTube.
upvoted 0 times
Melissa
11 days ago
I wonder if adding those new fields will really help with the latency issues.
upvoted 0 times
...
Marsha
14 days ago
I know right, seems like they're trying to implement all the latest technologies.
upvoted 0 times
...
...
Miesha
2 months ago
Spark can't capture the topic partition fields from Kafka? That's wild. I guess the team's gonna have to get creative with their diagnostics.
upvoted 0 times
...
Winifred
2 months ago
Option C, huh? Providing default values for each file added? That's a pain, but I suppose it's better than having the schema update fail entirely.
upvoted 0 times
Marcelle
6 days ago
True, it's a trade-off for maintaining the integrity of the data pipeline.
upvoted 0 times
...
Gerald
8 days ago
I agree, but at least it ensures the schema update doesn't fail completely.
upvoted 0 times
...
Ronny
1 months ago
Yeah, it can be a hassle to provide default values for each file added.
upvoted 0 times
...
...
Frederica
2 months ago
Hmm, my money's on option B. Messing with the Delta transaction log metadata? That sounds like a recipe for disaster.
upvoted 0 times
Vernell
26 days ago
True, but I still think option B is the most risky choice here.
upvoted 0 times
...
Nicholle
1 months ago
I think option C might also be a limitation, having to provide a default value for each file added sounds like a hassle.
upvoted 0 times
...
Willis
1 months ago
But what about option A? Would that also cause problems with historic records?
upvoted 0 times
...
Rosamond
2 months ago
I agree, messing with the transaction log metadata could cause some serious issues.
upvoted 0 times
...
...
Elbert
3 months ago
I think the limitation will be that Spark cannot capture the topic partition fields from the kafka source.
upvoted 0 times
...
Theron
3 months ago
I disagree, I believe the limitation will be that updating the table schema will invalidate the Delta transaction log metadata.
upvoted 0 times
...
Kenneth
3 months ago
Ah, the joys of schema evolution! I guess the team is in for a fun time with those 'transient processing delays'. At least they're trying to get to the bottom of it.
upvoted 0 times
Glory
2 months ago
B) Updating the table schema will invalidate the Delta transaction log metadata.
upvoted 0 times
...
Emile
2 months ago
A) New fields not be computed for historic records.
upvoted 0 times
...
...
Soledad
3 months ago
I think the limitation will be that new fields cannot be computed for historic records.
upvoted 0 times
...

Save Cancel