Amazon Exam Amazon-DEA-C01 Topic 2 Question 12 Discussion

Actual exam question for Amazon's Amazon-DEA-C01 exam

Question #: 12
Topic #: 2

A data engineer is building an automated extract, transform, and load (ETL) ingestion pipeline by using AWS Glue. The pipeline ingests compressed files that are in an Amazon S3 bucket. The ingestion pipeline must support incremental data processing.

Which AWS Glue feature should the data engineer use to meet this requirement?

AWorkflows

BTriggers

CJob bookmarks

DClassifiers

Show Suggested Answer

Suggested Answer: C

Problem Analysis:

The pipeline processes compressed files in S3 and must support incremental data processing.

AWS Glue features must facilitate tracking progress to avoid reprocessing the same data.

Key Considerations:

Incremental data processing requires tracking which files or partitions have already been processed.

The solution must be automated and efficient for large-scale ETL jobs.

Solution Analysis:

Option A: Workflows

Workflows organize and orchestrate multiple Glue jobs but do not track progress for incremental data processing.

Option B: Triggers

Triggers initiate Glue jobs based on a schedule or events but do not track which data has been processed.

Option C: Job Bookmarks

Job bookmarks track the state of the data that has been processed, enabling incremental processing.

Automatically skip files or partitions that were previously processed in Glue jobs.

Option D: Classifiers

Classifiers determine the schema of incoming data but do not handle incremental processing.

Final Recommendation:

Job bookmarks are specifically designed to enable incremental data processing in AWS Glue ETL pipelines.

AWS Glue Job Bookmarks Documentation

AWS Glue ETL Features

by Caprice at Mar 06, 2025, 03:33 PM

Limited Time Offer

25%

Off

Get Premium Amazon-DEA-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Gracia

3 months ago

I believe triggers could also be used for incremental data processing in the AWS Glue pipeline.

upvoted 0 times

...

3 months ago

Hmm, I'm torn between B. Triggers and C. Job bookmarks. Triggers could be used to kick off the pipeline based on new file arrivals, but bookmarks might be better for actually tracking the incremental progress.

upvoted 0 times

Reyes

2 months ago

You make a good point, maybe we can use both features together for a more robust solution.

upvoted 0 times

...

Johnetta

3 months ago

But wouldn't B. Triggers help kick off the pipeline when new files arrive?

upvoted 0 times

...

Lorrine

3 months ago

I think you're right, C. Job bookmarks would be better for tracking incremental progress.

upvoted 0 times

...

Julio

3 months ago

I think the data engineer should use job bookmarks for incremental data processing.

upvoted 0 times

...

3 months ago

I agree, using job bookmarks would be the best option for supporting incremental data processing in the ETL pipeline.

upvoted 0 times

...