New Year Sale ! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam Amazon-DEA-C01 Topic 2 Question 7 Discussion

Actual exam question for Amazon's Amazon-DEA-C01 exam
Question #: 7
Topic #: 2
[All Amazon-DEA-C01 Questions]

A company uses AWS Glue Data Catalog to index data that is uploaded to an Amazon S3 bucket every day. The company uses a daily batch processes in an extract, transform, and load (ETL) pipeline to upload data from external sources into the S3 bucket.

The company runs a daily report on the S3 dat

a. Some days, the company runs the report before all the daily data has been uploaded to the S3 bucket. A data engineer must be able to send a message that identifies any incomplete data to an existing Amazon Simple Notification Service (Amazon SNS) topic.

Which solution will meet this requirement with the LEAST operational overhead?

Show Suggested Answer Hide Answer
Suggested Answer: C

AWS Glue workflows are designed to orchestrate the ETL pipeline, and you can create data quality checks to ensure the uploaded datasets are complete before running reports. If there is an issue with the data, AWS Glue workflows can trigger an Amazon EventBridge event that sends a message to an SNS topic.

AWS Glue Workflows:

AWS Glue workflows allow users to automate and monitor complex ETL processes. You can include data quality actions to check for null values, data types, and other consistency checks.

In the event of incomplete data, an EventBridge event can be generated to notify via SNS.


Alternatives Considered:

A (Airflow cluster): Managed Airflow introduces more operational overhead and complexity compared to Glue workflows.

B (EMR cluster): Setting up an EMR cluster is also more complex compared to the Glue-centric solution.

D (Lambda functions): While Lambda functions can work, using Glue workflows offers a more integrated and lower operational overhead solution.

AWS Glue Workflow Documentation

Contribute your Thoughts:

Johana
14 days ago
That's a good point, option C does seem like a simpler solution with less operational overhead.
upvoted 0 times
...
Jennifer
18 days ago
I'm just hoping the data engineer has a good sense of humor. Imagine getting that SNS notification every time there's a data hiccup - 'Houston, we have a problem... and it's in the cloud!'
upvoted 0 times
Gene
7 days ago
A: I know right, that SNS notification would definitely keep things interesting!
upvoted 0 times
...
...
Micheal
19 days ago
Ah, the age-old debate - Airflow or EMR? Option A and B both have their merits, but I'm just glad I don't have to make that call. As long as it works, I'm happy!
upvoted 0 times
...
Nguyet
23 days ago
I like the idea of using Lambda functions in Option D, but orchestrating the whole thing through Step Functions feels a bit overkill. Maybe a simpler Lambda-based solution could work just as well.
upvoted 0 times
...
Brett
23 days ago
I disagree, I believe option C is the most efficient as it uses AWS Glue workflows and EventBridge to handle data quality checks.
upvoted 0 times
...
Johana
27 days ago
I think option A is the best choice because it uses Apache Airflow to run data quality checks and send notifications.
upvoted 0 times
...
Sheridan
1 months ago
Option C seems the most straightforward. Integrating the data quality checks directly into the Glue workflow and using EventBridge to trigger the notification is a nice clean solution.
upvoted 0 times
Anjelica
3 days ago
Using EventBridge to trigger the notification is a smart move.
upvoted 0 times
...
Phyliss
10 days ago
I agree, integrating the data quality checks into the Glue workflow seems efficient.
upvoted 0 times
...
Margurite
17 days ago
I think option C is the best choice here.
upvoted 0 times
...
...

Save Cancel