Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam Amazon-DEA-C01 Topic 2 Question 7 Discussion

Actual exam question for Amazon's Amazon-DEA-C01 exam
Question #: 7
Topic #: 2
[All Amazon-DEA-C01 Questions]

A company uses AWS Glue Data Catalog to index data that is uploaded to an Amazon S3 bucket every day. The company uses a daily batch processes in an extract, transform, and load (ETL) pipeline to upload data from external sources into the S3 bucket.

The company runs a daily report on the S3 dat

a. Some days, the company runs the report before all the daily data has been uploaded to the S3 bucket. A data engineer must be able to send a message that identifies any incomplete data to an existing Amazon Simple Notification Service (Amazon SNS) topic.

Which solution will meet this requirement with the LEAST operational overhead?

Show Suggested Answer Hide Answer
Suggested Answer: C

AWS Glue workflows are designed to orchestrate the ETL pipeline, and you can create data quality checks to ensure the uploaded datasets are complete before running reports. If there is an issue with the data, AWS Glue workflows can trigger an Amazon EventBridge event that sends a message to an SNS topic.

AWS Glue Workflows:

AWS Glue workflows allow users to automate and monitor complex ETL processes. You can include data quality actions to check for null values, data types, and other consistency checks.

In the event of incomplete data, an EventBridge event can be generated to notify via SNS.


Alternatives Considered:

A (Airflow cluster): Managed Airflow introduces more operational overhead and complexity compared to Glue workflows.

B (EMR cluster): Setting up an EMR cluster is also more complex compared to the Glue-centric solution.

D (Lambda functions): While Lambda functions can work, using Glue workflows offers a more integrated and lower operational overhead solution.

AWS Glue Workflow Documentation

Contribute your Thoughts:

Johana
3 months ago
That's a good point, option C does seem like a simpler solution with less operational overhead.
upvoted 0 times
...
Jennifer
3 months ago
I'm just hoping the data engineer has a good sense of humor. Imagine getting that SNS notification every time there's a data hiccup - 'Houston, we have a problem... and it's in the cloud!'
upvoted 0 times
Jamika
1 months ago
D: It's important to have a sense of humor when dealing with data hiccups in the cloud.
upvoted 0 times
...
Geraldo
2 months ago
C: I can only imagine the data engineer's reaction every time they get that notification.
upvoted 0 times
...
Ivory
2 months ago
B: Haha, 'Houston, we have a problem in the cloud' - that's a good one!
upvoted 0 times
...
Gene
3 months ago
A: I know right, that SNS notification would definitely keep things interesting!
upvoted 0 times
...
...
Micheal
3 months ago
Ah, the age-old debate - Airflow or EMR? Option A and B both have their merits, but I'm just glad I don't have to make that call. As long as it works, I'm happy!
upvoted 0 times
...
Nguyet
3 months ago
I like the idea of using Lambda functions in Option D, but orchestrating the whole thing through Step Functions feels a bit overkill. Maybe a simpler Lambda-based solution could work just as well.
upvoted 0 times
Earlean
2 months ago
Lucina: That's a good point, a simpler Lambda-based solution could still meet the requirement with less overhead.
upvoted 0 times
...
Lucina
2 months ago
User 2: I agree, but maybe we can simplify it by just using Lambda functions without Step Functions.
upvoted 0 times
...
Paola
2 months ago
User 1: Option D sounds good, using Lambda functions for data quality checks is efficient.
upvoted 0 times
...
...
Brett
3 months ago
I disagree, I believe option C is the most efficient as it uses AWS Glue workflows and EventBridge to handle data quality checks.
upvoted 0 times
...
Johana
3 months ago
I think option A is the best choice because it uses Apache Airflow to run data quality checks and send notifications.
upvoted 0 times
...
Sheridan
3 months ago
Option C seems the most straightforward. Integrating the data quality checks directly into the Glue workflow and using EventBridge to trigger the notification is a nice clean solution.
upvoted 0 times
Billye
2 months ago
Option C definitely has the least operational overhead.
upvoted 0 times
...
Anjelica
2 months ago
Using EventBridge to trigger the notification is a smart move.
upvoted 0 times
...
Phyliss
3 months ago
I agree, integrating the data quality checks into the Glue workflow seems efficient.
upvoted 0 times
...
Margurite
3 months ago
I think option C is the best choice here.
upvoted 0 times
...
...

Save Cancel