Amazon Exam DAS-C01 Topic 2 Question 95 Discussion

Actual exam question for Amazon's DAS-C01 exam

Question #: 95
Topic #: 2

A company receives datasets from partners at various frequencies. The datasets include baseline data and incremental data. The company needs to merge and store all the datasets without reprocessing the data.

Which solution will meet these requirements with the LEAST development effort?

AUse an AWS Glue job with a temporary table to process the datasets. Store the data in an Amazon RDS table.

BUse an Apache Spark job in an Amazon EMR cluster to process the datasets. Store the data in EMR File System (EMRFS).

CUse an AWS Glue job with job bookmarks enabled to process the datasets. Store the data in Amazon S3.

DUse an AWS Lambda function to process the datasets. Store the data in Amazon S3.

Show Suggested Answer

Suggested Answer: B

by James at Jun 18, 2024, 10:30 PM

Limited Time Offer

25%

Off

Get Premium DAS-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Farrah

9 months ago

That makes sense. I agree with you.

upvoted 0 times

...

Phung

9 months ago

Wait, are we sure these are all the right answers? I thought there was supposed to be a 'none of the above' option for these tricky AWS questions.

upvoted 0 times

...

Argelia

9 months ago

Because splitting the files to match the number of slices in the Redshift cluster will optimize the COPY process.

upvoted 0 times

...

Asuncion

9 months ago

Haha, Shawna's got a point. If you have a skewed distribution on that DISTKEY, option D could be a real game-changer. Gotta love those database optimization tricks!

upvoted 0 times

Deonna

8 months ago

Definitely, leveraging database optimization techniques is key in scenarios like this.

upvoted 0 times

...

Josephine

8 months ago

I agree, it's all about maximizing efficiency when dealing with large datasets.

upvoted 0 times

...

Elly

9 months ago

Yeah, sharding based on the DISTKEY columns could really improve performance.

upvoted 0 times

...

Cheryl

9 months ago

Option D sounds like a solid choice for optimizing the COPY process.

upvoted 0 times

...

Farrah

9 months ago

Why do you think that?

upvoted 0 times

...

Argelia

9 months ago

I think option B is the best solution.

upvoted 0 times

...

Shawna

10 months ago

Hold on, what if I have a really big DISTKEY column? Wouldn't option D be even better by sharding the files based on that?

upvoted 0 times

...

Norah

10 months ago

I was thinking the same thing. Compressing and uploading the files to S3 in a way that aligns with the Redshift architecture is a smart move.

upvoted 0 times

Darrin

9 months ago

B: D) Apply sharding by breaking up the files so that the DISTKEY columns with the same values go to the same file. Compress and upload the sharded files to Amazon S3. Run the COPY command on the files.

upvoted 0 times

...

Myra

9 months ago

A: B) Split the files so that the number of files is equal to a multiple of the number of slices in the Redshift cluster. Compress and upload the files to Amazon S3. Run the COPY command on the files.

upvoted 0 times

...

Amie

10 months ago

Option B sounds like the way to go. Splitting the files to match the number of slices in the Redshift cluster should definitely speed up the COPY process.

upvoted 0 times

Clorinda

9 months ago

Definitely, it's important to optimize the process for faster performance.

upvoted 0 times

...

Brock

9 months ago

Yeah, splitting the files to match the number of slices in the Redshift cluster makes a lot of sense.

upvoted 0 times

...

Valentin

9 months ago

I agree, option B seems like the most efficient way to accelerate the COPY process.

upvoted 0 times

...