Amazon MLS-C01 Exam - Topic 4 Question 115 Discussion

Actual exam question for Amazon's MLS-C01 exam

Question #: 115
Topic #: 4

A company wants to create a data repository in the AWS Cloud for machine learning (ML) projects. The company wants to use AWS to perform complete ML lifecycles and wants to use Amazon S3 for the data storage. All of the company's data currently resides on premises and is 40 in size.

The company wants a solution that can transfer and automatically update data between the on-premises object storage and Amazon S3. The solution must support encryption, scheduling, monitoring, and data integrity validation.

Which solution meets these requirements?

AUse the S3 sync command to compare the source S3 bucket and the destination S3 bucket. Determine which source files do not exist in the destination S3 bucket and which source files were modified.

BUse AWS Transfer for FTPS to transfer the files from the on-premises storage to Amazon S3.

CUse AWS DataSync to make an initial copy of the entire dataset. Schedule subsequent incremental transfers of changing data until the final cutover from on premises to AWS.

DUse S3 Batch Operations to pull data periodically from the on-premises storage. Enable S3 Versioning on the S3 bucket to protect against accidental overwrites.

Show Suggested Answer

Suggested Answer: C

The best solution to meet the requirements of the company is to use AWS DataSync to make an initial copy of the entire dataset, and schedule subsequent incremental transfers of changing data until the final cutover from on premises to AWS. This is because:

AWS DataSync is an online data movement and discovery service that simplifies data migration and helps you quickly, easily, and securely transfer your file or object data to, from, and between AWS storage services1.AWS DataSync can copy data between on-premises object storage and Amazon S3, and also supports encryption, scheduling, monitoring, and data integrity validation1.

AWS DataSync can make an initial copy of the entire dataset by using a DataSync agent, which is a software appliance that connects to your on-premises storage and manages the data transfer to AWS2.The DataSync agent can be deployed as a virtual machine (VM) on your existing hypervisor, or as an Amazon EC2 instance in your AWS account2.

AWS DataSync can schedule subsequent incremental transfers of changing data by using a task, which is a configuration that specifies the source and destination locations, the options for the transfer, and the schedule for the transfer3.You can create a task to run once or on a recurring schedule, and you can also use filters to include or exclude specific files or objects based on their names or prefixes3.

AWS DataSync can perform the final cutover from on premises to AWS by using a sync task, which is a type of task that synchronizes the data in the source and destination locations4.A sync task transfers only the data that has changed or that doesn't exist in the destination, and also deletes any files or objects from the destination that were deleted from the source since the last sync4.

Therefore, by using AWS DataSync, the company can create a data repository in the AWS Cloud for machine learning projects, and use Amazon S3 for the data storage, while meeting the requirements of encryption, scheduling, monitoring, and data integrity validation.

References:

Data Transfer Service - AWS DataSync

Deploying a DataSync Agent

Creating a Task

Syncing Data with AWS DataSync

by Hannah at Apr 09, 2025, 03:59 AM

Limited Time Offer

25%

Off

Get Premium MLS-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Jutta

3 months ago

D sounds interesting, but I'm not sure it covers all the requirements.

upvoted 0 times

...

Michell

3 months ago

I think A could work, but it seems a bit manual.

upvoted 0 times

...

Detra

4 months ago

Wait, can AWS DataSync really handle 40 TB efficiently?

upvoted 0 times

...

Harris

4 months ago

Totally agree with C! It’s designed for this kind of task.

upvoted 0 times

...

Louvenia

4 months ago

C is definitely the best choice for this scenario.

upvoted 0 times

...

Arminda

4 months ago

S3 Batch Operations sounds interesting, but I don't recall it being designed for continuous updates. I wonder if it can really meet the need for automatic data syncing.

upvoted 0 times

...

Zana

4 months ago

AWS Transfer for FTPS was mentioned in one of our practice questions, but I feel like it might not cover all the requirements, especially the scheduling and monitoring aspects.

upvoted 0 times

...

Geoffrey

5 months ago

I think the S3 sync command could work, but it sounds more manual than what they need. They mentioned automatic updates, and I'm not confident it can handle that well.

upvoted 0 times

...

Lanie

5 months ago

I remember studying about AWS DataSync and its ability to handle large data transfers efficiently. It seems like a good fit for this scenario, but I'm not entirely sure about the encryption part.

upvoted 0 times

...

Kimberely

5 months ago

I'm leaning towards option C with AWS DataSync. It seems to cover the key requirements like encryption, scheduling, and monitoring for the data transfer process. The initial full copy and then incremental updates also make sense for this use case.

upvoted 0 times

...

Krissy

5 months ago

Option B using AWS Transfer for FTPS could work, but I'm not sure if it supports all the required features like automatic updates and data integrity validation. I'll have to research that option further.

upvoted 0 times

...

Starr

5 months ago

Hmm, I'm a bit confused. The question mentions encryption, scheduling, and monitoring, but doesn't specify if those are required for the data transfer itself or for the overall ML lifecycle. I'll need to re-read the question carefully.

upvoted 0 times

...

Aileen

5 months ago

This seems like a straightforward data migration problem. I think option C using AWS DataSync is the best solution to meet the requirements.

upvoted 0 times

...

Beatriz

11 months ago

Haha, I bet the person who came up with option B has never actually used FTPS before. That's just asking for trouble!

upvoted 0 times

Fatima

10 months ago

C: Agreed, using AWS DataSync for incremental transfers seems like a much safer option.

upvoted 0 times

...

Edgar

10 months ago

B: Definitely, I wouldn't want to rely on that for such an important task.

upvoted 0 times

...

Beatriz

10 months ago

A: Yeah, using FTPS for transferring data sounds risky.

upvoted 0 times

...

Arthur

11 months ago

I agree, C is the best choice here. AWS DataSync seems like the perfect fit for this use case.

upvoted 0 times

...

Brunilda

11 months ago

But what about enabling S3 Versioning for data protection in option D?

upvoted 0 times

...

Desmond

11 months ago

I agree. Using AWS DataSync for initial copy and incremental transfers seems efficient.

upvoted 0 times

...

Nieves

11 months ago

Option C looks the most comprehensive. It covers all the requirements like initial data transfer, incremental updates, scheduling, and data integrity.

upvoted 0 times