New Year Sale 2026! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Microsoft DP-100 Exam - Topic 8 Question 16 Discussion

Actual exam question for Microsoft's DP-100 exam
Question #: 16
Topic #: 8
[All DP-100 Questions]

A set of CSV files contains sales records. All the CSV files have the same data schema.

Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file in stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent folder named sales to create the following hierarchical structure:

At the end of each month, a new folder with that month's sales file is added to the sales folder.

You plan to use the sales data to train a machine learning model based on the following requirements:

You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a dataframe.

You must be able to create experiments that use only data that was created before a specific previous month, ignoring any data that was added after that month.

You must register the minimum number of datasets possible.

You need to register the sales data as a dataset in Azure Machine Learning service workspace.

What should you do?

Show Suggested Answer Hide Answer
Suggested Answer: B

Specify the path.

Example:

The following code gets the workspace existing workspace and the desired datastore by name. And then passes the datastore and file locations to the path parameter to create a new TabularDataset, weather_ds.

from azureml.core import Workspace, Datastore, Dataset

datastore_name = 'your datastore name'

# get existing workspace

workspace = Workspace.from_config()

# retrieve an existing datastore in the workspace by name

datastore = Datastore.get(workspace, datastore_name)

# create a TabularDataset from 3 file paths in datastore

datastore_paths = [(datastore, 'weather/2018/11.csv'),

(datastore, 'weather/2018/12.csv'),

(datastore, 'weather/2019/*.csv')]

weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)


Contribute your Thoughts:

0/2000 characters
Alpha
4 months ago
D sounds like a good approach for version control!
upvoted 0 times
...
Selma
4 months ago
Not sure if B will work as expected with future data.
upvoted 0 times
...
Chaya
4 months ago
Wait, why not just use A? Seems simpler!
upvoted 0 times
...
Yong
4 months ago
I agree, B is definitely the way to go.
upvoted 0 times
...
Maryann
4 months ago
Option B seems the most efficient!
upvoted 0 times
...
Emmanuel
4 months ago
I recall that we should aim to minimize the number of datasets registered. Option B seems straightforward and meets the requirements without extra hassle.
upvoted 0 times
...
Alaine
5 months ago
I practiced a similar question where we had to manage datasets efficiently. I think option D could work since it allows versioning, but it seems a bit complicated.
upvoted 0 times
...
Aileen
5 months ago
I'm not entirely sure, but I feel like specifying each file every month could lead to a lot of unnecessary registrations. Maybe option A is too cumbersome?
upvoted 0 times
...
Thaddeus
5 months ago
I remember we discussed the importance of using a wildcard path for datasets to avoid registering multiple versions. I think option B might be the right choice.
upvoted 0 times
...
Alyce
5 months ago
This is a good opportunity to apply my knowledge of Python syntax and control flow. I'll work through it systematically.
upvoted 0 times
...
Georgeanna
5 months ago
This seems to be a certificate-related problem. I'm thinking the solution might be to add a server certificate to the Expressway-C that is signed by a certificate authority.
upvoted 0 times
...
Christa
5 months ago
This seems like a pretty straightforward security question. I'd focus on option D - considering proper authentication options. That's probably the most important thing to address before allowing external access.
upvoted 0 times
...
Kanisha
5 months ago
Hmm, I'm a bit unsure about the details of the default Solaris 11 configuration. I'll need to think through each option carefully and try to eliminate the incorrect ones.
upvoted 0 times
...

Save Cancel