Cyber Monday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam MLS-C01 Topic 1 Question 107 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 107
Topic #: 1
[All MLS-C01 Questions]

An ecommerce company has developed a XGBoost model in Amazon SageMaker to predict whether a customer will return a purchased item. The dataset is imbalanced. Only 5% of customers return items

A data scientist must find the hyperparameters to capture as many instances of returned items as possible. The company has a small budget for compute.

How should the data scientist meet these requirements MOST cost-effectively?

Show Suggested Answer Hide Answer
Suggested Answer: B

The best solution to meet the requirements is to tune the csv_weight hyperparameter and the scale_pos_weight hyperparameter by using automatic model tuning (AMT). Optimize on {''HyperParameterTuningJobObjective'': {''MetricName'': ''validation:f1'', ''Type'': ''Maximize''}}.

The csv_weight hyperparameter is used to specify the instance weights for the training data in CSV format. This can help handle imbalanced data by assigning higher weights to the minority class examples and lower weights to the majority class examples. The scale_pos_weight hyperparameter is used to control the balance of positive and negative weights. It is the ratio of the number of negative class examples to the number of positive class examples. Setting a higher value for this hyperparameter can increase the importance of the positive class and improve the recall. Both of these hyperparameters can help the XGBoost model capture as many instances of returned items as possible.

Automatic model tuning (AMT) is a feature of Amazon SageMaker that automates the process of finding the best hyperparameter values for a machine learning model. AMT uses Bayesian optimization to search the hyperparameter space and evaluate the model performance based on a predefined objective metric. The objective metric is the metric that AMT tries to optimize by adjusting the hyperparameter values. For imbalanced classification problems, accuracy is not a good objective metric, as it can be misleading and biased towards the majority class. A better objective metric is the F1 score, which is the harmonic mean of precision and recall. The F1 score can reflect the balance between precision and recall and is more suitable for imbalanced data. The F1 score ranges from 0 to 1, where 1 is the best possible value. Therefore, the type of the objective should be ''Maximize'' to achieve the highest F1 score.

By tuning the csv_weight and scale_pos_weight hyperparameters and optimizing on the F1 score, the data scientist can meet the requirements most cost-effectively. This solution requires tuning only two hyperparameters, which can reduce the computation time and cost compared to tuning all possible hyperparameters. This solution also uses the appropriate objective metric for imbalanced classification, which can improve the model performance and capture more instances of returned items.

References:

* XGBoost Hyperparameters

* Automatic Model Tuning

* How to Configure XGBoost for Imbalanced Classification

* Imbalanced Data


Contribute your Thoughts:

Francis
13 days ago
I'd go with option B. It's like a cheat code for imbalanced datasets - just tweak those key hyperparameters and watch the returns roll in!
upvoted 0 times
Mitzie
3 days ago
I agree, focusing on csv_weight and scale_pos_weight seems like the best strategy for imbalanced datasets.
upvoted 0 times
...
Chu
4 days ago
Option B sounds like the way to go. Tweak those hyperparameters for maximum returns!
upvoted 0 times
...
...
Nydia
1 months ago
Option A looks good, but it might be a bit too ambitious for a small budget. Why not let the computer do the heavy lifting, right?
upvoted 0 times
Annabelle
16 days ago
A) Tune all possible hyperparameters by using automatic model tuning (AMT). Optimize on {HyperParameterTuningJobObjective: {MetricName: validation:accuracy, Type: Maximize}}
upvoted 0 times
...
...
Antione
1 months ago
I agree with Alaine, maximizing accuracy is crucial in this case.
upvoted 0 times
...
Alaine
1 months ago
I think option A is the best choice.
upvoted 0 times
...
Antonette
1 months ago
Option D is interesting, but minimizing F1 score doesn't really make sense. I think B is the way to go.
upvoted 0 times
Osvaldo
6 days ago
Yeah, maximizing the F1 score is important for capturing instances of returned items.
upvoted 0 times
...
Jerry
7 days ago
Let's go with option B then, it seems like the most cost-effective approach.
upvoted 0 times
...
Jessenia
15 days ago
I think B is a good option because it focuses on maximizing the F1 score.
upvoted 0 times
...
Marla
25 days ago
I agree, option D doesn't seem like the best choice.
upvoted 0 times
...
...
Miss
2 months ago
Optimizing for accuracy doesn't seem like the right approach here. F1 score is a better metric to capture the minority class (returned items).
upvoted 0 times
Alyce
29 days ago
F1 score is a better metric to capture the minority class (returned items).
upvoted 0 times
...
Jerry
1 months ago
Optimizing for accuracy doesn't seem like the right approach here.
upvoted 0 times
...
Cora
1 months ago
B) Tune the csv_weight hyperparameter and the scale_pos_weight hyperparameter by using automatic model tuning (AMT). Optimize on {HyperParameterTuningJobObjective: {MetricName: validation:f1, Type: Maximize}}
upvoted 0 times
...
Cherri
1 months ago
A) Tune all possible hyperparameters by using automatic model tuning (AMT). Optimize on {HyperParameterTuningJobObjective: {MetricName: validation:accuracy, Type: Maximize}}
upvoted 0 times
...
...
Tora
2 months ago
Hmm, tuning all hyperparameters seems like overkill for a small budget. I'd go with option B to focus on the relevant hyperparameters and optimize for F1 score.
upvoted 0 times
Amalia
18 days ago
Definitely, optimizing for F1 score will help capture more instances of returned items without breaking the budget.
upvoted 0 times
...
Ernestine
19 days ago
Agreed, it's a more targeted approach that should be cost-effective for the company.
upvoted 0 times
...
Josefa
23 days ago
Option B sounds like the best choice. Focusing on specific hyperparameters and optimizing for F1 score is key.
upvoted 0 times
...
...

Save Cancel