BlackFriday 2024! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam MLS-C01 Topic 1 Question 85 Discussion

Actual exam question for Amazon's MLS-C01 exam
Question #: 85
Topic #: 1
[All MLS-C01 Questions]

An ecommerce company has developed a XGBoost model in Amazon SageMaker to predict whether a customer will return a purchased item. The dataset is imbalanced. Only 5% of customers return items

A data scientist must find the hyperparameters to capture as many instances of returned items as possible. The company has a small budget for compute.

How should the data scientist meet these requirements MOST cost-effectively?

Show Suggested Answer Hide Answer
Suggested Answer: B

The best solution to meet the requirements is to tune the csv_weight hyperparameter and the scale_pos_weight hyperparameter by using automatic model tuning (AMT). Optimize on {''HyperParameterTuningJobObjective'': {''MetricName'': ''validation:f1'', ''Type'': ''Maximize''}}.

The csv_weight hyperparameter is used to specify the instance weights for the training data in CSV format. This can help handle imbalanced data by assigning higher weights to the minority class examples and lower weights to the majority class examples. The scale_pos_weight hyperparameter is used to control the balance of positive and negative weights. It is the ratio of the number of negative class examples to the number of positive class examples. Setting a higher value for this hyperparameter can increase the importance of the positive class and improve the recall. Both of these hyperparameters can help the XGBoost model capture as many instances of returned items as possible.

Automatic model tuning (AMT) is a feature of Amazon SageMaker that automates the process of finding the best hyperparameter values for a machine learning model. AMT uses Bayesian optimization to search the hyperparameter space and evaluate the model performance based on a predefined objective metric. The objective metric is the metric that AMT tries to optimize by adjusting the hyperparameter values. For imbalanced classification problems, accuracy is not a good objective metric, as it can be misleading and biased towards the majority class. A better objective metric is the F1 score, which is the harmonic mean of precision and recall. The F1 score can reflect the balance between precision and recall and is more suitable for imbalanced data. The F1 score ranges from 0 to 1, where 1 is the best possible value. Therefore, the type of the objective should be ''Maximize'' to achieve the highest F1 score.

By tuning the csv_weight and scale_pos_weight hyperparameters and optimizing on the F1 score, the data scientist can meet the requirements most cost-effectively. This solution requires tuning only two hyperparameters, which can reduce the computation time and cost compared to tuning all possible hyperparameters. This solution also uses the appropriate objective metric for imbalanced classification, which can improve the model performance and capture more instances of returned items.

References:

* XGBoost Hyperparameters

* Automatic Model Tuning

* How to Configure XGBoost for Imbalanced Classification

* Imbalanced Data


Contribute your Thoughts:

Charlene
7 months ago
You know, I was leaning towards option C as well. But then I remembered that the company has a small budget, and tuning all the hyperparameters might be overkill. Maybe we could compromise and try option B first, and if that's not cutting it, we can always go back and tune everything. But let's not get too fancy, we need to keep this simple and cost-effective.
upvoted 0 times
...
Valentine
7 months ago
Hmm, I'm not so sure. Optimizing for F1 score is a good idea, but I'm not convinced that just tuning the csv_weight and scale_pos_weight hyperparameters is enough. What if there are other hyperparameters that could really boost the model's performance on the minority class? I might lean more towards option C, where we tune all the hyperparameters but still optimize for F1 score.
upvoted 0 times
Lavera
7 months ago
C
upvoted 0 times
...
Marcelle
7 months ago
C
upvoted 0 times
...
Anissa
7 months ago
C
upvoted 0 times
...
Elinore
7 months ago
C
upvoted 0 times
...
...
Nana
7 months ago
I agree, option B seems like the most cost-effective approach. Tuning all the hyperparameters might be overkill, and the company has a small budget. Focusing on the key hyperparameters that can help with the imbalanced dataset is a smart move. And optimizing for F1 score instead of just accuracy is a good call.
upvoted 0 times
...
Loreta
7 months ago
Hmm, this is a tricky one. The dataset is imbalanced, with only 5% of customers returning items. We need to find a way to capture as many of those returned items as possible, but the company has a small budget for compute. I think option B might be the way to go. Tuning the csv_weight and scale_pos_weight hyperparameters specifically could help us focus on the minority class and maximize the F1 score, which is a good balance between precision and recall.
upvoted 0 times
...

Save Cancel