A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products.
Which solution will meet these requirements with the MOST operational efficiency?
Amazon SageMaker's Neural Topic Model (NTM) is designed to uncover underlying topics within text data by clustering documents based on topic similarity. For document categorization, NTM can identify product categories by analyzing and grouping the documents, making it an efficient choice for unsupervised learning where predefined categories do not exist.
A business to business (B2B) ecommerce company wants to develop a fair and equitable risk mitigation strategy to reject potentially fraudulent transactions. The company wants to reject fraudulent transactions despite the possibility of losing some profitable transactions or customers.
Which solution will meet these requirements with the LEAST operational effort?
Amazon Fraud Detector is a managed service designed to detect potentially fraudulent online activities, such as transactions. It uses machine learning and business rules to classify activities as fraudulent or legitimate, minimizing the need for custom model training. By using the Amazon Fraud Detector prediction API, the company can automatically approve or reject transactions flagged as fraudulent, implementing an efficient risk mitigation strategy without extensive operational effort.
This approach requires minimal setup and effectively allows the company to block fraudulent transactions with high confidence, addressing the business's need to balance risk mitigation and customer impact.
An insurance company is creating an application to automate car insurance claims. A machine learning (ML) specialist used an Amazon SageMaker Object Detection - TensorFlow built-in algorithm to train a model to detect scratches and dents in images of cars. After the model was trained, the ML specialist noticed that the model performed better on the training dataset than on the testing dataset.
Which approach should the ML specialist use to improve the performance of the model on the testing data?
The machine learning model in this scenario shows signs of overfitting, as evidenced by better performance on the training dataset than on the testing dataset. Overfitting indicates that the model is capturing noise or details specific to the training data rather than general patterns.
One common approach to reduce overfitting is L2 regularization, which adds a penalty to the loss function for large weights and helps the model generalize better by smoothing out the weight distribution. By increasing the value of the L2 hyperparameter, the ML specialist can increase this penalty, helping to mitigate overfitting and improve performance on the testing dataset.
Options like increasing momentum or reducing dropout are less effective for addressing overfitting in this context.
An ecommerce company has developed a XGBoost model in Amazon SageMaker to predict whether a customer will return a purchased item. The dataset is imbalanced. Only 5% of customers return items
A data scientist must find the hyperparameters to capture as many instances of returned items as possible. The company has a small budget for compute.
How should the data scientist meet these requirements MOST cost-effectively?
The best solution to meet the requirements is to tune the csv_weight hyperparameter and the scale_pos_weight hyperparameter by using automatic model tuning (AMT). Optimize on {''HyperParameterTuningJobObjective'': {''MetricName'': ''validation:f1'', ''Type'': ''Maximize''}}.
The csv_weight hyperparameter is used to specify the instance weights for the training data in CSV format. This can help handle imbalanced data by assigning higher weights to the minority class examples and lower weights to the majority class examples. The scale_pos_weight hyperparameter is used to control the balance of positive and negative weights. It is the ratio of the number of negative class examples to the number of positive class examples. Setting a higher value for this hyperparameter can increase the importance of the positive class and improve the recall. Both of these hyperparameters can help the XGBoost model capture as many instances of returned items as possible.
Automatic model tuning (AMT) is a feature of Amazon SageMaker that automates the process of finding the best hyperparameter values for a machine learning model. AMT uses Bayesian optimization to search the hyperparameter space and evaluate the model performance based on a predefined objective metric. The objective metric is the metric that AMT tries to optimize by adjusting the hyperparameter values. For imbalanced classification problems, accuracy is not a good objective metric, as it can be misleading and biased towards the majority class. A better objective metric is the F1 score, which is the harmonic mean of precision and recall. The F1 score can reflect the balance between precision and recall and is more suitable for imbalanced data. The F1 score ranges from 0 to 1, where 1 is the best possible value. Therefore, the type of the objective should be ''Maximize'' to achieve the highest F1 score.
By tuning the csv_weight and scale_pos_weight hyperparameters and optimizing on the F1 score, the data scientist can meet the requirements most cost-effectively. This solution requires tuning only two hyperparameters, which can reduce the computation time and cost compared to tuning all possible hyperparameters. This solution also uses the appropriate objective metric for imbalanced classification, which can improve the model performance and capture more instances of returned items.
References:
* XGBoost Hyperparameters
* Automatic Model Tuning
* How to Configure XGBoost for Imbalanced Classification
* Imbalanced Data
An insurance company developed a new experimental machine learning (ML) model to replace an existing model that is in production. The company must validate the quality of predictions from the new experimental model in a production environment before the company uses the new experimental model to serve general user requests.
Which one model can serve user requests at a time. The company must measure the performance of the new experimental model without affecting the current live traffic
Which solution will meet these requirements?
The other solutions are not suitable, because they have the following drawbacks:
References:
1:Shadow Deployment: A Safe Way to Test in Production | LaunchDarkly Blog
2:Shadow Deployment: A Safe Way to Test in Production | LaunchDarkly Blog
3:A/B Testing for Machine Learning Models | AWS Machine Learning Blog
4:Canary Releases for Machine Learning Models | AWS Machine Learning Blog
5:Blue-Green Deployments for Machine Learning Models | AWS Machine Learning Blog
Gaston
2 days agoTorie
11 days agoLenna
16 days agoDannie
1 months agoJavier
1 months agoPortia
1 months agoFranklyn
2 months agoElke
2 months agoDarrel
2 months agoTimmy
2 months agoAlberta
2 months agoHelga
3 months agoKimi
3 months agoPamella
3 months agoMitsue
3 months agoGlenna
3 months agoAdell
4 months agoGladys
4 months agoFarrah
4 months agoDalene
4 months agoKayleigh
5 months agoRoyal
6 months agoElza
7 months agoHerman
7 months agoGlory
7 months agoTherese
7 months ago