A data scientist needs to create a model for predictive maintenance. The model will be based on historical data to identify rare anomalies in the data.
The historical data is stored in an Amazon S3 bucket. The data scientist needs to use Amazon SageMaker Data Wrangler to ingest the dat
a. The data scientists also needs to perform exploratory data analysis (EDA) to understand the statistical properties of the data.
Which solution will meet these requirements with the LEAST amount of compute resources?
To perform efficient exploratory data analysis (EDA) on a large dataset for anomaly detection, using the First K option in SageMaker Data Wrangler is an optimal choice. This option allows the data scientist to select the first K rows, limiting the data loaded into memory, which conserves compute resources.
Given that the First K option allows the data scientist to determine K based on domain knowledge, this approach provides a representative sample without requiring extensive compute resources. Other options like randomized sampling may not provide data samples that are as useful for initial analysis in a time-series or sequential dataset context.
An ecommerce company wants to train a large image classification model with 10.000 classes. The company runs multiple model training iterations and needs to minimize operational overhead and cost. The company also needs to avoid loss of work and model retraining.
Which solution will meet these requirements?
Amazon SageMaker managed spot training allows for cost-effective training by utilizing Spot Instances, which are lower-cost EC2 instances that can be interrupted when demand is high. By enabling checkpointing in SageMaker, the company can save intermediate model states to Amazon S3, allowing training to resume from the last checkpoint if interrupted. This solution minimizes operational overhead by automating the checkpointing process and resuming work after interruptions, reducing the need for retraining from scratch.
This setup provides a reliable and cost-efficient approach to training large models with minimal operational overhead and risk of data loss.
A company stores its documents in Amazon S3 with no predefined product categories. A data scientist needs to build a machine learning model to categorize the documents for all the company's products.
Which solution will meet these requirements with the MOST operational efficiency?
Amazon SageMaker's Neural Topic Model (NTM) is designed to uncover underlying topics within text data by clustering documents based on topic similarity. For document categorization, NTM can identify product categories by analyzing and grouping the documents, making it an efficient choice for unsupervised learning where predefined categories do not exist.
A business to business (B2B) ecommerce company wants to develop a fair and equitable risk mitigation strategy to reject potentially fraudulent transactions. The company wants to reject fraudulent transactions despite the possibility of losing some profitable transactions or customers.
Which solution will meet these requirements with the LEAST operational effort?
Amazon Fraud Detector is a managed service designed to detect potentially fraudulent online activities, such as transactions. It uses machine learning and business rules to classify activities as fraudulent or legitimate, minimizing the need for custom model training. By using the Amazon Fraud Detector prediction API, the company can automatically approve or reject transactions flagged as fraudulent, implementing an efficient risk mitigation strategy without extensive operational effort.
This approach requires minimal setup and effectively allows the company to block fraudulent transactions with high confidence, addressing the business's need to balance risk mitigation and customer impact.
An insurance company is creating an application to automate car insurance claims. A machine learning (ML) specialist used an Amazon SageMaker Object Detection - TensorFlow built-in algorithm to train a model to detect scratches and dents in images of cars. After the model was trained, the ML specialist noticed that the model performed better on the training dataset than on the testing dataset.
Which approach should the ML specialist use to improve the performance of the model on the testing data?
The machine learning model in this scenario shows signs of overfitting, as evidenced by better performance on the training dataset than on the testing dataset. Overfitting indicates that the model is capturing noise or details specific to the training data rather than general patterns.
One common approach to reduce overfitting is L2 regularization, which adds a penalty to the loss function for large weights and helps the model generalize better by smoothing out the weight distribution. By increasing the value of the L2 hyperparameter, the ML specialist can increase this penalty, helping to mitigate overfitting and improve performance on the testing dataset.
Options like increasing momentum or reducing dropout are less effective for addressing overfitting in this context.
Leonida
11 days agoLilli
15 days agoMeghann
20 days agoGaston
30 days agoTorie
1 months agoLenna
1 months agoDannie
2 months agoJavier
2 months agoPortia
2 months agoFranklyn
2 months agoElke
2 months agoDarrel
3 months agoTimmy
3 months agoAlberta
3 months agoHelga
3 months agoKimi
4 months agoPamella
4 months agoMitsue
4 months agoGlenna
4 months agoAdell
4 months agoGladys
5 months agoFarrah
5 months agoDalene
5 months agoKayleigh
6 months agoRoyal
7 months agoElza
8 months agoHerman
8 months agoGlory
8 months agoTherese
8 months ago