Amazon Exam MLS-C01 Topic 4 Question 51 Discussion

Actual exam question for Amazon's MLS-C01 exam

Question #: 51
Topic #: 4

A global financial company is using machine learning to automate its loan approval process. The company has a dataset of customer information. The dataset contains some categorical fields, such as customer location by city and housing status. The dataset also includes financial fields in different units, such as account balances in US dollars and monthly interest in US cents.

The company's data scientists are using a gradient boosting regression model to infer the credit score for each customer. The model has a training accuracy of 99% and a testing accuracy of 75%. The data scientists want to improve the model's testing accuracy.

Which process will improve the testing accuracy the MOST?

AUse a one-hot encoder for the categorical fields in the dataset. Perform standardization on the financial fields in the dataset. Apply L1 regularization to the data.

BUse tokenization of the categorical fields in the dataset. Perform binning on the financial fields in the dataset. Remove the outliers in the data by using the z-score.

CUse a label encoder for the categorical fields in the dataset. Perform L1 regularization on the financial fields in the dataset. Apply L2 regularization to the data.

DUse a logarithm transformation on the categorical fields in the dataset. Perform binning on the financial fields in the dataset. Use imputation to populate missing values in the dataset.

Show Suggested Answer

Suggested Answer: B

by Carey at May 07, 2022, 12:14 AM

Limited Time Offer

25%

Off

Get Premium MLS-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

yak22

3 years ago

The correct answer is 'A' because first it allows to encode categorical data with one hot encoding, second the input variables (dollar amounts) are in different units they need to on same scale for that 'Standardization' method will be used. Third based on train & test accuracy the model is overfitted so it should be regularized (constrained) using L1.

upvoted 1 times

...