Google Exam Professional Data Engineer Topic 2 Question 29 Discussion

Actual exam question for Google's Professional Data Engineer exam

Question #: 29
Topic #: 2

[All Professional Data Engineer Questions]

You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?

AIncrease the share of the test sample in the train-test split.

BTry to collect more data and increase the size of your dataset.

CTry out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.

DIncrease the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.

Show Suggested Answer

Suggested Answer: D

by Barbra at May 06, 2022, 06:18 AM

Limited Time Offer

25%

Off

Get Premium Professional Data Engineer Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Currently there are no comments in this discussion, be the first to comment!