New Year Sale ! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Databricks Exam Databricks-Certified-Professional-Data-Scientist Topic 5 Question 22 Discussion

Actual exam question for Databricks's Databricks-Certified-Professional-Data-Scientist exam
Question #: 22
Topic #: 5
[All Databricks-Certified-Professional-Data-Scientist Questions]

You are working in a classification model for a book, written by HadoopExam Learning Resources and decided to use building a text classification model

for determining whether this book is for Hadoop or Cloud computing. You have to select the proper features (feature selection) hence, to cut down on the size of the feature space, you will use the mutual information of each word with the label of hadoop or cloud to select the 1000 best features to use as input to a Naive Bayes model. When you compare the performance of a model built with the 250 best features to a model built with the 1000 best features, you notice that the model with only 250 features performs slightly better on our test data.

What would help you choose better features for your model?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

Currently there are no comments in this discussion, be the first to comment!


Save Cancel