Amazon Exam AIF-C01 Topic 4 Question 1 Discussion

Actual exam question for Amazon's AIF-C01 exam

Question #: 1
Topic #: 4

A company has built a solution by using generative AI. The solution uses large language models (LLMs) to translate training manuals from English into other languages. The company wants to evaluate the accuracy of the solution by examining the text generated for the manuals.

Which model evaluation strategy meets these requirements?

ABilingual Evaluation Understudy (BLEU)

BRoot mean squared error (RMSE)

CRecall-Oriented Understudy for Gisting Evaluation (ROUGE)

DF1 score

Show Suggested Answer

Suggested Answer: A

by Coleen at Sep 11, 2024, 01:37 AM

Limited Time Offer

25%

Off

Get Premium AIF-C01 Questions as Interactive Web-Based Practice Test or PDF

Contribute your Thoughts:

Submit Cancel

Alease

5 months ago

RMSE? Really? That's more for measuring numerical accuracy, not text quality. I don't think that's what the company is looking for here.

upvoted 0 times

...

Colton

5 months ago

I think F1 score could also be useful in evaluating the accuracy of the solution, as it considers both precision and recall.

upvoted 0 times

...

6 months ago

BLEU seems like the obvious choice here. It's designed specifically for evaluating machine translation, which is exactly what this company is trying to do.

upvoted 0 times