Deal of The Day! Hurry Up, Grab the Special Discount - Save 25% - Ends In 00:00:00 Coupon code: SAVE25
Welcome to Pass4Success

- Free Preparation Discussions

Amazon Exam AIF-C01 Topic 4 Question 1 Discussion

Actual exam question for Amazon's AIF-C01 exam
Question #: 1
Topic #: 4
[All AIF-C01 Questions]

A company has built a solution by using generative AI. The solution uses large language models (LLMs) to translate training manuals from English into other languages. The company wants to evaluate the accuracy of the solution by examining the text generated for the manuals.

Which model evaluation strategy meets these requirements?

Show Suggested Answer Hide Answer
Suggested Answer: A

Contribute your Thoughts:

Alease
28 days ago
RMSE? Really? That's more for measuring numerical accuracy, not text quality. I don't think that's what the company is looking for here.
upvoted 0 times
...
Colton
29 days ago
I think F1 score could also be useful in evaluating the accuracy of the solution, as it considers both precision and recall.
upvoted 0 times
...
Ma
1 months ago
I'm not convinced BLEU is the best option. Shouldn't we also consider ROUGE, which is better for evaluating text summarization? Hmm, decisions, decisions.
upvoted 0 times
Kirby
9 days ago
I think we should consider ROUGE as well, it's better for text summarization.
upvoted 0 times
...
...
Marvel
1 months ago
I'm not sure, but I think C) Recall-Oriented Understudy for Gisting Evaluation (ROUGE) could also be a good option for evaluating text generation.
upvoted 0 times
...
Margurite
1 months ago
BLEU seems like the obvious choice here. It's designed specifically for evaluating machine translation, which is exactly what this company is trying to do.
upvoted 0 times
Xochitl
12 days ago
Yes, BLEU is widely used in the field for assessing the quality of generated text.
upvoted 0 times
...
Leontine
14 days ago
I agree, BLEU is the best choice for evaluating machine translation.
upvoted 0 times
...
...
Ashley
1 months ago
I agree with Dierdre, BLEU is commonly used for evaluating machine translation.
upvoted 0 times
...
Dierdre
1 months ago
I think the best model evaluation strategy for this scenario is A) Bilingual Evaluation Understudy (BLEU).
upvoted 0 times
...

Save Cancel