本课程能让机器学习从业者掌握评估生成式和预测式 AI 模型的基本工具、方法和最佳实践。要确保机器学习系统在实际运用中提供可靠、准确、高效的结果,做好模型评估至关重要。 学员将深入了解各项评估指标、方法及如何在不同模型类型和任务中适当应用这些指标和方法。课程将着重介绍生成式 AI 模型带来的独特挑战,并提供有效解决这些挑战的策略。通过利用 Google Cloud 的 Vertex AI Platform,学员可学习如何在模型选择、优化和持续监控工作中实施卓有成效的评估流程。
This course delves into the complexities of assessing the quality of large language model outputs. It examines the challenges enterprises face due to the subjective and sometimes incorrect nature of LLM responses, including hallucinations and inconsistent results. The course introduces various evaluation metrics for different tasks like classification, text generation, and question answering, such as Accuracy, Precision, Recall, F1 score, ROUGE, BLEU, and Exact Match. It also explores evaluation methods offered by Vertex AI LLM Evaluation Services, including computation-based, autorater, and human evaluation, providing insights into their application and benefits. Finally, the module covers how to unit test LLM applications within Vertex AI.
Model Garden is a model library that helps you discover, test, and deploy models from Google and Google partners. Learn how to explore the available models and select the right ones for your use case. And how to deploy and interact with Model Garden models through the Google Cloud console and APIs.