AWS releases "Bedrock Model Evaluation" to optimize AI development AI NEWS

Home
AInews
AWS releases "Bedrock Model Evaluation" to optimize AI development

AWS releases "Bedrock Model Evaluation" to optimize AI development

2023-11-30

At the AWS re:Invent conference, Swami Sivasubramanian, Vice President of Databases, Analytics, and Machine Learning at AWS, announced the launch of Model Evaluation on Bedrock and is currently offering a preview version. The purpose of this tool is to help users better evaluate the models in their Amazon Bedrock repository. Without a transparent model testing method, developers may use less accurate models to complete their projects or use models that are too large for their use cases. Sivasubramanian stated, "Model selection and evaluation should not only be done at the beginning but should be repeated regularly." He also emphasized the importance of human involvement and provided a convenient way to manage human evaluation workflows and model performance metrics. Sivasubramanian previously mentioned that some developers are unsure whether they should use a larger model for their projects because they assume a more powerful model will meet their needs. However, they later discover that they could have built a smaller model instead. Model Evaluation consists of two parts: automated evaluation and manual evaluation. In automated evaluation, developers can access their Bedrock console and select a model for testing. They can then evaluate the model's performance based on metrics such as robustness, accuracy, or toxicity, which are applicable to tasks such as summarization, text classification, question answering, and text generation. Bedrock includes popular third-party AI models such as Meta's Llama 2, Anthropic's Claude 2, and Stability AI's Stable Diffusion. Although AWS provides a test dataset, customers can also import their own data into the benchmarking platform to gain a better understanding of the model's performance. The system will then generate a report. For manual evaluation, users can choose to collaborate with AWS's manual evaluation team or their own team. Customers must specify the task type (e.g., summarization or text generation), evaluation metrics, and the dataset they want to use. AWS will provide customized pricing and schedules for customers working with their evaluation team. Vasi Philomin, Vice President of AI at AWS, stated in an interview that a better understanding of model performance can guide development. It also allows companies to assess whether a model meets certain responsible AI standards, such as having an appropriate level of sensitivity to toxicity. Philomin said, "It is important for our customers to understand which model is best suited for them, and we are providing a better evaluation method." Sivasubramanian also pointed out that when humans evaluate AI models, they can detect other indicators that automated systems cannot, such as empathy or friendliness. Philomin stated that AWS does not require all customers to perform benchmark testing on their models because some developers may have already used basic models on Bedrock or have an understanding of what models can do for them. However, companies that are still exploring which models to use may benefit from the benchmark testing process. AWS stated that although the benchmark testing service is currently in preview, they will only charge for the model inference used during the evaluation period. Although there are no specific standards for AI model benchmark testing, certain industry-accepted metrics exist. Philomin stated that the goal of benchmark testing on Bedrock is not to extensively evaluate models but to provide companies with a way to measure the impact of models on their projects.

RECENT AI TOOLS

Tattoo Sai

Bolt.new

Langfuse

Aitubo

IllumiDesk

RECENT AI NEWS

NVIDIA CEO Jensen Huang Envisions Future Tech Giant with 50,000 Employees and 100 Million AI Assistants

Lidwave Raises $10 Million to Advance 4D LiDAR Technology

Photoshop Update: Comprehensive AI Feature Upgrades

Chinese Academy of Sciences Discovers Five Ultra-Short-Period Planets Using Artificial Intelligence

Key Microsoft AI Researcher Sebastien Bubeck Joins OpenAI

Adobe Launches Firefly Video Model, Officially Enters the Generative AI Video Space

China Academy of Information and Communications Technology and Tencent Sign Artificial Intelligence Cooperation Agreement

Tesla Robotaxi Launch Causes Stock Decline, Musk's Net Worth Decreases

RECENT AI TOOLS

Tattoo Sai

Bolt.new

Langfuse

Aitubo

IllumiDesk

HubSpot Campaign Assistant

VFusion3D

Revid AI

Shortspilot