A
GenAI chat arena: Compare model responses
We start with a multimodal chat project. During the setup, we previously selected two frontier models to compare. 

Pre-loaded models include Gemini 2.5 Pro, Claude 3.7, Grok 3, OpenAI o3, and more. You can also load a custom model. The tool can compare up to 10 models at once. Below we can examine and edit our ontology. Here you can customize the ranking, classification, and other annotation fields you want captured on the responses.

Interactive tour:
Chat arena for model comparison

Click below (or use arrow keys) for a 2-minute click-through demo

GenAI chat arena: Compare model responses