Skip to main content
Configure evaluations to verify that your agents provide the results that you expect. Create test cases with prompts and expected responses, run evaluations to generate agent responses, then review results to identify where your agents can improve.

Before you start

You must create an agent in the Agent builder.

Configure evaluation

Configure an evaluation with test cases that define prompts and expected responses.
1

Start configuration

  1. In CDF, navigate to Atlas AI > Evaluate agents.
  2. Select + Create new evaluation.
2

Enter evaluation details

  1. Enter a Name for your evaluation.
Use descriptive names like “Pump maintenance agent - v1” to help you identify your evaluations.
  1. Add a Description to explain the purpose of the evaluation.
3

Define test cases

Add test cases to define prompts and the expected responses.
  1. In the Test cases section, enter a Prompt.
  2. Enter the Expected response. Atlas AI compares your expected response to the agent’s response to test performance.
  3. Optionally, choose an agent and select Generate answer to use the agent to generate the expected response.
Be specific about what details, like operational status, metrics, or alert information, you want agents to include in their responses.

Run and monitor evaluation

Run an evaluation to test your agent. Monitor real-time progress as the agent processes test cases.
  1. Go to Evaluation overview > Run evaluation.
  2. Use the search or filter to find your agent by name, status, or latest version.
  3. Select the agent from the list.
  4. Select Run with selected agent to run the evaluation.
  5. If Atlas AI prompts you to allow code tool access for your agent, select Confirm and run to continue running the evaluation or Cancel to go back to the list of agents.
When you run the evaluation, Atlas AI sends your prompts to the agent and records the responses in real time. This can take several minutes depending on how many test cases you have. You can select Cancel run to stop the evaluation at any time.
Do not close the browser tab while the evaluation is running. Evaluations run in the browser, and closing or navigating away from the tab will stop the evaluation.

View and analyze results

Review evaluation results to identify where agents need improvement and track performance across multiple runs.
  1. On the Evaluation overview tab, review the list of completed evaluations.
  2. Select an evaluation to view the status of your test cases.
  3. Select View details to compare the agent responses to your expected responses for each test case.
Use evaluation insights to improve your agent by refining instructions in the Agent builder, updating tools or data access, or editing test cases.

Learn more