Two models can reach the same answer,
but one takes far more tries to get there
By tracking learning trajectories, we find that the most efficient optimizer is frequently not the one with the highest final score.
What models are the most efficient?
The table below compares search efficiency against a standard classical algorithm (GP-UCB). We evaluate how often each model outperforms this baseline, how close its overall path stays to parity, and how often scoring its full trajectory changes its ranking. Click any model to view its learning curve over the 30-iteration budget.
By-model comparison
| ModelClick a row to learn more | Beats baseline? |
How efficient is the path? |
Best final result |
Most efficient path |
Rank shifts |
|---|
Does scientific context help?
We tested the common assumption that providing scientific context (such as specific protein names or units) helps LLMs optimize better. By stripping away this terminology in a 'domain-agnostic' control group, we isolated how these domain priors can affect in-context learning.
Explore the task set
Scientific optimization tasks differ widely in how models approach them. Some tasks yield the same winner regardless of the metric used, while others reveal sharp disagreements between final scores and path efficiency.
| TaskClick a row to learn more | Winner changes? |
Beats baseline? |
Does context help? |
Endpoint winner |
Trajectory winner |
|---|