Skip to Content
ResearchArboria Swarm Benchmark

The Arboria Swarm Benchmark

A fixed set of scenarios and baselines. Every Arboria paper — and any follow-on work that wants to be comparable — reports against the same matrix. This is what separates “we tuned flocking” from “we made measurable progress against prior art.”

Source: gossamer.benchmarks.

Scenarios

Each scenario defines an initial state, a per-step reward, and a terminal metric. The full reference is in gossamer/benchmarks/scenarios.py; short form:

NameQuestionTerminal metricAgent range
dispersalHow fast can a clumped swarm spread without colliding?Mean NN distance at termination100 – 10,000
rendezvousHow fast does a scattered swarm meet at a common point?Final mean distance to centroid (lower is better)100 – 10,000
coverageExplore a bounded region; maximize cells visited per unit time.Unique cells visited / total cells500 – 50,000
leader_followerOne agent is exogenously driven; keep the swarm within range.Mean follower distance to leader path100 – 10,000
predator_preyAdversarial agents chase. Measure survival and evasion.Survival rate100 – 10,000
byzantineInject k% silently faulty agents. Measure robustness of the base scenario.Terminal metric under perturbation100 – 10,000

Baselines

Every new policy must report against the same set so reviewers have a stable reference point:

  • random — uniform random accelerations. Lower bound.
  • greedy — scenario-appropriate hand-crafted greedy solution (go-to-centroid for rendezvous, push-from-nearest for dispersal, persistent random walk for coverage, etc.).
  • gossamer_flocking — classical Boids via gossamer.algorithms.coordination.flocking.flock_step.
  • mappo — learned policy trained via gossamer.learning.mappo against the same scenarios.

Add your own policy to the leaderboard by implementing a Baseline callable ((pos, vel, rng) -> accel) and passing it to gossamer.benchmarks.run_benchmark.

Leaderboard

Run leaderboard() over any subset of scenarios and baselines and generate_leaderboard_md() emits a Markdown table ready to paste into a paper or the docs. The canonical leaderboard is regenerated on every tool release and published alongside latest.mdx.

from gossamer.benchmarks import leaderboard, generate_leaderboard_md results = leaderboard(num_seeds=5) print(generate_leaderboard_md(results))

Reproducibility

Each benchmark row carries: scenario, baseline, num_agents, steps, seed, metric, mean_reward, elapsed_sec. Seeds are published; rerunning with the same seed yields identical numbers.

The benchmark harness uses a pure-NumPy stepper by default so the suite has no physics-engine dependency at test time. For numbers that feed directly into a Leviathan paper, re-run with ENGINE_MODE=inprocess to route through the actual C++ core.

Last updated on