Operator handbook
Daily playbook for teams running Velocity in staging and production.
Last updated October 8, 2025View on GitHub
Run Velocity with confidence
This handbook is written for the people who babysit Velocity day and night: platform engineers, SREs, and operations teams. Use it as a living checklist—keep it close to your runbooks and automation.
Readiness scan
Item | Target state | Verification tip |
---|---|---|
Tooling | Rust ≥ 1.81, Node ≥ 20, velocity-cli installed | rustc --version && node --version && velocity --version |
Network | UDP/4433 permitted end-to-end; fallback TCP path defined | nmap -sU <host> -p 4433 and a one-shot HTTP/3 probe |
Certificates | Hybrid Dilithium + ECDSA chain available, stored securely | velocity-cli cert inspect certs/<name> |
Hosts | Service account owns /etc/velocity (read) and /var/lib/velocity (write) | sudo -u velocity touch /var/lib/velocity/.probe |
Bake these checks into CI or pre-flight jobs so every environment starts from the same baseline.
First hour runbook
- Clone the pilot config: copy the starter YAML, rename it to match the environment, and commit it to your infrastructure repo.
- Generate or hybridise certificates: use
velocity-cli cert hybridize
against real chains; store outputs in your secret manager. - Launch the service:
velocity-cli serve --config /etc/velocity/<env>.yaml
from your automation or systemd unit. - Smoke-test clients: run the Velocity client and a plain HTTP/3 request to verify both the PQ and fallback paths.
- Snapshot telemetry: capture metrics, logs, and the admin diagnostic bundle as your gold baseline.
Configuration layers in practice
- Base file (
serve.<env>.yaml
) holds the long-lived settings: listeners, profiles, ticket rotation, and proxy targets. - Environment variables (
VELOCITY_*
) override secrets or unique host data—perfect for templated deployments. - Edge runtime file (
edge.yaml
) gives you programmable routing, auth, and templated responses without rebuilding binaries. - CLI flags have the last word. Use them sparingly for emergency overrides so the source of truth stays declarative.
Keep version-controlled samples for each layer and run velocity-cli config lint
before committing changes.
Observe and react
- Scrape the Prometheus endpoint and watch handshake latency (
velocity_handshake_duration_seconds
) plus downgrade counters. - Stream structured logs to your SIEM. Tag by
profile
,workload
, anddowngrade_reason
to accelerate incident triage. - Schedule
velocity-cli admin diagnostics --dump
as a cron job so you always have recent snapshots for forensics.
Troubleshooting playbook
Signal | Likely cause | First move |
---|---|---|
CERT_PQ_VALIDATION_FAILED | Missing Dilithium signature or expired chain | Re-issue the hybrid cert, confirm host clock, reload Velocity. |
Spike in downgrades | Clients negotiating HTTP/3 fallback or policy mismatch | Inspect downgrade_reason , adjust profile permits, notify client owners. |
Tickets failing to rotate | Secret not writable or cron misconfigured | Check permissions on /var/lib/velocity , re-run rotation job manually. |
Metrics gap | Scrape target moved or auth token expired | Reconcile Prometheus config, validate bearer tokens, retry scrape. |
UDP drops | Load balancer unaware of QUIC | Switch to UDP pass-through mode or deploy a supported proxy. |
Keep everyone aligned
- Publish the readiness and troubleshooting tables above in your internal wiki.
- Automate ticket rotation (24h max age, 12h rotation) using your secrets platform.
- Review the Security playbooks quarterly and practise the exploit hardening drills.
- Surface lessons learned in the next release retro and feed updates back into this handbook via pull requests.