Automated simulation pipeline for defence data foundations
1 December 2025 • Defence sector client • Defence / Simulation
1 run at a time
Manual
Thousands per day
Automated
Training ML models to predict real-world behaviour requires large volumes of high-quality simulation data. When simulations are run manually, one at a time, with results gathered by hand, the data bottleneck blocks the entire ML initiative before it starts.
The challenge
Our client needed to run thousands of simulation scenarios across a wide range of input parameters to build a data foundation for ML models. The existing process was entirely manual: an engineer would configure a simulation, run it, wait for it to complete, and then collect and organise the output by hand.
This created several problems:
- A single engineer could only run a handful of simulations per day, far short of the volume needed for meaningful ML training data
- There was no traceability: it was impossible to reliably link a batch of output data back to the exact configuration that produced it
- When simulations failed, there was no visibility into what went wrong. A failed run meant starting over from scratch
- No automatic recovery: transient failures required manual intervention and re-runs
- The manual process could not scale, and it was blocking downstream ML work that depended on having a large, well-structured dataset
Our approach
We designed and built a fully automated simulation pipeline that could schedule, execute, monitor, and store simulation runs around the clock with no manual intervention.
Scheduling
Prefect · Cron triggers
Simulation Execution
Monte-carlo scenarios
Data Collection
MinIO · Batch capture
Processing
Validation · Transforms
Storage
PostgreSQL · MinIO
ML-ready Dataset
Training · Evaluation
Pipeline orchestration
Prefect serves as the orchestration engine. The client's infrastructure spans multiple on-premise machines: dedicated servers for running simulations and separate servers for post-processing. Prefect's worker and work pool model was a natural fit, allowing us to assign workloads to the right machines without building custom distribution logic. Simulation runs are scheduled via configurable triggers, and Prefect manages the full lifecycle of each run: dispatching to the correct worker pool, monitoring progress, handling retries on failure, and recording outcomes. The pipeline runs autonomously 24/7. Failed simulations are automatically retried with configurable backoff, and operators have full visibility into run status, throughput, and failure rates through Prefect's built-in dashboards.
Traceability and metadata
Every simulation run is linked to its exact configuration parameters in PostgreSQL. This gives the team full lineage from input settings through to output dataset batches. Any data point in the resulting dataset can be traced back to the simulation run and configuration that produced it. This audit trail supports both reproducibility and compliance requirements.
Data storage and processing
MinIO provides S3-compatible object storage for simulation outputs, running entirely on-premise. Automated post-processing pipelines validate and transform raw simulation output into ML-ready datasets. Batch tracking gives clear visibility into which simulation batches completed successfully and which need attention, making it straightforward to identify and re-run problematic batches.
Security and reliability
The entire stack runs on-premise on Linux servers with no external network dependencies. PostgreSQL, MinIO, and Prefect are all self-hosted within the client's infrastructure. No simulation data leaves the network, meeting strict data sovereignty requirements.
Reliability was a core design constraint. The pipeline handles hardware failures and transient errors gracefully: every task is idempotent and safely retryable. Prefect tracks the state of every run, so if a machine goes down mid-simulation, the work is automatically reassigned and restarted. Structured logging and alerting ensure the team is notified of failures before they compound. The result is a system that runs unattended for weeks at a time without data loss or silent failures.
Results
The pipeline replaced a manual process that could produce a handful of simulation runs per day with a fully autonomous system.
1000s
Simulations per day
up from single manual runs
24/7
Autonomous operation
no manual intervention
Full
Data lineage
traceability and sovereignty
Automatic
Failure recovery
retry, logging, and alerting
Full on-premise deployment with zero external network dependencies. The data bottleneck was removed, and the client's ML team could train models on a data foundation that was previously impossible to produce at the required scale: all running within their own infrastructure, with full traceability and complete data sovereignty.
Client feedback
“What used to take an engineer all day now runs overnight without anyone touching it.”
“We finally have the data volume to train models properly. The pipeline removed the bottleneck we had been stuck on for months.”
Working on a similar challenge?
We build AI systems for defence and critical infrastructure clients across Northern Europe. Let's talk about what's possible for your environment.
Let's talk