Cyber-physical systems simulation for critical infrastructure validation
Simulation
05 / 15 / 2026

Key Takeaways
- Cyber-physical systems need validation that treats software, timing, sensors, and plant behaviour as one loop.
- Critical infrastructure testing works best when scenarios are ranked by operational consequence and recovery impact.
- Acceptance criteria matter only when they tie test results to safe response, service continuity, and operator clarity.
Critical infrastructure validation works only when you test timing, sensing, and control as one closed loop.
Cyber-physical systems connect software logic to electrical, mechanical, and process behaviour, so failure often appears at the boundary between code and physics. About 85% of the nation’s critical infrastructure is owned and operated by the private sector. That mix of legacy assets, custom controls, and sector rules means you can’t rely on generic software verification. You need simulation and testing that reproduces signals, delays, faults, and operator actions before deployment.
“A design can look correct in offline analysis and still trip, stall, or hunt in operation.”
Cyber-physical systems couple computation with physical process timing

Cyber-physical systems combine computation, communications, and physical processes under timing limits that matter to safety and service. A controller scan that arrives late can change valve position, relay pickup, or braking force. That tight coupling makes validation a systems task. Software tests alone won’t capture it.
A feeder relay shows the point clearly. Current samples enter an analogue front end, protective logic calculates a trip, and a breaker coil must act within a set window. A water pump station works the same way because pressure, motor speed, and network delay feed back into control logic. Each path has timing and physical consequences.
You need to model the plant and the controller in one loop so the system behaves as it will in service. That means reproducing sensor resolution, actuator limits, communication jitter, and fault transients. Teams that treat cyber and physical pieces separately usually miss compound faults. Those misses show up late, when fixes cost more and field confidence is lower.
Validation fails when timing assumptions stay untested
Validation fails when engineers assume time is a constant rather than a test variable. Scan periods, task pre-emption, bus latency, and interrupt handling all shift control response. A design can look correct in offline analysis and still trip, stall, or hunt in operation. Timing needs direct testing.
A compressor control loop makes this visible. The model may hold pressure within limits at a 1 millisecond step, yet the embedded target can miss its deadline during a communications burst and command the wrong valve state. The fault won’t appear in code review. It appears when execution time, I/O updates, and plant response interact.
Timing tests need more than worst case CPU numbers. You should inject latency, vary scheduler load, and record the exact point where control quality degrades. That approach separates harmless delay from delay that changes physical behaviour. Once you know the boundary, design changes become specific instead of speculative.
Sensor simulation must match field behaviour under stress
Sensor simulation has to reproduce how field instruments misread, saturate, drift, and recover under stress. Clean signals produce clean results, but critical systems rarely operate on clean signals. If you want valid evidence, your test inputs must include the same imperfections that shape controller response in service.
A distance relay fed with ideal current waveforms will appear stable until a fault pushes a current transformer into saturation. A pipeline pressure controller can look accurate until the transmitter adds noise after a pump start. Those cases are not edge details. They decide trip timing, alarm quality, and operator trust.
Good sensor simulation covers range limits, quantization, missing packets, stuck values, and calibration error. You also need the right fault duration because a 20-millisecond dropout and a 2 second freeze trigger different logic. Matching field behaviour matters more than signal volume. A small set of believable sensor faults tells you far more than thousands of perfect runs.
Embedded systems testing needs closed-loop execution fidelity
Embedded systems testing needs closed-loop execution so code, I/O, and plant response interact at the same rate. Open-loop playback can confirm functions, but it won’t expose unstable control, missed deadlines, or unsafe state transitions. Fidelity here means the controller sees consequences immediately. That is what turns a bench check into validation.
A motor drive controller shows why. Firmware that looks solid in a software debugger can oscillate once pulse updates, ADC sampling, and load torque hit the hardware with exact timing. Teams often use platforms such as OPAL-RT to connect controller hardware to a plant model and verify those interactions before a field test. That setup shows the difference between code that runs and control that holds.
Closed-loop fidelity is not about maximum detail everywhere. You need high fidelity at the interfaces that set control response, especially PWM timing, interrupt service, network exchange, and protection logic. Less important subsystems can stay simplified if they do not change those interfaces. That balance keeps tests credible without turning the model into a maintenance burden.
| Validation focus | Why it matters in practice |
|---|---|
| Controller task timing must match deployed scheduling. | Small deadline slips can alter trip logic even when average CPU load looks safe. |
| Sensor inputs must include noise, saturation, and dropout. | Controllers tuned on ideal data often mishandle faults or issue nuisance alarms. |
| Actuator limits must be part of the loop. | Ignoring deadbands and slew limits hides recovery problems after disturbances. |
| Network effects must be injected as delay and loss. | Protection and supervisory functions can diverge once messages arrive late or out of order. |
| Operator actions must be tested as discrete events. | Manual override paths often create unsafe transitions that automation tests never touch. |
Critical infrastructure validation starts with consequence-based scenarios
Critical infrastructure validation should start with consequence, because not every fault deserves the same test depth. A nuisance alarm on a redundant panel is different from a missed trip on a feeder or a wrong open command on a valve. Test scope follows impact. That is how scarce lab time stays useful.
Sector breadth makes that prioritization necessary. Canada groups critical infrastructure into 10 sectors, and each sector mixes different processes, failure costs, and response times. A hospital backup power controller needs a different scenario set than a wastewater lift station. Uniform test plans look tidy on paper and fail in practice.
A consequence-based plan ranks scenarios by service loss, safety exposure, recovery time, and operator workload. That ranking helps you choose which faults need hardware in the loop, which need longer endurance runs, and which can stay in model review. You won’t eliminate all uncertainty. You will focus effort where a wrong assumption carries the highest cost.
“Security evidence has to include the physical consequence of the cyber event.”
Power grid protection needs hardware-in-the-loop fault studies

Power grid protection needs hardware in the loop fault studies because relays act on fast transients, not on averaged behaviour. Pickup thresholds, logic timing, and breaker coordination depend on exact current and voltage waveforms. A static model will miss that detail. Fault studies have to run at protection speed.
A feeder protection test can inject a single line to ground fault, current transformer saturation, breaker failure, and delayed teleprotection in one sequence. The relay then sees the same waveform distortion and timing pressure it will face on the system. That is where underreach, overreach, and nuisance tripping become visible. Settings that seemed conservative can prove unstable under stressed conditions.
Grid teams also need to test restoration logic after the trip. Reclosing timers, synch checks, and blocking signals can interact in awkward ways after a disturbance. That is why protection validation covers the full event, from prefault steady state to postfault recovery. A pass based only on the initial trip leaves the risky part of the sequence untouched.
Security testing must measure deterministic response under attack
Security testing for cyber-physical systems must measure deterministic response, not only intrusion detection. An attack matters because it alters timing, data quality, or control state at a specific moment. If your test cannot show how the plant responds within that window, you don’t yet know the operational risk.
Consider a replay attack on a substation measurement stream. The packet content can look valid while the time stamp is stale, which can hold a controller in an unsafe state or delay a trip. A water treatment skid faces a similar problem when a spoofed level signal suppresses a pump shutdown. Security evidence has to include the physical consequence of the cyber event.
Deterministic response testing focuses on bounded outcomes. You should measure maximum safe delay, maximum tolerable packet loss, and the point where fallback logic takes control. That method gives operators rules they can use under stress. It also keeps security work tied to plant performance instead of abstract severity scores.
Acceptance criteria should reflect operational risk before deployment
Acceptance criteria should reflect operational risk before deployment, because a passing test only matters if it maps to service, safety, and recovery goals. You need thresholds that say when timing error is acceptable, when sensor deviation is tolerable, and when fallback behaviour is mandatory. That is the standard that counts.
A useful acceptance set is specific enough to fail a weak design and simple enough for teams to apply consistently. One relay test might pass on trip speed and still fail on reset stability after a breaker lockout. A pump controller might keep flow within limits and still overload an operator with ambiguous alarms. Good criteria reflect the full operating sequence, not a single clean moment.
- Latency limits are tied to the protective or control function they affect.
- Sensor error bounds include drift, dropout, and saturation under faulted states.
- Fallback modes restore a safe state within a defined recovery window.
- Operator actions remain clear when alarms, overrides, and trips occur together.
- Pass results require repeatable performance across nominal and stressed runs.
Teams that keep these criteria visible build better judgment over time because every test result maps to a concrete operating risk. That discipline matters more than a long report full of isolated pass marks. OPAL-RT appears in many labs for exactly this execution work, where controller hardware, plant models, and fault cases need to run as one timed system. The value comes from disciplined test design and honest acceptance thresholds, because that is what turns simulation into evidence.
EXata CPS has been specifically designed for real-time performance to allow studies of cyberattacks on power systems through the Communication Network layer of any size and connecting to any number of equipment for HIL and PHIL simulations. This is a discrete event simulation toolkit that considers all the inherent physics-based properties that will affect how the network (either wired or wireless) behaves.


