How to Build Large Scale Cyber-Physical Systems

To build large-scale safety-critical systems, we need to decompose the system into smaller solvable problems, resolve what is known, and resolve unknowns through experiments, Robin Yeman argued at QCon New York. She suggested investing in test environments for both software and hardware early to enable being test-driven early to increase the safety, security, reliability, and availability of the systems.

Robin Yeman spoke about building large scale cyber physical systems at QCon New York.

There are several challenges in building hardware-reliant cyber-physical systems, such as hardware lead times, organisational structure, common language, system decomposition, cross-team communication, alignment, and culture. People engaged in the development of large-scale safety-critical systems need line of sight to business objectives, Yeman said. Each team should be able to connect their daily work to those objectives.

Yeman suggested communicating the objectives through the intent and goals of the system as opposed to specific tasks. An example of an intent-based system objective would be to ensure the system can communicate to military platforms securely as opposed to specifically defining that the system must communicate via link-16, she added.

Yeman advised breaking the system problem down into smaller solvable problems. With each of those problems resolve what is known first and then resolve the unknown through a series of experiments, she said. This approach allows you to iteratively and incrementally build a continuously validated solution.

Each requirement should be composed and written as an objective test case that does not need to be interpreted, Yeman said. Large-scale systems are built with multiple teams. The requirement is interpreted by multiple people with various mental models. Requirements that leave room for interpretation will be interpreted differently resulting in problems, as Yeman explained:

Think of the Mars Orbiter we lost in 1999 due to a navigation error caused by a mix-up of units—specifically, the spacecraft’s thrusters were controlled using the metric system, while NASA’s ground-based software used the Imperial (English) system for calculations.

Yeman advised making investments into test environments for both software and hardware early. In many cases we do not test early enough, she said. The Waterfall lifecycle shows a test at the end which results in purchasing environments later in the life cycle. For agile we want to test early and often.

Being test-driven increases the safety, security, reliability, and availability of the systems, Yeman mentioned:

Lockheed Martin ensures that programs such as F-35 or Orion leverage models and digital twins early to verify and validate early in the life cycle. Digital twins are cyber copies of the physical system. These are significant investments, but they allow us to validate system behaviours early. Digital twins range from low fidelity to high fidelity depending on the stage of the lifecycle and the investment made. High fidelity twins are connected to the physical system and process full telemetry.

Modelling physical systems in the digital space gives feedback far earlier, closer to the timelines of software for hardware when we leverage models and digital twins.

InfoQ interviewed Robin Yeman about building cyber physical systems.

InfoQ: How do you deal with compliance and safety regulations using agile?

Robin Yeman: Begin with the constraint using security test-driven development and compliance-driven development. This forces us to build compliance and safety into the system, as opposed to bolting on after we have built the system.

A good example is when you build software for a classified environment. If you build it in an unclass open environment, you do not realise how many calls are going out to the internet for libraries. The software developed in this way will never work in a classified environment. Each library needs to be downloaded and placed in the environment. Beginning development in the closed environment will show what items can actually be built.

InfoQ: What benefits can we get from this approach?

Yeman: The earlier the feedback, the less rework and we can mitigate risk early. There have been times when I have used an Excel spreadsheet to evaluate how to test requirements before implementation and have found in some cases > 30% of requirements are not testable.

By getting that feedback early before solutions have been deployed, I have saved millions of dollars and countless hours of frustration. Be like a carpenter: measure twice and cut once.

About the Author

Ben Linders

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Write for InfoQ

About the Author

Ben Linders

Rate this Article

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter