Event-driven systems in manufacturing
I’ve been teaching software design and architecture for over twelve years now and applying it for over twelve more. A lot of that work has centered around objects, components and services, all of which are more or less structural elements of software systems. A key notion since the early 70s is that our world consists of objects and that it makes sense to represent them one-to-one in software structures. This has been stressed even more in the past 20 years in domain-driven design. Over time, however, both my own experiences and those of the advocates of domain-driven design have made it clear that the world doesn’t exist solely of objects, but that most of these react to things that happen: events.
As a simple example, let’s consider a customer calling our factory to put in a new order. Looking from an object-oriented perspective, we model the customer, the factory employee answering the phone and the phone itself all as objects. Traditionally, these objects invoke operations on each other. Better said, using the terminology used in Smalltalk, the first real object-oriented programming language, they send messages to each other.
In a naive design, the customer sends a dial message to the phone, which connects to the phone in the factory, which then alerts the operator. In real life, however, my phone doesn’t alert me; it just rings and I decide whether I’m alerted or not. I may not even be in the vicinity, in which case I can’t be alerted at all, but the phone still rings.
We all know that, in reality, the phone isn’t going to invoke a pickup operation on the employee. The employee is picking up the phone (or not) as a reaction to it ringing. The ringing is the actual event and the employee handles it by picking up the phone. Similar reasoning can be applied to the customer dialing a number on his phone and the call coming in at the factory phone.
So, we end up with objects triggering and handling events instead of just sending messages to each other. Events are then defined as things that happen in the system. They’re directly connected to a change in the state of that system and contain data that relates to the nature of the change. Thus, object communication is a lot more asynchronous than the invocation of operations in ‘classic’ object-oriented systems.
Flexibility
Enter event-driven architecture. This is an architectural style that considers a system to be a set of components that react to change by handling and/or raising events. Each component can then be an event source or an event handler. In many cases, they’re actually both.
Let’s look at a manufacturing system where a sensor detects a problem with a machine and raises an event. The alarm system then handles the issue by notifying the operators, since problems with machines often require immediate attention. At the same time, the event is also handled by the order system of the machine supplier, triggering an order for relevant spare parts in the supplier’s warehouse. Similarly, an event-driven architecture makes it very easy to track the flow of materials or orders through a workshop or trace the use of containers, trays and mobile equipment through the production process. It allows operational processes and statistical analysis to be handled in the same way.
One of the strengths of an event-driven architecture is that with the right technology in place, different components can be much more decoupled from each other. Distribution and handling of events can be based on standard technology and simple data structures, removing the need for big, product-specific APIs.
These APIs are now often getting in the way of making manufacturing more flexible and more open. A lot of factories operate a software stack that consists, from top to bottom, of an ERP system, a MES system and often a Scada system and multiple PLCs. Each communicates with the layer above or below via predefined interfaces. Every change in the process that requires new or different data leads to (expensive) changes in these interfaces.
Raising a new event and publishing it on an event bus only requires a change in the component that raises the event. Components interested in that event can be modified as needed. An event bus is a piece of communications software built purposely for exchanging events – often in the form of messages. Examples are MQTT brokers, RabbitMQ and Apache Kafka.
Having this in place thus brings a lot of flexibility to the shopfloor and the systems that control it. It allows different parts of the manufacturing system to respond to these events in real time.
Scaling
While event-driven architecture has clear benefits, it also has some disadvantages if not managed properly. In the same way as a person may get overpowered by the amount of stimuli encountered in a big city or a crowded amusement park, an event-driven system, and its maintainers, can get overwhelmed by the large amount and variety of events.
This is where the combination of an event-driven architecture and a unified namespace (UNS) comes into play. A UNS can be compared to a well-organized folder structure on a PC. Just like every file goes into a specific folder in a folder hierarchy, every event can be placed in a specific location in an event hierarchy. A UNS does exactly that: it defines a structure that allows us to position events where they belong. In manufacturing, the ISA-95 hierarchy is often used as a starting point for this, with factory-specific modifications being introduced over time.
With this structure in place as part of the event-handling system, different components can focus on (ie subscribe to) events in the parts of the hierarchy they’re interested in while ignoring the rest. It also makes it possible to introduce an extra component, a so-called historian, that subscribes to all events and stores them in a database. This allows for offline analysis and improvement of production processes.
The combination of an event-driven architecture and a unified namespace also introduces complexity. Even with the UNS structure in place, the number of events occurring is still big. The event processing system being used should be able to deal with all of them and make sure that none get lost or duplicated, which can be disastrous for production.
Even though the UNS helps structure the events and their related data and the event-driven architecture ensures that system components can automatically handle events, there’s always a need for monitoring. Getting this arranged in real-time dashboards, reports and process analysis isn’t trivial. Think only of the number of events that need to be combined to be able to say something useful about the operational efficiency (OEE) of a production line – every item change, item completion, item discard and so on needs to be taken into account.
The combination does allow us to scale systems more easily and make them more resilient to failure. The new components from an additional machine or production line are inserted in the appropriate place in the UNS and known event types are raised and handled. As the event system records all events, including possible failures of production line elements, other components can react and reroute or start a graceful shutdown.