Jurgen Vinju is a full professor of automated software analysis at Eindhoven University of Technology and the leader of the software analysis and transformation group at Centrum Wiskunde & Informatica, the Dutch national research institute for mathematics and computer science.

13 May 2016

The specialists who create, maintain and extend software are having trouble managing its exponentially increasing complexity, especially after the initial development phase. Context-specific analysis can help, TUE professor Jurgen Vinju argues.

Most of today’s innovation takes place in software, and the prominence of software’s role increases every day. Its impact is profound in technology, but it is also strongly felt in our society and our personal environments. Sometimes we can see it directly; most of the time it plays its part more discreetly, hidden in the background. In recent decades, software has become an omnipresent, pervasive and dominant force in our daily lives, and this development is still far from its end.

The sheer volume of software is growing rapidly, due to both strong demand for new applications and extensions to existing systems. At the same time its complexity has been increasing exponentially. Software’s complexity naturally grows much faster than its volume. This is mainly due to a lack of time to reconsider the software’s internal design to accommodate evolving requirements or remove deprecated functionality. Other factors add even more dimensions of complexity to our code bases. These include the distributed character of today’s software systems, and the freedom and expressive power of modern programming languages and tooling.

The tools to support this forward engineering direction in the software development process are reasonably well understood from a scientific point of view, and form part of every modern development environment. As they are being developed alongside new programming concepts and languages, they allow programmers to build increasingly complex software systems in a highly efficient and well-managed way. Provided, that is, that the programmers are working in a green field.

Generally, however, software is read far more often than it is written. During the lifetime of a typical program, bugs need to be found and fixed, some components will be exchanged for others, new functionality needs to be incorporated, and – hopefully – existing code can be reused in other packages. At this point software development becomes an evolutionary process, requiring a very different kind of tooling to analyze the intricacies of an organically growing code base.

Unfortunately, the tools to support this reverse engineering direction are far more difficult to develop than those for the forward direction. To provide information at a higher level of abstraction, they need to extract the meaning and purpose of the source code, qualities that mostly existed only in the heads of the original programmers. These people worked their way through the requirements to understand the what, how and why of the program; they interacted with customers, users and colleagues about parts that needed further clarification; and they were probably busy getting things done rather than meticulously logging every administrative detail of the process. In general, reading the resulting code and making sense of it becomes harder and harder as the software changes over time and its original context fades into history.

It should come as no surprise that unmanageable complexity leads to errors, which lead to extra costs at best and accidents at worst.

Rascal

The solution to this problem is to incorporate knowledge of a very different nature than we do now into the analysis phase of our reverse-engineering tools. This includes technical information on the interfaces and coding standards used, as well as business information on terminology and processes. This additional input is expected to be specific to a particular sector or industry, or even to an individual organization or development process. The result will be an advanced type of software analysis tooling that is domain-specific or context-specific.

This new direction in the development of tools for forward engineering analysis is still in its infancy, however. It requires a lot of fundamental research before its benefits can be reaped. So I propose context-specific analysis as an important research area, with the explicit goal of understanding software complexity and finding new ways to prevent, manage and mitigate it.

What we are trying to accomplish is to clearly separate the reusable ‘heavy lifting’ in analysis tools from the hopefully lightweight specializations needed for specific contexts. To do this, we first transform source code into reusable abstract objects and relations, and then analyze these models in conjunction with context-specific information. Examples of the latter are knowledge about which specific APIs, platforms and coding standards are being used, along with the professional terminology and idioms.

That requires us to link up to particular sectors and industries, which often have this type of knowledge available in a formalized or semi-formalized form. Think, for example, of domain-specific languages (DSLs), which provide specialized features for particular domains, and UML code, representing software systems, workflows and business processes. So reaching out to specific sectors, finding industry partners and talking to experts are important parts of this quest.

For our research we rely heavily on Rascal, a meta-programming system aimed at code analysis, code transformation (such as refactoring) and the implementation of DSLs. As a matter of fact, Rascal itself is a domain-specific language catering to meta-programmers. We can already read source code and transform it into abstractions for analysis for Java and PHP. We are currently starting on the connectors for C and C++. Developing these connectors has now become an engineering problem rather than a scientific feat.

The main research questions concern the analysis phase. How do we incorporate context-specific knowledge in our tools, and how do we process the resulting information? At the other end we need to find out how to query these abstractions quickly and intelligently. That end is a scientific challenge in itself.

Collaboration

What drives us in this direction is the reward this scientific research will hopefully bring. Context-specific analysis will allow us to see not only what the source code says but also what it means. It will let us ask context-specific questions. Since more specific questions lead to more specific answers, those answers will be more relevant to the software engineering experts – and their managers – than current state-of-the-art software tools can provide.

All in all, context-specific analysis is expected to facilitate higher-quality software at lower cost, especially after the initial development phase. Naturally, to get there we need collaboration and an extensive exchange of knowledge between software researchers and software engineers. Making sense of software is eminently a multidisciplinary affair.

Edited by Nieke Roos