Joost Visser is a professor of software and data science at Leiden University. Together with Alex Serban, Holger Hoos and Koen van der Blom, he’s currently looking into the relevance of software engineering for industrial machine learning systems.

24 February

Artificial intelligence (AI) is undeniably experiencing a new wave of attention, energy and sky-high expectations. This wave is driven by the abundance of data that’s being generated in our connected, digital society, and by the low-barrier availability of enormous computational resources. Among the various AI techniques, machine learning, in particular, has come to play a key role.

Machine learning allows us to solve complex problems, not by arduously writing new code, but by letting an existing algorithm learn new behavior from examples. We’re now witnessing breakthrough results in image recognition, speech processing, medical diagnostics, securities trading, autonomous driving, product design and manufacturing, and much more.

Relative interest in search terms according to Google Trends.

Does ML’s rapid ascent mean that software systems will no longer need to be programmed? Will we need data scientists instead of software developers? To those that have experienced software-related project delays, system outages and indefinitely incomplete feature sets, a world without programmers might seem attractive.

There are several reasons why machine learning will not replace programming, but rather make the software engineering discipline even richer and more complex. First, ML algorithms are themselves software that needs to be developed, tested, and maintained. Second, using an ML algorithm requires programming, for the tasks of ingesting, cleaning, merging and enhancing data, for feeding the data into the algorithm, for running repeated training experiments to generate, evaluate and optimize an ML model and for testing, integrating, deploying and operating ML models in production systems. Third, trained ML models are just one building block in the construction of complex software systems.

Around the globe, numerous organizations are learning step by step how to develop software systems that include ML components. With an increasing number of people self-identifying as ML engineers, the discipline of machine learning engineering is emerging. This raises interesting questions. Is ML engineering distinct from software engineering? Or is one a subdiscipline of the other? Do established software engineering best practices apply equally when building software systems with ML components? Or do these best practices need to be modified or replaced? Can a canonical set of ML engineering best practices be identified by which practitioners can be guided and newcomers can be educated?

To investigate these questions, researchers in the fields of software engineering and machine learning have teamed up. We’ve started with an extensive review of both scientific and popular literature, to identify which practices are described and recommended by practitioners and researchers. These practices range from data management (eg how to deal with storage and versioning of large data sets), through model training (eg how to run and evaluate training experiments), to operations (eg how to deploy and monitor trained models).

We then embedded the identified practices in a survey among representatives of teams that build software with ML components. This survey is currently in progress and open for new participants. At the time of writing, about 200 teams have participated. Early results show that some practices are widely adopted, and can be considered ‘basic’, while other practices are only applied by more experienced teams in larger organizations.

We’ll use the results of our survey to organize the best practices into a comprehensive catalog. This requires sorting out the level of difficulty of each practice, their interdependencies and their applicability in various contexts. Our objective is that the resulting catalog will help the formation and effectiveness of ML engineering teams.

Meet us on 11 March at the Machine Learning Conference in ’s-Hertogenbosch.