Developed in the heart of the Brainport region, boasting more than 150 thousand users worldwide and yet, the Open Machine Learning platform is largely unknown in the local industry. With the support of initiator Joaquin Vanschoren, Georgo Angelis from TUE’s High Tech Systems Center and Eindhoven AI Systems Institute wants to change that with his startup PortML.
When Joaquin Vanschoren started developing the Open Machine Learning platform about six years ago, it was out of need and out of frustration. As a researcher at the KU Leuven, he kept running into the same walls when he wanted to use machine learning techniques. “How can I get access to many datasets? And how can I properly compare different machine learning algorithms?” describes Vanschoren some of his daily obstacles. “The challenge was – and still is – that most datasets aren’t accessible or, at least, require weeks of work before they’re useful. Moreover, what’s published in research papers is often very difficult to reproduce, if it’s even possible at all. Especially when there’s a commercial company behind it, they contain a lot of marketing. When you try for yourself, it often doesn’t work.”

Vanschoren started the OpenML platform as an open-source project because his ambition was too big for one person to achieve. His initiative was quickly picked up by the research community and now about twenty people are contributing to the tool. “Mostly volunteers,” says Vanschoren, currently an associate professor at Eindhoven University of Technology. “Initially, they were predominantly PhDs who, like myself, were struggling with the same challenges but now, more and more people from the industry are getting involved.”

The basic idea behind OpenML is that it should be an open platform where datasets are easily available and where you can find algorithms that are relevant to your problem. “An accessible interface to all machine learning research,” summarizes Vanschoren. At the moment, OpenML serves a community of about 150 thousand users worldwide. Understandably, a similar tool wasn’t available. “Commercial parties have little interest in transparency. They rather hold their cards close to their chest. However, they can benefit from having such a platform for internal use – something we realized quickly during the development. Large companies like Amazon have their own tools, of course, but for most companies and organizations, it’s unfeasible to do it themselves.”
Enterprise version
It’s precisely that last point that triggered Georgo Angelis from TUE’s High Tech Systems Center and the Eindhoven AI Systems Institute (EAISI) to start his own company, PortML, in collaboration with the OpenML Foundation and the Eindhoven university. “In the academic world, OpenML has many users but in industry, the platform is still largely unknown,” says Angelis. Although the tool is available for anyone, and companies could start right away, there’s some reluctance. “That’s understandable considering the open character of OpenML. The uploaded datasets, the models, the algorithms – it’s all public. Commercial companies aren’t too keen on sharing these kinds of data with everyone.”
Speaking with potential industrial users, Angelis notices that many are interested but that they indeed aren’t happy with releasing all their valuable data. “Within PortML, and supported by the OpenML community, we’re working on an enterprise version,” tells Angelis. “Currently, we’re in the pilot phase with several companies to really understand which features are required from an industrial point of view.”
“Openness is pivotal for OpenML, but companies want to safeguard their data,” Vanschoren adds. “PortML tries to find the middle ground by building a platform that combines the advantages of access to the latest research with the requirements from industry.”

Michelin chefs
Angelis expects the first beta version of OpenML for industrial users to become available this quarter. “Data scientists are specialists who use every tool they can find to optimize their machine learning flow,” Angelis points out. “They can approach OpenML from their toolset through an API. They can keep using their own environment – whether it’s Python, or R, or any other suite – and use the stand-alone version of OpenML to organize all data, algorithms and models, and make them suitable for reuse.”
Does OpenML also make sense for smaller companies that more often than not lack an in-house data scientist? Angelis definitely thinks so: “Bigger companies can use OpenML to improve the efficiency of their process, increase the quality and automate several steps. For smaller companies, OpenML lowers the threshold to start with machine learning. With some mouse clicks, they have access to what’s already available and reach a sufficiently good solution. That result may not be next-level, but it will surely be a big step in the right direction. They’ll benefit from the recipes created by Michelin chefs.”
Vanschoren: “When you want to build a machine learning model, you need to take an incredible amount of decisions. Which algorithms, which models, which parameters, to name a few. Currently, you need a PhD – or at least someone with a lot of experience in machine learning – to create efficient models. Because there’s so much data and metadata available in OpenML, we can learn from ourselves. We use machine learning to decide what will work and what won’t.” That’s the research field of automatic machine learning, or AutoML, focusing on good search algorithms that find the best solution for a given dataset. “The resulting solution may not be a panacea but for many small and medium-sized companies, it can be very insightful to experience what they can do with machine learning, and it will give them a perfect starting point to build on.”

Angelis is looking emphatically for collaboration with the outside world. The companies interested in the pilot phase are involved in healthcare, manufacturing and mobility. “We at HTSC/EAISI want to contribute to the power of the Brainport region. So obviously, we’ve started with our own network in the high-tech,” he explains. “Later, we plan to expand to, for instance, telecom or finance. To me, this article is a pitch for the industry. We want to work together with companies to get to the most optimal version of OpenML for that target audience.”