The World Health Organization estimates that there are more than 25 million Europeans living with a visual impairment, with more than 75 percent of them projected to be unemployed. To assist this population with increased mobility independence and enhanced text reading capabilities, the Yes!Delft start-up Envision is developing its own hands-free, AI-based solution for smart glasses.
Envision, a member of the Yes!Delft high tech start-up incubator, is the brainchild of co-founders Karthik Mahadevan and Karthik Kannan: the Karthiks. As an industrial design student at Delft University of Technology, Mahadevan’s master’s thesis work was on finding ways to improve independence for the visually impaired. But it wasn’t until after that, while on holiday in India, that the inspiration struck the Karthiks and the Hague-based start-up got off the starting blocks.
While on vacation, Madadevan, a designer, and Kannan, an engineer, were invited to a children’s school for the blind to give a talk about their career choices. The topic: what it really meant to be an engineer and a designer. For the Karthiks, the answer was simple: it was all about finding solutions for problems. After the session, the founders stayed to talk to the kids about some of the problems the children would like to solve when they grew up. “The central theme we received was about independence. They wanted to be able to read books and go out independently,” recalls Kannan. “That triggered us to think about what we could do and how we could actually solve this independence problem.”
Almost immediately the duo got started on their new project – Envision. But it didn’t take long before the realization struck them, “You can’t expect the world around you to change for you. You can’t put a braille display everywhere; you can’t make everything braille compatible. The best way to give independence is to help people access the world, exactly the way it is,” explains Kannan.
Text in the wild
By adopting this ethos, Envision jumped into the mobile app development arena. Currently, the Envision app has more than 40,000 active users on IOS and Android, almost entirely through word of mouth referrals. The app utilizes computer vision, a field of technology where computers are used to process and understand digital images and videos, essentially mimicking and automating the way the human visual system operates. The idea for Envision was to find a way to implement this technology into two specific areas of focus: text recognition and object recognition.
Text recognition is an especially complex and complicated task. From personal communications to street signs and the numerous screens of everyday living, text is nearly everywhere and comes with countless fonts, forms and overlays – making it very tricky for normal OCR (optical character recognition) systems to recognize. With the emphasis of experiencing the world as it is, the Karthiks invested heavily into reading what they refer to as “text in the wild” – technically known as scene text. Today, Envision can recognize virtually any text, including hand-written, in any font and can do so in almost sixty different languages – something not many other solutions offer.
Object recognition, Envision’s second point of focus, is still largely experimental and not as developed as the text recognition. Currently, the most common use is to make sense of images from the internet, social media and communications apps, like Whatsapp. When an image is received, it can be shared with the Envision app, which will then describe the image and speak out any text held within. By uploading photos and creating a custom database for each user, the app is also designed to be trained and customized. This means the user can tailor it to assist in the detection of specific objects or even people in their surroundings (ie friends, neighbors, colleagues, et cetera).
From a technology standpoint, Envision’s key focus was to make the system work offline as much as possible. To achieve this, the company took a two-step approach. First, it utilizes a technique called quantization, a method to trim the fat – so to speak. Basically, quantization prunes the machine learning model, which can be very bulky, and removes the non-essential layers. This compresses the size while retaining the overall accuracy of the model itself. Second, the start-up works heavily on preprocessing images. In a typical machine learning setup, the images being sent are full-sized and can take a while to extrapolate. But, by processing the images before they’re sent to the AI model, inference of the image can be made much faster – which is useful when working with a cloud-based model.
Using the Envision app as a stepping-stone, the self-described software start-up is ready for the next phase of development: embedding its AI system into smart glasses. According to Kannan, “From the beginning, our aim was to put this software on the smart glasses. It’s the best and most user-friendly application for visually impaired people. When they’re out walking, using a cane or guide dog in one hand and having a phone in the other hand isn’t optimal for safely moving around. Glasses can offer more of a hands-free experience.”
To kick off this transition to hands-free smart glasses, Envision is teaming up with the municipality of The Hague for a new pilot program. Starting in September, the start-up will partner with ten visually impaired people from The Hague experiencing a distance to the workplace. Sponsored by the municipality, Envision will provide each of them with a prototype of its intelligent glasses to help them live more independently and to get reacquainted into the workforce.
This shift in focus doesn’t come without difficulties. Smart glasses aren’t a new idea. Various systems started popping up in the early 2010s, but these devices were never fully accepted by the market. The social aspect of technology is incredibly important, particularly in wearables, and bulky stigmatizing glasses certainly won’t be accepted by those with a visual impairment, or anyone else. Additionally, despite advances in technology over the decade, the processing power of the glasses is nowhere near that of a smartphone, and designing software that can retain the effectiveness of the Envision app, that is then able to be implemented into smart glasses is certainly no small feat.
To simplify some of the challenges, Envision wants to partner with companies already established in the smart glasses market. After all, it’s not a manufacturing company, it’s a software company, and that’s where it wants to leverage its potential. In fact, the Yes!Delft start-up is already talking to the American company Vuzix, an industry leader in wearable display tech, as well as technology juggernaut Google, as contributors to the Google Glass 2. “That’s a big thing for us,” voices Kannan. “Not only is Google Glass 2 much more streamlined, but it looks like normal sunglasses and is much more powerful than the existing smart glasses on the market today.”
Another benefit of working with experts like Vuzix and Google is that, in terms of development, they’re very open. Envision is already taking advantage of their SDKs (software development kits), some of which are very robust and allow users to make use of functions like the touch gestures on the glasses, operating the camera or using the onboard processor. Kannan: “From a development perspective, since they all run on Android, we can build common modules for all of them and then reuse the elements in different instances, which is exactly what we’re doing right now with two or three different prototypes.”
Over the next few years, Envision will be focused on ushering smart glasses as a solution for the visually impaired community to be more mainstream. They also have plans to offer similar solutions for other groups of people, like those with cognitive disabilities like dementia and dyslexia, as these are some of the people that are already using the app and could benefit from the glasses. In the long term, Envision sees itself as a camera plus AI company. “We believe that the camera is going to be the user interface within the next 5-10 years,” proclaims Kannan. “It’s going to be very ubiquitous and our aim is to go ahead and build AI that uses the camera as its input medium.”