Artificial intelligence is not just making our lives easier by automating mundane and tedious tasks, it is also unlocking myriad possibilities to people with disabilities and promising unique ways of experiencing the world. Thanks to AI-powered technologies, the assistive technology industry has advanced far beyond wheelchairs, prostheses or vision and hearing aids t.
For example, AI-powered technologies can give a voice to non-verbal people: BrightSign glove is a machine-learning-based tool that translates sign language to speech; empowering signers with the ability to communicate independently with the world. Advances in computational linguistics are helping people with autism and dementia overcome their cognitive disabilities by simplifying and summarizing digital content. Robotic exoskeletons are promising a whole new life to people with motor disabilities, and self-driving cars will soon allow visually-impaired people to freely explore their physical environment without relying on others.
Computer vision methods such as object recognition, scene understanding, visual question answering (VQA), and visual dialogue hold great promise in easing the lives of about 1.3 billion people with visual impairments.
Computer vision meets visual assistive technology
Functioning in our physical environment assumes we have the ability to see. Small daily tasks such as reading menus at restaurants, finding restrooms, reading instructions on how to cook a packaged meal, or merely looking in the mirror to determine if their clothes match are among the daily challenges that blind people encounter while navigating their physical environment. Vision and language models can assist blind people in overcoming these challenges, mainly through VQA, which enables blind people to take images of objects, ask questions about these images, and get timely spoken answers.
How does VQA work? Almost ten years ago, a group of researchers developed the VizWiz app, which enabled blind users to take pictures with their phones, ask questions about these pictures and receive almost real-time spoken answers from remote sighted employees. VQA models have advanced since then. Researchers from the computer vision community took advantage of the data collected through the app and put together the VizWiz dataset using more than 31,000 questions collected from visually impaired users. The question remained, however: How could they develop “sighted” VQA models under natural settings that could answer these questions?
Achieving this task is difficult as the natural setting in which these real-life images and natural language questions originate reflect the challenges of standard VQA datasets and models that are developed primarily under artificial settings.
This year, the European Conference on Computer Vision (ECCV) featured a VizWiz workshop, which included a VizWiz Grand Challenge urging the research community to address the challenges of the VizWiz dataset and the VQA task at large and come up with new approaches that meet the needs of blind people.
VizWiz: VQA for blind people
Whereas VQA models and algorithms have shown remarkable progress over the past few years, they usually perform well under artificially curated datasets with high-quality and clear images and direct written questions that the algorithm can easily identify and respond to.
The VizWiz dataset, on the other hand, is based on real-life data originating from blind people. For example, the images provided are often of poor quality. Moreover, the questions asked are mostly conversational or suffer from audio recording issues. Additionally, in many cases, the questions cannot be answered because of irrelevant or out-of-focus images. To solve these issues, the VizWiz Grand Challenge included two tasks for the dataset: To predict the answer to a visual question and to predict whether a visual question cannot be answered.
Many researchers participated in the challenge, including the SAP Machine Learning Research team. The results were very encouraging, pointing towards a future where machines could lend an eye to visually impaired people and answer their questions accurately and promptly. There are already some promising visual assistive technologies, including Seeing AI and OrCam, which use different computer vision methods.
However, as the technology is still in its infancy, the significant breakthrough is yet to come. The VizWiz challenge initiated dialogues among the research community to stimulate further research and develop VQA systems to better meet the needs of blind people. This should lead to cutting-edge visual assistive technologies that can facilitate simple tasks and transform the lives of blind people by granting them greater independence and freedom.
Learn more about VizWiz:
Learn more about SAP’s machine learning research: