Close panel

Close panel

Close panel

Close panel

The BBVA Foundation Frontiers of Knowledge award goes to the mathematician who made artificial vision possible for robots

Japanese researcher Takeo Kanade has received the BBVA Foundation Frontiers of Knowledge Award in the Information and Communication Technologies category for developing the algorithms that underlie the current capabilities of computers and robots in comprehending and interpreting visual images and scenes. His work has enabled key technologies such as self-driving vehicles, robotic surgery and facial recognition systems, and has revolutionized the broadcasting of sports events.

La Fundación BBVA premia al matemático que hizo posible la visión artificial de los robots

The award committee remarked that the work of this professor at Carnegie Mellon University (United States) “has significantly transformed the technological world in which we live.” The emergence of self-driving vehicles, robots that assist surgeons in all kinds of operations, facial recognition systems for accessing our cell phones, and sports broadcasts offering panoramic replays of match highlights from multiple angles; all these advances owe a large debt to the contributions of this Japanese researcher.

“As is evidenced by the fact that the visual cortex occupies the dominant portion of the human brain, vision or visual information processing provides humans with our richest and the most important information channel for understanding and communication. AI and robots with similar or even better computer vision capabilities contribute to the betterment of our lives. I see a lot of opportunities,” said Professor Kanade in an interview granted shortly after hearing of the award.

The father of 3D machine vision

Just as humans and animals need two eyes to see in depth, three-dimensional artificial vision requires the merging of images from at least two cameras. But the first artificial vision algorithms were designed to process just one image, and using them to combine several images was too slow a process to be useful in practice.

These algorithms, which provided 2D computer vision, would analyze a video frame by frame trying to reconstruct the objects appearing and deduce how they move. But this method is ruled out if each frame consists of merged images from multiple cameras, given the sheer amount of computing power required.

Kanade realized that, rather than merging each frame then tracking the movement of the objects, it would be far quicker to use the object motion information recorded by each camera to understand how the image is moving, even before combining the videos. “Once we understand that, there’s no need to send all the color or video information, we can just send the motion only,” explained Kanade. This is the idea underpinning the method developed by Kanade, alongside his doctoral student Bruce Lucas, to capture the shape of objects and deduce the speed and direction of their movement.

Even so, 3D images require much more computing power than their 2D equivalents. To get around this problem, Kanade and Carlo Tomasi devised a way to drastically simplify the calculations that the computer must run to process the images.

Takeo Kanade, Carnegie Mellon University (United States).

Autonomous cars, helicopters and drones

With the techniques developed by Kanade and his associates, two Carnegie Mellon researchers crossed the U.S. by freeway, coast to coast, in one of the first autonomous vehicles ever built, manually engaging the accelerator and the brake but hardly touching the steering wheel. This was back in 1995, and “No Hands Across America”, as the project was called, showed that a van could steer itself autonomously, relying solely on the information from its cameras.

Kanade's work has also been used as a blueprint for robots operating in settings such as restaurants, airports and museums. The techniques proposed by the awardee are also built into the drones in use today and all robots equipped with visual capabilities. And Kanade has recently been working on an autonomous helicopter capable of tracking an object.

Advances in medical scanners and robotic surgery

Computer vision is also a core enabling technology for robotic surgery, a burgeoning field whose expansion owes much to the techniques pioneered by Kanade.

It was in fact Kanade himself who, together with his team, developed the first robotized system for hip replacement surgery. Called HipNav, it achieved much greater precision in the placement of the prosthesis with a far less invasive procedure than traditional surgery, reducing the risk of side effects like dislocation. The possibility of real-time tracking of the exact position of the patient’s pelvis was key to this success.

Moreover, thanks in no small measure to Kanade’s contributions, it is now possible to design robots capable of performing simple medical tests, like certain ultrasound scans, and detecting possible pathological regions. “Many villages have no hospitals,” remarks committee member Oussama Khatib, Professor of Computer Science and Director of the Robotics Laboratory at Stanford University (United States).

“So we are trying to set up small clinics with a robot that can perform very simple scanning, and can detect via an algorithm if there is anything suspicious that requires further tests.” This same robot, he adds, can be connected to a hospital, however distant, and controlled remotely by a radiologist, who can run more detailed tests without the patient having to travel.

‘Virtualized reality’ for a 360 view of sporting highlights

In 2001, the Super Bowl final, the most watched program on American TV, featured a technological breakthrough in the field of computer vision that forever changed the way sports are broadcast, and Professor Kanade himself appeared on screen to explain it to the viewers.

The new technique, in effect, enabled 360-degree reproduction. To get this all-round view, a scene has to be recorded with a number of cameras, but with Kanade’s methods it is possible to obtain images of that scene independently of the cameras’ actual viewing angles, or to reconstruct any vantage point from a video recorded by a moving camera. This is the basis of the “virtualized reality” that has transformed sporting events by allowing viewers, for instance, to follow a football match from the ball’s point of view or use the hawk-eye in tennis.

“When the term virtual reality appeared in the 1990s,” Kanade recalls, “people worked mostly on creating artificial worlds with computer graphics. But I thought it would be more interesting to start from reality; in other words, to input reality to the computer so it becomes virtual.” It was to highlight this aspect and distinguish his proposal from the artificial worlds beginning to emerge that the researcher coined the concept of “virtualized reality.”

The system’s debut at the 2001 Super Bowl, under the name EyeVision, was the first time viewers could enjoy replays of the game’s highlights in panoramic mode. “The stadium had 33 cameras mounted on the upper deck, looking onto the field, and when there was a beautiful move, the broadcaster could replay it, spinning around the main player. It was like the big scene in The Matrix, where the camera seems to circle the protagonist,” remarks Kanade: “And now this 360-degree replay is used in almost every sport.”

Technologies conceived to improve quality of life

Kanade is confident that, in a few years time, his work will facilitate the spread of “quality-of-life technologies,” particularly through robots and other devices that “can help older people or people with disabilities to live independently.”

He also insists that “this technology is not just for leisure and entertainment. It may also be useful, for example, in coordinating the response to humanitarian emergencies through virtual reconstructions of disaster-hit zones.”

He has concerns, he admits, about the possible misuse of some of the technologies his work has helped develop: “I hate to see how artificial intelligence and computer vision are being applied to create the likes of deep fake videos.”

In fact, in 2010, Kanade and his team made a video in which President Obama speaks Japanese, with the images generated from a recording of the researcher himself. “It was a fun experiment, but the intention was serious, and we had meaningful applications in mind,” he recalls today. “For example, we wanted to better understand human facial expressions and the effects of certain gestures, like head or eye movements, to help people who have difficulty communicating smoothly, and we were also working on the creation of avatars to participate virtually in video conferences.”

In any case, Kanade is convinced that technology will be able to detect artificially generated videos to prevent their malicious use: “It should be easy to certify whether an image is genuine or non-genuine, and add a watermark to identify frauds. That said, it saddens me that this technology has the potential for harm, due to misuse by certain people.”