Teaching computers to see

UC Davis computer science professor researches computer vision

Yong Jae Lee, an associate professor in the UC Davis Department of Computer Science, recently won the National Science Foundation CAREER Award, a grant that provides $501,000 over a period of five years to early-career faculty who have demonstrated potential to make significant advances in their fields.

Lee’s specialty is to teach computers how to “see.” The niche of computer science that seeks to emulate the human visual system is known as “computer vision.”

Computer vision involves developing algorithms that process large numbers of digital images and videos and detect patterns associated with various objects. In a very simple example, a computer program trying to learn what a dog looks like would first be fed millions of images containing a dog and millions of images containing no dogs.

First, it attempts to learn the differences between images that have a dog in them and images that do not. Then, it tries to find patterns and similarities among the images with dogs in them: for example, it may detect two ears, two light shapes (eyes), and a dark spot (the nose) among all images containing dogs and conclude that this pattern is what determines a dog.

Current image-recognition techniques rely on images that have been painstakingly annotated by humans. Consider, for example, a photo of an airplane. A human annotator would have to go in, draw a box around the category they’re interested in (this is known as a “bounding box”), and label it as an airplane. Easy enough, right?

Not so easy, it turns out.

“What happens when we have have thousands of categories? Human annotation can get very expensive,” Lee said.

Lee’s research seeks to eliminate the need for expensive, time-consuming and cumbersome human annotation by developing “weakly-supervised” algorithms. Weakly-supervised algorithms require very little human annotation and labeling. Instead of having a bounding box and label around item in a photo, the photo would have no bounding boxes. It would also have minimal labeling: for example, just “plane” or “dog in grass.”

Lee noted that people have been working on weakly-supervised computer vision algorithms for a while.

“Typically, weakly-supervised learning has been done using only images. We’re adding video,” Lee said. “In video, things are changing, objects are moving. This motion information can be useful for any algorithm trying to learn about the world.”

Lee noted that his research relies on a combination of photo and video. While photos can provide a diverse breadth of information about an object category, videos can teach an algorithm more deeply about a category.

For example, consider an algorithm trying to learn what a dog looks like. Photos can expose an algorithm to thousands of different kinds of dogs. But videos can show the algorithm a more complete picture of the dog, such as how it looks like at a variety of angles.

Lee also teaches ECS 174, an undergraduate course on computer vision. According to his TA, Yash Bhartia, Lee makes an effort to ensure that class projects reflect the most recent and relevant computer vision techniques.

Students in ECS 174 also have the opportunity to build a Video Search Tool application. The application allows a user to search video footage. For the purposes of this project, students are asked to build an application that allows users to search through footage from the TV show “Friends.” For example, if a user searches for “Joey in a red shirt”, the application returns footage frames containing the “Friends” character Joey in a red shirt.

“The reason I like this project is because people who are not familiar with computer science can still recognize and appreciate the complexity of the task,” Bhartia said.

The potential applications of Lee’s research and similar efforts are many and varied. One such application is building robots to navigate our world. They would need to be equipped with sophisticated object detection and recognition systems in order to learn quickly about their surroundings.

Harshita Kaushal, a second-year studying computer engineering at UC Davis, worked on precisely such a system during her internship at Intel last summer. She implemented an algorithm for Intel’s Autonomous Driving Platform that would allow a computer to accurately detect depths of various objects.

“This is important for self-driving cars, so that they can accurately navigate the road,” Kaushal said. “After working on this project, I truly understood the far-reaching applications of it, beyond just identifying faces in your photos.”

There are also ethical dilemmas posed by a computer system that is able to perfectly recognize all objects — including human faces. But as with all technologies, Lee says that there will be “good” and “bad” uses of computer vision technology.

A positive use of human facial recognition would be an intelligent home assistant for elderly people or children. Such a system would need to be able to accurately recognize the faces of the people it is monitoring.

“It is still definitely something to be cautious about,” Lee said. “We need to be aware of security and privacy concerns and think ahead about potential negative consequences.”

Written by: Nausheen Sujela — science@theaggie.org

Teaching computers to see

LEAVE A REPLY Cancel reply

Related Articles

Sections

Resources

Best of Davis

Projects