Understanding complex visual scenes is one of the hallmark tasks of computer vision. Given a picture or a video, the goal of scene understanding is to build a representation of the content of a picture (ie what are the objects inside the picture; how are they related; if there are people in the picture, what actions are they performing; what is the place depicted in the picture; etc.).
With the appearance of large scale databases like ImageNet 1 and Places 2, and the recent success of machine learning techniques such as Deep Neural Networks 3, scene understanding has experienced a great deal of progress. This progress has made it possible to build vision systems capable of addressing some of the above-mentioned tasks 4
This line of research is being undertaken in collaboration with the computer vision group at the Massachusetts Institute of Technology. Our goal is to improve existing algorithms for scene understanding and to define new problems made attainable by recent advances in neural networks and machine learning.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierar- chical image database. In Proc. CVPR, 2009.
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. “Learning Deep Features for Scene Recognition using Places Database.” Advances in Neural Information Processing Systems 27 (NIPS), 2014.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In In Advances in Neural Information Processing Systems, 2012
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. “Learning Deep Features for Discriminative Localization”. Computer Vision Pattern Recognition (CVPR), 2016.