Solving Semantic Segmentation: Precision Matrix, Knowledge-Based Rules and Generator Adversarial Network (GAN)
Figure-ground separation is a land-mark problem in visual perception, which has fascinated many scientists for centuries. In computer vision, edge detection and region segmentation of an image have been two grand challenges for understanding of an image in terms of objects and contextual surroundings, and shapes and appearances of objects. Generic Segmentation of an image involves grouping pixels, which are perceptually similar. However, in Semantic Segmentation the aim is to assign a semantic label to each pixel in the image. Even though semantic segmentation can be achieved by simply applying classifiers (which are trained using supervised learning), to each pixel or a region in the image, the results may not be desirable due to the fact that general context information beyond the simple smoothness is not considered. In this talk, I will start with briefly presenting two supervised approaches to address this problem. First, I will discuss an approach to discover interactions between labels and regions using a sparse estimation of precision matrix, which is the inverse of covariance matrix of data obtained by graphical lasso. In this context, we find a graph over labels as well as regions in the image which encodes significant interactions and also it is able to capture the long-distance associations. Second, I will introduce a knowledge-based method to incorporate dependencies among regions in the image during inference. High level knowledge rules - such as co-occurrence, spatial relations and mutual exclusivity - are extracted from training data and transformed into constraints in Integer Programming formulation.
A difficulty which most supervised semantic segmentation approaches are confronted with is lack of enough training data, particularly in deep learning methods which have become enormously popular recently. Annotated data should be at the pixel-level (i.e., each pixel of training images must be annotated), which is highly expensive to achieve. To address this limitation, next I will present a semi supervised learning approach to exploit the plentiful amount of available unlabeled as well as synthetic images generated via Generative Adversarial Networks (GAN). Furthermore, I will discuss an extension of the model to use additional weakly labeled data to solve the problem in a weakly supervised manner. The basic idea here is by providing these fake data from the Generator and the competition between real/fake data (discriminator/generator networks), true samples are encouraged to be close in the feature space. Therefore, the model learns more discriminative features, which lead to better classification results for semantic segmentation.
Dr. Mubarak Shah, the UCF Trustee Chair Professor, is the founding director of Center for Research in Computer Visions at University of Central Florida (UCF). He is a co-author of five books (Motion-Based Recognition (1997); Video Registration (2003); Automated Multi-Camera Surveillance: Algorithms and Practice (2008); Modeling, Simulation and Visual Analysis of Crowds (2013); and Robust Subspace Estimation Using Low-Rank Optimization (2014), all by Springer. He has published extensively on topics related to visual surveillance, tracking, human activity and action recognition, object detection and categorization, shape from shading, geo registration, visual crowd analysis, etc. Dr. Shah is a fellow of IEEE, IAPR, AAAS and SPIE. He has been ACM and IEEE Distinguished Visitor Program speaker and is often invited to present seminars, tutorials and invited talks all over the world. He received Pegasus award in 2006; University Distinguished Research Award in 2017, 2012 and 2005; Faculty Excellence in Mentoring Doctoral Students in 2016, Scholarship of Teaching and Learning award in 2011; Teaching Incentive Program award in 1995 and 2003; Research Incentive Award in 2003, 2009 and 2012; the Harris Corporation Engineering Achievement Award in 1999; the TOKTEN awards from UNDP in 1995, 1997, and 2000; 2009 IEEE Outstanding Engineering Educator Award in 1997; an honorable mention for the ICCV 2005 Where Am I? Challenge Problem; 2013 NGA Best Research Poster Presentation, 2nd place in Grand Challenge at the ACM Multimedia 2013 conference; and runner up for the best paper award in ACM Multimedia Conference in 2005 and 2010.