Human Pose Estimation in Videos ( Detail... )
In this paper, we present a method to estimate a sequence of human poses in unconstrained videos. In contrast to the commonly employed graph optimization framework, which is NP-hard and needs approximate solutions, we formulate this problem into a unified two stage tree-based optimization problem for which an efficient and exact solution exists. Although the proposed method finds an exact solution, it does not sacrifice the ability to model the spatial and temporal constraints between body parts in the video frames; indeed it even models the symmetric parts better than the existing methods. The proposed method is based on two main ideas: `Abstraction' and `Association' to enforce the intra- and inter-frame body part constraints respectively without inducing extra computational complexity to the polynomial time solution. Using the idea of `Abstraction', a new concept of `abstract body part' is introduced to model not only the tree based body part structure similar to existing methods, but also extra constraints between symmetric parts. Using the idea of `Association', the optimal tracklets are generated for each abstract body part, in order to enforce the spatiotemporal constraints between body parts in adjacent frames. Finally, a sequence of the best poses is inferred from the abstract body part tracklets through the tree-based optimization. We evaluated the proposed method on three publicly available video based human pose estimation datasets, and obtained dramatically improved performance compared to the state-of-the-art methods.
Video Object Segmentation ( Detail... )
The goal of video object segmentation is to detect the primary object in videos and to delineate it from the background in all frames. Video object segmentation is a well-researched problem in the computer vision community and is a prerequisite for a variety of high-level vision applications, including content based video retrieval, video summarization, activity understanding and targeted content replacement. The proposed approach has several contributions: First, a novel layered Directed Acyclic Graph (DAG) based framework is presented for detection and segmentation of the primary object in video. We exploit the fact that, in general, objects are spatially cohesive and characterized by locally smooth motion trajectories, to extract the primary object from the set of all available proposals based on motion, appearance and predicted-shape similarity across frames. Second, the DAG is initialized with an enhanced object proposal set where motion based proposal predictions (from adjacent frames) are used to expand the set of object proposals for a particular frame. Last, the paper presents a motion scoring function for selection of object proposals that emphasizes high optical flow gradients at proposal boundaries to discriminate between moving objects and the background.
Video Object CoSegmentation ( Detail... )
We propose a novel approach for object co-segmentation in arbitrary videos by sampling, tracking and matching object proposals via a Regulated Maximum Weight Clique (RMWC) extraction scheme. The proposed approach is able to achieve good segmentation results by pruning away noisy segments in each video through selection of object proposal tracklets that are spatially salient and temporally consistent, and by iteratively extracting weighted groupings of objects with similar shape and appearance (with-in and across videos). The object regions obtained from the video sets are used to initialize per-pixel segmentation to get the final co-segmentation results. Our approach is general in the sense that it can handle multiple objects, temporary occlusions, and objects going in and out of view. Additionally, it makes no prior assumption on the commonality of objects in the video collection. The proposed method is evaluated on publicly available multi-class video object co-segmentation dataset and demonstrates improved performance compared to the state-of-the-art methods.
Face Verification ( Detail... )
In this project, we propose to extract cross-image features, i.e. features across the pair of images, which, as we demonstrate, is more discriminative to the similarity and the dissimilarity of faces. Our features are designed for face verification problem instead of face detection. We collect a large bank of cross-image features using filters of different sizes, locations, and orientations. Consequently, we use AdaBoost to select and weight the most discriminative features.
Computer Vision in Dynamical Scenes ( Detail... )
Traditional computer vision methods for Structure from Motion (SFM), 3D reconstruction and camera pose estimation are mostly designed for static scenes, in which there is no moving objects in the scenes. However, the static scene assumption of the methods severely limits the applications, because most of the real world scenes are dynamical scenes. In this project we have proposed a framework to deal with all kinds of computer vision applications in dynamical scenes. Details coming soon...