Executive summary of the proposal
SoLSTiCe is a fundamental research project which aims at designing new models and tools for representing and managing images and videos in order to, e.g., retrieve images or videos which are similar to a query image or video; recognize objects in images; track objects in videos; or detect typical activities in videos. To tackle those applications, a solution is to describe images by global features such as, e.g.,histograms or eigen-features. A major current trend is to use bag-of-visual-words (BoVW) models, the basic idea of which is to extract local features from small image regions so that images are mapped into a vector space of visual words. However, BoVW models as many other global models proposed in the literature, do not integrate structural information such as spatial or temporal relationships holding between local features which hinders their applicability to realistic problems requiring large discriminance. The lack of structural information can be an advantage as it is easier to make the models invariant to a large class of transformations. However, the drawback is their lack of ability to model geometrical and temporal relationships between parts of objects and actions, which is required for complex applications.
In this project we would like to explore locally structured data (LSD), which combine visual features (such as interest points, segmented regions or visual words) with discrete structures (such as strings, trees, combinatorial maps or, more generally, graphs) in order to model local (spatio-temporal) relationships holding between these features. Using LSD for classi_cation, recognition or indexing tasks will bring us to study 3 main issues:
- Extracting LSD from images and videos: We extract relevant visual features and structure them w.r.t. spatial and temporal relationships.
- Measuring the similarity of LSD: We design relevant similarity measures for comparing LSD, and e_cient algorithms for computing these measures.
- Mining LSD: We characterize LSD by means of frequently (or infrequently) occurring patterns (itemsets, sequences or graphs) and use them to create discriminative features for solving computer vision tasks.
Two main issues in computer vision that motivate our LSD are the need to deal with occlusions and non-rigid objects. We propose to validate our new models and tools on three computer vision applications which share this need, i.e., the recognition of human actions and events in videos, the tracking of objects in videos and the recognition of objects in 3D scenes from 2D images or videos. These applications remain open problems and own complementary constraints mainly due to the di_erent media types that are addressed (2D (+ t), 3D and 3D+t).
Download the project : solstice