Home | Login
Lectures       Previous announcements
Select year: 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
Seminars in 2014
2014-12-20
    This paper proposes a method for detecting objects carried by pedestrians, such as backpacks and suitcases, from video sequences. In common with earlier work [14], [16] on the same problem, the method produces a representation of motion and shape (known as a temporal template) that has some immunity to noise in foreground segmentations and phase of the walking cycle. Our key novelty is for carried objects to be revealed by comparing the temporal templates against view-specific exemplars generated offline for unencumbered pedestrians. A likelihood map of protrusions, obtained from this match, is combined in a Markov random field for spatial continuity, from which we obtain a segmentation of carried objects using the MAP solution. We also compare the previously used method of periodicity analysis to distinguish carried objects from other protrusions with using prior probabilities for carried-object locations relative to the silhouette. We have reimplemented the earlier state-of-the-art method [14] and demonstrate a substantial improvement in performance for the new method on the PETS2006 data set. The carried-object detector is also tested on another outdoor data set. Although developed for a specific problem, the method could be applied to the detection of irregularities in appearance for other categories of object that move in a periodic fashion.
2014-12-13
    Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset.
2014-12-06
    This paper proposes a vacant parking slot detection and tracking system that fuses the sensors of an Around View Monitor (AVM) system and an ultrasonic sensor-based automatic parking system. The proposed system consists of three stages: parking slot marking detection, parking slot occupancy classification, and parking slot marking tracking. The parking slot marking detection stage recognizes various types of parking slot markings using AVM image sequences. It detects parking slots in individual AVM images by exploiting a hierarchical tree structure of parking slot markings and combines sequential detection results. The parking slot occupancy classification stage identifies vacancies of detected parking slots using ultrasonic sensor data. Parking slot occupancy is probabilistically calculated by treating each parking slot region as a single cell of the occupancy grid. The parking slot marking tracking stage continuously estimates the position of the selected parking slot while the ego-vehicle is moving into it. During tracking, AVM images and motion sensor-based odometry are fused together in the chamfer score level to achieve robustness against inevitable occlusions caused by the ego-vehicle. In the experiments, it is shown that the proposed method can recognize the positions and occupancies of various types of parking slot markings and stably track them under practical situations in a real-time manner. The proposed system is expected to help drivers conveniently select one of the available parking slots and support the parking control system by continuously updating the designated target positions
2014-11-29
    Abstract?Video text which contains rich semantic information can be utilized for video indexing and summarization. However, compared with scanned documents, text recogniton for video text is still a challenging problem due to complex background. Segmenting text line into single characters before text extraction can achieve higher recognition accuracy, since background of single character is less complex compared with whole text line. Therefore, we first perform character segmentation, which can accurately locate the character gap in the text line. More specifically, we get a fusion map which fuses the results of color gradient and log-gabor filter. Then, candidate segmentation points are obtained by vertical projection analysis of the fusion map. We get segmentation points by finding minimum projection value of candidate points in a limited range. Finally, we get the binary image of the single character image by applying K-means clustering and combine their results to form binary image of the whole text line. The binary image is further refined by inward filling and the fusion map. The experimental results on a large amount of data show that the proposed method can contribute to better binarization result which leads to a higher character recognition rate of OCR engine.
2014-11-22
    People reidentification is one of the most challenging tasks in computer vision, and considerable efforts have been directed toward providing solutions to this problem. The existence of extensive camera networks and surveillance systems increases the amount of people images obtained, but, on the other hand, implies the need for new algorithms to enable reidentification of people captured by the cameras. There is no one optimal model that solves the entire problem, but a set of distinctive features can be used to help in the matching process. Our proposal consists of using the orientation of each person captured in the surveillance scene to considerably improve the reidentification process. An iterative algorithm maximizes the number of successful matches and speeds up the process. A comparison with other earlier relevant studies is presented using available datasets.
2014-11-15
2014-11-08
    This paper provides a real-time vehicle classification and ounting system based on WSNs, namely, EasiSee. ?? Accurate vehicle classification. ?? Low-delay real-time performance. ?? Low resource consumption. Propose an event trigger mechanism-CSM(collaborative ensing mechanism), which activates the camera sensor node only when a vehicle detected, to avoid keeping the camera sensor node working all the time. Propose a robust vehicle image processing algorithm with low computational complexity, including the vehicle image segmentation and physical feature extraction.
2014-11-01
    We present a stereo algorithm designed for speed and efficiency that uses local slanted plane sweeps to propose disparity hypotheses for a semi-global matching algorithm. Our local plane hypotheses are derived from initial sparse feature correspondences followed by an iterative clustering step. Local plane sweeps are then performed around each slanted plane to produce out-of-plane parallax and matching-cost estimates. A final global optimization stage, implemented using semi-global matching, assigns each pixel to one of the local plane hypotheses. By only exploring a small fraction of the whole disparity space volume, our technique achieves significant speedups over previous algorithms and achieves state-of-the-art accuracy on high-resolution stereo pairs of up to 19 megapixels.
2014-10-18
2014-10-11
    Recent years have seen greater interest in the use of discriminative classi ers in tracking systems, owing to their success in object detection. They are trained online with samples collected during tracking. Unfortunately, the potentially large number of samples becomes a computational burden, which directly con icts with real-time requirements. On the other hand, limiting the samples may sacri ce performance. Interestingly, we observed that, as we add more and more samples, the problem acquires circulant structure. Using the well-established theory of Circulant matrices, we provide a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform. This can be done in the dual space of kernel machines as fast as with linear classi ers. We derive closed-form solutions for training and detection with several types of kernels, including the popular Gaussian and polynomial kernels. The resulting tracker achieves performance competitive with the state-of-the-art, can be implemented with only a few lines of code and runs at hundreds of frames-per-second. MATLAB code is provided in the paper.
2014-10-04
    Automatic pedestrian detection for advanced driver assistance systems (ADASs) is still a challenging task. Major reasons are dynamic and complex backgrounds in street scenes and variations in clothing or postures of pedestrians. We propose a simple yet effective detector for robust pedestrian detection. Observing that pedestrians usually appear upright in video data, we employ a statistical model of the upright human body in which the head, upper body, and lower body are treated as three distinct components. Our main contribution is to systematically design a pool of rectangular features that are tailored to this shape model. As we incorporate different kinds of low-level measurements, the resulting multimodal and multichannel Haar-like features represent characteristic differences between parts of the human body but are robust against variations in clothing or environmental settings. Our approach avoids exhaustive searches over all possible configurations of rectangular features nor does it rely on random sampling. It thus marks a middle ground among recently published techniques and yields efficient low-dimensional yet highly discriminative features. Experimental results on the well-established INRIA, Caltech, and KITTI pedestrian data sets show that our detector reaches state-of-the-art performance at low computational costs and that our features are robust against occlusions.
2014-09-27
    Most of the existing traffic sign recognition (TSR) systems make use of the inner region of the signs or the local features such as Haar, histograms of oriented gradients (HOG), and scale-invariant feature transform for recognition, whereas these features are still limited to deal with the rotation, illumination, and scale variations situations. A good feature of a traffic sign is desired to be discriminative and robust. In this paper, a novel Color Global and Local Oriented Edge Magnitude Pattern (Color Global LOEMP) is proposed. The Color Global LOEMP is a framework that is able to effectively combine color, global spatial structure, global direction structure, and local shape information and balance the two concerns of distinctiveness and robustness. The contributions of this paper are as follows: 1) color angular patterns are proposed to provide the color distinguishing information; 2) a context frame is established to provide global spatial information, due to the fact that the context frame is established by the shape of the traffic sign, thus allowing the cells to be aligned well with the inside part of the traffic sign even when rotation and scale variations occur; and 3) a LOEMP is proposed to represent each cell. In each cell, the distribution of the orientation patterns is described by the HOG feature, and then, each direction of HOG is represented in detail by the occurrence of local binary pattern histogram in this direction. Experiments are performed to validate the effectiveness of the proposed approach with TSR systems, and the experimental results are satisfying, even for images containing traffic signs that have been rotated, damaged, altered in color, or undergone affine transformations or images that were photographed under different weather or illumination conditions.
2014-09-20
    In this paper, we present a novel object detection approach that is capable of regressing the aspect ratio of objects. This results in accurately predicted bounding boxes having high overlap with the ground truth. In contrast to most recent works, we employ a Random Forest for learning a template-based model but exploit the nature of this learning algorithm to predict arbitrary output spaces. In this way, we can simultaneously predict the object probability of a window in a sliding window approach as well as regress its aspect ratio with a single model. Furthermore, we also exploit the additional information of the aspect ratio during the training of the Joint Classification-Regression Random Forest, resulting in better detection models. Our experiments demonstrate several benefits: (i) Our approach gives competitive results on standard detection benchmarks. (ii) The additional aspect ratio regression delivers more accurate bounding boxes than standard object detection approaches in terms of overlap with ground truth, especially when tightening the evaluation criterion. (iii) The detector itself becomes better by only including the aspect ratio information during training.
2014-09-13
    This paper presents a robust and efficient text detection algorithm for news video. The proposed algorithm uses the temporal information of video and logical AND operation to remove most of irrelevant background. Then a window-based method by counting the black-and-white transitions is applied on the resulted edge map to obtain rough text blobs. Line deletion technique is used twice to refine the text blocks. The proposed algorithm is applicable to multiple languages (English, Japanese and Chinese), robust to text polarities (positive or negative), various character sizes (from 4?7 to 30?30), and text alignments (horizontal or vertical).Three metrics, recall (R), precision (P), and quality of bounding preciseness (Q), are adopted to measure the efficacy of text detection algorithms. According to the experimental results on various multilingual video sequences, the proposed algorithm has a 96% and above performance in all three metrics. Comparing to existing methods, our method has better performance especially in the quality of bounding preciseness that is crucial to later binarization process.
2014-09-06
    Computer-aided sports analysis is demanded by coaches and the media. Image processing and machine learning techniques that allow for ??live?? recognition nd tracking of players exist. But these methods are far from collecting and nalyzing event data fully autonomously. To generate accurate results, human nteraction is required at different stages including system setup, calibration, supervision of classifier training, and resolution of tracking conflicts. urthermore, the real-time constraints are challenging: in contrast to other bject recognition and tracking applications, we cannot treat data collection, nnotation, and learning as an offline task. A semi-automatic labeling of training ata and robust learning given few examples from unbalanced classes are required. e present a realtime system acquiring and analyzing video sequences from soccer atches. It estimates each player?s position throughout the whole match in real- ime. Performance measures derived from these raw data allow for an objective evaluation of physical and tactical profiles of teams and individuals. The need or precise object recognition, the restricted working environment, and the echnical limitations of a mobile setup are taken into account. Our contribution s twofold: (1) the deliberate use of machine learning and pattern recognition echniques allows us to achieve high classification accuracy in varying nvironments. We systematically evaluate combinations of image features and earning machines in the given online scenario. Switching between classifiers epending on the amount of training data and available training time improves obustness and efficiency. (2) A proper human? machine interface decreases the umber of required operators who are incorporated into the system?s learning process. Their main task reduces to the identification of players in uncertain ituations. Our experiments showed high performance in the classification task chieving an average error rate of 3 % on three real-world datasets. The system as proved to collect accurate tracking statistics throughout different soccer atches in real-time by incorporating two human operators only. We finally show ow the resulting data can be used instantly for consumer applications and discuss urther development in the context of behavior analysis.
2014-09-13
    In this paper, we present a novel object detection approach that is capable of regressing the aspect ratio of objects. This results in accurately predicted bounding boxes having high overlap with the ground truth. In contrast to most recent works, we employ a Random Forest for learning a template-based model but exploit the nature of this learning algorithm to predict arbitrary output spaces. In this way, we can simultaneously predict the object probability of a window in a sliding window approach as well as regress its aspect ratio with a single model. Furthermore, we also exploit the additional information of the aspect ratio during the training of the Joint Classification-Regression Random Forest, resulting in better detection models. Our experiments demonstrate several benefits: (i) Our approach gives competitive results on standard detection benchmarks. (ii) The additional aspect ratio regression delivers more accurate bounding boxes than standard object detection approaches in terms of overlap with ground truth, especially when tightening the evaluation criterion. (iii) The detector itself becomes better by only including the aspect ratio information during training.
2014-08-09
    Recent years have seen greater interest in the use of discriminative classi ers in tracking systems, owing to their success in object detection. They are trained online with samples collected during tracking. Unfortunately, the potentially large number of samples becomes a computational burden, which directly con icts with real-time requirements. On the other hand, limiting the samples may sacri ce performance. Interestingly, we observed that, as we add more and more samples, the problem acquires circulant structure. Using the well-established theory of Circulant matrices, we provide a link to Fourier analysis that opens up the possibility of extremely fast learning and detection with the Fast Fourier Transform. This can be done in the dual space of kernel machines as fast as with linear classi ers. We derive closed-form solutions for training and detection with several types of kernels, including the popular Gaussian and polynomial kernels. The resulting tracker achieves performance competitive with the state-of-the-art, can be implemented with only a few lines of code and runs at hundreds of frames-per-second. MATLAB code is provided in the paper (see Algorithm 1).
2014-08-02
    Traffic sign detection and recognition has been thoroughly studied for a long time. However, traffic panel detection and recognition still remains a challenge in computer vision due to its different types and the huge variability of the information depicted in them. This paper presents a method to detect traffic panels in street-level images and to recognize the information contained on them, as an application to intelligent transportation systems (ITS). The main purpose can be to make an automatic inventory of the traffic panels located in a road to support road maintenance and to assist drivers. Our proposal extracts local descriptors at some interest keypoints after applying blue and white color segmentation. Then, images are represented as a ?bag of visual words? and classified using Na?ve Bayes or support vector machines. This visual appearance categorization method is a new approach for traffic panel detection in the state of the art. Finally, our own text detection and recognition method is applied on those images where a traffic panel has been detected, in order to automatically read and save the information depicted in the panels.We propose a language model partly based on a dynamic dictionary for a limited geographical area using a reverse geocoding service. Experimental results on real images from Google Street View prove the efficiency of the proposed method and give way to using street-level images for different applications on ITS.
2014-05-31
    -
2014-07-26
    This paper proposes a novel method for tracking failure detection. The detection is based on the Forward-Backward error, i.e. the tracking is performed forward and backward in time and the discrepancies between these two trajectories are measured. We demonstrate that the proposed error enables reliable detection of tracking failures and selection of reliable trajectories in video sequences. We demonstrate that the approach is complementary to commonly used normalized cross-correlation (NCC). Based on the error, we propose a novel object tracker called Median Flow. State-of-the-art performance is achieved on challenging benchmark video sequences which include non-rigid objects.
2014-07-19
    We describe a state-of-the-art system for finding objects in cluttered images. Our system is based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts. We reduce object detection to classification with latent variables. The latent variables introduce invariances that make it possible to detect objects with highly variable appearance. We use a generalization of support vector machines to incorporate latent information during training. This has led to a general framework for discriminative training of classifiers with latent variables. Discriminative training benefits from large training datasets. In practice we use an terative algorithm that alternates between estimating latent values for positive examples and solving a large convex optimization problem. Practical optimization of this large convex problem can be done using active set techniques for adaptive subsampling of the training data.
2014-07-12
    Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.
2014-07-12
    We describe a state-of-the-art system for finding objects in cluttered images. Our system is based on deformable models that represent objects using local part templates and geometric constraints on the locations of parts. We reduce object detection to classification with latent variables. The latent variables introduce invariances that make it possible to detect objects with highly variable appearance. We use a generalization of support vector machines to incorporate latent information during training. This has led to a general framework for discriminative training of classifiers with latent variables. Discriminative training benefits from large training datasets. In practice we use an terative algorithm that alternates between estimating latent values for positive examples and solving a large convex optimization problem. Practical optimization of this large convex problem can be done using active set techniques for adaptive subsampling of the training data.
2014-06-28
    Catadioptric omnidirectional view sensors have found increasing adoption in various robotic and surveillance applications due to their 360? field of view. However, the inherent distortion caused by the sensors prevents their direct utilisations using existing image processing techniques developed for perspective images. Therefore, a correction processing known as ??unwrapping?? is commonly performed. However, the unwrapping process incursadditional computational loads on central processing units. In this paper, a method to reduce this burden in the computation is investigated by exploiting the parallelism of graphical processing units (GPUs) based on the Compute Unified Device Architecture (CUDA). More specifically, we first introduce a general approach of parallelisation to the said process. Then, a series of adaptations to the CUDA platform is proposed to enable an optimised usage of the hardware platform. Finally, the performances of the unwrapping function were evaluated on a high-end and low-end GPU to demonstrate the effectiveness of the parallelisation approach.
2014-06-21
    Abstract?Detecting text and caption from videos is important and in great demand for video retrieval, annotation, indexing, and content analysis. In this paper, we present a corner based approach to detect text and caption from videos. This approach is inspired by the observation that there exist dense and orderly presences of corner points in characters, especially in text and caption. We use several discriminative features to describe the text regions formed by the corner points. The usage of these features is in a flexible manner, thus, can be adapted to different applications. Language independence is an important advantage of the proposed method. Moreover, based upon the text features, we further develop a novel algorithm to detect moving captions in videos. In the algorithm, the motion features, extracted by optical flow, are combined with text features to detect the moving caption patterns. The decision tree is adopted to learn the classification criteria. Experiments conducted on a large volume of real video shots demonstrate the efficiency and robustness of our proposed approaches and the real-world system. Our text and caption detection system was recently highlighted in a worldwide multimedia retrieval competition, Star Challenge, by achieving the superior performance with the top ranking.
2014-06-07
    Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We observe that generic objects with well- defined closed boundary can be discriminated by looking at the norm of gradients, with a suitable resizing of their cor- responding image windows in to a small fixed size. Based on this observation and computational reasons, we propose to resize the window to 8x8 and use the norm of the gra- dients as a simple 64D feature to describe it, for explicitly training a generic objectness measure. We further show how the binarized version of this fea- ture, namely binarized normed gradients (BING), can be used for efficient objectness estimation, which requires only a few atomic operations (e.g. ADD, BITWISE SHIFT, etc.). Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single lap- top CPU) generates a small set of category-independent, high quality object windows, yielding 96:2% object detec- tion rate (DR) with 1,000 proposals. Increasing the num- bers of proposals and color spaces for computing BING fea- tures, our performance can be further improved to 99:5% DR
2014-05-24
    In this paper, we study the problem of detecting sudden pedestrian crossings to assist drivers in avoiding accidents. This application has two major requirements: to detect crossing pedestrians as early as possible just as they enter the view of the car-mounted camera and to maintain a false alarm rate as low as possible for practical purposes. Although many current sliding-window-based approaches using various features and classification algorithms have been proposed for image-/video-based pedestrian detection, their performance in terms of accuracy and processing speed falls far short of practical application requirements. To address this problem, we propose a three-level coarseto-fine video-based framework that detects partially visible pedestrians just as they enter the camera view, with low false alarm rate and high speed. The framework is tested on a new collection of high-resolution videos captured from a moving vehicle and yields a performance better than that of state-of-the-art pedestrian detection while running at a frame rate of 55 fps. Index Terms—Coarse to fine, pedestrian detection, performance evaluation, spatiotemporal refinement, sudden pedestrian crossing
2014-05-17
    We consider the problem of detection and tracking of multiple people in crowded street scenes. State-ofthe-art methods perform well in scenes with relatively few people, but are severely challenged by scenes with many subjects that partially occlude each other. This limitation is due to the fact that current people detectors fail when persons are strongly occluded. We observe that typical occlusions are due to overlaps between people and propose a people detector tailored to various occlusion levels. Instead of treating partial occlusions as distractions, we leverage the fact that person/person occlusions result in very characteristic appearance patterns that can help to improve detection results. We demonstrate the performance of our occlusion-aware person detector on a new dataset of people with controlled but severe levels of occlusion and on two challenging publicly available benchmarks outperforming single person detectors in each case.
2014-05-10
    Background subtraction has been widely investigated in recent years. Most previous work has focused on stationary cameras. Recently, moving cameras have also been studied since videos from mobile devices have increased significantly. In this paper, we propose a unified and robust framework to effectively handle diverse types of videos, e.g., videos from stationary or moving cameras. Our model is inspired by two observations: 1) background motion caused by orthographic cameras lies in a low rank subspace, and 2) pixels belonging to one trajectory tend to group together. Based on these two observations, we introduce a new model using both low rank and group sparsity constraints. It is able to robustly decompose a motion trajectory matrix into foreground and background ones. After obtaining foreground and background trajectories, the information gathered on them is used to build a statistical model to further label frames at the pixel level. Extensive experiments demonstrate very competitive performance on both synthetic data and real videos.
2014-05-03
    Since the initial comparison of Seitz et al. [48], the accuracy of dense multiview stereovision methods has been increasing steadily. A number of limitations, however, make most of these methods not suitable to outdoor scenes taken under uncontrolled imaging conditions. The present work consists of a complete dense multiview stereo pipeline which circumvents these limitations, being able to handle large-scale scenes without sacrificing accuracy. Highly detailed reconstructions are produced within very reasonable time thanks to two key stages in our pipeline: a minimum s-t cut optimization over an adaptive domain that robustly and efficiently filters a quasidense point cloud from outliers and reconstructs an initial surface by integrating visibility constraints, followed by a mesh-based variational refinement that captures small details, smartly handling photo-consistency, regularization, and adaptive resolution. The pipeline has been tested over a wide range of scenes: from classic compact objects taken in a laboratory setting, to outdoor architectural scenes, landscapes, and cultural heritage sites. The accuracy of its reconstructions has also been measured on the dense multiview benchmark proposed by Strecha et al. [59], showing the results to compare more than favorably with the current state-of-the-art methods.
2014-04-19
    This paper develops a theoretical model for the formation of transparent overlays and proposes a temporal algorithm to detect them independent of their degree of transparency. The proposed algorithm exploits our novel observation that the appearance of a transparent overlay results in a proportionally constant decrease in the intensity variance. In order to detect transparent regions, we first compute intensity variances about each pixel. After that, the ratios of the variances between the pixels of the consecutive frames are computed to form variance ratio images. Because the degree of transparency is unknown and may vary, we generate binary images by thresholding variance ratio images for every possible fine interval of the degree of transparency. Various morphological, textural, and contextual information are applied to every candidate binary image to detect spatial location of transparent overlays. We can also accurately detect the color and the degree of transparency of the transparent overlay so that we can remove the transparency or apply user-specific enhancement operations. We also demonstrate the application of the algorithm to video indexing and retrieval.
2014-04-12
    The efficiency and quality of a feature descriptor are critical to the user experience of many computer vision applications. However, the existing descriptors are either too computationally expensive to achieve real-time performance, or not sufficiently distinctive to identify correct matches from a large database with various transformations. In this paper, we propose a highly efficient and distinctive binary descriptor, called local difference binary (LDB). LDB directly computes a binary string for an image patch using simple intensity and gradient difference tests on pairwise grid cells within the patch. A multiple-gridding strategy and a salient bit-selection method are applied to capture the distinct patterns of the patch at different spatial granularities. Experimental results demonstrate that compared to the existing state-of-the-art binary descriptors, primarily designed for speed, LDB has similar construction efficiency, while achieving a greater accuracy and faster speed for mobile object recognition and tracking tasks.
2014-03-29
    Multioriented text detection in video frames is not as easy as detection of captions or graphics or overlaid texts, which usually appears in the horizontal direction and has high contrast compared to its background. Multioriented text generally refers to scene text that makes text detection more challenging and interesting due to unfavorable characteristics of scene text. Therefore, conventional text detection methods may not give good results for multioriented scene text detection. Hence, in this paper, we present a new enhancement method that includes the product of Laplacian and Sobel operations to enhance text pixels in videos. To classify true text pixels, we propose a Bayesian classifier without assuming a priori probability about the input frame but estimating it based on three probable matrices. Three different ways of clustering are performed on the output of the enhancement method to obtain the three probable matrices. Text candidates are obtained by intersecting the output of the Bayesian classifier with the Canny edge map of the input frame. A boundary growing method is introduced to traverse the multioriented scene text lines using text candidates. The boundary growing method works based on the concept of nearest neighbors. The robustness of the method has been tested on a variety of datasets that include our own created data (nonhorizontal and horizontal text data) and two publicly available data, namely, video frames of Hua and complex scene text data of ICDAR 2003 competition (camera images). Experimental results show that the performance of the proposed method is encouraging compared with results of existing methods in terms of recall, precision, F-measures, and computational times.
2014-03-22
    Part-based models have demonstrated their merit in object detection. However, there is a key issue to be solved on how to integrate the inaccurate scores of part detectors when there are occlusions or large deformations. To handle the imperfectness of part detectors, this paper presents a probabilistic pedestrian detection framework. In this framework, a deformable part-based model is used to obtain the scores of part detectors and the visibilities of parts are modeled as hidden variables. Unlike previous occlusion handling approaches that assume independence among visibility probabilities of parts or manually define rules for the visibility relationship, a discriminative deep model is used in this paper for learning the visibility relationship among overlapping parts at multiple layers. Experimental results on three public datasets (Caltech, ETH and Daimler) and a new CUHK occlusion dataset1 specially designed for the evaluation of occlusion handling approaches show the effectiveness of the proposed approach.
2014-03-15
    The automatic extraction of line-networks from images is a well-known computer vision issue. Appearance and shape considerations have been deeply explored in the liter-ature to improve accuracy in presence of occlusions, shad-ows, and a wide variety of irrelevant objects. However most existing works have ignored the structural aspect of the problem. We present an original method which pro-vides structurally-coherent solutions. Contrary to the pixel-based and object-based methods, our result is a graph in which each node represents either a connection or an end-ing in the line-network. Based on stochastic geometry, we develop a new family of point processes consisting in sam-pling junction-points in the input image by using a Monte Carlo mechanism. The quality of a configuration is mea-sured by a probability density which takes into account both image consistency and shape priors. Our experiments on a variety of problems illustrate the potential of our approach in terms of accuracy, flexibility and efficiency.
2014-03-08
    There are few fully automated methods for liver segmentation in magnetic resonance images (MRI) despite the benefits of this type of acquisition in comparison to other radiology techniques such as computed tomography (CT). Motivated by medical requirements, liver segmentation in MRI has been carried out. For this purpose, we present a new method for liver segmentation based on the watershed transform and stochastic partitions. The classical watershed over-segmentation is reduced using a marker-controlled algorithm. To improve accuracy of selected contours, the gradient of the original image is successfully enhanced by applying a new variant of stochastic watershed. Moreover, a final classifier is performed in order to obtain the final liver mask. Optimal parameters of the method are tuned using a training dataset and then they are applied to the rest of studies (17 datasets).The obtained results (a Jaccard coefficient of 0.91 ± 0.02) in comparison to other methodsdemonstrate that the new variant of stochastic watershed is a robust tool for automaticsegmentation of the liver in MRI.
2014-03-01
    In this paper, we propose a depth-map merging based multiple view stereo method for large-scale scenes which takes both accuracyandefficiencyinto account. In the proposed method, an efficient patch-based stereo matching process is used to generate depth-map at each image with acceptable errors, followed by a depth-map refinement process to enforce consistency over neighboring views. Compared to state-of-the-art methods, the proposed method can reconstruct quite accurate and dense point clouds with high computational efficiency. Besides, the proposed method could be easily parallelized at image level, i.e., each depth-map is computed individually, which makes it suitable for large-scale scene reconstruction with high resolution images. The accuracy and efficiency of the proposed method are evaluated quantitatively on benchmark data and qualitatively on large data sets.
2014-02-08
    This work introduces a novel descriptor called Binary Robust Appearance and Normals Descriptor (BRAND), that efficiently combines appearance and geometric shape information from RGB-D images, and is largely invariant to rotation and scale transform. The proposed approach encodes point information as a binary string providing a descriptor that is suitable for applications that demand speed performance and low memory consumption. Results of several experiments demonstrate that as far as precision and robustness are concerned, BRAND achieves improved results when compared to state of the art descriptors based on texture, geometry and combination of both information. We also demonstrate that our descriptor is robust and provides reliable results in a registration task even when a sparsely textured and poorly illuminated scene is used.
2014-01-25
    There is a growing body of work addressing the problem of localizing printed text regions occurring in natural scenes, all of it focused on images in which the text to be localized is resolved clearly enough to be read by OCR. This paper introduces an alternative approach to text localization based on the fact that it is often useful to localize text that is identifiable as text but too blurry or small to be read, for two reasons. First, an image can be decimated and processed at a coarser resolution than usual, resulting in faster localization before OCR is performed (at full resolution, if needed). Second, in real-time applications such as a cell phone app to find and read text, text may initially be acquired from a lower-resolution video image in which it appears too small to be read; once the text’s presence and location have been established, a higher-resolution image can be taken in order to resolve the text clearly enough to read it. We demonstrate proof of concept of this approach by describing a novel algorithm for binarizing the image and extracting candidate text features, called “blobs,” and grouping and classifying the blobs into text and non-text categories. Experimental results are shown on a variety of images in which the text is resolved too poorly to be clearly read, but is still identifiable by our algorithm as text.
2014-01-18
    In robotics, vertical lines have been always very useful for autonomous robot localization and navigation in structured environments. This paper presents a robust method for matching vertical lines in omnidirectional images. Matching robustness is achieved by creating a descriptor which is very distinctive and is invariant to rotation and slight changes of illumination. We characterize the performance of the descriptor on a large image dataset by taking into account the sensitiveness to the different parameters of the descriptor. The robustness of the approach is also validated through a real navigation experiment with a mobile robot equipped with an omnidirectional camera.
2014-01-11
    A full-automatic method for recognizing parking slot markings is proposed. The proposed method recognizes various types of parking slot markings by modeling them as a hierarchical tree structure. This method mainly consists of two processes: bottom-up and top-down. First, the bottom-up process climbs up the hierarchical tree structure to excessively generate parking slot candidates so as not to lose the correct slots. This process includes corner detection, junction and slot generation, and type selection procedures. After that, the top-down process confirms the final parking slots by eliminating falsely generated slots, junctions, and corners based on the properties of the parking slot marking type by climbing down the hierarchical tree structure. The proposed method was evaluated in 608 real-world parking situations encompassing a variety of different parking slot markings. The experimental result reveals that the proposed method outperforms the previous semiautomatic method while requiring a small amount of computational costs even though it is fully automatic.
2014-01-04
    We present a system able to predict the future behavior of the ego-vehicle in an inner-city environment. Our system learns the mapping between the current perceived scene (information about the ego-vehicle and the preceding vehicle,as well as information about the possible traffic lights) and the future driving behavior of the ego-vehicle. We improve the prediction accuracy by estimating the prediction confidence and by discarding unconfident samples. The behavior of the driver is represented as a sequence of elementary states termed behavior primitives. These behavior primitives are abstractions from the raw actuator states. Behavior prediction is therefore considered to be a multi-class learning problem. In this contribution, we explore the possibilities of situation-specific earning. We show that decomposing the perceived complex situation into a combination of simpler ones, each of them with a dedicated prediction, allows the system to reach a performance equivalent to a system without situation-specificity. We believe that this is advantageous for the scalability of the approach to the number of possible situations that the driver will encounter. the system is tested on a real world scenario,using streams recorded in inner-city scenes. The prediction is evaluated for a prediction horizon of 3s into the future, and the quality of the prediction is measured using established evaluation methods.
News | About us | Research | Lectures