|Select year: 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
Seminars in 2012
Scorebox plays an important role in understanding contents of sports videos. However, the tiny scorebox may give
the small-display-viewers uncomfortable experience in grasping the game situation. In this paper, we propose a novel
framework to extract the scorebox from sports video frames. We first extract candidates by using accumulated intensity
and edge information after short learning period. Since there are various types of scoreboxes inserted in sports videos,
multiple attributes need to be used for efficient extraction. Based on those attributes, the optimal information gain is
computed and top three ranked attributes in terms of information gain are selected as a three-dimensional feature vector
for Support Vector Machines (SVM) to distinguish the scorebox from other candidates, such as logos and advertisement
boards. The proposed method is tested on various videos of sports games and experimental results show the efficiency
and robustness of our proposed method.
Abstract-Automatic face recognition is one of the most challenging tasks in fields of computer vision and pattern recognition, and face detection is the first critical step in full automatic face recognition system. The skin-color feature is an effective feature, but this feature is interfered easily. This paper proposes a method of face detection from a picture based on an improved skin-color model. Firstly, use an improved “reference white” method to remove the
interference of non-skin-color region; and then design colorclassifier based on statistic large number of skin-color pixels and detect each pixel in color picture is skin-color or nonskin-color through the color-classifier; finally, detect face on the candidate regions and remove the non-face regions, and then locate the face regions. Experimental results show that the algorithm can effectively detect face with skin-color interference under complex background.
Keywords-Face detection; Skin-color model; Skin-color classifier; Reference white; Non-face regions
This paper introduces a novel framework for estimating the motion of a robotic car from image information,a scenario widely known as visual odometry. Most current monocular visual odometry algorithms rely on a calibrated camera model and recover relative rotation and translation by tracking image features and applying geometrical constraints. This approach has some drawbacks: translation is recovered up to a scale, it requires camera calibration which can be tricky under certain conditions, and uncertainty estimates are not directly obtained. We propose an alternative approach that involves the use of semi-parametric tatistical models as means to recover scale, infer camera parameters and provide uncertainty estimates given a training dataset. As opposed to conventional non-parametric machine learning procedures, where standard models for egomotion would be neglected, we present a novel framework in which the existing parametric models and powerful non-parametric Bayesian learning proce-dures are combined. We devise a multiple output Gaussian Process (GP)procedure, named Coupled GP, that uses a parametric model as the mean function and a non-stationary covariance function to map image features directly into vehicle motion. Additionally, this procedure is also able to infer joint uncertainty estimates (full covariance matrices) for rotation and translation. Experiments performed using data collected from a single camera under challenging conditions show that this technique outperforms traditional methods in trajectories of several kilometers.
In this paper, we present a method for extracting consistent foreground regions when multiple views of a scene are available. We propose a framework that automatically identifies such regions in images under the assumption that, in each image, background and foreground regions present different color properties. To achieve this task, monocular color information is not sufficient and we exploit the spatial consistency constraint that several image projections of the same space region must satisfy. Combining the monocular color consistency constraint with multiview spatial constraints allows us to automatically and simultaneously segment the foreground and background regions in multiview images. In contrast to standard background subtraction methods, the proposed approach does not require a priori knowledge of the background nor user interaction. Experimental results under realistic scenarios demonstrate the effectiveness of the method for multiple camera set ups.
In this paper we present a method for 3D urban reconstruction from a single catadioptric omnidirectional image. Firstly, we classify the catadioptric omnidirectional image to horizontal ground, vertical building surface and vertical background surface through the registration between catadioptric omnidirectional image and remote sensing image. According to the classification results, we recover the geometry based on the catadioptric projection model. The experiment shows that our method is feasible and realizes a precise 3D reconstruction for the city scenes.
Due to the limitation of dynamic range, a single still image is usually insufficient to describe a high contrast scene. Fusing multi-exposure images of the same scene can produce a resulting image with details both in bright and dark regions. However, they may be sensitive to the exposure parameters of the input images. To improve the robustness of the method, a novel layered-based exposure fusion algorithm is proposed in this paper. In our algorithm, a global-layer is introduced to improve the robustness of the fusion method. The global-layer is employed to preserve the overall luminance of a real scene and avoid possible luminance reversion. Then details are recovered in gradient domain by a Poisson solver. Experimental results show the superior performance of our approach in terms of robustness and color consistency.
The last generation of consumer electronic devices is endowed with Augmented Reality (AR) tools. These tools require moving object detection strategies, which should be fast and efficient, to carry out higher level object analysis tasks. We propose a lightweight spatio-temporal-based non-parametric background-foreground modeling strategy in a General Purpose Graphics Processing Unit (GPGPU), which provides real-time high-quality results in a great
variety of scenarios and is suitable for AR applications.
This paper presents two new, efficient solutions to the two-view,relative pose problem from three image point correspondences and one common reference direction. This three-plus-one problem can be used either as a substitute for the classic five-point algorithm, using a vanishing point for the reference direction, or to make use of an inertial measurement unit commonly available on
robots and mobile devices where the gravity vector becomes the reference direction. We provide a simple, closed-form solution and a solution based on algebraic geometry which offers numerical advantages. In addition, we introduce a new method for computing visual odometry with RANSAC and four point correspondences per hypothesis. In a set of real experiments, we demonstrate the power of our approach by comparing it to the five-point method in a hypothesize and-test visual odometry setting.
Occlusion and lack of visibility in crowded and cluttered scenes make it difficult to track individual people correctly and consistently, particularly in a single view. We present a multiview approach to solve this problem. In our approach, we neither detect nor track objects from any single camera or camera pair; rather, evidence is gathered from all of the cameras into a synergistic framework and detection and tracking results are propagated back to each view. Unlike other multiview approaches that require fully calibrated views, our approach is purely image-based and uses only 2D constructs. To this end, we develop a planar homographic occupancy constraint that fuses foreground likelihood information from multiple views to resolve occlusions and localize people on a reference scene plane. For greater robustness, this process is extended to multiple planes parallel to the reference plane in the framework of plane to plane homologies. Our fusion methodology also models scene clutter using the Schmieder and Weathersby clutter measure, which acts as a confidence prior, to assign higher fusion weight to views with lesser clutter. Detection and tracking are
performed simultaneously by graph cuts segmentation of tracks in the space-time occupancy likelihood data. Experimental results with detailed qualitative and quantitative analysis are demonstrated in challenging multiview crowded scenes.
In this paper, we propose Hangul grapheme segmentation method by structural approach, which is developed on the machine printed characters with the widely known fonts such as Myunjo, Gulim and Gothic and applied to much more deformed fonts. The process is composed of two steps. One is a structural grapheme segmentation to the characters classified into 20 types, more a reasonable classification than 6 types in that the algorithm of the grapheme segmentation can be simpler and more effective by the intensified common features of 20 types. Furthermore, it is quite easy for 20 type classified characters to be postprocessed using the connected components separated by the boundary information. With the proposed method, we got 99% correct segmentation rate with very high execution speed
Abstract. Most commercial television channels use video
logos, which can be considered a form of visible watermark,
as a declaration of intellectual property ownership. They are
also used as a symbol of authorization to rebroadcast when
original logos are used in conjunction with newer logos. An
unfortunate side effect of such logos is the concomitant decrease
in viewing pleasure. In this paper, we use the temporal
correlation of video frames to detect and remove video
logos. In the video-logo-detection part, as an initial step, the
logo boundary box is first located by using a distance threshold
of video frames and is further refined by employing a
comparison of edge lengths. Second, our proposed Bayesian
classifier framework locates fragments of logos called logolets.
In this framework, we systematically integrate the prior
knowledge about the location of the video logos and their
intrinsic local features to achieve a robust detection result.
In our logo-removal part, after the logo region is marked,
a matching technique is used to find the best replacement
patch for the marked region within that video shot. This
technique is found to be useful for small logos. Furthermore,
we extend the image inpainting technique to videos. Unlike
the use of 2D gradients in the image inpainting technique,
we inpaint the logo region of video frames by using 3D gradients
exploiting the temporal correlations in video. The advantage
of this algorithm is that the inpainted regions are
consistent with the surrounding texture and hence the result
is perceptually pleasing. We present the results of our implementation
and demonstrate the utility of our method for
Face Detection is image processing of determining the face location, size and number. Meantime, Face Detection is the premise of face recognition, human-computer interaction and so on. This paper presents a new Face Detection method, which firstly clusters skin-color model in YCbCr chrominance space with the templates collected, then locates candidate face areas through the given skin-color model. After the normalization of the candidate face areas, a calculation of Hausdorff distance is performed between the given template and the candidates.
Finally according to the length of the distance, whether the given area is face or not is determined. Plentiful experiments indicate that this method possesses high accuracy.
Keywords: skin-color clustering, template matching, face detection
We propose bridge routing based on network
coding for wireless mesh network. Our bridge routing offers
the solution to exploit the network coding to minimize the
usage of time slot. We present feasible and practical ways
to study the performance of routing with network coding,
as compared to the conventional shortest path algorithms.
Bridge routing consists of two procedures, node coordination
procedure which builds bridge and routing procedure, and it
works in a decentralized way. Simulation results show that
our bridge routing is more efficient than the shortest path
algorithm, which its performance depends on the network
Vision sensors give mobile robots a relatively
cheap means of obtaining rich 3D information of their environment, but lack the depth information that a laser range ﬁnder
can provide. This paper describes a novel composite sensor
approach that combines the information given by an omnidirectional camera and a laser range ﬁnder to efﬁciently solve the
indoor Simultaneous Localization and Mapping problem and
reconstruct a 3D representation of the environment. We report
the results of validating our methodology using a mobile robot
equipped with a 2D laser range ﬁnder and an omnidirectional
Text detection in natural images has gained much
attention in the last years as it is a primary step towards fully
autonomous text recognition. Understanding the visual text
content is of a vital importance in many applicative areas from
the internet search engines to the PDA signboard translators.
Images of natural scenes, however, pose numerous difficulties
compared to the traditional scanned documents. They mainly
contain diverse complex text of different sizes, styles and colors
with complex backgrounds. Furthermore, such images are
captured under variable lighting conditions and are often
affected by the skew distortion and perspective projections. In
this article an improved edge profile based text detection method
is presented. It uses a set of heuristic rules to eliminate detection
of non-text areas. The method is evaluated on CVL OCR DB, an
annotated image database of text in natural scenes.
Abstract: The authors propose a vision-based automatic system to detect preceding vehicles on the highway under various lighting and different weather conditions. To adapt to different characteristics of vehicle appearance under various lighting conditions, four
cues including underneath shadow, vertical edge, symmetry and taillight are fused for the vehicle detection. The authors achieve this goal by generating probability distribution of vehicle under particle filter framework through the processes of initial sampling, propagation, observation, cue fusion and evaluation. Unlike normal particle filter focusing on single target distribution in a state space, the authors detect multiple vehicles with a single particle filter through a high-level tracking strategy using clustering. In addition, the data-driven initial sampling technique helps the system detect new objects and prevent the multi-modal distribution from collapsing to the local maxima. Experiments demonstrate the effectiveness of the proposed system.
We integrate the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variable-size blocks that capture salient features of humans automatically. Us-ing AdaBoost for feature selection, we identify the appro-priate set of blocks, from a large set of possible blocks. In our system, we use the integral image representation and a rejection cascade which significantly speed up the compu-tation. For a 320 × 280 image, the system can process 5 to 30 frames per second depending on the density in which we scan the image, while maintaining an accuracy level similar to existing methods.
We propose a novel localization method for outdoor mobile robots using High Dynamic Range (HDR) vision technology. To obtain an HDR image, multiple images at different exposures is typically captured and combined. However, since mobile robots can be moving during a capture sequence, images cannot be fused easily. Instead, we generate a set of keypoints that incorporates those detected in each image. The position of the robot is estimated using the keypoint sets to match measured positions with a map. We conducted experimental comparisons of HDR and auto-exposure images, and our HDR method showed higher robustness and localization
Visual surveillance using multiple cameras has attracted increasing interest in recent years. Correspondence between multiple cameras is one of the most important and basic problems which visual surveillance using multiple cameras brings. In this paper, we propose a simple and robust method, based on principal axes of people, to match people across multiple cameras. The correspondence likelihood reflecting the similarity of pairs of principal axes of people is constructed according to the relationship between ¡±ground-points¡± of people detected in each camera view and the intersections of principal axes
detected in different camera views and transformed to the same view. Our method has the following desirable properties: 1) Camera calibration is not needed. 2) Accurate motion detection and segmentation are less critical due to the robustness of the principal axis-based feature to noise. 3) Based on the fused data derived from correspondence results, positions of people in each camera view can be accurately located even when the people are partially occluded in all views. The experimental results on several real video sequences from outdoor environments have demonstrated the effectiveness, efficiency, and robustness of our method.
Most computer graphics pictures have been computed all
at once, so that the rendering program takes care of all
computations relating to the overlap of objects. There are
several applications, however, where elements must be
rendered separately, relying on eompositing techniques for
the anti-aliased accumulation of the full image. This paper
presents the case for four-channel pictures, demonstrating
that a matte component can be computed similarly to the
color channels. The paper discusses guidelines for the
generation of elements and the arithmetic for their arbitrary
Tilt license plate correction is an important part of the license plate recognition system. In reality, there are a lot of inclined license plates due to various reasons, such as the perspective distortion and uneven or curvy road surface. The usual rotation methods are often based only on one theory, which is difficult to use the advantages of different methods, and we do not know the rotation results are correct or not. We proposed a mutual correction method based on pairwise fitted parallel straight lines, which will provide much help in the credibility of verifying the result of line fitting method by measuring the parallelism of these two lines. If we find that this method fails, we can use another method to do tilt license plate correction or give up. The proposed method can provide reliable correction results, and utilize the advantages of different rotation algorithms. The experimental results show better results than only using one method.
his paper describes a face detection framework that is capable of processing images extremely rapidly while achieving high detection rates. There are three key contributions. The first is the introduction of a new image representation called the “Integral Image” which allows the features used by our detector to be computed very quickly. The second is a simple and efficient classifier which is built using the AdaBoost learning algorithm to select a small number of critical visual features from a very large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows background regions of the image to be quickly discarded while spending more computation on promising face-like regions. A set of experiments in the domain of face detection is presented. The system yields face detection perfor-mance comparable to the best previous systems (Sung and Poggio, 1998; Rowley et al., 1998; Schneiderman and Kanade, 2000; Roth et al., 2000). Implemented on a conventional desktop, face detection proceeds at 15 frames persecond.