文献翻译Mean Shift
英文翻译
分 院
专 业
届 别
学 号
姓 名
指导教师
2009年05月10日
<文献翻译一:原文>
NON-RIGIDOBJECT LOCALIZATION FROM COLOR MODEL USING
MEANSHIFT1
GaëlJaffré Alain Crouzil
Institutde Recherche en Informatique de Toulouse
UniversitéPaul Sabatier - 118 route de Narbonne - 31062 Toulouse Cedex 4 -France {jaffre,crouzil}@irit.fr
ABSTRACT
Thispaper deals with non-rigid object localization in an image, fromobject colors. Our method allows detection in an image of all theobjects which correspond to a color model, without a prioriinformation about their number. Our approach consists in creating abinary image, which represents the repartition of the most probablepixels to be part of the object. Considering this image as a cluster
-
in
R
2
, the object localization is done by finding all the cluster modes. This search is carried out by
applyinga statistical method: the mean shift procedure. To illustrate ourapproach, we use sport images, from which we try to detect all theplayers.
1. INTRODUCTION
Ourframework is sport image sequence analysis, especially playertracking. In this study, our topic search is non-rigid objectlocalization in an image, to automatically initialize trackingprocedures. This problem is often encountered in trackingapplications where search is local (tracking with snakes, mean shifttracking, …). The search being done in the neighborhood of theobject position in the previous frame, initialization is needed forthe first frame. Due to many difficulties, it is usually a manualinitialization [1, 2].
Ourproblem presents two main difficulties : the non-rigid and the 3Daspects of the objects. In order to solve both difficulties, we makeuse of color densities of the objects. In this paper, we assume thatobjects have a discriminant color, i.e. their color is characteristicof them. Thus, color density is robust to object non-rigidity,partial occlusions, camera zooming and camera position changing.
Amongpublications about object detection from their color, we wereinspired by [3] and [4]. In [3], Comaniciu proposes a face trackingapplication where, in the first frame, a face is iteratively detectedfrom multiple initializations. In [4], Vages, all the players aredetected by pixel classification. However, this approach needsvarious learning data, and no results with noisy models arepresented.
Weuse both method approaches. First, a classification of the pixels isdone: from an object model, a binary image is created, where eachpixel represents a belonging measure to the model (0 or 1). Thisimage represents the repartition of the most probable pixels to bepart of the searched object. An example is given in Fig. 1.b. Thisimage will be used in the sequel as a model to illustrate thedifferent examples.
-
Our approach consists in considering this binary image as a cluster in
R
2
: the task of object
1GaëlJaffré and Alain Crouzil. Non-rigid object localization from colormodel using mean shift[D]. IEEE. International Conference on ImageProcessing(ICIP 2003)
localizationreduces to the detection of local modes in the cluster. Each mode isassociated with an object, which corresponds to the model (a mode isa local density maximum).
-
(a) Original image
(b) Desired cluster
Fig.1. Exampleof cluster we would like to obtain from white player model. Blackpoints represent pixels with value 1, and white ones those with value0.
Inthe sequel, we will no longer use binary images, but weighted binaryimages, i.e. the belonging measure to the model will not be exactly 0or 1, but will be in the real interval [0, 1].
Thispaper is structured in four parts. First, we detail the statisticaltools used to estimate the cluster density and to search the clustermodes. Then, we discuss the used methods to create the cluster whichrepresents the most probable pixel repartition. Besides, we show howwe can extract the object coordinates from this cluster. In the lastpart, we present experiments of the applicationof our method on sportimages.
-
2.
STATISTICAL TOOLS
R
2
. In this section, we will
Our approach consists in considering binary images as clusters in
presentthe statistical methods used in this paper.
2.1.The Mean Shift Procedure
Themean shift is a nonparametric estimator of density gradient, proposedin 1975 by Fukunaga [5], in order to apply it for pattern recognitionproblems. However, it was really used only by Cheng [6] in 1995, thenby Comaniciu and Meer [7] since 1997.
-
Let
{
x i
} i
1 ...
m
be an arbitrary set of n points in
R
d
. The mean shift vector computed with
kernelKand kernel bandwidth his given by [7]
-
M
(
x
)
i1 n
x i
K
(
x
x i
h
)
x
h
i1 n
K
(
x
x i
h
Inthis paper, we only deal with mean shift procedure in case ofEpanechnikov kernel [5]. The mean shift vector definition becomes
-
M
(
x
)
1
x i
x
h
n
x
x i
S
h
(
x
)
-
(c)Noisy whiter player
(a) White player
(b) Red player
Fig.2.Player models used in the examples of this paper.
-
Where
Sh
(x
)
is the sphere centered on x, of radius h and containing
n
x
data points. The mean
shiftvector has the direction of the gradient of the density estimate atx.The mean shift procedure
isobtained by successive computations of the mean shift vector Mh (x),and translation of the
sphereSh(x)by Mh (x).The procedure is guaranteed to converge [7].
2.2. Mode Research
In[5], Fukunaga proposes an algorithm of mode seeking using the meanshift procedure. He applies the procedure to each point: when twodata points converge to the same final position, they are consideredto belong to the same cluster.
-
However, complexity is
O
(
n
2
)
, where n is the number of data points, what provides huge
computationaltime. In order to be usable to quickly initialize tracking procedure,complexity of mode seeking must be reduced. First, we use images asclusters, so data are in a regular grid, providing an efficientcomputation of the mean shift procedure [7]. To even reducecomplexity, we use the approach presented in [8]: instead of applyingthe mean shift procedure to every point, we only apply it to asubset. This is carried out as described below.
(1)Define a tessellation of the cluster with mspheres of radius h:
– thedistance between two neighbors should not be smaller thanh,
– (optional)the number of points inside the sphere should not be below athreshold
T1.
(2)Apply the mean shift procedure to the sphere centers. The differentpoints of convergence are themodes of the cluster.
(3)Merge modes whose distance is less than h.
(4)Discard modes whose density estimate is below a fraction T2of maximal density estimate.
3.CLUSTER CREATION FROM IMAGES
Thecolor models we used in this paper are presented in Fig. 2.
3.1.Binary Images
Themost naive approach consists in creating binary image with thefollowing way: each pixel whose color appears in the model has thevalue 1, and each pixel whose color does not appear has the value 0.Example is given in Fig. 3, where result seems satisfactory: densityestimate of the cluster
presentstwo maxima which can be distinguished, and which correspond to the“centers” of the two players. However, this approach is verylimited, because on real data the color model is often slightlydifferent from the color of the object in the image. Moreover,results are very sensitive to noise in the model, as we will see insection 3.3.
Todeal with slight changes in color, histograms can be used. To comparetwo pixels, we must now compare their index in the histogram insteadof directly comparing their color or gray level. The problem consistsin finding the optimal bin number. A small one will provide too manypoints in the cluster, whereas a too large one will not allowimportant changes in color.
(a)Cluster. (b)Density estimate
Fig.3.Cluster from the naive approach, with the model of Fig. 2.a.
Vandenbrouckeproposes another approach to get binary images from models, withsoccer images [4]. Learning data are drawn from various playermodels, for which many color representation systems are used.Finally, he only keeps the system which provides the best separationbetween the two classes of pixels according to the player soccersuits. Obtained clusters have a very good quality (like the desiredmodels shown in Fig. 1.b). However, this method needs many learningmodels, many color system changes, and particularly noise-free models(noise is manually removed).
3.2.Weighted Binary Images
Ourclusters are obtained from binary images. Each pixel has only twopossibilities: either it belongs to the model, or it does not. Theweighted binary image allows more degrees in model belonging. Largestvalues correspond to most probable pixels, whereas lower onescorrespond to unlikely pixels. Cluster is now obtained by taking onlythe pixels with strict positive value, each value corresponding to aweight: thus, most probable pixels will have a greater weight thanunlikely pixels.
Ourmethod to create cluster is the one proposed by Swain and Ballard in[9]: the measure of the object presence in image is defined from whatis called the backprojected image.
Letus denote by
{xi}i1... mthe location of the npixels in the image,
{qk}k1... mthe m-bincolor histogram (linearized color histogram with mbins) of the
image,
-
{
p
k
} k
1 ...
m
the
-bin color histogram of the model.
-
We also define the function
c
:
R
2
{ 1 ... m }
which associates to the pixel at location
x i
the
index
c
(
x i
)
of the histogram bin corresponding to the color of that pixel. The ratio histogram
{ r
}
1 ...
m
is defined as
r k
min(
p
k
, 1 )
. It associates to each color a belonging measure to the
k
k
q
k
model.It will allow the object presence in the image to be measured. Itsbackprojection onto the
-
image associates the value
r Cx i
)
to the pixel
x i
, for all
i
1 ... m
. Since the ratio histogram
emphasizesthe predominant colors of the target while diminishing the presenceof clutter and
-
background colors, the backprojected image
{ r Cx i
)} i
1 ...
m
represents a spatial measure of the
objectpresence.
-
In order to compute the mean shift vector for all
x
R
d
, we use a definition which takes the
weightsinto account
-
M
h
(
x
)
i
S
h
( x
)
w (
x i
)
x i
x
x i
S
h
( x
)
w (
x i
)
w (
x i
)
is the weight associated to the
Where
Sh
(x
)
is the sphere of radius h and center x, and
point
x i
.
(a)Cluster. (b)Density estimate. Fig.4.Backprojected image.
Backprojectedimage obtained from our soccer image is given in Fig. 4. We computedthe histogram in the RGB space with 16 bins for each color.
3.3.Comparison with Noisy Models
Withsynthetic models, we have similar results with naive approach andbackprojected image. However, this difference becomes obvious asmodels become noisy. In order to have a synthetic model from an imageof the object, we must select the image area where the object is,then manually remove the background [4]. This model could be used forvarious images.
However,we could expect to have a model from an image area, without manualcorrection. Background pixels will then be part of the model. Themethods previously presented have different behaviors in presence ofnoisy pixels, as we can observe in Fig. 5: with the naive approach,background pixel presence provides a too noisy cluster, which makesplayer localization impossible. One can see that in the rectangulararea where the model was extracted, every pixel was kept for the
-
cluster. On the other hand, with backprojected image, the ratio
p
k
allows the background pixel
q
k
influenceto decrease [9], because in the image, the background pixels are morenumerous than the object pixels. In the area where the model isextracted, background pixels have low weights, whereas player pixelskeep large weights. Density estimate of the backprojected image hastwo maxima which can be distinguished, the most important beingobviously the one which corresponds to the player selected for themodel. This last method providing more reliable
resultson real (noisy) data, we will use it in our applications.
4. OBJECTLOCALIZATION
Whenthe cluster is computed, object localization is carried out byapplying the algorithm described in section 2.2, which allows thedetection of all the cluster modes, without knowledge about theirnumber. The only a priori information needed is the scale of thecluster, i.e. the radius hwe will use in the mean shift procedure. When mode computation isended, each mode corresponds to an object center.
Whenthere is only one object to find, its position corresponds to theglobal maximum of cluster density estimate. Even if pixel by pixeldensity computation is possible, the algorithm of section 2.2 is moreefficient in term of time computation because density is onlyestimated for a few number of points. Moreover, the last step of thealgorithm becomes useless, because we only keep the mode
-
with maximum density. Thus, threshold
T 2
is not used.
Thisapproach is inspired from [3], where face localization is carried outfrom various initializations. For each of them, the mean shiftprocedure is run, in order to compute the corresponding mode, and tokeep the one with the highest peak. However, we can point out somedifferences with our approach. First, initialization points are notrandomly taken in sufficiently populated region, but are alwayschosen in the same location, for all images. Besides, theoptimization is not the one of density estimate of the backprojectedimage, but the one of Bhattacharyya coefficient, which measures thesimilarity between color distributions of the model and an imagearea. The advantage of using backprojected image instead ofBhattacharyya coefficient is the computational time, which is lowerwith backprojected image.
(a)Cluster obtained from naive approach. (b) Density estimate.
(c)Cluster obtained from backprojected image. (d)Density estimate. Fig.5. Noiseinfluence on clusters. Models were drawn from an image area, without backgroundremoval. (Fig. 2.c).
-
When we know the number N of the objects in the image, the threshold
T 2
is useless. The
methodis the same than with only one object, but we keep the Nmodes with highest density estimate.
Whenwe do not know the number of objects in the image, which usuallyoccurs in applications, we then apply all the steps of the algorithmof section 2.2. With the last step, the modes with a too
-
small density estimate will be removed. In our experiments, we used
T 2
25 %
. This threshold is
notvery dependent on the current environment, i.e. we did not have tochange it in our different experiments.
5. EXPERIMENTS
Weapplied this method with different sport images, and results1are satisfactory. When we deal with collective sports with two teams,we must run the procedure two times (one by team model).
Fig.6 and 7 present results of our method with different sport images.Sportsman locations are given by the circle centers. In both cases,we did not know the number of players in the images.
Thelimits of our method arise when conditions become difficult, as inFig. 8, where there is a billboard with the same color than whiteplayers, and those ones are only represented by few pixels.
Allthe players are detected, but there are two false alarms, thebillboard and the referee.
Fig.6. Soccerplayer localization.
(a)Noisy surfer model. (b)Result.
Fig.7.Surfer localization from a noisy model.
6.CONCLUSIONS AND DIRECTIONS OF FURTHER RESEARCH
Ourmethod allows non-rigid object localization from a color model, evenin the presence of noise, when they have enough pixels. This methodcan be used to automatically initialize tracking procedures, whichare often manually initialized in the first frame [1, 2].
Theinterest of our approach is the use of the mean shift procedure, which is a nonparametric statistical procedure, robust to noisealways present in real data. Mean shift procedure allows quickconvergence toward object locations, due to low complexity, lownumber of iterations and local search.
Anotheradvantage in our method is the low number of parameters: object colormodel,
-
approximate scale of the objects, and thresholds
T 1
and
T 2
. The threshold
T 1
is only used to
reducethe number of mean shift procedure uses, it can be removed if we donot want to determine
-
it. The threshold
T 2
is only used when we do not know the number of objects in the image. We
setit experimentally, and as it is not very dependent on the currentenvironment, we did not change it in all our examples.
So,the user only has to give a color model of the object, and itsapproximate size. To avoid manually creating a synthetic model, theuser can straightforwardly give the center object location:
themodel will be extracted from neighborhood (which depends on thescale) of the object. Finally, the user may not choose any scale, butselect an area in the image which corresponds to an object: the colormodel will be initialized from the colors of this area, and the scalewill be taken from the
-
size of the area. In this case, we can take two different scales
h x
and
h
y
for the height and the
widthof the object. Background pixels will only have a weak influence, asdiscussed in section 3.3.
However,a prior knowledge about the scale remains needed. We would like toremove this a priori knowledge, by automatically computing the scale.Thus, we will use the works presented in [10] and [11], where aredescribed clustering methods which use the mean shift procedure, butautomatically compute the scale from data.
7.ACKNOWLEDGMENT
Thiswork has been conducted on the behalf of the KLIMT project.
Fig.8.Soccer player localization : difficult case.
8. REFERENCES
[1]Chris Needham and Roger Boyle, “Tracking multiple sports playersthrough occlusion, congestion and scale,” in Proceedings of the12thBritish Machine Vision Conference, Manchester, UK, Sept. 2001, vol.1, pp. 93–102.
[2]Janez Perš and Stanislav Kovačič, “Tracking People in Sport:Making Use of Partially Controlled Environment,” in Proceedings ofthe 9th International Conference on Computer Analysis of Images andPatterns, Warsaw, Poland, Sept. 2001, pp. 374–382.
[3]Dorin Comaniciu and Visvanathan Ramesh, “Robust Detection andTracking of Human Faces with an Active Camera,” in Proceedings ofthe 3rd IEEE International Workshop on Visual Surveillance, Dublin,Ireland, July 2000, pp. 11–18.
[4]Nicolas Vandenbroucke, Ludovic Macaire, and Jack-Gérard Postaire,“Color image segmentation by supervised pixel classification in acolor texture feature space. Application to soccer imagesegmentation,” in Proceedings of the 15th International Conferenceon Pattern Recognition, Barcelona, Spain, Sept. 2000, vol. 3, pp.625–628.
[5]Keinosuke Fukunaga and Larry Hostetler, “The Estimation of theGradient of a Density Function, with Applications in PatternRecognition,” IEEE Transactions on Information Theory, vol. 21, no.1, pp. 32–40, Jan. 1975.
[6]Yizong Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEETransactions on Pattern
Analysisand Machine Intelligence, vol. 17, no. 8, pp. 790–799, Aug. 1995.
[7]Dorin Comaniciu and Peter Meer, “Mean Shift: A Robust ApproachToward Feature Space Analysis,” IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, May2002.
[8]Dorin Comaniciu and Peter Meer, “Distribution Free Decomposition ofMultivariate Data,” Pattern Analysis and Applications, vol. 2, no.1, pp. 22–30, 1999.
[9]Michael Swain and Dana Ballard, “Color Indexing,” InternationalJournal of Computer Vision, vol. 7, no. 1, pp. 11–32, Nov. 1991.
[10]Dorin Comaniciu, “An Algorithm for Data-Driven BandwidthSelection,” IEEE Transactions on Pattern Analysis and MachineIntelligence, vol. 25, no. 2, pp. 281–288, Feb. 2003.
[11]Hong Pan, Stan Li, and Guodong Guo, “Robust Unsupervised ClusteringUsing Generalized Annealing MEstimator,” in Proceedings of the 4thAsian Conference on Computer Vision, Taipei, Taiwan, Jan. 2000, vol.1, pp. 104–109.
<文献翻译一:译文>
基于MeanShift的颜色模型的非刚性物体定位 GaëlJaffré AlainCrouzil
Institutde Recherche en Informatique de Toulouse
UniversitéPaul Sabatier - 118 route de Narbonne - 31062 Toulouse Cedex 4 -France {jaffre,crouzil}@irit.fr
摘要
本文处理的是图像中基于颜色的非刚性目标定位。我们的方法可以在没有先验信息的前提下检测出图像中基于颜色模型的物体。我们的方法是创造一个二值图像,以便区分目
-
标最重要的像素。考虑图像作为在
R
2
中的一个簇,目标的定位就是找出所有的簇模型。该
搜索运用一个统计方法:MeanShift过程。为了说明我们的方法,我们采用运动图像,以从中找出所有的运动员。
1.介绍
我们的框架是运动图像序列分析,特别是运动员跟踪。在这项研究中,我们的目的是在图像中寻找非刚性物体的定位,自动初始化跟踪程序。当搜索是在本地(蛇形搜索,MeanShift搜索)时,这个问题会经常遇到。在跟踪时,需要在目标位置附近的前一帧初始化,由于诸多困难,往往需要一个手动的初始化[1,2]。
我们的问题主要两个难点:非刚性物体和三维物体。为了解决这两个难题,我们采用目标的颜色密度。在本文中,我们假设对象有一个差别颜色,例如,它们的颜色是容易区分的。因此,运用颜色密度对物体的非刚性、部分遮挡、摄像机的位置和远近变化等都有很好的鲁棒性。
在有关基于颜色的目标辨别的文献中,我们从文献[3]和[4]中受到启发。在文献[3]中,Comaniciu举了一个人脸跟踪的例子,在第一帧,一个脸由多个初始化检测出来。在文献[4]中,Vandenbroucke列举了一个不同的应用:从学习图像中,通过像素分类所有的运动员被检测出来。但是这种方法需要学习各种数据,并且在有噪声的图像中不能得到理想的结果。
我们同时使用两种方法:首先,我们先对像素分类:从目标模型中创建一个二值图
-
像,这样每个像素都被重新分类(0 或1)。此图像了代表了最重要目标像素信息被划分开
来。图1给出了例子,这个图像将作为模型被用于说明后面一同的例子中。
-
我们的方法是考虑这个二值图像作为
R
2
中的一个簇:目标定位的任务就减少为在簇中
对本地模型的区分。每一个模型是一个对象,它对应于该模型(一个模型就是一个本地密
-
度最大值)。
(a) Original image
(b) Desired cluster
图1.簇的例子,我们想得到白色运动员的模型,黑点代表的像素值为1,白点代表的像素 值为0
在后续的实验中,我们将不再使用二值图像,但是使用加权的二值图像,例如,这些模型不在为单纯的0或者1,而是在[0,1]的区间内。
本文的结构分为四个部分:首先,我们详细地介绍了统计工具用来估计簇密度和搜索
簇模型。然后,我们讨论用使用过的方法来创建簇,此外,我们还会展示我们能从簇中提
取对象坐标。在最后一部分,我们将用我们的方法在这些运动图像中进行实验。 2.统计手法
-
我们的方法在考虑二值图像作为
R
2
中的一个簇。在本节中,我们将讲述一下本文所使
用一统计方法。
2.1.Mean Shift 过程
MeanShift是一个无参密度梯度估计。有Fukunaga[5]在1975年为了解决图像识别而提出。然而,它真正的应用还在于1995年从Cheng[6]开始的,然后是1997年的Comaniciu和Meer[7]。
-
设
i i
1 ...
n
为
R
d
中的n 个点的集合。核函数为K 且核带宽为h 的Mean Shift 向量在
文献[7]给出:
-
M
(
x
)
i1 n
x i
K
(
x
x i
h
)
x
h
i1 n
K
(
x
x i
h
在本文中,我们仅用Epanechnikov核函数[5]和处理MeanShift过程。MeanShift向量的定义就变成
-
M
(
x
)
1
x i
x
h
n
x
x i
S
h
(
x
)
(a) White player
(b) Red player
(c) Noisy white player
图2在本文例子中使用的运动员模型
-
Sh
(x
)
是一个以x 为中心的球形,半径为h 并且包含
n
x
个数据点。Mean Shift 向量的方向就
是在x 点概率密度梯度估计的方向。Mean Shift 过程就是计算Mean Shift 向量
M h
(x
)
的过程,
运用
M h
(x
)
对球形
Sh
(x
)
的转化。在个过程在文献[7]中得到保证。
2.2.模型搜索
在文献[5]中,Fukunaga提出了一个运用MeanShift过程模型搜索的算法。他把这一过程运用到每一点:当两个数据点收敛到相同的位置时,认为它们属于相同的簇。
-
然而,当数据点的个数为n 时,复杂度为
O
(
n
2
)
,这将耗费很大的计算时间。为了能
快速地初始化跟踪过程,节点搜索的复杂度就必须减少。首先,我们把图像分成簇,因此,数据就在一个正则网络中,提供了一个有效的MeanShift过程的计算[7]。为了更加减少复杂度,我们使用文献[8]提到的方法:我们仅仅使用MeanShift在一个集合内,而不是运用到每点。下面是该算法描述:
(1)定义一个簇的分布,m个半径为h的圆:
-
两个相邻的距离要大于或等于h,
T 1
。
圆内的点的个数不能低于一个阈值
(2)在这些圆的中心使用MeanShift过程,不同的收敛点是这些簇的模型。(3)合并距离小于h的模型。
-
(4)舍弃密度估计小于最大概率密度估计的一个小数的
T
2
模型。
3.图像中簇的创建
本文中使用的颜色模型在图2中给出。
3.1.二值图像
最简单的构建二值图像的方法如下:在模型中出现的颜色的像素值为1,否则为0。以
图3为例,结果看起来令人满意:簇的密度估计呈现两个可以区分的极大值,并分别对应于
两个运动员的“中心”。然而,这种方法还是有一定的局限性,在真实的颜色模型中目标的
颜色模型往往有背景相差不大。此外,结果对模型中的噪声非常敏感,这些我们将在3.3中
看到。
为了处理颜色的轻微变化,可使用直方图。为了比较两个像素,我们必须用比较它们的
直方图索引来代替比较它们的颜色的灰度级。问题就在于寻找最优的bin数。小的在簇中提
供的点太多,而在大的则不允许颜色的大的变化。
Vandenbroucke提出了从模型中得到二值图像的另一种方法,运用足球图片[4]。学习数
据来自于大量的运动员模型,他们的很多颜色系统被使用。最后,他只保留了能通过球衣区
分运动员的两个像素类。获得的簇有很高的质量(像在图1.b中所示的所需要的模型)。然
而,这种方法需要很多学习模型,很多颜色系统的变化,特别是无噪声模型(噪声被移除)。
3.2.加权二值图像
我们的簇从二值图像中获得。每一个像素只有两种可能:属于模型或者不属于模型。加
权二值图像允许模型中更多的度。最大的值对应最有可能的像素,同样,最小的值对应最不
可能的像素。簇现在有有着严格正值的像素获得,每个值对应一个权重:因此,最有可能的
像素有一个更大的权重。
我们创建簇的方法是Swain和Ballard在文献[9]中提出的:图像中目标的度量由投影图
像定义。
让我们用以下表示:
-
{
x i
} i
1 ...
m
为图像中n 个像素点的个位置,
(
x i
)
对应像素的颜色。直方图比率
{ q
k
} k
1 ...
m
图像的m-bin 颜色直方图,
{
p
k
} k
1 ...
m
模型的m-bin 颜色直方图。
我们同样定义函数c:
R
{ 1 ... m }
,直方图的索引
c
-
{ r
} k
1 ...
被定义为
r k
min(
p
k
, 1 )
。它允许图像中目标的描述是可度量的。它的在图像上
k
m
q
k
像素
x i
的投影值为
r cx i
)
。由于强调目标颜色直方图的比率,同时减少杂波和背景颜色,投
影图像
{ r cx i
)} i
1 ...
m
代表了目标的空间度量。
为了计算所有
x
R
d
的Mean Shift 向量,我们使用了加入权重的定义
M
(
x
)
x iS h ( x ) w (
x iS h ( x )
x i
)
x i
x
h
x i
这里,
Sh
(x
)
半径为h 的圆并且中心为x,
w (
x i
)
为
x i
点的权重。
投影图像由图4中的足球图像所得,我们计算在RGB空间内计算直方图,每个颜色为
16bin。
-
(a) Cluster
(b) Density estimate
图4投影图像
3.3.噪声模型对照
对于综合模型,我们用简单的方法和背面投影图像得到简单的结果。然而,当模型带有
噪声时,这种差异就变得非常明显。为了从一个对象图像中得到一个综合模型,我们必须选
择其中的图像目标区域,然后手动删除背景[4]。这模型可用于各种图像。
然而,我们期望有通过手工矫正而从图像区域中得到一个模型。背景像素将会是模型的
一部分。先前提到的方法在噪声像素面前有不同的表现,我们能在图5中看到:用最简单的
方法,背景像素为一个噪声很大的簇,使得运动员定位变得不可能。我们能看到模型被抽出
-
的矩形区域,每一个像素都被簇保留。另一方面,对于背面投影图像,比率
p
k
允许背景像
q
k
素的影响减少[9]。因为在图像中,背景像素比目标像素要多得多。在模型被抽出的区域,
背景像素有较低的权重,而运动员像素有较大的权重。背面投影的密度估计有两个可以区分
的极大值,最重要的一个对应选择的运动员模型区域。最后一个方法在真实(噪声)数据上
提供了较真实结果,我们将在我们的应用中使用它。
4.目标定位
当簇被计算出时,目标的定位就可以由2.2描述的算法得出,在没有簇的个数的先验知
识的情况下,允许所有簇的检测。仅仅需要的先验信息是簇的宽度,例如,在MeanShift过
程中使用的半径h。当计算结束时,每一个模态对应一个目标的中心。
当只有一个寻找的目标时,它的位置对应一个簇密度估计的全局最大值。如果一个像素
一个像素地密度计算是可能的,则2.2提到的算法是更有效的,因为密度的估计仅仅是很少
的数据点。此外,算法的最后一步变得无用,因为我们仅仅保留模态的最大密度。因此阈值
-
T
2
是没有用的。
这种方法在文献[3]中得到启发,其中脸部的定位从不同的大量初始化中得到。它们中
的每一个都是有MeanShift过程得到,为了计算出对应的模态,并且保留最高的尖峰。然而,
我们能指出我们方法的一些不同之处。首先,初始化点不是随机的在稠密的区域取得,而是
一直在所有图像中选择同一个位置。此外,最优位置不是背面投影图像的密度估计,而是
Bhattacharyya系数,其表示目标颜色描述与图像区域的相似程度。用背面投影图像代替
Bhattacharyya系数的优点是计算时间,用背面投影的时间要短。
-
当我们知道图像中目标的数目N 时,阈值
T 2
就变得无用。方法就变得与一个目标一样,
但是我们保留有最高密度估计的N个模型。
当我们不知道图像的目标个数时,我们就使用2.2提到的算法的每一步。到了最后一步,
具有太少密度估计的模型就会被移除。在我们的实验中,我们使用T225%。这个阈值并
不是太依赖当前环境,例如,我们不会在不同的实验中去改变它。
5.实验
我们在不同的运动图像中使用我们的方法,结果1效果是别人满意的。当我们处理两个
队伍的集体运动中,我们必须执行这个过程两次(每一个队伍模型一次)
图6和图7表明在不同的运动图像中,我们的方法的结果。运动员的位置为圆形中心。
两个实验中,我们都不知道运动员的数量。
当条件变得困难时我们的方法的局限性上升,如图8,当有一个与白色运动员相同颜色
的布告板时,并且这些只是代表少数像素。所有的运动员都被侦测到,但是有两个错误,布
告板和裁判员。
图6足球运动员定位
(a)Noisysurfer model (b)Result
图7噪声模型的冲浪运动员定位
图8足球运动员定位
6.结论和进一步研究方向
我们的方法是基于颜色模型的非刚性目标的定位,即使在有噪声的环境下,只要有足够
多的像素。这个方法可用于自动初始化跟踪程序,而往往是在第一帧手动初始化[1,2]。
我们的方法关注的是MeanShift过程的使用,MeanShift是一个非参数统计过程,对实
际数据中存在的噪声鲁棒性强。MeanShift过程能快速收敛到目标位置,因复杂度低,迭代
次数少和局部搜索而著称。
-
我们的方法另一个优点是参数的数量少:目标颜色模型,目标大致宽度和阈值
T 1
和
T
2
。
T 1
仅仅用来减少Mean Shift 过程使用的次数,假如我们不想用它,它是可以移除的。阈值
T
2
仅仅在我们不知道目标数量的时候使用。我们实验性地设置它,并且它不依赖于当前的环境,
我们在所有的实验中都没有改变它。
所以用户只要给出一个目标的颜色模型和它的近似大小。为了避免手动创建一个枢密使
模型,用户可以直截了当地给出目标的中心位置:目标将从邻近位置(依赖于宽度)抽出。
最后,用户可以选择任意宽度,但是要在图像中选择一个对应目标的区域:目标颜色模型将
以这个区域的颜色进行初始化,并且这个宽度用来表示区域的尺度。我们可以使用两个不同
的宽度hx和hy表示目标的长和宽。背景像素的影响很弱,已经在3.3中讨论。
然而,一个宽度的先验知识仍然需要。我们想要通过自动计算宽度来移除这个先验知识。
因此,我们使用在文献[10]和[11]中提供的方法,即使用MeanShift过程描述簇方法,自动
计算宽度。
7.感谢
这项工作已经成为KLIMT工程的代表。
8.参考文献
[1]Chris Needham and Roger Boyle, “Tracking multiple sports playersthrough occlusion, congestion and scale,” in Proceedingsof the 12th British Machine Vision Conference,Manchester, UK, Sept. 2001, vol. 1, pp. 93–102.
[2]Janez Perš and Stanislav Kovačič, “racking People in Sport:Making Use of Partially Controlled Environment,”in Proceedingsof the 9th International Conference on Computer Analysis of Imagesand Patterns,Warsaw, Poland, Sept. 2001, pp. 374–382.
[3]Dorin Comaniciu and Visvanathan Ramesh, “Robust Detection andTracking of Human Faces with an Active Camera,” in Proceedingsof the 3rd IEEE International Workshop on Visual Surveillance,Dublin, Ireland, July 2000, pp. 11–18.
[4]Nicolas Vandenbroucke, Ludovic Macaire, and Jack-Gérard Postaire,“Color image segmentation by supervised pixel classification in acolor texture feature space. Application
tosoccer image segmentation,” in Proceedingsof the 15th International Conference on Pattern Recognition,Barcelona, Spain, Sept. 2000, vol. 3, pp. 625–628.
[5]Keinosuke Fukunaga and Larry Hostetler, “The Estimation of theGradient of a Density Function, with Applications in PatternRecognition,” IEEETransactions on Information Theory,vol. 21, no. 1, pp. 32–40, Jan. 1975.
[6]Yizong Cheng, “Mean Shift, Mode Seeking, and Clustering,” IEEETransactions on Pattern Analysisand Machine Intelligence,vol. 17, no. 8, pp. 790–799, Aug. 1995.
[7]Dorin Comaniciu and Peter Meer, “Mean Shift: A Robust ApproachToward Feature Space Analysis,” IEEETransactions on Pattern Analysis and Machine Intelligence,vol. 24, no. 5, p. 603–619, May 2002.
[8]Dorin Comaniciu and Peter Meer, “Distribution Free Decomposition ofMultivariate Data,” PatternAnalysis and Applications,vol. 2, no. 1, pp. 22–30, 1999.
[9]Michael Swain and Dana Ballard, “Color Indexing,” InternationalJournal of Computer Vision, vol.7, no. 1, pp. 11–32, Nov. 1991.
[10]Dorin Comaniciu, “An Algorithm for Data-Driven BandwidthSelection,” IEEETransactions onPattern Analysis and Machine Intelligence,vol. 25, no. 2, pp. 281–288, Feb. 2003.
[11]Hong Pan, Stan Li, and Guodong Guo, “Robust Unsupervised ClusteringUsing Generalized Annealing MEstimator,” in Proceedingsof the 4th Asian Conference on Computer Vision,Taipei, Taiwan, Jan. 2000, vol. 1, pp. 104–109.
<文献翻译二:原文>
MEANSHIFT AND OPTIMAL PREDICTION FOR EFFICIENT OBJECT
TRACKING2
DorinComaniciu and Visvanathan Ramesh
Imagingand Visualization Department, Siemens Corporate Research
755College Road East, Princeton, NJ 08540
{comanici,vramesh}@scr.siemens.com
ABSTRACT
Anew paradigm for the efficient color-based tracking of objects seenfrom amoving camera is presented. The proposed technique employs themean shift analysis to derive the target candidate that is the mostsimilar to a given target model, while the prediction of the nexttarget location is computed with a Kalman filter. The dissimilaritybetween the target model and the target candidates is expressed by ametric based on the Bhattacharyya coefficient. The implementation ofthe new method achieves real-time performance, being appropriate fora large variety of objects with different color patterns. Theresulting tracking, tested on various sequences, is robust topartial occlusion, significant clutter, target scale variations,rotations in depth, and changes in camera position
1. INTRODUCTION
Objecttracking is a task required by different computer visionapplications, such as perceptual user interfaces [3], intelligentvideo compression [8], and surveillance [12]. To achieve robustnessto out-of-plane rotations of the target, the color distribution ofthe target model is employed instead of raw image pixels. Thelocation of the target in the new frame is predicted based on thepast trajectory, and a search is performed in its neighborhood forimage regions (target candidates) whose distribution is similar tothat of the model. In single hypothesis tracking the best matchdetermines the new location estimate, however, more complexstrategies also exist to form multiple hypothesis [1].
Theexhaustive search in the neighborhood of the predicted targetlocation for the best target candidate is, however, a computationallyintensive process. As a solution to this problem we propose acolor-based tracking method based on the mean shift iterations [4, 5]which works in real time, being based on a gradient ascentoptimization rather than exhaustive search. The measurement vector isderived based on mean shifts, while the prediction of the next targetlocation is computed by a Kalman filter (Figure 1).
2DorinComaniciu and Visvanathan Ramesh. Mean Shift and optimal predictionfor efficient object tracking[D]. Imaging and VisualizationDepartment, Siemens Corporate Research: 2000
Fig.1.Block diagram showing the main computational modules of the proposedtracking: the fast target localization based on mean shift iterationsand the state prediction using Kalman filtering. The motion of thetarget is assumed to have a velocity that undergoes slight changes,modeled by a zero mean white noise that affects the acceleration.
Itis assumed next the support of two modules which should provide (a)detection and localization in the initial frame of the objects totrack (targets) [12], and (b) periodic analysis of each object toaccount for possible updates of the target models due to significantchanges in color [13].
Theorganization of the paper is as follows. Section 2 presents theemployed similarity measure. The mean shift based localization of thetarget is described in Section 3. Section 4 discusses the Kalmanfilter, while the scale adaptation is presented in Section 5.Experimental results are given in Section 6.
2. COLOR-BASEDSIMILARITY MEASURE
Giventhe predicted location of the target in the current frame and itsuncertainty, the measurement task assumes the search of a confidenceregion for the target candidate that is the most similar to thetarget model. The similarity measure we develop is based on colorinformation. The feature z
representingthe color of the target model is assumed to have a density function qz,while the target
candidatecentered at location yhas the feature distributed according to pz(y).The problem is to
findthe discrete location y whose associated density pz(y)is the closest to the target density qz.
Ourmeasure of the distance between the two densities is based on theBhattacharyya coefficient, whose general form is defined by [11]
-
(
y
)
p
(
y
),
q
]
p
z
(
y
)
q
z
dz
(1)
Propertiesof the Bhattacharyya coefficient such as its relation to the Fishermeasure of information, quality of the sample estimate, and explicitforms for various distributions are discussed in [7, 11].
Thederivation of the Bhattacharyya coefficient from sample data involvesthe estimation of the densities pand q,for which we employ the histogram formulation. The discrete density
-
q
{ q u
} u
1 ...
m
(with
1
q u
1
) is estimated from the m-bin histogram of the target model,
while
p
(
y
)
{
p uy
)} u
1 ...
. m
(with
1
p u
1
) is estimated at a given location y from the m-
binhistogram of the target candidate. Therefore, the sample estimate ofthe Bhattacharyya coefficient is given by
-
(
y
)
p
(
y
),
q
]
m
u1
p u
(
y
)
q u
(2)
Basedon equation (2) we define the distance between two distributions as
-
d
(
y
)
1
p
(
y
),
q
]
(3)
Thestatistical measure (3) is a metric valid for arbitrarydistributions, being nearly optimal (due to its link to the Bayeserror [11]) and invariant to the scale of the target. It is thereforesuperior to other measures such as histogram intersection [14],Bhattacharyya distance, Fisher linear discriminant [10], or Kullbackdivergence.
3. TARGETLOCALIZATION
Thissection shows how to efficiently minimize (3) as a function of yin the neighborhood of a predicted location. By contrast to objecttracking based on exhaustive search in a confidence region [2, 9,12], our optimization through mean shift iterations is faster sinceit exploits the spatial gradient of the measure (3).
3.1.Weighted Histogram Computation
-
Target Model
We denote by
{
x i *} i
1 ...
m
the pixel locations of the target model, centered at 0.
Let
b
:
R 2
{ 1 ... m }
be function which associates to the pixel at location
x i *
the index
b
(
x i *
)
ofthe histogram bin corresponding to the color of that pixel. Theprobability of the color u in the
-
target model is derived by employing a convex and monotonic decreasing function
k
:
[
0 ,
]
R
whichassigns a smaller weight to the locations that are farther from thecenter of the target. The weighting increases the robustness of theestimation, since the peripheral pixels are the least reliable, beingoften affected by occlusions (clutter) or background. By assumingthat the generic coordinates
-
x and y are normalized with
h x
and
h
y
, respectively, we can write
q u
m
i1
k
(
x i *
2
)[ b
(
x i *
)
u
]
(4)
Where
is the Kronecker delta function. The normalization constant C is derived by imposing
the condition
1
q u
1
, from where
-
C
1
1
x i * )
,
(5)
k
(
thesummation of delta functions for u= 1…mbeing equal to one.
-
Target Candidates
Let us denote by
{
x i
} i
1 ...
m
the pixel locations of the target candidate,
centeredat yin the current frame. Employing the same weighting function k,the probability of the color uin the target candidate is given by
-
p
(
y
)
C
n
k
(
y
x i
2
)[ b
(
x
)
u
]
.
(6)
u
h
i1
h
i
Thescale of the target candidate (i.e., the number of pixels) isdetermined by the constant hwhich plays the same role as the bandwidth (radius) in the case ofkernel density estimation [5]. By
-
imposing the condition that
1
p u
1
we obtain the normalization constant
C
h
(
1
x i
2
)
,
(7)
y
1
k
h
Note that
C
h
does not depend on y, since the pixel locations
x i
are organized in a regular lattice,
y being one of the lattice nodes. Therefore,
C
h
can be precalculated for a given kernel and different
valuesof h.
3.2.Distance Minimization
-
The search for the new target location in the current frame starts at the predicted location
y
0
of
the target computed by the Kalman filter (Figure 1). Thus, the color probabilities
{
p uy
0)} u
1 ...
m
of the target candidate at location
y
0
in the current frame have to be computed first.
Theminimization of the distance (3) being equivalent to the maximizationof the Bhattacharyya coefficient (2), we start with the Taylorexpansion of
-
p
(
y
),
q
]
1
m
p
(
y
)
q
1
m
p
(
y
)
q u
(8)
2
u1
u
0
u
2
u1
u
p u
(
y
0
)
3.3.Measurement Uncertainty
Theuncertainty in the localization of the target is determined by theimage noise, the similarity between the target colors andbackground/clutter colors, and the percentage of occlusion. However,the perturbation sources also influence the maximum value of theBhattacharyya coefficient and the curvature around the maximum. Sincethese two parameters (the maximum value and the curvature aroundmaximum) can be evaluated in real time, we derived throughMonte-Carlo simulations a
lookup-tablethat relates the maximum value and the surface curvature to theuncertainty in the location estimate. As a result, after each meanshift optimization that gives the measured location of the target,the uncertainty of the estimate can also be computed.
4. KALMANPREDICTION
Thetracker employs two independent Kalman filters, one for eachdirection x and y. The target motion is assumed to have a slightlychanging velocity ([1, p. 82]) modeled by a zero mean, low variance(0:01) white noise that affects the acceleration.
Thetracking process consists in running for each frame the mean shiftbased optimization which determines the measurement vector and itsuncertainty, followed by the Kalman iteration which gives thepredicted position of the target and a confidence region. Theseentities are used in turn to initialize the mean shift optimizationfor the next frame.
5. SCALEADAPTATION
Thescale adaptation scheme exploits the property of the distance (3) tobe invariant to changes in the object scale. We simply modify thebandwidth h of the kernel profile with a certain fraction (we
-
used
10 %
), let the mean shift based algorithm to converge again, and choose the radius yielding
thelargest decrease in the distance (3). An IIR filter is used to derivethe new radius based on the current measurements and old radius.
6. EXPERIMENTS
Theproposed tracking has been applied to various test sequences withsuperior performance and low computational complexity. Figure 2 showsthe successful tracking in the presence of a complete
occlusionof the hand-drawn ellipsoidal region of size (hxhy)(55 , 39 )marked in the first
image.Note that the target histogram has been derived in the RGB space with323232bins. The algorithm runs comfortably at 30 fps on a 600 MHz PC, Javaimplementation.
Figure3 shows samples from a sequence taken with a moving camera,demonstrating the tracking of an electronic device whose colors areclose to those of the background. One can observe the scaleadaptation provided by the algorithm.
Fig.2. Tennissequence: The frames 21, 47, and 52 areshown (left-right).
Fig.3.Device sequence: The frames 1, 100, 200, and 300 are shown.
7. REFERENCES
[1]Y. Bar-Shalom, T. Fortmann, Tracking and Data Association, AcademicPress, London, 1988. [2] S. Birchfield, “Elliptical Head Trackingusing intensity Gradients and Color Histograms,” IEEEConf. on Comp. Vis. and Pat. Rec., Santa Barbara, 232–237, 1998.
[3]G.R. Bradski, “Computer Vision Face Tracking as a Component of aPerceptual User Interface,”IEEE Work. on Applic. Comp. Vis., Princeton, 214–219, 1998.
[4]D. Comaniciu, V. Ramesh, P. Meer, “Real-Time Tracking of Non-RigidObjects using Mean Shift, To appear, IEEE Conf. on Comp. Vis. andPat. Rec., Hilton Head Island, South Carolina, 2000.
[5]D. Comaniciu, P. Meer, “Mean Shift Analysis and Applications,”IEEE Int’l Conf. Comp. Vis., Kerkyra,Greece, 1197–1203, 1999.
[6]D. Comaniciu, P. Meer, “Distribution Free Decomposition ofMultivariate Data”, Pattern Anal. andApplic., 2:22–30, 1999.
[7]A. Djouadi, O. Snorrason, F.D. Garber, “The Quality ofTraining-Sample Estimates of the BhattacharyyaCoefficient,” IEEE Trans. Pattern Analysis Machine Intell.,12:92–97, 1990. [8] A. Eleftheriadis, A. Jacquin, “Automatic FaceLocation Detection and Tracking for Model-AssistedCoding of Video Teleconference Sequences at Low Bit Rates,” SignalProcessing- Image Communication, 7(3): 231–248, 1995.
[9]P. Fieguth, D. Terzopoulos, “Color-Based Tracking of Heads andOther Mobile Objects at Video Frame Rates,” IEEE Conf. on Comp.Vis. and Pat. Rec, Puerto Rico, 21–27, 1997.
[10]K. Fukunaga, Introduction to Statistical Pattern Recognition, SecondEd., Academic Press, Boston,1990.
[11]T. Kailath, “The Divergence and Bhattacharyya Distance Measures inSignal Selection,” IEEETrans. Commun. Tech., COM-15:52–60, 1967.
[12]A.J. Lipton, H. Fujiyoshi, R.S. Patil, “Moving TargetClassification and Tracking from Real-Time Video,” IEEE Workshop onApplications of Computer Vision, Princeton, 8–14, 1998.
[13]S.J. McKenna, Y. Raja, S. Gong, “Tracking Colour Objects usingAdaptive Mixture Models,” Imageand Vision Computing, 17:223–229, 1999.
[14]M.J. Swain, D.H. Ballard, “Color Indexing,” Intern. J. Comp.Vis., 7(1):11–32, 1991. [15] “Real-Time Tracking of Non-RigidObjects using Mean Shift,” US patent pending.
<文献翻译二:译文>
均值漂移和最优预测在高效目标跟踪中的应用DorinComaniciu and Visvanathan Ramesh
Imagingand Visualization Department, Siemens Corporate Research 755 CollegeRoad East, Princeton, NJ 08540
{comanici,vramesh}@scr.siemens.com
摘要
本文将介绍一个基于颜色的高效的跟踪移动摄像头中目标的实例。该拟议技术采用MeanShift分析推导出与目标模型最相似的候选区域,而下一帧目标位置的预测则采用卡尔曼滤波。目标模型与候选区域之间的相似程度用Bhattacharyya来描述。这个新方法实行性好,适合各种各样的彩色目标。在不同的图像序列中,对局部遮挡、噪声、目标形状变化和运动背景都有很好的鲁棒性。
1.介绍
目标跟踪在不同的计算机视觉应用中都不可或缺,例如用户感知接口、智能视频压缩、监视。为了提高对目标旋转的鲁棒性,原始的图像像素将被目标模型的颜色描述替代。目标在新的一帧中的位置预测基于过去的轨迹,并且在与目标模型分布相似的附近区域(候选区域)执行搜索。在单假设跟踪中,最佳匹配决定新的位置估计,然而在多假设中,更复杂的情况依然存在。
在预测目标位置穷举搜索候选区域是一个计算复杂的过程。作为解决方案,我们提出了一个基于颜色的MeanShift跟踪方法,MeanShift迭代是基于梯度上升而优化较彻底的搜索。
测得的向量基于MeanShift向量,而下一个目标预测位置的计算采用卡尔曼滤波(图1)。
图1.框图显示了跟踪的主要计算模块:快速目标定位基于MeanShift迭代,正式的预测采用卡尔曼滤波。目标运动假设经过较小的位置变化而有一个速率,模拟与零均值白噪
声影响的加速度。
这个假设应该提供以下两个模块:(1)目标的检测和定位在初始帧完成。(2)定期分析目标可能的重要颜色变化。本论文安排如下:第二章节讲述相似度,MeanShift目标定位在第三章节中描述,第四章节讨论了卡尔曼滤波。自适应尺度在第五章节,第6章节为实验结果。
2.颜色相似度
鉴于在当前帧中目标定位预测和它的不确定性,我们的任务就是找到一个置信区间使候选区域与目标模型最相似。我们采用基于颜色的相似度。用z表示目标模型的颜色,假
-
设目标模型的颜色概率密度函数为
q
z
,而候选区域的中心y 的特征用
pz
(y
)
来描述。问题
就变成找出离散的点y,使其概率密度
pz
(y
)
最接近目标模型
q
z
。
两个概率密度的相似度我们采用Bhattacharyya系数,其定义为:
-
(
y
)
p
(
y
),
q
]
p
z
(
y
)
q
z
dz
(1)
Bhattacharyya系数的有关性质,比如与Fisher信息量的关系,样本估计的质量将在[7,11]讨论。
样本中Bhattacharyya系数的推导包括概率密度p和q的估计,为此我们提出了直方
-
图。离散概率密度
q
{ q u
} u
1 ...
m
(
1
q u
1
)是对目标模型的m-bin 直方图的估计,而
p
(
y
)
{
p uy
)} u
1 ...
m
(
1
p u
1
)是对候选区域m-bin 直方图的位置y 的估计。而
Bhattacharyya系数的样本估计为:
-
(
y
)
p
(
y
),
q
]
m
u1
p u
(
y
)
q u
(2)
基于公式(2)我们定义两个分布间的距离为:
-
d
(
y
)
1
p
(
y
),
q
]
(3)
统计方法(3)是对任意分布的一种有效的度量,接近最佳(取决于Bayes错误[11])和目标大小恒定。因为优于其他措施,如直方图相交[14],Bhattacharyya距离,Fisher线性判别或Kullback散度。
3.目标定位
本节说明如何有效地使y在预测位置附近最小化。在目标跟踪的基础上在置信区间[2,9,12]彻底搜索。利用空间梯度使MeanShift快速迭代以达到最优。
3.1带权重的直方图计算
-
目标模型我们用
i
1 ...
m
来表示目标模型的像素位置。中心为0。设b:
R 2
{ 1 ... m }
为在像素位置
的函数。直方图bin 的索引
b
(
x i *
)
对应像素的颜色。目标
模型的颜色概率u 得自于单调递减函数k:
[
0 ,
)
R
,表示离目标中心越远的点权重越
小。权重的加入增加了估计的鲁棒性,这是因为周边的像素最不可靠,往往受遮挡(杂
-
波)或背景的影响。假设坐标x 和y 规格化为
h x
和
h
y
,分别,我们可以写:
qu
n
i1
k
(
x i *
2
)[ b
(
x i *
)
u
]
,
(4)
是Kronecker
函数。C 是归一化系数,使
1
q u
1
,故
C
1
1
x i * )
,
(5)
k
(
函数的和为1.
-
候选区域
我们用
i
1 ...
m
来表示候选区域的像素位置,在当前帧中心为y。权重函数
仍为k,候选区域的颜色概率密度u为:
-
p
(
y
)
C
n
k
(
y
x i
2
)[ b
(
x
)
u
]
。
(6)
u
h
i1
h
i
候选区域的大小(如像素个数)由常数h决定,就是核密度估计的带宽[5]。为满足的条件
-
1
p u
1
,我们取规格化常数
C
h
1
x i
2
)
。
(7)
y
1
k
(
h
注意
C
h
不是由y 决定的。像素位置
x i
组织成一个正则的方格,y 是方格节点中的一个。因
此,在给一个核和不同的h 值,
C
h
可以预先计算出来。
3.2.距离最小化
-
在当前帧中新的目标位置的搜索在由卡尔曼滤波计算出的位置
y
0
处开始(图1)。而
候选区域的颜色概率
{
p uy
0)} u
1 ...
m
在当前帧中的位置
y
0
被首先计算出来。
最小化式(3)等效于Bhattacharyya 系数(2)最大化。我们对
p
(
y
),
q
]
利用泰勒级
数在
pu
(
y
0
)
处展开:
p
(
y
),
q
]
1
m
u1
p u
(
y
)
q
1
m
p
(
y
)
q u
)
(8)
2
0
u
2
u1
u
p u
(
y
0
由(6)和(8)我们得到
-
p
(
y
),
q
]
1
m
p
(
y
)
q
C
h
n
w
k
(
y
x i
2
)
(9)
2
u1
u
0
u
2
i1
i
h
其中
-
w i
m
[ bu1
(
x i
)
u
]
q u
0)
(10)
p u
(
y
因此,为了使距离最小化,公式(9)中的第二子项必须最大化,第一子项与y无关。第
-
二项表示在当前帧y 处的核概率密度估计,数据用
w i
加权。最大化能用基于均值漂移的迭
代有效地完成。使用下面的算法。
-
Bhattacharyya 系数
p
(
y
),
q
]
最大值
p u
(
y
0
)
q u
。
给出目标模型的分布
{
p uy
0)} u
1 ...
m
和目标的预测位置
y
0
:
1. 计算分布函数
{
p uy
0)} u
1 ...
m
并且
p
(
y
0
),
q
m
u1
2. 根据式(10)计算权重
{ w i
} i
1 ...
m
3.计算出新的目标位置[5]
-
y 1
i1 n h
x i
w i
g
(
y
0
x i
2
)h
i1 n h
w i
g
(
y
0
x i
2
)h
更新
{
p uy 1)} u
1 ...
m
并且估算
p
(
y 1
),
q
]
1
p u
(
y 1
)
q u
4. While
p
(
y 1
),
q
p
(
y 0
),
q
Do
y 1
1
(
y 0
y 1
)
2
5. 如果
y 1
y 0
则停止,
否则
y 0
y 1
,转步骤1
以上最优化在第3 步运用Mean Shift 向量使近似的Bhattacharyya 系数
~ y
)
增加。
然而这步操作没有必要增加
~ y
)
的值,包括第4 步的使目标新位置生效是需要的。
然而实际实验(跟踪不同的目标较长一段时间)表明:由公式(11)定位的位置计
-
算出的Bhattacharyya 系数总是大小
y
0
相应的系数。不少于0.1%的最大差值说明第4
步的迭代是有必要的。在第5 步使用的最终阈值通过强制描述
y
0
和
y 1
在图像坐标中
的同一个像素。
3.3.不确定度
目标位置的不确定度由图像噪声,目标颜色与背景颜色的相似和遮挡百分比决
定。不过这些因素也会影响Bhattacharyya系数的极大值和极大值周围的曲率。由于
这两个参数(极大值和极值周围曲率)能实时估算,我们通过Montecarlo模拟查找表,关于位置估计有确定度的极大值和表面曲率。作为结果,在MeanShift最优化给出每一个目标位置之后,估计的不确定度也就可以计算出来了。
4.卡尔曼预测
跟踪器采用两个独立的卡尔曼滤波,每一个分别应对方向x和y。该方案假定目标运动速率不快,方差低(0.01)白噪声影响程度小。
跟踪过程先是在每一帧运用MeanShift优化决定向量和它的不确定度,然后用Kalman滤波预测出目标位置和一个置信区间。这些属性用于在下一帧中初始化MeanShift优化。
5.自适应宽度
宽度自适应利用公式(3)的距离性质适应目标的大小变化。我们用一个确定的
-
分数(我们使用
10 %
)简单改变核函数的带宽h,让Mean Shift 算法再次收敛并且
选择半径为在距离(3)中的最大减少量。IIR滤波用来在旧的半径的基础上计算出新的半径。
6.实验
该跟踪算法已经在不同的序列测试中证明了其优越的性能和较低的计算复杂度。图2说明了在完全遮挡的情况了,同样跟踪成功。注意目标直方图的RGB空间被分
-
成
32
32
32
bin。该算法在600MHz 的电脑30FPS 的视频中运行良好,Java 实现。
图3表明了用一个移动的摄像机拍摄的一个电子设备的跟踪,其颜色接近背景颜色。观察算法提供的宽度自适应。
图2乒乓球序列:依次为第21,47和52帧(从左至右)
5、参与贵公司的股权收购商业谈判,分析在可能存在的潜在商业风险,提出合理的建议并及时加以正确的处理,积极维护了公司的利益,预防了风险的发生。
6、对贵公司的应收账款及时进行跟踪、了解,针对久拖不还的客户及时向对方发出《律师函》进行追讨,向对方阐明利害关系,力争让对方尽快支付拖欠公司的款项。
7、妥善处理贵公司与。有限公司之间的买卖合同纠纷,积极与对方当事人和法官沟通,通过我们的诉讼技巧,避免了诉讼的发生,为公司争取了利益最大化和避免了一些不必要纠纷的纠缠。
8、积极处理贵公司与。集团有限公司建设施工合同纠纷一案,及时与对方当事人和法官沟通,对话,努力把工作做细、做好,最大程度的使公司资金尽快的回笼。
在___年度的法律顾问服务工作过程中,我们律师的工作取得了一定的效果。通过___年度的法律服务,能清晰的看到通过我们律师多角度、全方位的努力工作,贵公司无论是经营管理环境、模式,抑或是公司的盈利能力都得到了显著的改善,而且从风险防范和控制角度看,通过我们律师的顾问服务,___年度,我们的工作都有效地实现了贵公司风险防范和控制总目标,公司的风险控制能力明显增强。
第7页共8页
感谢浏览
范文仅供参考
第8页共8页
因篇幅问题不能全部显示,请点此查看更多更全内容