1The Likelihood Function: Illustration from a Computer Vision Problem
The likelihood function gives a probabilistic interpretation of the measurements given a specific model. It is probabilistic because the measurement is always corrupted by noise at the input stage. In many vision theories the likelihood function depends on the image formation process and involves physical constraints such as, for instance, the geometry and surface reflectance.
1.1Background
The theory of stochastic processes gives us a mathematical framework for modelling for instance motion signals that vary over time. These processes can be used to predict the probabilities of future states of the system (e.g. the future velocities) and are called Markov if the probability of future states depends only on their present state (i.e.
not on the time history of how they arrived at this state). In computer vision, Markov processes have been used to model temporal coherence for applications such as the optimal fusing of data from multiple frames of measurements (Matthies
et al. 1989; Clark and Yuille 1990; Chin
et al. 1994). Such optimal fusing is typically defined in terms of least-squares estimates which reduces to Kalman filtering theory (Kalman 1960). Because Kalman filters are (recursive) linear estimators that apply only to Gaussian densities, their applicability in complex scenes involving several moving objects is questionable.
Using a Bayesian formulation of temporal coherence allows to generalize the standard Kalman filters so as to deal with several targets moving in complex environments.
1.2A Specific Example from Visual Neurosciences
To determine the likelihood function, we must first specify the input stage. Ideally, this would involve modeling the behaviors of the cortical cells sensing natural image motion, but this would be too complex to be practical. Instead, we use a simplified model of a bank of receptive fields tuned to various velocities
,
, and positioned at each image pixel. These cells have observation activities
which are intended to represent the output of a neuronally plausible motion model (see for instance Grzywacz and Yuille 1990). In the proposed implementation, these simple model cells receive contributions from the motion of dots falling within a local neighbourhood, typically the four nearest neighbours. Intuitively, the closer the dot is to the center of the receptive field, and the closer its velocity is to the preferred velocity of the cell, then the larger the response. The spatial profile and velocity tuning curve of these receptive fields are described by Gaussian functions whose covariance matrices
and
depend on the direction of
, and are specified in terms of their longitudinal and transverse components
,
and
,
(for more details, see section 4.1 of Yuille, Burgi, Grzywacz 1997).
The likelihood function specifies the probability of the receptive field responses conditioned on a "true" external motion field. We assume that the measurements depend only on the velocity field at that specific position. We can therefore write the likelihood function as:
,
with
where
is the joint probability distribution and we set
to be (this is only one of several possible choices. It is attractive, see (Yuille, Burgi, Grzywacz 1997) because it leads to a simple linear update rule):
with tuning curves
given by:
where
is the covariance matrix which depends on the direction of
, and is specified in terms of its longitudinal and transverse components
and
). The experimental data (Anstis and Ramachandran 1987, Gottsdanker 1956, Werkhoven
et al. 1992) suggest that temporal integration occurs for velocity direction rather than for speed. This is built into our model by choosing the covariances so that the variance is bigger in the direction of motion than in the perpendicular direction (i.e. the velocity component perpendicular to the motion has mean zero and very small variance so the
direction of motion is fairly accurate, but the variance of the velocity component along the direction of motion is bigger which means that the estimation of the speed is not accurate). Observe that we have assumed that the response of the measurement device is instantaneous. It would be possible to adapt the likelihood function to allow for a time lag but we have not pursued this. Such a model might be needed to account for motion blurring. We develop a theory for the temporal integration of visual motion motivated by psychophysical experiments. The theory proposes that input data are temporally grouped and used to predict and estimate the motion flows in the image sequence. This temporal grouping can be considered a generalization of the data association techniques used by engineers to study motion sequences.
Our temporal-grouping theory is expressed in terms of the Bayesian generalization of standard Kalman filtering.