Our approach to model evaluation is based on directly measuring key properties of a given model. An effective model is one which is able to generate a broad range of example of the class of modelled images. This property is referred to as Generalisation ability. This property is not sufficient since the model must also generate examples that are consistent with the class of modelled images. This property is referred to as Specificity. As will be shown later, the two properties are related to (and their estimates can even be substituted by) the notion of Shannon's entropy [22].
The approach to the assessment of NRR relies on the close
relationship between registration and statistical model building,
and extends the work of Davies et al. on evaluating shape models [8].
We note that NRR of a set of images establishes the dense correspondence
which is required to build a combined appearance model. Given the
correct correspondence, the model provides a concise description of
the training set. As the correspondence is degraded, the model also
degrades in terms of its ability to reconstruct images of the same
class, not in the training set (Generalisation), and its ability to
only synthesise new images similar to those in the training set (Specificity).
If we represent training images and those synthesised by the model
as points in a high dimensional space, the clouds represented by training
and synthetic images ideally overlap fully (see Fig. 2). Given a measure
of the distance between images (as described in the next subsection), Specificity, ,
Generalisation,
, and their standard errors
and
can be defined as follows:
|
Let
be a large image set which
has been sampled from the model and has the same distribution as the
model. The distance between two images is described by
which enables us to define:
![]() |
(5) |
![]() |
(6) |
![]() |
(7) |
![]() |
(8) |
where {
is a large set of images sampled
from the model,
is the distance between two images and
SD is standard deviation.
Both values are low for a good model as short distances imply image proximity. Specificity measures the mean distance between images generated by the model and their closest neighbours in the training set, whilst Generalisation measures the mean distance between images in the training set and their closest neighbours in the synthesised set. The approach is illustrated diagrammatically in Fig. 3.
|
It can be observed that Specificity and Generalisation fail to account for many of image distances. This might lead to poor and incomplete results. While these measures provide good approximations, we strive to make use of a less simplistic method and exploit work that is related to the MDL principle. The principle was shown to be valuable when dealing with shapes alone.