Differences

This shows you the differences between two versions of the page.

@@ Line 1: / Line 1: @@
+Assessing the Accuracy of Non-Rigid
+Registration With and Without Ground Truth
+R. S. Schestowitz^{1}, W. R. Crum^{2}, V. S. Petrovic^{1},
+C. J. Twining^{1}, T. F. Cootes^{1} and C. J. Taylor^{1}
+^{1}Imaging Science and Biomedical Engineering, University
+of Manchester
+Stopford Building, Oxford Road, Manchester M13 9PT,
+United Kingdom
+^{2}Centre for Medical Image Computing, Department of
+Computer Science, University
+College London, Gower Street, London WC1E 6BT, United Kingdom
+We present two methods for assessing the performance of
+non-rigid registration algorithms. One of them requires
+ground-truth solutions, whereas the other does not need
+any form of ground truth. The former method is based on
+label overlap, which can be computed using Tanimoto's
+formulation. The method which requires no ground truth
+exploits the fact that, given a set of non-rigidly
+registered images, a generative statistical appearance
+model can be constructed. The quality of the model
+depends on the quality of the registration, and can be
+evaluated by comparing images sampled from it with the
+original image set. We derive indices of model
+specificity and generalisation, and show that they
+demonstrate the loss of registration as a set of
+correctly registered images is progressively perturbed.
+We finally compare the two methods of assessment and
+show that the latter method, which requires no ground
+truth, is in fact more sensitive than the one that does.
+Over the past few years, non-rigid registration (NRR)
+has been used increasingly as a basis for medical image
+analysis. Applications include structural analysis,
+atlas matching and change analysis. Many different
+approaches to NRR have been proposed, for registering
+both pairs and groups of images. These differ in terms
+of the objective function used to assess the degree of
+mis-registration, the representation of spatial
+deformation fields, and the approach to minimizing the
+mis-registration with respect to the deformations. The
+problem is highly under-constrained and, given a set of
+images to be registered, each approach will, in
+general, give a different result. This leads to a
+requirement for methods of assessing the quality of registration.
+Hereby we outline two methods for assessment, one of
+which requires ground-truth solutions to be provided a
+priori while the other does not. We shall present
+results which confirm that both methods are valid and
+proceed to calculating their sensitivities. We find
+that the method which requires ground-truth solutions
+is not as sensitive as the method which requires
+nothing but the raw images and the corresponding
+deformation fields, i.e. the registration.
+The first among the methods to be described relies on the
+existence of ground-truth data such as boundaries of
+image structures, produced by manual markup of
+distinguishable points. Having registered an image set,
+the method can measure overlap between structures that
+have been annotated, thereby implying how good a
+registration was.
+Our latter method is able to assess registration
+without ground truth of any form. The approach involves
+automatic construction of appearance models from the
+registered data, subsequently evaluating, using model
+syntheses, the quality of that model. Quality of the
+registration is tightly-related to the quality of its
+resulting model and the two tasks, namely model
+construction and image registration, are innately the
+same one. Both involve the identification of corresponding
+points, also known as landmarks in the context of
+model-building. Expressed differently, a registration
+produces a dense set of correspondences and models
+of appearance require the images and these
+correspondences in order to be built.
+To put the validity of both methods to the test, we
+assembled a set of 2-D 38 MR images of the brain. Each
+of these images was carefully annotated to identify
+different compartments within the brain. These
+anatomical compartments can be perceived as simplified
+labels that faithfully define the structure of the brain. Our
+first method of assessment uses the Tanimoto overlap
+measure to calculate the degree to which labels across
+the image set concur. In that respect, it exploits
+ground truth, which has been identified by an expert,
+to reason about registration quality.
+The second method takes an entirely different approach.
+It feeds on the results of a registration algorithm,
+where correspondences have been highlighted, and builds
+an appearance model given the images and their
+correspondences. From that model, many synthetic brain
+images are derived. Vectorisation of these images
+allows us to embed them in a
+high-dimensional space. We can then compare the spatial
+cloud that these synthetic images form with the cloud
+that gets composed from the original image set -- the set
+from which the model has been build. Computing the
+overlap between these clouds gives insight into the
+quality of the registration. Simply put, it is a model
+fit evaluation paradigm. The better the registration,
+the greater the overlap between those clouds will be.
+To compute overlap between two clouds of data, we have
+devised measures that we refer to as Specificity and
+Generalisablity. The former tells how well the model
+fits its seminal data, whereas the latter tells how
+well the data fits its derived model. It is a
+reciprocal relationship that 'locks' data to its
+model and vice versa. We calculate Specificity and
+Generalisablity by measuring distances in space. As we
+seek a distance measure that is tolerant to slight anatomical differences,
+we use the shuffle distance, not neglecting to compare
+it against Euclidean distance. The shuffle distance compares each point in one image with a larger corresponding region in another image. It adheres to the best fit, i.e. matches the two points whose distance is minimal.
+Our assessment framework, by which we test both
+methods, uses non-rigid registration, whereby many
+degrees of freedom are involved in image
+transformations. To systematically generate data over
+which our hypotheses can be tested, we perturb the
+brain data using clamped-plate splines, which are diffeomorphic. In the brain
+data which we use, correspondences among images are said to be
+perfect so they can only ever be degraded. We wish
+to show that as the degree of perturbation increases,
+so do the measures of our registration assessment methods.
+In an extensive batch of experiments we perturbed the
+datasets at progressively increasing levels, which led
+to well-understood mis-registration of the data. We
+repeated these experiments 10 times to demonstrate that
+both approaches to assessment are consistent and
+results are unbiased. Having investigated and plotted the
+measures of overlap for each perturbation extent, we
+see a rather linear decrease in the amount of overlap
+(Figure X). This means that, when ground-truth-based
+registration is eroded, the overlap-based measure is
+able to detect that and the response is very
+well-behaved, thus meaningful and reliable.
+<Graphics file: ./Graphics/1.eps>
+    <Graphics file: ./Graphics/2.eps>
+          Figures X&Y. The measured quality of registration as perceived
+          by the overlap-based evaluation (left) and the model-based
+          evaluation (right).
+We then undertake another assessment task, this time
+exploiting the method which does not make use of ground truth.
+We notice a very similar behaviour (Figure Y), which is
+evidence that the latter is a powerful and reliable
+method of assessing the degree of mis-registration -- or
+conversely -- the quality of registration.
+As a last step, we embark on the task of comparing the
+two algorithms, identifying sensitivity as the factor
+which is most important. Sensitivity reflects on our
+ability to confidently tell apart a good registration
+from a worse one. The slighter the difference which can
+be detected reliably, the more sensitive the method.
+To calculate sensitivity, we compute the amount of
+change in terms of mean pixel displacement --
+deviation from the correct solution, that is. We then
+look at differences in our assessor's value, be it
+overlap, or Specificity, or Generalisation. We also must
+stress the need to take account of the errors bars as
+there is both an inter-instantiation error and a
+measure-specific error; the two must be composed
+carefully. The derivation of sensitivity can be
+expressed as follows:
+placeholder
+where X is... (TODO)
+<Graphics file: ./Graphics/3.eps>
+          Figure Z. The sensitivity of registration assessment methods.
+          note to self: exclude Gen.? Combined? Plots?
+          -.
+Figure Z suggests that, for roughly any selection of
+shuffle distance neighbourhood, the method which does
+not require ground truth is more sensitive than the
+method which depends on it. When the trends of these
+curves are inspected closely, it can be observed that
+they are approximately parallel, which implies that the two
+methods are very closely correlated.
+In summary, we have shown two valid methods for
+assessing non-rigid registration. The methods are
+correlated in practice, but the principles they build
+upon are quite separable. Their pre-requisites -- if
+any -- likewise. Registration can be evaluated with or
+without ground-truth annotation and the behaviour our measures is consistent across distinct datasets, is
+well-behaved, and is sensitive. Both methods have been
+successfully applied to assessment of non-rigid
+registration algorithms and both led to the
+expected conclusions. That aspect of the work,
+nonetheless, is beyond the scope of this paper.

Schestowitz Wiki

User Tools

Site Tools

Differences

Page Tools