Non-rigid registration is widely used in a variety of image analysis ap- plications. We compare two methods for assessing the performance of group- wise non-rigid registration algorithms. One approach, which has been de- scribed previously, utilizes a measure of overlap between data labels. Our new approach exploits the fact that, given a set of non-rigidly registered images, a generative statistical appearance model can be constructed. We observe that the quality of the model depends on the quality of the regis- tration, and can be evaluated by comparing synthetic images sampled from the model with the original image set. We derive indices of model speci- ficity and generalisation that can be used to assess model/registration quality. We show that both approaches detect the loss of registration as a set of correctly registered MR images of the brain is progressively per- turbed. We compare the sensitivities of the different methods and show that, as well as requiring no ground truth, our new specificity measure provides the most sensitive approach to detecting misregistration.