It is important to note that the most expensive and invasive measurement tool might not necessarily be associated with the least test–retest measurement error. Moreover, all measurement methods that are employed in order to measure some aspect of human physiology have some degree of test–retest error attributable to natural biological variation. For example, use of the so called gold standard’ Douglas bag method of gas analysis is still associated with substantial test retest error due to human variability in oxygen consumption kinetics during exercise.
Similarly, whilst it is conventional to compare a new automatic blood pressure monitor with sphygmomanometry, this latter method is, again, associated with substantial test–retest measurement error that is biological in origin. This ubiquity of biological variability governs several major considerations when analysing the performance characteristics of physiological measurement tools.
A METHOD AGREEMENT AND MEASUREMENT ERROR CHECKLIST
Ideally, the measurement study should involve at least 40 participants. If there are less than 40 participants, then scrutiny of confidence limits for the error statistics becomes even more important (see Section 4), since error estimates calculated on a small sample can be imprecise.
Try to match the characteristics of the measurement study to planned uses of the measurement tool, that is, a similar population, a similar time between repeated measurements (for investigations into test–retest error), a similar exercise protocol as well as comparable resting conditions during measurements
Calculate the 95% confidence interval (CI) for the mean difference between methods/tests (Jones et al., 1996) and compare this CI to a ‘region of equivalence’ for the two methods of measurement (Figure 5.2). This CI is not the same as the LOA statistic. Scrutiny of the lower and upper limits of this CI should not change the conclusion that has been arrived at regarding the acceptability of systematic error between methods/tests.
For example, one might observe a mean difference of 10 mmHg between blood pressure measuring devices but the 95% CI might be 4 to 24 mmHg. This means that the population mean difference between methods could be as much as 4 mmHg in one direction or as much as 24 mmHg in the other direction. Only a narrower CI, mediated mostly by a greater sample size, would allow one to make a more conclusive statement regarding systematic error. Atkinson and Nevill (2001) and Atkinson et al. (2005a) discuss further the use of CI’s and limits
Calculate confidence limits for the random error statistics. As above, scrutiny of the CI should not change the decision that has been made about acceptability of random error. For example, a physiological measurement method with a repeatability CV of 30% and associated CI of indicates poor repeatability, even if the lower limit of the CI is taken into account. Bland and Altman (1999) provide details relevant to limits of agreement. Hopkins (2000) shows how to calculate confidence limits for CV and Morrow and Jackson (1993) provide details for intraclass correlation.