Inter-Rater Agreement Study

Many factors must be taken into account when choosing the most appropriate statistical test. such as the metric in which a variable was coded (for example. B nominal e. ordinal, interval or report), study design (for example. B if all subjects are evaluated by a subset of subjects by multiple coders) and the intended purpose of estimating error (for example. B the reliability of each coder`s evaluations relative to the reliability of the average ratings of several coders). Researchers should ensure that a statistic is relevant to their study design and that other options are more appropriate for their study. The corresponding statistics for different study projects will be examined in more detail in the following calculation sections. The Transition after Childhood Cancer (TaCC) project aims to assess the transition of paediatric care to adults of childhood cancer survivors in Switzerland, by collecting MR data from nine clinics and three language regions. Since no previous study assessed the transition using a systematic graphic review for data collection, we had to develop and pilot a form of abstraction based on the literature and available project objectives. For these reasons, we felt it was important to assess the reliability of the data collected in a) by examining the reliability of two advisors at two times; b) the learning effects that can be learned over time by comparing each advisor to a two-date gold standard; and c) reliability between advisors. The kappa is a form of correlation coefficient.

Correlation coefficients cannot be interpreted directly, but a square correlation coefficient, called the Determination Coefficient (COD), is directly interpretable. The COD is explained as the amount of variation in the dependent variable that can be explained by the independent variable. Whereas the true COD is calculated only on the pearson r, an estimate of the variance taken into account for each correlation statistic can be obtained by squaring the correlated value. The squaring of the Kappa is conceptually translated into accuracy (i.e. the reversal of the error) in the data because of the congruence between the data collectors. Figure 2 shows an estimate of the amount of correct and false data in research data sets based on the degree of congruence, as measured by the percentage agreement or kappa. The concept of “advisor agreement” is quite simple and, for many years, the reliability of Interraters has been measured as a percentage of match among data collectors. To obtain the measurement agreement, the statistician established a matrix in which the columns represent the different advisors and the lines of the variables for which the raters had collected data (Table 1). The cells in the matrix contained the values captured by the data collectors for each variable. An example of this procedure can be made in Table 1.

In this example, there are two advisors (Mark and Susan). They each record their values for variables 1 to 10. To obtain a percentage of approval, the researcher subtracted Susan`s scores from Marks Scores and counted the resulting number of zeroes. Dividing the number of zeros by the number of variables provides a measure of the agreement between advisors. In Table 1, the agreement is 80%. This means that 20% of the data collected in the study is incorrect, because only one of the advisors can be correct if there is disagreement. This statistic is directly interpreted as a percentage of correct data.