Inter-tester reproducibility and inter-method agreement of two variations of the Beighton test for determining Generalised Joint Hypermobility in primary school children
Peer reviewed, Journal article
MetadataShow full item record
Background: The assessment of Generalised Joint Hypermobility (GJH) is usually based on the Beighton tests, which consist of a series of nine tests. Possible methodological shortcomings can arise, as the tests do not include detailed descriptions of performance, interpretation nor classification of GJH. The purpose of this study was, among children aged 7-8 and 10-12 years, to evaluate: 1) the inter-tester reproducibility of the tests and criteria for classification of GJH for 2 variations of the Beighton test battery (Methods A and B) with a variation in starting positions and benchmarks between methods, and 2) the inter-method agreement for the two batteries. Methods: A standardised three-phase protocol for clinical reproducibility studies was followed including a training phase, an overall agreement phase and a study phase. The number of participants in the three phases was 10, 70 and 39 respectively. For the inter-method study a total of 103 children participated. Two testers judged each test battery. A score of ≥5 was set as the cut-off level for GJH. Cohen's kappa statistics and McNemar´s test were used to test for agreement and significant differences. Results: Kappa values for GJH (≥5# were 0.64 #Method A, prevalence 0.42# and 0.59 #Method B, prevalence 0.46#, with no difference between testers in Method A #p = 0.45# and B #p = 0.29#. Prevalence of GJH in the inter-method study was 31% #A# and 35% #B# with no difference between methods #p = 0.54). Conclusions: Inter-tester reproducibility of Methods A and B was moderate to substantial, when following a standardised study protocol. Both test batteries can be used in the same children population, as there was no difference in prevalence of GJH at cut point 5, when applying method A and B. However, both methods need to be tested for their predictive validity at higher cut-off levels, e.g. ≥6 and ≥7.