But does it measure what I want it to?


While there are thousands of assessment tools available for various aspects of pain and function, one of the most important things to consider is content validity – does the assessment measure what I want it to measure? Reliability is all very well, and ensures accuracy, but if the test doesn’t measure anything useful or important, then it’s not going to be very helpful!

This article, published in 2006, is one of the few that seeks to conduct a qualitative evaluation of the content of several questionnaires but base it on a reasonably sound theoretical framework with relatively solid methodology to ensure other researchers can conduct the same process. So far, however, I haven’t found much to compare it with – but it’s a helpful study in terms of helping clinicians define exactly what they want to include in an assessment battery, even if it concludes that there are gaps in the existing repertoire!

Sigl, Cieza, Brockow, Chatterji, Kostanjsek, and Stucki set about comparing three very common low back pain measures using the the International Classification of Functioning, Disability and Health (ICF) approved by the World Health Assembly in May 2001. Their intention was twofold: to review whether three common instruments cover the areas in the ICF, and whether the ICF can function as a somewhat atheoretical framework for comparing different instruments.

Just to review the ICF, the ICF is a multipurpose classification belonging to the WHO family of international health classifications. Part 1 covers functioning and disability and includes the components ‘‘body functions’’ (b) and ‘‘structure’’(s) and ‘‘activities and participation’’ (d). Part 2 covers contextual factors and includes the components ‘‘environmental factors’’ (e) and ‘‘personal factors.’’

To quote directly from the WHO, ‘The ICF puts the notions of ‘health’ and ‘disability’ in a new light. It acknowledges that every human being can experience a decrement in health and therheby experience some degree of disability. Disability is not something that only happens to a minority of humanity. The ICF thus ‘mainstreams’ the experience of disability and recognises it as a universal human experience. [my emphasis – BFT] By shifting the focus from cause to impact it places all health conditions on an equal footing allowing them to be compared using a common metric – the ruler of health and disability. Furthermore ICF takes into account the social aspects of disability and does not see disability only as ‘medical’ or ‘biological’ dysfunction. By including Contextual Factors, in which environmental factors are listed, ICF allows to record the impact of the environment on the person’s functioning.

I quite like the ideal of ‘everyone’ having both limitations and abilities, and especially the idea that limitations are contextual. I’m not sure that this model has yet had an impact on the systems in which we usually work, however! I use the idea that everyone has abilities and everyone has limitations when working with people experiencing chronic pain – it has the effect of encouraging people to focus on their abilities rather than defining themselves by their limitations. The flow from conceptual ideals to measurement and implementation of these ideas takes time, and because it’s a nonmedical concept, unlikely to have a significant impact on health delivery systems for many years yet.

Back to the article…
The methodology is well-described in the article – three clinicians already trained in the ICF were used. Two reviewed the content, and linked the items in the questionnaire to a content area in the ICF, applying 10 different linking rules to the items, and then compared the identified concepts and selected ICF categories to establish a Kappa statistic. If disagreement existed occurred, a third person trained in the ICF and in the linking rules was consulted, and independently determined how the item should be classified.

Clear guidelines on how linkages were to be developed, although these are not provided in the article itself – several examples, however, demonstrate how different items were allocated categories, for example, ‘If an item of a measure contains more than one concept, each concept has to be linked separately. For example, in the item of the ODI ‘‘Pain doesn’t prevent me from walking any distance,’’ the concepts ‘‘pain’’ and ‘‘walking’’ were linked to ‘‘b28013 pain in back’’ and ‘‘d450 walking,’’ respectively. The response options of an item are linked to the ICF if they refer to concepts other than those contained in the corresponding item. For example, in the item 14 ‘‘sleeping’’ of the NASS, in which two of the response categories of the item are ‘‘I sleep well’’ and ‘‘pain interrupts my sleep,’’ the concept ‘‘sleeping’’ was linked to the ICF category ‘‘b134 sleep functions,’’ the concept ‘‘sleep well’’ to ‘‘b1343 quality of sleep,’’ and the concept ‘‘interrupts my sleep’’ to ‘‘b1342 maintenance of sleep.’’ If an item/concept is not contained in the ICF classification, it is labeled ‘‘nc’’ (not covered by the ICF). ‘‘nc’’ does not differentiate between concepts relating to function not covered by the ICF, concepts relating to personal factors for which no categories currently exist, and other concepts relating to aspects like time and space.’

Although this sounds tedious to read here, I’m certain that the process ensures precision and enables the majority of items to be appropriately categorised.

Well the first thing to establish is whether the two (and occasionally three) clinicians agreed on the categories in which they allocated items. The Kappa statistics, with adjustment made for the skewdness of the sample (from high Kappa values and small sample size) by using a bootstrapping technique of sampling from percentiles based on the observed data, was used to determine agreement. The results showed that the range of agreement was from 0.67 at the broadest level of category through to 1.0 (or total agreement) at the fourth level. To illustrate this, an example selected from the component ‘‘body functions’’ is presented below:
b2: Sensory functions and pain (first level) – at this level there was a small level of disagreement
b280: Sensation of pain (second level)
b2801: Pain in body part (third level)
b28013: Pain in back (fourth level)
b28018: Pain in body part, other specified (fourth level) – at this level, there was total agreement

This demonstrates very good inter-rater reliability, although it should be appreciated that there were only three individuals involved. A larger number of raters would have provided a much better determination of the accuracy of this approach to content validation – but would also increase the time required to do it!

Now, for the real work of this study: what areas were covered by the three assessment tools, and which areas were not well-covered?

  • The representation of body functions is similar in all three measures incorporating pain and sleep.
  • All three questionnaires contain a similar number of concepts representing the ICF component “activities and participation.’’
  • None of the selected instruments covered aspects of remunerative work (d850). ‘‘Domestic life, other specified’’ (d698), which had to be linked for carrying out household tasks (‘‘doing any of the jobs that I usually do around the house,’’ ‘‘heavy jobs around the house’’), is applicable only for the RMQ.

The two research questions were: whether three common instruments cover the areas in the ICF, and whether the framework was a useful way to determine content.

  1. It was found that yes, all three instruments cover aspects of the ICF – to varying extents. Only one looked at the psychological impact of pain, and none looked at factors such as fatigue that are well-known to be associated with poorer function. Interestingly, none of the measures looked at ‘context’ – for example, ‘attitudes of immediate family members or friends or society are important prognostic determinants for life satisfaction, work performance, and disability in patients with back pain. This also holds true for remunerative work, which is not covered by any of the measures.’
  2. The second question was whether the ICF could be helpful as a framework – one use of this type of comparison work is to create an item bank. Item banks consist of large sets of questions representing various levels of a latent variable that can be used to develop brief, efficient scales for measuring that latent variable. Using Rasch analysis, items the measure the variable of interest can be identified and selected to form a measurement tool that precisely assesses that specific level of function.
  3. The first finding alone is interesting – why have these very important areas of function been ignored? Does this reflect the western idea that ‘the person with the disability’ exists in isolation?

    The final comment I want to make is about the usefulness of this research from a clinical perspective. Key areas that are well-known to be important both to people with pain, and to funders of health care and compensation are not included in three commonly-used assessment tools. Perhaps if these agencies could see their way to fund this type of comparison, it might be possible to develop supplementary measures to ensure this information is available for use in clinical situations.

    Sigl, T., Cieza, A., Brockow, T., Chatterji, S., Kostanjsek, N., Stucki, G. (2006). Content Comparison of Low Back Pain-Specific Measures Based on the International Classification of Functioning, Disability and Health (ICF). Clinical Journal of Pain, 22(2), 147-153.

    World Health Organization. International Classification of Functioning,
    Disability and Health: ICF. Geneva: WHO, 2001.
    Schultz IZ, Crook JM, Berkowitz J, et al. Biopsychosocial multivariate
    predictive model of occupational low back disability. Spine. 2002;27:

    Takeyachi Y, Konno S, Otani K, et al. Correlation of low back pain with
    functional status, general health perception, social participation, subjective
    happiness, patient satisfaction. Spine. 2003;28:1461–1466.


  1. Have you considered the Rasch model? It has been used extensively in Occupational Therapy in areas such as activities of daily living and fatigue, for example.

    Once developed, questions and patients/people are on the same “ruler” of interest, and diagnosis, intervention and communication is far easier than traditional approaches (e.g. classical test theory) or other modern psychometric approaches that are more complex and more controversial, in my opinion like Item Response Theory (IRT).

  2. Hi there!
    Thanks for suggesting this – I haven’t posted on it yet, but mean to shortly. There are some detractors from this model, as well as proponents, and it’s a great area to debate.
    Thanks for stopping by and taking the time to comment – visitors are always welcome!

  3. Yes, I agree there are controversies between Rasch and IRT proponents. I favor Rasch for practical reasons:

    a) Smaller sample sizes required
    b) Raw score sufficiency, makes it easy to score for non-technical people
    c) Similar to how people already think about measurement (e.g. like a thermometer)


  4. Hi Matt
    How would you feel about writing a brief summary of Rasch for the ‘uninitiated’ who would like to know about it?!
    Give me an email if you’re interested.

  5. I too did my masters research on the pain measures recommended by the IMMPACT group for assessing pain in clinical trials to the ICF and there was good evidence that the ICF is a good framework and the questionnaires were mapped to the ICF by 15 health professionals. There were a lot of omissions shown. I like the ICF as a framework for measuring chronic pain and use a questionnaire based on the ICF

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.