More discussion on Functional Capacity Evaluations

Some years ago I wrote about Functional Capacity Evaluations and the lack of evidence supporting their use, particularly their use as predictive tools for establishing work “fitness”. 

I’ve received some sharp criticism in the past for my stance on FCE, and I continue to look for evidence that FCE are valid and reliable.  I haven’t found anything recently, and I’m still concerned that FCE are used inappropriately for people with chronic pain.  There is nothing like the demand characteristics of a testing situation for a person with chronic pain to either push themselves – and have a flare-up for some days afterwards but get a “good” report suggesting they have put in “full effort” and that they can manage a full time job of a certain MET demand; or to pace themselves, using pain management strategies – and avoid a flare-up but receive a “bad” report, suggesting they haven’t put in “full effort” and despite this, they can manage a full time job of a certain MET demand.

I can’t understand why FCE providers don’t work alongside people with chronic pain and their vocational counsellors, to help them define their sustainable level of physical demand, and systematically help them to gain confidence that they can find suitable work without exacerbating their pain. 

For the record, I’m not against establishing functional abilities.  And I think having a systematic approach to doing this.  I am against FCE’s being touted as a way to reliably determine work capacity, or to being able to determine “effort” through “consistency”. There simply isn’t published evidence to support these claims.  If someone can provide me with evidence, I’ll gladly change my mind because if there is one thing scientific training teaches, it’s that it’s OK to change your mind – if there’s evidence to do so.

This doesn’t mean that FCE’s would then be fine and dandy – because, as I’ve seen far too many times – HOW they’re used goes often well beyond what any FCE can possibly do, and very often is used as a blunt instrument when some good motivational interviewing and careful vocational counselling would achieve the same.

Here’s my original post, and some very good references are at the end of it.

Questionnaire Validation: A Brief Guide for Readers of the Research Literature

Questionnaire Validation: A Brief Guide for Readers of the Research Literature. Mark Jensen.

I thought I’d give you a quick overview of a brief but very useful (and readable) article that explains how readers of research literature in pain can evaluate the literature.  It provides a summary of the issues surrounding the evaluation of pain measures by reviewing the essential concepts of validity and reliability, and how these are usually evaluated in pain assessment research.

It also has a glossary of terms used in evaluating psychometric properties of pain measures that is very helpful as a brief dictionary, and it covers just what needs to be included in any paper about a new pain assessment:

(1) the rationale for the measure (what will this measure do that previous measures cannot?);

(2) validity data that specifically addresses the uses for which the measure is being proposed; and

(3) initial reliability data.

Any psychology student (and many other health science students) will very quickly realise that there are thousands of pain measures already available, yet each year there are many more that are published.  Why oh why would we need any more?  The answer is not just that each researcher keenly wants to be ‘known’ for his or her new questionnaire – but that ‘our understanding of pain, and the effects of pain treatments, is so dependent on our ability to measure pain, any improvement in pain assessment should ultimately result in an improvement in our understanding and treatment of pain.’

Jensen writes that there are two main reasons for developing a new pain measure:

(1) that the new measure assesses a dimension or component of pain not assessed by existing measures, and

(2) that the new measure shows clear improvements over existing measures of the same pain dimension (eg, it is shorter, it is easier to administer and score, it is a better predictor of important outcomes, it is more sensitive to change).

Although this is not a new article – it was published in 2003, it summarises all the relevant psychometric areas in such a succinct and reader-friendly way that I think it should be compulsory reading for anyone learning about pain assessment (and certainly anyone in the midst of dreaming up a new measure!).

Jensen, M. (2003). Questionnaire Validation: A Brief Guide for Readers of the Research Literature. The Clinical Journal of Pain, 19:345–352.

‘its taken over my life’…

Each time I spend listening to someone who is really finding it hard to cope with his or her pain, I hear the unspoken cry that pain has taken over everything. It can be heartbreaking to hear someone talk about their troubled sleep, poor concentration, difficult relationships, losing their job and ending up feeling out of control and at the mercy of the grim slave-driver we call chronic pain. The impact of pain can be all-pervasive, and it can be hard to work out what the key problems are.

To help break the areas down a little, I’ve been quite arbitrary really. I’m going to explore functional limitations in terms of the following:
1. Movement changes such as mobility (walking), manual handling, personal activities of daily living
2. Disability – participation in usual activities and roles such as grocery shopping, household management, parenting, relationships/intimacy/communication
3. Sleep – because it is such a common problem in pain
4. Work disability – mainly because this is such a complex area
5. Quality of life measures

The two following areas are ones I’ll discuss in a day or so – they’re associated with disability because they mediate the pain experience and disability…as I mentioned yesterday, they’re the ‘suffering’ component of the Loeser ‘rings’ model.
6. Affective impact – things like anxiety, fear, mood, anger that are influenced by thoughts and beliefs about pain and directly influence behaviour
7. Beliefs and attitudes– these mediate behaviour often through mood, but can directly influence behaviour also (especially treatment seeking)

There are so many other areas that could be included as well, but these are some that I think are important.
Before I discuss specific instruments, I want to spend yet more time looking at who and how – and the factors that may influence the usefulness of any assessment measure.

Who should assess these areas? Well, it’s not perhaps who ‘should’ but how can these areas be assessed in a clinical setting.

Most clinicians working in pain management (doctors, psychologists, occupational therapists, physiotherapists, nurses, social workers – have I missed anyone?) will want to know about these areas of disability but will interpret findings in slightly different ways, and perhaps assess by focusing on different aspects of these areas.

As I pointed out yesterday, there are many confounding factors when we start to look at pain assessment, and these need to be borne in mind throughout the assessment process.

How can the functional impact of pain be assessed?

  • Self report, eg interview, questionnaires – and the limitations of these approaches are reliability, validity threats as well as ‘motivation’ or expectancies
  • Observation, either in a ‘natural’ setting such as home or work, or a clinical setting
  • Functional testing, again either in a ‘natural’ setting such as home or work, or a clinical setting – and functional testing can include naturalistic procedures such as the AMPS assessment, formal and structured testing such as the 6 minute walk test, the sock test, or even certain functional capacity tests; or it may be clinical testing such as manual muscle testing or range of movement, or even Waddell’s signs

All self report measures, whether they’re verbal questions, interview or pen and paper measures are subject to the problem that they are simply the individual’s own perception of the degree of interference they attribute to pain. The accuracy of this perception can be called into question especially if the person hasn’t carried out a particular activity recently, but in the end, it is the person’s perception of their abilities.

All measures need to be evaluated in terms of their reliability and validity – how much can we depend on this measure to (1) assess current status (2) contribute to a useful diagnosis (or formulation) (3) provide a basis for treatment decisions (4) evaluate or measure function over time (Dworkin & Sherman, 2001).

Reliability refers to how consistently a measure performs over time, person, clinician.

Validity refers to how well a test actually measures what it says its measuring.  The best way to determine validity is if there is a ‘gold standard’ against which the test can be compared – of course in pain and functional performance, this is not easy, because there is no gold standard!  The closest we can come to is a comparison between, for example, a self report in a clinic on a pen and paper test compared with a naturalistic observation in a person’s home or workplace – when they’re not being observed.

Probably one of the best chapters discussing these aspects of pain assessment is Chapter 32, written by Dworkin & Sherman chapter in the 2nd Edition of the Handbook of Pain Assessment 2001 (DC Turk & R Melzack, Eds), The Guilford Press.

Importantly for clinicians working in New Zealand, or outside of North America and the UK, the reference group against which the client’s performance is being compared, needs to be somewhat similar to the population the client comes from.  Unfortunately, there are very few assessment instruments that have normative data derived from a New Zealand or Australasian population – and we simply don’t know whether the people seeking treatment in New Zealand are the same on many dimensions as those in North America.

I’m also interested in how well any instruments, whether pen and paper, observation or performance-based assessment translate into the everyday context of the person.  This is a critical aspect of pain assessment validity that hasn’t really been examined well.  For example, the predictive validity (which is what I’m talking about) of functional capacity tests such as Isernhagen, Blankenship or other systems have never been satisfactorily established, despite the extensive reliance on these tests by insurers.

Observation is almost always included in disability assessment. The main problems with observation are:
– there are relatively few formal observation assessments available for routine clinical use
– they do take time to carry out
– maintaining inter-rater reliability over time can be difficult (while people may initially maintain a high level of integrity with the original assessment process, it’s common to ‘drift’ over time, and ‘recalibration’ is rarely carried out)

While it’s tempting to think that observation, and even functional testing, is more ‘objective’ than self report, it’s also important to consider that these are tests of what a person will do rather than what a person can do (performance rather than capacity). As a result, these tests can’t be considered infallible or completely reliable indicators of actual performance in another setting or over a different time period.

Influences on observation or performance-based assessments include:
– the person’s beliefs about the purpose of the test
– the person’s beliefs about his or her pain (for example, the meaning of it such as hurt = harm, and whether they believe they can cope with fluctuations of intensity)
– the time of day, previous activities
– past experience of the testing process

And of course, all the usual validity and reliability issues.
More on this tomorrow, in the meantime you really can’t go far past the 2nd Edition of the Handbook of Pain Assessment 2001 (DC Turk & R Melzack, Eds), The Guilford Press.

Here’s a review of the book when the 2nd Edition was published. And it’s still relevant.

Colour therapy…
With only a small proportion of the people experiencing acute low back pain becoming chronically disabled by their pain, a holy grail of sorts has been to quickly and effectively identify those who need additional help and those who don’t.

The ‘Psychosocial Yellow Flags’ initially developed in New Zealand by Kendall, Linton & Main (1999) provides a useful mnemonic for the factors that have been established as predicting longterm disability – but requires clinicians to be aware of the flags, and record them. Because the ‘Yellow Flags’ are not ‘objective’ and can’t be summed or scored, there is no way to determine a cut-off point to identify those people only just at risk – and the tendency is to under-estimate those who need more assistance, while many clinicians report that they don’t feel comfortable or confident to assess ‘Yellow Flags’ in a primary health setting.

For those who can’t remember, the risk factors known to be associated with ongoing disability in people with acute low back pain are:

A: Attitudes and beliefs – e.g. catastrophising, a passive approach to rehabilitation, ‘Doctor fix me’, hurt = harm

B: Behaviours – e.g. resting for extended periods, poor sleep, using aids and appliances such as crutches or braces, inappropriate use of medication, self medication

C: Compensation – e.g. difficulty obtaining cover, inadequate or ineffective case management, multiple claims in the past, poor knowledge of what is available for assistance

D: Diagnosis/Doctor or treatment provider effects – unexplained technical language that is misunderstood, multiple diagnoses, multiple investigations, multiple ineffective treatments, assuring a ‘techno-fix’ is available, recommending changing jobs or stopping jobs

E: Emotions – anger, depression, sense of helplessness or feeling out of control, numbed emotions

F: Family and friends – unintentionally reinforcing pain behaviour, unsupportive of returning to work, punishing responses, or being socially isolated

W: Work – an employer who is unsupportive, history of frequent job changes or limited employment history, heavy manual work, monotonous work, high responsibility with limited control, shiftwork, working alone or while isolated, disliking the job

As I mentioned, these factors are well-known, and relatively easily recognised. Many people say to me that they have an intuitive ‘feel’ for those who will have trouble recovering from ALBP – but feel that if they ask about these factors, they risk ”opening Pandora’s box’, or being unable to extricate themselves from a complex or emotionally charged situation. Many people don’t feel adequately skilled in managing the issues involved, and would prefer to either let well enough alone, or quickly refer to someone else (Crawford, Ryan & Shipton, 2007).

Well, I don’t agree with any of those options. Although sometimes people will have ‘saved up’ a lot of their concerns and want to offload with a lot of emotion, for many people it’s a simple case of exploring what their concerns are and problem-solving around the practical issues. A referral to a psychologist or counsellor isn’t always necessary, and can for some people escalate their distress and disability.

What skills can you use to identify and manage ‘Yellow Flags’?

Open-ended questions like ‘How do you feel about your recovery so far?’, ‘Are there any things that concern you about your recovery?’, ‘What do you think is going on in your back?’, ‘What do you think this [diagnosis] means for you?’

Reflective listening demonstrates two things: (1) that you are listening and (2) that you want to understand. It should be used whenever someone begins to display emotional responses. Reflective listening can be simple ‘So from what you’ve said, I think you mean….’, or more complex ‘It seems that you think your boss wants you to go back to full duties and you’re not sure you can. I wonder if you’re feeling really anxious?’ When in doubt, reflect!

Action and responsibility – Then it can be really useful to pose this question: ‘So where does this leave you?’ or ‘What do you think you need to have happen next?’

For many people the step of demonstrating your acceptance of their point of view and understanding their distress is a good start. And many ‘Yellow Flags’ can be simply influenced by the person themselves – just being given permission to think of a solution that they’re ready to do, or being asked whether they want to hear of other options allows the person to feel more in control.

And for slightly more complex situations – such as financial strain from time off work, or difficulty within a relationship because of changed roles – these can be helped by budgetting advice, bringing the partner in to the clinic to be a part of the rehabilitation, or asking the person to think of community-based self-help organisations. Intense psychological therapy isn’t always necessary, and if misguided or not consistent with other messages about engaging in activity despite pain, can impede recovery.

And if you’re REALLY pressed for time, but still want to ‘pick winners and losers’ – a study by Westman, Linton, Ohrvik, Wahle´& Leppert (2007) finds support for the use of the ‘Orebro Musculoskeletal Pain Screening Questionnaire’, which is a 25 item questionnaire covering five groups (function, pain, psychological factors, fear avoidance, and miscellaneous which includes things like sick leave, age, gender, nationality, monotonous or heavy work and job satisfaction. It’s been used extensively in New Zealand in a compensation setting since 1999, as a screening tool to identify those who may be at increased risk of ongoing disability. This study reviews its use in sub-acute pain, with a three-year follow-up to identify the predictive validity of the instrument for sick leave.

The results are very strong – psychosocial factors as measured by OMPSQ were related to work disability and perceived health even 3 years after treatment in primary care. The screening questionnaire had discriminative power even for patients with non-acute or recurrent pain problems. The OMPSQ had better predictive power than any of the questionnaires included in the study, which included the Job Strain, the Coping Strategies Questionnaire (CSQ),
the Pain Catastrophizing Scale (PCS) and the Tampa Scale for Kinesiophobia (TSK). This study shows that among the factors, pain and function are the factors most strongly related to sick leave 3 years later.

Interestingly for me, the study demonstrated that function with focus on daily living, sleep capacity and pain experience had the most powerful predictive value concerning sick leave at 3 years. While earlier studies have shown that emotional and cognitive variables such as distress and fear avoidance beliefs have been strong predictors for 6–12-month outcomes, the best predictor in this study is having problems functioning.

The authors suggest that this probably reflects the length of the follow-up and suggests that different variables may be predictive at various stages in the process of chronification. Even though it is recognized that psychological variables are
influential factors, little is known about how and when these variables interact in the process toward disability.
Furthermore, psychological variables might operate differently for different people and at different time points.

So, this particular instrument, which has been widely used at least within New Zealand for many years, is readily available and gives clinicians and others very useful guidance on who might benefit the most from high intensity therapeutic input early.

WESTMAN, A. (2007). Do psychosocial factors predict disability and health at a 3-year follow-up for patients with non-acute musculoskeletal pain A validation of the Orebro Musculoskeletal Pain Screening Questionnaire. European Journal of Pain DOI: 10.1016/j.ejpain.2007.10.007

Crawford C, Ryan K, Shipton E (2007) Exploring general practitioner identification and management of psychosocial Yellow Flags in acute low back pain, New Zealand medical journal, 120:1254, pp U2536



There are some very weird and crazy measures out there in pain assessment land… some of them take a little stretch of the imagination to work out how they were selected and what they’re meant to mean in the real world.

Functional measures are especially challenging – given that they are about what a person will do on a given day in a given setting, they are inherently prone to performance variation (test-retest reliability) and can’t really be held up as gold standards in terms of objectivity. Nevertheless, most pain management programmes are asked to provide measures of performance, and over the years I’ve seen quite a few different ones. For example, the ‘how long can you stand on one leg’ timed measure…the ‘sock test’ measure…the ‘pick up a crate from the floor and put it on a table’ measure…the ‘timed 50 m walk test’…the ‘step up test’… – and I could go on.

Some of these tests have normative data against age and gender, some even have standardised instructions (and some of these instructions are even followed!), and some even have predictive validity – but all measures beg the question – ‘why?’

I’m not being deliberately contentious here, not really… I think we as clinicians should always ask ‘why’ of ourselves and what we do, and reflect on what we do in light of new evidence over time. At the same time I know that each of us will come up with slightly different answers to the question ‘why’ depending on our professional background, experience, the purpose of the measure, and even our knowledge of scientific methodology. So, given that I’m in a thinking sort of mood, I thought I’d spend a moment or two noting down some of the thoughts I have about measures of function in a pain management setting.

  1. The first thing I’d note is that functional performance is at least in part, a measure of pain behaviour. That is, it’s about what a person is prepared to do, upon request, in a specific setting, at a certain time of day, for a certain purpose. And each person who is asked to carry out a functional task will bring a slightly different context to the functional performance task. For example, one person may want to demonstrate that their pain is ‘really bad’, another may want to ‘fake good’ because their job is on the line, another may be fearful of increased pain or harm and self-limit, while another may be keen to show ‘this new therapist just what it’s like for me with pain’. As a result, there will be variations in performance depending on the instructions given, the beliefs of the person about their pain – and about the way the assessment results will be used, and even on the gender, age and other characteristics of the therapist conducting the testing. And this is normal, and extremely difficult to control.
  2. The second is that the purpose of the functional performance testing must be clear to the therapist and the participant. Let’s look at the purpose of the test for the therapist – is it to act as a baseline before any intervention is undertaken? is it to be used diagnostically? (ie to help assess the performance style or approach to activity that the client has) is it to establish whether the participant meets certain performance criteria? (eg able to sustain manual handling safely in order to carry out a work task) is it to help the participant learn something about him or herself? (eg that this movement is safe, that this is the baseline and they are expected to improve over time etc).  And for the participant? Is this test to demonstrate that they are ‘faking’? (or do they think that’s what it’s about?) Is it to help them test out for themselves whether they are safe? Is it a baseline measure, something to improve on?  Is it something they’ve done before and know how to do, or is it something they’ve not done since before they hurt themselves? You see, I can go on!!
  3. Then the functional measures must be relevant to the purpose of the testing. It’s no use measuring ‘timed get up and go’, for example, if the purpose of the assessment is to determine whether this person with back pain can manage his or her job as a dock worker. Likewise, if it’s to help the person learn about his or her ability to approach a feared task, then it’s not helpful to have a standardised set of measures (unless this is a set that is taken pre-treatment and again at post-treatment). This means the selection of the measures should at least include consideration of predictive validity for the purpose of the test. For example, while a ‘timed get up and go’ may be predictive of falls risk in an elderly population, it may be an inappropriate measure in a young person who is being assessed for hand pain. It’s probably more useful to have a slightly inaccurate measure that measures something relevant than a highly accurate measure that measures something irrelevant. For example, we may know the normative data for (plucking something out of the air here…) ‘standing on one leg’, but unless this predicts something useful in the ‘real world’, then it may be a waste of time.
  4. Once we’ve determined a useful, hopefully predictive measure, then it’s critical that the assessment process is carried out in a standard way. That means the whole process, not just the task itself. What do I mean? Well, because there are multiple influences on performance, such as time of day, presence or absence of other people, and even the way the test is measured (eg If it’s timed with a stop-watch, when is the button pushed to start? When is it pushed to stop? Is this documented so everyone carries it out exactly the same way?) There is a phenomenon known as assessment drift (well, that’s what I call it!) where the person carrying out the assessment drifts from the original measurement criteria over time. This happens for all of us as we get more experienced, and as we forget the original instructions. Essentially we are a bit like a set of scales – we need to be calibrated just as much as any other piece of equipment. So the entire assessment needs to be documented right down to the words used, and the exact criteria used for each judgement.
  5. And finally, probably for me a plea from the heart – that the measures are recorded, analysed, repeated appropriately, and returned to the participant, along with the interpretation of the findings. This means the person being assessed gains from the process, not just the clinician, or the funder or requester of the assessment.

So over the Easter break (have a good one!), take a moment or two to think about the validity and reliability of the functional assessments you take. Know the confounds that may influence the individuals’ performance and try to take this into account when interpreting the findings. Consider why you are using these specific measures, and when you were last ‘calibrated’. Make a resolution: ask yourself ‘what will this measure mean in the real world?’ And if, as I suspect most of us know, your assessments don’t reflect the reality of carrying the groceries in from the boot of the car, or pushing a supermarket trolley around a busy supermarket, or squeezing the pegs above the head to hang out the washing – well, there might be a research project in it!!