Measurement in pain is complicated. Firstly it’s an experience, so inherently subjective – how do we measure “taste”, for example? Or “joy”? Secondly, there’s so much riding on its measurement: how much pain relief a person gets, whether a treatment has been successful, whether a person is thought sick enough to be excused from working, whether a person even gets treatment at all…
And even more than these, given it’s so important and we have to use surrogate ways to measure the unmeasurable, we have the language of assessment. In physiotherapy practice, what the person says is called “subjective” while the measurements the clinician takes are called “objective” – as if, by them being conducted by a clinician and by using instruments, they’re not biased or “not influenced by personal feelings or opinions in considering and representing facts”. Subjective, in this instance, is defined by Merriam Webster as “ relating to the way a person experiences things in his or her own mind. : based on feelings or opinions rather than facts.” Of course, we know that variability exists between clinicians even when carrying out seemingly “objective” tests of, for example, range of movement, muscle strength, or interpreting radiological images or even conducting a Timed Up and Go test (take a look here at a very good review of this common functional test – click)
In the latest issue of Pain, Professor Stephen Morley reflects on bias and reliability in pain ratings, reminding us that “measurement of psychological variables is an interaction between the individual, the test material, and the context in which the measure is taken” (Morley, 2016). While there are many ways formal testing can be standardised to reduce the amount of bias, it doesn’t completely remove the variability inherent in a measurement situation.
Morley was providing commentary on a study published in the same journal, a study in which participants were given training and prompts each day when they were asked to rate their pain. Actually, three groups were compared: a group without training, a group with training but no prompts, and a group with training and daily prompts (Smith, Amtmann, Askew, Gewandter et al, 2016). The hypothesis was that people given training would provide more consistent pain ratings than those who weren’t. But no, in another twist to the pain story, the results showed that during the first post-training week, participants with training were less reliable than those who simply gave a rating as usual.
Morley considers two possible explanations for this – the first relates to the whole notion of reliability. Reliability is about identifying how much of the variability is due to the test being a bit inaccurate, vs how much variability is due to the variability of the actual thing being measured, assuming that errors or variability are only random. So perhaps one problem is that pain intensity does vary a great deal from day-to-day. The second reason is related to the way people make judgements about their own pain intensity. Smith and colleagues identify two main biases (bias = systematic errors) – scale anchoring effects (that by giving people a set word or concept to “anchor” their ratings, the tendency to wander off and report pain based only on emotion or setting or memory might be reduced), and that daily variations in context might also influence pain. Smith and colleagues believed that by providing anchors between least and “worst imaginable pain”, they’d be able to guide people to reflect on these same imagined experiences each day, that these imagined experiences would be pretty stable, and that people could compare what they were actually experiencing at the time with these imagined pain intensities.
But, and it’s a big but, how do people scale and remember pain? And as Morley asks, “What aspect of the imagined pain is reimagined and used as an anchor at the point of rating?” He points out that re-experiencing the somatosensory-intensity aspect of pain is rare (though people can remember the context in which they experienced that pain, and they can give a summative evaluative assessment such as “oh it was horrible”). Smith and colleagues’ study attempted to control for contextual effects by asking people to reflect only on intensity and duration, and only on pain intensity rather than other associated experiences such as fatigue or stress. This, it must be said, is pretty darned impossible, and Morley again points out that “peak-end” phenomenon (which means that our estimate of pain intensity depends a great deal on how long we think an experience might go on, disparities between what we expect and what we actually feel, and differences between each of us) will bias self-report.
Smith et al (2016) carefully review and discuss their findings, and I strongly encourage readers to read the entire paper themselves. This is important stuff – even though this was an approach designed to help improve pain intensity measurement within treatment trials, what it tells us is that our understanding of pain intensity measurement needs more work, and that some of our assumptions about measuring our pain experience using a simple numeric rating scale might be challenged. The study used people living with chronic pain, and their experiences may be different from those with acute pain (eg post-surgical pain). The training did appear to help people correctly rank their pain in terms of least pain, average pain, and worst pain daily ratings.
What can we learn from this study? I think it’s a good reminder to us to think about our assumptions about ANY kind of measurement in pain. Including what we observe, what we do when carrying out pain assessments, and the influences we don’t yet know about on pain intensity ratings.
Morley, S. (2016). Bias and reliability in pain ratings. Pain, 157(5), 993-994.
Smith, S. M., Amtmann, D., Askew, R. L., Gewandter, J. S., Hunsinger, M., Jensen, M. P., . . . Dworkin, R. H. (2016). Pain intensity rating training: Results from an exploratory study of the acttion protecct system. Pain, 157(5), 1056-1064.