Series: Rethinking the Likert-Scale, Part 2

Part 2: Drawbacks

What exactly are some of the main drawbacks to using Likert-scale items in educational settings, or for the evaluation of learning programs? What are some ways we can address these drawbacks?

Much has been written elsewhere about the drawbacks of Likert-scale items and their interpretation by scholars and evaluation specialists. Their work is highly recommended to any practitioner or scholar who utilizes Likert-based items in their data collection practice (e.g., Bishop & Herron, 2015; Yeager, Bryk, Muhich, Hausman, & Morales, 2013).

To this robust thinking I’d like to propose a few additional drawbacks I have identified and offer a suggested starting point for a better approach to address them, depending on the context.

Drawback #1. Likert-scales are too easy for respondents to complete.

Rationale	A Better Approach
Many would argue that Likert-scales are desirable in some contexts because they are easy for respondents to complete, reducing the cognitive load for the respondent. I would offer that this is actually a drawback, since it creates the impression for the respondent that they can quickly zip through a survey instead of taking the time to reflect and evaluate.	Use fewer items that have a greater cognitive load. Ask respondents to pause and reflect. Use “evaluative items,” shifting the burden of the evaluation to the respondent.

Drawback #2. Likert-scales are often non-evaluative.

Rationale	A Better Approach
In our quest to craft Likert-scale items that are clear, short, and relatable across a broad audience, we usually pose watered-down items that only demand surface agreement and/or perceived value.	Embed evaluative language into the items. Replace agreement scales with meaningful descriptors that capture the progression of expected outcomes.

Drawback #3. Long-form instruments are not always practical.

Rationale	A Better Approach
Long-from, battery-style tools developed for research purposes are not always suitable for practical contexts. The statistical assumptions that must be met (e.g., normality, independence, adequacy of sample size, homogeneity, continuous variables) to compare item scores to each other or evaluate performance over time are seldom evaluated.	Use short-form or single-item measures that align closely with key constructs of interest. Measure fewer items using varied scales.

Drawback #4. Mean scores and percent agreement are relatively meaningless.

Rationale	A Better Approach
Reporting mean scores for ordinal data is one of the seven deadly sins of statistical analysis. Reporting percent agreement is a better way to interpret ordinal data, but rarely yields much variation between indicators (for comparative purposes), or over time (pre-post). Most reports of percent agreement will yield a range from 65% to 85% agreement/top two boxes, every single time.	Use frequency distributions to report scaled data. Replace scales with a 3-5 point range with meaningful descriptors that capture the progression of expected outcomes. Use visual analog scales for ratings with a larger range, applying a single descriptor for the scale.

Drawback #5. Likert-scales don’t play nice with longitudinal or pre-post administrations.

Rationale	A Better Approach
Using Likert-scales for program evaluation using a pre-post design ignores the effect of response shift bias. This occurs when participants overrate performance before an intervention takes place due to unconscious incompetence. Likert-scaled instruments are less likely to detect changes from pre to post, particularly for attributes that are less malleable.	Use a retrospective pretest design with an evaluative scale at program completion to get respondents’ ratings for both before and after program participation.

Drawback #6. Frequency data using Likert-scales are seldom comparable.

Rationale	A Better Approach
Likert-scale items that ask questions about frequency of use or implementation of a key practice are seldom written with the theoretical frequencies in mind. Some items may have a different desired frequency than others. This means that reports that display multiple frequency items side by side may distort how we interpret frequency-scaled data.	Get clear about whether frequency is what you intend to measure, or whether a scale of quality would be better suited for each indicator. In cases where frequency is desired, group all indicators with a similar desired frequency in a single block, or report frequency items individually.

The Surveys of the Future

Likert-scale instruments certainly have enabled researchers to make great strides in academic studies where longer batteries of items are more useful for measuring latent constructs, or in instances where a large sample size is required for benchmarking performance across contexts. However, many scholars and evaluation specialists have long advocated for alternative item designs and administration protocols that are built for practical applications.

The surveys of the future will draw from better approaches to item design, which will result in better analysis, reporting, interpretation, and use.

In Part 3 of this blog series, we will take a peek at a few item designs that boost the meaning and value of our self-report tools.

References

Bishop, P. A., & Herron, R. L. (2015). Use and misuse of the Likert item responses and other ordinal measures. International journal of exercise science, 8(3), 297.

Yeager, D., Bryk, A., Muhich, J., Hausman, H., & Morales, L. (2013). Practical measurement. Palo Alto, CA: Carnegie Foundation for the Advancement of Teaching, 78712.

check out part 3