Let’s Bust Up Our Likert-scale Surveys and Measure Program Outcomes

In developing programs aimed at changing systems, structures, or behaviors, we often rely on self-report surveys to help us measure progress. We typically use Likert or other numeric scales to gather input from program participants or their beneficiaries to quickly assess their perceptions, then use those data to evaluate program effectiveness or identify opportunities for improvement.

While using surveys for such a purpose certainly has its advantages, and there are times when they are appropriate for use, there are many ways that such measures can lead us astray when we are evaluating our progress in systems change. Let’s consider the following case study to see how alternative item types may help us get clearer about our program outcomes, as well as better measure and assess our progress toward achieving them.

A Case Study

Let’s pretend we have developed a program to educate instructional leaders. One of the outcomes we have for the program is that leaders will be able to collaborate with their staff to develop a school-level vision and values statement, communicate the vision and values to all stakeholders, and celebrate when staff demonstrates evidence of supporting them. We would like to measure the extent to which leaders have established such a process in their schools, and ask instructional leaders to use these data to make improvements. So we create a survey aimed at gathering staff perceptions about their leaders.

We probably developed a survey using Likert-scaled items that looked something like this:

Sample instrument: leadership development of school vision and values

After administering the survey, we reported the results using a visual like the one below. First, we ranked the items from high to low according to the highest percent agreement, then created a stacked bar chart to highlight the frequency of respondents who indicated each level of agreement across the scale. We used special colors to call out the total percent agreement (strongly agree plus agree) for each indicator.

Sample report: leadership development of school vision and values

Drawbacks to an Atheoretical Approach to Measuring Program Outcomes

Once these data are returned, how would our instructional leaders interpret and act upon the data?

One of the most common approaches to data interpretation that is prevalent in our data culture today is to identify which indicators scored the highest and pat ourselves on the back for our achievement, and then identify the 2-3 indicators that scored the lowest and establish plans to improve in those areas. Using this approach to action planning based on our sample data, our leaders may be tempted to dive into a plan to improve how they communicate the vision and values to stakeholders or celebrate evidence of staff support of the vision or values in the school (the two lowest scoring indicators).

While this method is an improvement over using mean scores for Likert-scaled data, such an approach to measurement, visualization, and interpretation in general still fails to take into account many key elements of our program’s theory. In other words, this approach to measurement and interpretation of findings is extremely atheoretical. This means that because of the way we identified program outcomes, and then presented, listed, and measured them as indicators, we unintentionally messaged our program theory in a way that will affect how program beneficiaries interpret and act on the data. In this case, we inadvertently communicated that each of the outcomes that we measured:

Are equally important to our program’s guiding theory,
Carry equal weight in what is important to measure or is relevant for all stakeholders, depending on their level of maturity,
Are equally likely to reflect a desired use case in most contexts,
May be implemented in isolation of each other, and
May be equally easy or difficult to implement in the journey toward excellence.

In essence, the way we measured our outcomes muddied the waters about what steps leaders should take as a next step in their journey. We measured “around the veneer” of our program outcomes instead of “at the root” of our program theory and the actual outcomes we expect to see once participants translate their learning into their own contexts.

Toward a Theoretical Approach to Measuring Program Outcomes

It is important that our measures of program outcomes better communicate the way in which we expect our program theory to operate in the participants’ target contexts. By getting clear about how a participant is likely to apply program outcomes along a theoretical progression of practice (Danks & Allen, 2014), we can support program participants to better interpret and more meaningfully act upon the data.

Let’s go back to our case study and see if we can measure these program outcomes in a way that better communicates our program theory.

To help us rework these indicators, it will help us to first review a few questions about our program theory:

Are there any outcomes that are more important than others?
Are there any outcomes that should carry more weight in what is important to measure, particularly those that are relevant for all stakeholders, regardless of their level of maturity?
Are there any outcomes that reflect a desired use case in most contexts? On the flipside, are there any that are not critical for participant success?
Are there any outcomes that should not be implemented in isolation of each other? Or are there indicators that have a tendency to go together in the typical progression of practice?
Are there any outcomes that are simply more challenging to implement than the others?

In reviewing our program outcomes, theory of action, level of difficulty, and the typical readiness of most program participants to embark upon this work, we may notice that some of the actions may take place at an earlier phase in the journey than others. We also may notice that some are more challenging to implement than others, or that some are more important to do in all contexts, and at each step of the journey than others. We may opt to group our indicators into a progression of practice, like this:

Key program outcomes grouped into a progression of practice

Level 1 – Initiating

Our leaders develop or revise the vision for our school on an annual basis.

Level 2 – Organizing

Our leaders encourage staff to participate in the revision of the school vision.
Our leaders communicate the vision and values to staff.

Level 3 – Implementing

Our leaders encourage staff to participate in developing a vision and guiding values for our school.
Our leaders communicate the vision and values to staff and stakeholders.

Level 4 – Integrating

Our leaders engage staff to lead the development of a vision and guiding values for our school.
Our leaders communicate the vision and values to staff and stakeholders.
Our leaders celebrate evidence of staff support of the vision and values of our school.

From ARKEN, 2020. Sustainable Systems Checkup.

When we group our outcomes into a progression of practice, we don’t need to worry about whether we have a fancy label to describe each level on the progression, even though the best-built progressions do typically align with the findings from the implementation sciences and/or relevant models of change. But it is important that the progression communicates which indicators:

are the easiest to accomplish,
should be implemented regardless of the context,
tend to go hand in hand on the journey, and
are more important and carry more weight than others.

The progression of practice from our case study could be administered using a survey-based tool like this:

Revised instrument: leadership development of school vision and values

After administering this item as a progression of practice, we could visualize the results in a similar manner:

Revised report: leadership development of school vision and values

The Benefits of Measuring the Progression of Practice

Measuring and visualizing the data in this manner accomplishes many things. First, it reduces complexity so that the program developer or instructional leader interpreting the data can zone in on a single indicator. To act upon the data, the leader would first identify which level on the progression of practice received the greatest number of responses to identify their current “level,” then prioritize actions needed to achieve the next level on the progression. This method of interpretation is much more manageable than reviewing abstract percent agreements across multiple indicators. While the leader may still need to take multiple action steps to achieve the next level on the progression, those steps have been grouped into items that are meaningfully linked to the program’s theory of change or maturity.

Wrap-Up

Program developers have theories, whether implicit or explicit, about how their programs should work, and how their participants should translate their learning into practice. When we use measurement tools that ask respondents to respond to each and every thing that we value about the program, and then ask program participants to interpret them side by side in an atheoretical manner, we impose the burden of program theory upon the interpreter of the data. When we ask participants to interpret and take action based on data from Likert-scale surveys, we are asking them to re-theorize about which elements of the program are most important for them to act upon, which may reduce the potency of the program as participants adapt it in their contexts over time.

By embedding progression of practice models into our measurement tools, we can clearly communicate how the program is intended to operate, reduce the data analysis burden on program participants, and make action planning simpler and more meaningful for all.

Let’s bust up our Likert-scale surveys and measure in a way that supports our program theory and tells the story of impact over time!

Reference

Danks, S., & Allen, J. (2014). Performance‐Based Rubrics for Measuring Organizational Strategy and Program Implementation. Performance Improvement Quarterly, 27(1), 33-49.