Throughout the history of psychometric assessment, we have relied heavily on structured responses, such as multiple-choice questions, on tests because these types of questions are easy to score, making the assessment process scalable, but they are artificial, low fidelity evaluations of skills. Technology, however, is creating opportunities to think differently about the questions we ask, how we ask them, and how we evaluate the responses.
But we are not delivering on the promise that technology is providing to truly innovate in assessment design and delivery in any meaningful way. Yes, small steps have been taken, but our industry is slow to adopt anything that is truly different, that challenges the status quo, and that is likely to be to our detriment. We need to think big, and even if we can’t implement those big ideas, by thinking big, we can leap toward innovation in ways that fundamentally change how we approach assessment.
Prior to Covid-19, there was a fair amount of criticism and skepticism about the assessment industry, and the pandemic has brought this sentiment into stark relief, underscoring the smallness of the steps we have taken as an industry to leverage technology to change how we assess people. As an example, many people who have been opposed to assessment are seeing this as an opportunity to opt out on a grand scale. See the recent decision by California that they will no longer require the ACT or SATs in their college application process by 2025.
The risks that Covid has magnified include the risk that our audience will decide that objective measurement is irrelevant, easily replaced, or doesn’t provide sufficient benefit for the associated costs. In addition, our reliance on our current item formats, development, and analytics, test delivery, and psychometrics that have not evolved to accommodate today’s technologies, big data, the reduction in importance of recall, and the increasing importance of non-knowledge based skills will undermine the testing industry. To address these risks, we need to understand customer needs deeply and create appropriate assessments… multiple choice questions will not meet this need.
So, we must rethink our approach to assessment.
Emerging technologies, such as machine learning, artificial and ambient intelligence, gaming, animation, virtual reality, speech/gesture/gaze/voice recognition, blockchain, and bots, just to name a few, can be harnessed to change the world of assessment.
Let’s take a look at how this could happen, starting with job task analysis, position analysis, competency analysis… whatever you call it. Quite honestly, this may be one of the easiest places to radically change assessment development, and it may even be easier than we think.
Leveraging telemetry, we can truly understand how people are performing their jobs. Today, we rely on (often unreliable) subject matter experts to help us define the job role, tasks performed, and skills and abilities needed for success. However, research consistently shows that true experts in a job often forget to include some of the basic tasks that they perform or even what it was like to be starting in a role, resulting in a list of skills and abilities that misses some of the critical job tasks that we should be evaluating on our exams.
If we use telemetry to understand what people are really doing in their jobs and align that to their skill level, we will have a better understanding of what we should be assessing on our exams to determine competence; we will understand the criticality and importance of each task through a data driven process rather than the subjective nature of SME driven consensus. Constantly monitoring this telemetry will help us identify new and emerging tasks and skills that are needed for success and inform us on when to add those skills to our assessment process. This would also help us identify the frequency with which tasks are preformed and their outcomes when not performed correctly. Further, it can help us build a learning culture, identifying common mistakes and providing learning opportunities in the moment.
Certainly, SMEs will need to be involved in refining the skills into something coherent that we can use for training and exam development, but by building a JTA on telemetry, we will have a much more accurate foundation for our assessment process. Not only will it help us quickly identify emerging tasks and skills and identity important but infrequent tasks, it may allow us to redefine what competence means by helping us better understand aspects of skills that have largely been ignored, such as elegance and efficiency of a solution, the quality of the outcome, speed of implementation or problem solving, and so on.
Assessment development can also change radically. Imagine a tool that “reads” text, “watches” videos, “completes” tutorials, and then identifies the instructional nature of the content and its key concepts to build a series of questions to assesses someone’s understanding, skill, or ability in only a few minutes or seconds.
This goes beyond automatic item generation, which has been around for years, to the idea of automatic/on the fly assessment development experience.
Think about how work in your organization has changed in a virtual world in the wake of Covid 19. For example, your employees are certainly having more interactions via email, chat, or text. How has that changed the dynamic of these interactions? Other than virtual meetings, how else has work for your employees changed? Have your assessments—be it for selection or training and development purposes, changed to reflect this new reality? Are your assessments evaluating the skills needed for success in a virtual world? In most cases, our current testing models are not flexible enough to accommodate disruptive changes that fundamentally change the job. We need to think differently about our approach to assessment development.
Ultimately, to prevent the risks that people will begin to see assessment as irrelevant, we need to create experiences that match their expectations in a world where technology is embedded in all that they do. Taking a multiple-choice assessment is “old school” if I may use that pun. Leveraging technology, we can deliver assessments that are more interactive in nature either through interviews with bots or through more reliable and scalable scoring of verbal and written responses; we could even design a “create your own adventure” type of assessment that is based on personal needs or known strengths and weaknesses. Virtual reality and similar technologies that immerse the candidate in the experience will have more fidelity and feel less like a test than multiple choice questions.
In addition, advances in technology enable us to think differently about how we are evaluating skills such as collaboration, communication, writing, especially if we consider the types of data we can capture in a virtual world. As long as we are intentional about the data we collect, we can design an evaluation that is based on meaningful patterns of behaviors that help us better understand the test taker’s engagement, collaboration abilities, and communication styles.
One example of how a company is leveraging the ubiquity of technology and increasing use of digital assistants in assessments is OpenEyes technologies. Recognizing the people want convenient assessment solutions that happen where they are, they are working on a AI natural language data collection platform for assessment, surveying, and employment screening.
This solution is designed so that not only will test takers, be it students, job applicants, or someone seeking a certification or license, not need to leave the comfort of their homes or offices, they won’t even need to sit in front of a computer to take the assessment. The digital assistant will ask the questions and the test taker will respond verbally. The assistant will note the answer and move to the next question. Imagine the possibilities…
Cognitive Services and the related technologies leverage powerful algorithms to see, hear, speak, understand and interpret our needs using natural methods of communication, including Emotion and sentiment detection; Image and speech recognition; Language understanding; and Semantic engines. They will make it easier and more accurate to assess soft skills that are of critical importance to organizations. Plus, they may improve and automate the localization of our assessment content.
These services allow testing programs to move from structured response assessment questions to assessment processes that are more open ended in nature (e.g., interviews, written documentation, portfolio reviews, projects, etc.) in scalable, repeatable, cost effective ways. If we are intentional in identifying the data that is critical to understanding engagement, collaboration, communication, etc., cognitive services opens a whole new world for soft skills assessment.
Finally, technology is allowing us to think about other ways to assess relevant skills and abilities indirectly. Could something that evaluates physiological measures such as modelling of eye movement, keystrokes, etc. be a good predictor of communication or other soft skills that today we are evaluating in low-tech ways? Maybe… the Canadian Medical Board is currently doing promising research on this idea.
Let’s talk a bit about how technology can move assessment delivery beyond computer adaptive testing. Probabilistic modeling allows computers to consider uncertainties and estimate the likelihood of that a test taker will complete a task or answer a question correctly. Leveraging technology, we can take computer adaptive testing one step further, and present the “right” task or question at the “right” time to determine competence more quickly (and potentially foster an ongoing learning journey, but more on that later). This would be a more effective approach to computer adaptive testing because it would optimize the item pool more efficiently.
Going one step further, imagine the tool that is generating items “real time” during the assessment process based on the skills that the test taker has demonstrated up to that point. Every exam is unique, accurate, and more efficient evaluation of skills.
Now, let’s turn to psychometrics. Measurement, psychometrics, and exams as we know them today have been around for a millennium. We have evidence of similar approaches to assessment being used in China and ancient Greece thousands of years ago for job placement and educational purposes. Further, psychometrics is about 150 years old if you go back to the work of Galton, and even our beloved classical test theory models and IRT are now 70+ years old; we are dealing with a legacy exam design process that is analyzed using legacy processes and have not evolved to keep pace with technology, changing educational models, the explosion of knowledge, and disruptive factors like Covid-19. We are still largely measuring crystallized knowledge—what I know—at a fixed point in time, which is in direct conflict with the fact that knowledge is doubling in some fields every few months and certainly in less than a year in nearly all fields. It’s insufficient to measure what is justifiably considered “foundational” knowledge. As a testing industry, we have not shown the world that we can assess skills in modern and relevant ways, and much of this is related to our psychometrics that underlie our ability to say if an item or assessment is valid and reliable. We need to completely rethink our approach to psychometrics to better reflect what and how we should be assessing, especially given the doors that technology advancements have opened for us.
As a timely example, as more and more people take assessments from home, I am starting to question the emphasis we place on standardized administration. Don’t get me wrong, this is important, but administration models still involve candidates showing up to test centers with a fixed set of equipment that the test provider can completely control, but with more testing from home, it is becoming painfully clear that equitable access has an increasing importance. There is a growing question of equality vs. equity. Which matters more in times of disruption? Equity may be just as important in times of disruption and the rigidity of standardization may be in direct conflict.
Our current psychometric models cannot handle the changes technology allows in assessment or our need to change in times of disruption. They cannot handle new approaches to assessment the create assessments on the fly as I proposed above, telemetry based evaluations, or solutions that assess all that there is to measure about a skill with a single “item.” Perhaps Covid=19 will be the impetus we need to rethink psychometrics as well.
I leave you with this big idea. Imagine a world without tests or exams—at least without the traditional definition of a test or exam because objective evaluation of skills will always be necessary.
To truly understand if someone has the skills to be successful in a job, we should be assessing people as they are do something that is as close as possible to that job. When possible, this means that we should be designing “in work” assessments that evaluate skills as people are doing their jobs. xAPIs or similar telemetry could be used to determine if someone has the skills that they need to be considered competent and identify any weaknesses.
Barring that, we need to design assessments that get as close as possible to what they will be doing on the job. These types of assessments are “free form” because there are many ways that people can accomplish the same tasks and our scoring/evaluation processes need to account for that. Traditional analysis and scoring techniques have not been able to adequately evaluate this type of free form responses (that’s the problem with our current psychometrics). AI with its natural language processing techniques and neural networks can be brought together to dramatically increase our ability to evaluate and understand unstructured responses, but this requires partnering with Data and AI Engineers.
In addition, we need to expand our definition of assessment to one that is a more integrated learning and assessment experience that combines frequent, planned low and high stakes assessments that not only evaluate competency but guide learning. Why are we not embracing the learning mindset that is needed to be successful in nearly every job as technology continues to advance so rapidly?
We need to break out of our paradigm of what we think assessment is. We need to stop creating obstacles that prevent us for thinking differently about assessments. If assessment becomes a seamless and integrated part of our daily experience that drives not only achievement but ongoing learning, not only would we have a more accurate, higher fidelity evaluation of skills, but odds are good, people will see assessments are relevant and possibly, dare I say, fun.