Quite soon into the first chapters of this book, I recalled a humorous little adage that often
comes to mind in my work; "The truth will make you free, but first it'll
make you miserable". The first two chapters present a thorough and critical
history of developments in performance assessment as a test method, first
in education and occupational settings, then moving on to language testing
theory and practice, particularly since the 70's, when the ideas of
communicative competence and authenticity of test-tasks began to be
incorporated into discussions about validity in language assessment. McNamara
describes language performance assessment as the product of two traditions.
The first is the fundamentally pragmatic 'work sample' approach influenced
by sociolinguistic theory, which treats the performance itself as the
target of assessment (the "strong" argument). The second stems from
psycholinguistic theory which views performance as merely a medium and the
underlying knowledge and ability as the target (the "weak" argument).
The book's initial message was that language performance
tests possess a seductive face validity that obscures the boundaries
between what is to be observed, how the subject reacts to the task at hand,
and how it is registered as a score. Theories across and within the
differing hues in the spectrum from the 'strong' to 'weak' arguments remain
far from definitive, and lack empirical evidence to support its constructs.
So okay, my most painful doubts during hands-on in-shop experience were
not only confirmed, but thanks to the author's comprehensive treatment,
were expanded. But in these first chapters, McNamara is simply telling us
what we have to face.
McNamara proposes a 'three pronged' approach to tackling the
problems he identifies. The first task is to incorporate a model of
communicative competence which can explain interaction between all
participants in performance assessment, including the interlocutor(s).
Secondly, we must direct our research towards finding how significant each
variable in our assessment method; (tasks, participants, settings, topics,
scales) is to our measurement. Lastly, we must decide, once we have a
picture of the impact of these variables, which will fit or inform our
model of communicative competence and what the practical boundaries for
"testability" are. McNamara uses the remaining two thirds of the book to
illustrate how it might be done, and in doing so makes this book a must for
those who wish to gain a clear understanding of what Rasch-based analysis can do.
From here on, the book can be read like a detailed journal that an
explorer might leave behind for others; telling us what to look for, how to
find it, and what it means. McNamara uses as his primary example the
development and data analysis of the Occupational English Test (OET), a
test of ESL for health professionals in Australia which assessed speaking
and writing in work-related simulations. He takes a chapter to explain the
procedures for determining the OET's test content (analysis of needs,
resources, and the communicative demands of the profession); writing up the
specifications, materials and scoring protocols; training and recruiting
evaluators; piloting and revision. He includes here, and throughout the
book, examples of the actual materials used in the decision making process
and in the test itself, all of which can serve as models for the reader to consider.
[ p. 13 ]
The final four chapters constitute what I think McNamara meant when he talked of directing our research to find what variables in performance assessment can inform our construct of communicative competence and what cannot. He begins with raters and ratings, presenting evidence of wide variation in how raters apply criteria, even when traditional methods to limit this have been employed. Here the author introduces the advantages of using multi-faceted measurement which, as a Rasch-based method, can process raw scores in such a way to estimate factors such as ability, item (criterion) difficulty, and rater severity all on the same scale. This not only allows the analyst to map the variables side by side to see how they introduce bias or interrelate, but, as demonstrated in this chapter, offers the test user an improved, fairer (more accurate) measurement than raw scores. McNamara then illustrates this with detail and clarity his next chapter on the concepts and procedures of Rasch analysis.- reviewed by Jeff Hubbell
[ p. 14 ]