Aligning Goals

Communicating Instructional Objectives Know and communicate your instructional objectives. Not only do teachers need to under- stand what their learning objectives are, but students need to know what is expected of them—hence the importance of having clear instructional objectives and of communicating them to students. If students understand what the most important learning goals are, they are far more likely to reach them than if they are just feeling their way around in the dark. For example, at the beginning of a unit on geography, Mrs. Wyatt tells her seventh graders that at the end of the unit, they will be able to create a map of their town to scale. To do this, Mrs. Wyatt explains, she will have to teach them some map-making skills and the math they will need to calculate the scaling for the map. She then begins a lesson on ratios.

Aligning Goals, Instruction, and Assessments Match instruction, assessment, and grading to goals. In theory, this guideline might seem obvious; but in practice, it is not always highly apparent. For example, one of my high school

Figure 2.1: Four kinds of educational assessment based on their main purposes

Assessment can serve at least four distinct purposes in education. However, the same assessment procedures and instruments can be used for all these purposes. Further, there is often no clear distinction among these purposes. For example, diagnosis is often part of placement and can also have formative functions.

f02.01_EDU645.ai

Summative Assessment

Summarizes extent to which

instructional objectives have

been met

Provides basis for grading

Useful for further

educational or career decisions

Diagnostic Assessment

Identifies instructional

objectives that have not been

met

Identifies reasons why

targets have not been met

Suggests approaches to remediation

Formative Assessment

Designed to improve the

teaching/ learning process

Provides feedback for teachers and

learners to enhance learning and motivation

Enhances learning; fosters self-regulated

learning; increases motivation

Placement Assessment

Assesses pre-existing

knowledge and skills

Provides information for

making decisions about learner’s

readiness

Useful for placement and

selection decisions

Four Purposes of Educational Assessment Chapter 2

teachers, Sister Ste. Mélanie, delighted in teaching us obscure details about the lives and times of the English authors whose poems and essays were part of our curriculum. We natu- rally assumed that some of her most important objectives had to do with learning these intriguing details.

Her tests matched her instructional objectives. As we expected, most of the items on the tests she gave us asked questions like Name three different kinds of food that would have been common in Shakespeare’s time. (Among the correct answers were responses like cherry cordial, half-moon-shaped meat and potato pies called pasties, toasted hazelnuts, and stuffed game hens. We often left her classes very hungry.)

But sadly, Sister Ste. Mélanie’s grading was based less on what we had learned in her classes than on the quality of our English. She had developed an elaborate system for sub- tracting points based on the number and severity of our grammatical and spelling errors. Our grades reflected the accuracy of our responses only when our grammar and spelling were impeccable. Her grading did not match her instruction; it exemplified poor educa- tional alignment.

Educational alignment is the expression used to describe an approach to education that deliberately matches learning objectives with instruction and assessment. As Biggs and Tang (2011) describe it, alignment involves three key components:

1. A conscious attempt to provide learners with clearly specified goals

2. The deliberate use of instructional strategies and learning activities designed to foster achievement of instructional goals

3. The development and use of assessments that provide feedback to improve learning and to gauge the degree of alignment between goals, instruction, and assessment

Good alignment happens when teachers think through their whole unit before beginning instruction. They identify the learning objectives, the evidence they will collect to document student learning (assessments), and the sequence in which they will have students access and interact with information before administering assessments. One elementary teacher did just that when she designed a unit on the watersheds. Her goal was the Virginia science standard: Science 4.8—The student will investigate and understand important VA natural resources (a) watershed and water resources. To organize her unit, she used a focusing question: “What happens to the water flowing down your street after a big rainstorm?” She wanted students to understand the overarching concept that every action has a con- sequence—in this case, that the flow of water affects areas downstream. Throughout the course of her unit, she had children actively engaged in a variety of meaningful tasks. The children discussed and debated the issues of pollution and their responsibilities to avoid pol- luting their water. They created tangible vocabulary tools to learn the vocabulary of the unit. They responded to academic prompts to explain key concepts. They built models of water- sheds. As the students performed these tasks, the teacher noted the level of performance of each child and documented individual knowledge, skill, and understanding. She used these instructional strategies as formative assessments as she provided feedback to each student. In addition to multiple quizzes throughout the unit, the students demonstrated their under- standing of important concepts by completing a performance-based task. Everything was aligned so that the teacher could infer that students truly understood the importance of Virginia’s watersheds and water resources.

Four Purposes of Educational Assessment Chapter 2

Using Assessment to Improve Instruction Use assessment as an integral part of the teaching–learning process. Good formative assess- ment is designed to provide frequent and timely feedback that is of immediate assistance to learners and teachers. As we see in Chapter 5, this doesn’t mean that teachers need to make up specially designed and carefully constructed placement and formative tests to assess their learners’ readiness for instruction, gauge their strengths and weaknesses, and monitor their progress. The best formative assessment will often consist of brief, informal assessments, perhaps in the form of oral questions or written problems that provide immediate feedback and inform both teaching and learning. Formative feedback might then lead the teacher to modify instructional approaches and learners to adjust their learning activities and strategies.

Using Different Approaches to Assessment Employ a variety of assessments, especially when important decisions depend on their out- comes. Test results are not always entirely valid (they don’t always measure what they are intended to measure) or reliable (they don’t always measure very accurately). The results of a single test might reflect temporary influences such as those related to fatigue, test anxiety, illness, situational distractions, current preoccupations, or other factors. Grades and decisions based on a variety of assessments are more likely to be fair and valid. For example, when someone is ready to demonstrate driving knowledge and skills, multiple assessments are given by the Department of Motor Vehicles (DMV). Drivers need to know the rules of the road, and they also need to know how to parallel park. So DMV assessments include both a written and a driving field test.

Constructing Tests According to Blueprints A house construction blueprint describes in detail the components of a house—its dimen- sions, the number of rooms and their placement, the materials to be used for building it, the pitch of its roof, the depth of its basement, its profile from different directions. A skilled con- tractor can read a blueprint and almost see the completed house.

In much the same way, a test blueprint describes in detail the nature of the items to be used in building the test. It includes information about the number of items it will contain, the con- tent areas they will tap, and the intellectual processes that will be assessed. A skilled educator can look at a test blueprint and almost see the completed test (Figure 2.2).

Test Blueprints

It’s important to keep in mind that tests are only one form of educational assessment. Assessment is a broad term referring to all the various methods that might be used to obtain information about different aspects of teaching and learning. The word test has a more spe- cific meaning: In education, it refers to specific instruments or procedures designed to mea- sure student achievement or progress, or various student characteristics.

Educational tests are quite different from many of the other measuring instruments we use— instruments like rulers, tape measures, thermometers, and speedometers. These instruments measure directly and relatively exactly: We don’t often have reason to doubt them.

Our psychological and educational tests aren’t like that: They measure indirectly and with varying accuracy. In effect, they measure a sample of behaviors. And from students’ behaviors (responses), we make inferences about qualities we can’t really measure directly at all. Thus, from a patient’s responses to questions like “What is the first word you think of when I say

Four Purposes of Educational Assessment Chapter 2

mother?” the psychologist makes inferences about hidden motives and feelings—and perhaps eventually arrives at a diagnosis.

In much the same way, the teacher makes inferences about what the learner knows—and perhaps inferences about the learner’s thought processes as well—from responses to a hand- ful of questions like this one:

Which of the following is most likely to be correct?

1. Mr. Wilson will still be alive at the end of the story.

2. Mr. Wilson will be in jail at the end of the story.

3. Mr. Wilson will have died within the next 30 pages.

4. Mr. Wilson will not be mentioned again.

Tests that are most likely to allow the teacher to make valid and useful inferences are those that actually tap the knowledge and skills that make up course objectives. And the best way of ensuring that this is the case is to use test construction blueprints that take these objectives into consideration (see Tables 4.3 and 4.4 for examples of test blueprints).

Figure 2.2: Guidelines for assessment

These guidelines are most useful when planning for assessment. Many other considerations have to be kept in mind when devising, administering, grading, and interpreting teacher-made tests.

f02.02_EDU645.ai

Some Guidelines

for Assessment

Use a variety of assessments

Use assessment to improve instruction

Align instruction, goals, and assessment

Develop blueprints

to construct tests

Know and communicate

learning targets

Test Fairness Chapter 2

Guidelines for Constructing Test Blueprints A good test blueprint will contain most of the following:

• A clear statement of the test content related directly to instructional objectives

• The performance, affective, or cognitive skills to be tapped

• An indication of the test format, describing the kinds of test items to be used or the nature of the performances required

• A summary of how marks are to be allocated in relation to different aspects of the content

• Some notion of the achievement levels expected of learners

• An indication of how achievement levels will be graded

• A review of the implications of different grades

Regrettably, not all teachers use test blueprints. Instead, when a test is required, many find it less trouble to sit down and simply write a number of test items that seem to them a rea- sonable examination of what they have taught. And sadly, in too many cases what they have taught is aimed loosely at what are often implied and vague rather than specific instructional objectives.

Using test blueprints has a number of important advantages and benefits. Among them is that they force the teacher to clarify learning objectives and to make decisions about the importance of different aspects of content. They also encourage teachers to become more aware of the learner’s cognitive processes and, by the same token, to pay more attention to the development of higher cognitive skills.

At a more practical level, using test blueprints makes it easier for teachers to produce similar tests at different times, thus maintaining uniform standards and allowing for comparisons among different classes and different students. Also, good test blueprints serve as a useful guide for constructing test items and perhaps, in the long run, make the teacher’s work easier. Figure 2.3 summarizes some of the many benefits of using test blueprints.

2.2 Test Fairness Determining what the best assessment procedures and instruments are is no simple matter and is not without controversy. But although educators and parents don’t always agree about these matters, there is general agreement about the characteristics of good measuring instru- ments. Most important among these is that evaluative instruments be fair and that students see them as being fair. The most common student complaint about tests and testing practices has to do with their imagined or real lack of fairness (Bouville, 2008; Felder, 2002).

The importance of test fairness was highlighted during the Vietnam War in the 1960s. President Kennedy’s decision to send troops to Vietnam led to the drafting of large numbers of age-eligible men, some of whom died or were seriously injured in Vietnam. But men who went to college were usually exempt from the draft—or their required military service was at least deferred. So, for many, it became crucial to be admitted to undergraduate or postgradu- ate studies. For some, passing college or graduate entrance exams was literally a matter of life and death. That the exams upon which admission decisions would be based should be as fair as possible seemed absolutely vital.

Test Fairness Chapter 2

Just how fair are our educational assessments? We don’t always know. But science provides ways of defining and sometimes of actually measuring the characteristics of tests. It says, for example, that the best assessment instruments have three important qualities:

1. Fairness

2. Validity

3. Reliability

As we saw, from the student’s point of view, the most important of these is the apparent— and real—fairness of the test.

There are two ways of looking at test fairness, explains Bouville (2008): On the one hand, there is fairness of treatment; on the other, there is fairness of opportunity. Fairness of treat- ment issues include problems relating to not making accommodations for children with spe- cial needs, biases and stereotypes, the use of misleading “trick” questions, and inconsistent

Figure 2.3: Advantages of test blueprints

Making and using test blueprints presents a number of distinct benefits. And, although develop- ing blueprints can be time-consuming, contrary to what some think, it can make the teacher’s task easier rather than more difficult and complicated.

f02.03_EDU645.ai

Advantages of Devising

and Using Test Blueprints

Forces teacher to

clarify learning targets

Increases test validity and

reliability

Promotes the development of thinking rather

than mainly remembering

skills

Encourages teachers to

become more aware of learners’ cognitive

activity

Simplifies test

construction

Leads to more consistency among

different tests, allowing more

meaningful comparisons

Promotes decisions about

the relative importance of

different aspects of

content

Test Fairness Chapter 2

grading. Fairness of opportunity problems include testing students on material not covered, not providing an opportunity to learn, not allowing sufficient time for the assessment, and not guarding against cheating. We look at each of these issues in the following sections (Figure 2.4).

Content Problems

Tests are—or at the very least, seem—highly unfair when they ask questions or pose prob- lems about matters that have not been covered or assigned. This issue sometimes has to do with bad teaching; at other times, it simply relates to bad test construction. For example, in my second year in high school, we had a teacher who almost invariably peppered her quizzes and exams with questions about matters we had never heard about in class. “We didn’t have time,” she would protest when someone complained and pointed out that she had never mentioned rhombuses and trapezoids and quadrilaterals. “But it’s important and it’s in the book and it might be on the final exam,” she would add.

Had she simply told us that we were responsible for the content in Chapter 6, we would not have felt so unfairly treated. This example illustrates bad teaching as much as bad test construction.

In connection with content problems that affect test fairness, it is interesting to note that when test results are higher, students tend to perceive the test as being fairer. It’s an intrigu- ing observation that, it turns out, may have a grain of truth in it. As Oller (2012) points out, higher scores are evidence that there is agreement between test makers and the better stu- dents about the content that is most important. This agreement illustrates what we termed educational alignment: close correspondence among goals, instructional approaches, and assessments.

Conversely, exams that yield low scores for all students may reflect poor educational align- ment: They indicate that what the teacher chose to test is not what even the better learners have learned. Hence there is good reason to believe that tests that yield higher average scores are, in fact, fairer than those on which most students do very poorly. And raising the marks, perhaps by scaling them so that they approximate a normal distribution with an acceptably high average, will do little to alter the apparent fairness of the test.

Figure 2.4: Issues affecting test fairness

That a test is fair, and that it seems to be fair, is one of the most important characteristics of good assessment.

f02.04_EDU645.ai

Issues of Fairness of Opportunity

Issues of Fairness of Treatment

• Testing material not covered

• Not providing an opportunity to learn

• Not allowing sufficient time to complete test

• Not guarding against cheating

• Not accommodating to special needs

• Being influenced by biases and stereotypes

• Using misleading, trick questions

• Grading inconsistently

Test Fairness Chapter 2

Trick Questions

Trick questions illustrate problems that have less to do with test content than with test construction—which, of course, doesn’t mean that the test maker is always unaware that one or more questions might be considered trick questions.

Trick questions are questions that mislead and deceive, regardless of whether the deception is intentional or is simply due to poor item construction. Trick questions do not test the intended learning targets, but rather a student’s ability to navigate a deceptive test. Items that students are most likely to consider trick questions include:

1. Questions that are ambiguous (even when the ambiguity is accidental rather than deliber- ate). Questions are ambiguous when they have more than one possible interpretation. For example, “Did you see the man in your car?” might mean, “Did you see the man who is in your car?” or “Did you see the man when you were in your car?”

2. Multiple-choice items where two nearly identical alternatives seem correct. Or, as in the following example, where all alternatives are potentially correct:

The Spanish word fastidiar means:

annoy damage

disgust harm

3. Items deliberately designed to catch students off their guard. For example, consider this item from a science test:

During a very strong north wind, a rooster lays an egg on a flat roof: On what side of the roof is the egg most likely to roll off?

North

South

East

West

No egg will roll off the roof

Students who aren’t paying sufficient attention on this fast-paced, timed test might well say South. Seems reasonable. (But no; apparently, roosters rarely lay eggs.)

4. Questions that use double negatives. For example: Is it true that people should never not eat everything they don’t like?

5. Items in which some apparently trivial word turns out to be crucial. That is often the case for words such as always, never, all, and most, as in this item: True or False? Organic prod- ucts are always better for you than those that are nonorganic.

6. Items that make a finer discrimination than expected. For example, say a teacher has described the speed of sound in dry air at 20 degrees centigrade as being right around 340 meters per second. Now she presents her students with this item:

What is the speed of sound in dry air at 20 degrees centigrade?

A. 300 meters per second

B. around 340 meters per second

C. 343.2 meters per second

D. 343.8 meters per second

Test Fairness Chapter 2

Because the alternatives contain both the correct answer (C) and the less precise informa- tion given by the teacher (B), the item is deceiving.

7. Long stems in multiple-choice questions that include extraneous and irrelevant information but that serve to distract. Consider, for example, this multiple-choice item:

A researcher found that the average score of a sample consisting of 106 females was 52. The highest score was 89 while the lowest score was 34. In this study, the median score was 55 and the two most frequent scores were 53 and 58. What was the sum of all the scores?

A. 5,512

B. 5,830

C. 5,618

D. 6,148

All the information required to answer this item correctly (A) is included in the first sen- tence. Everything after that sentence is irrelevant and, for that reason, misleading.

Opportunity to Learn

Tests are patently unfair when they sample concepts, skills, or cognitive processes that stu- dents have not had an opportunity to acquire. Lack of opportunity to learn might reflect an instructional problem. For example, it might result from not being exposed to the material either in class or through instructional resources. It might also result from not having sufficient

time to learn. Bloom (1976), for example, believed that there are faster and slower learners (not gifted learners and those less gifted), and that all, given sufficient time, can master what schools offer.

If Bloom is mostly correct, the results of many of our tests indicate that we sim- ply don’t allow some of our learners suf- ficient time for what we ask of them. Bloom’s mastery learning system offers one solution. Mastery learning describes an instructional approach in which course content is broken into small, sequential units and steps are taken to ensure that all learners eventually master instructional objectives (see Chapter 6 for a discussion of mastery learning).

Another solution, suggests Beem (2010), is the expanded use of technology and of

virtual reality instructional programs. These are instructional computer-based simula- tions designed to provide a sensation of realism. She argues that these, along with other digital technologies including computers and handheld devices, offer students an opportunity to learn at their own rate. Besides, digital technology might also reduce the negative influence of poorly qualified teachers—if there are any left.

iStockphoto/Thinkstock

▲ Ambiguous questions, misleading items, items about mate- rial not covered or assigned, overly long tests—all of these contribute to the perceived unfairness of tests.

Test Fairness Chapter 2

As Ferlazzo (2010) argues, great teaching is about giving students the opportunity to learn. Poor and unfair testing is about assessing the extent to which they have reached instruc- tional objectives they have never had an opportunity to reach. See Applications: Addressing Unfair Tests.

Insufficient Time

Closely related to the unfairness that results from not having an opportunity to learn is the injustice of a test that doesn’t give students an opportunity to demonstrate what they actu- ally have learned. For some learners, this is a common occurrence simply because they tend to respond more slowly than others and, as a result, often find that they don’t have enough time to complete the test.

A P P L I C AT I O N S :

Addressing Unfair Tests

In spring 2013, students in New York City had their first experience with English Language Arts tests designed to tap the curriculum of the Common Core State Standards. After adopting these standards in 2010, New York hired a test-publishing company to design a test that would reflect the knowledge and skills within the Common Core Standards.

After witnessing their students’ anguish following the initial testing experience, 21 principals were so outraged that they felt compelled to issue a formal protest through a letter to the State’s Commissioner of Education. In that letter, they highlighted how the English/Language Arts test was unfair. One of their major concerns was the lack of alignment between the types of questions asked and the critical thinking skills valued in the Common Core State Standards. The Common Core State Standards emphasize deep and rich analysis of fiction and nonfiction. But the ELA tests focused mostly on specific lines and words rather than on the wider standards. What was taught in the classrooms was not assessed on the test: The test failed to meet the criterion of fairness of opportunity.

While alignment between what was tested and what is in the Standards is important, this was not the administrators’ only complaint. In reviewing the tests taken by the students, they concluded that the structure of the tests was not developmentally appropriate. For example, testing required 90-minute sessions on each of three consecutive days—a difficult undertaking for a mature stu- dent, let alone for a 10-year-old. Clearly, there is a violation here of the criterion of fairness of opportunity.

Finally, the principals expressed concern that too much was riding on a flawed test developed by a company with a track record of errors. They feared that the tests might not be valid. Yet students’ promotion to the next grade, entry into middle and secondary school, and admission to special programs are often based on these tests. In addition, teachers and schools are evaluated in terms of how well their students perform, even though that is not an intended use of the tests. As a result, scores on these tests can affect the extent to which schools receive special funds or are put on improvement plans. These complications raise questions about the test’s validity for these purposes.

Clearly, as the principals reflected on the new English/Language Arts test, they saw problems with both fairness of opportunity and validity.

Test Fairness Chapter 2

Suppose that a 100-item test is designed to sample all the target skills and information that define a course of study. If a student has time to respond to only 80 of these 100 items, then only 80% of the instructional objectives have been tested. That test is probably unfair for that student.

There is clearly a place for speeded testing, particularly with respect to standardized tests such as those that assess some of the abilities that contribute to intelligence. (We look at some of these tests in Chapter 10.) But as a rule of thumb, teacher-made tests should always be of such a length and difficulty level that most, if not all, students can easily complete them within the allotted time (van der Linden, 2011).

Failure to Make Accommodations for Special Needs

Timed tests can be especially unfair to some learners with special needs. For example, Gregg and Nelson (2012) reviewed a large number of studies that looked at performance on timed graduation tests—a form of high-stakes testing (so called because results on these tests can have important consequences relating to transition from high school, school funding, and even teaching and administrative careers). These researchers found that whereas students with learning disabilities would normally be expected to achieve at a lower than average level on these tests, when they are given the extra time they require, their test scores are often comparable to those of students without disabilities.

Giving students with special needs extra time is the most common of possible accommoda- tions. It is also one of the most effective and fairest adjustments. Even for more gifted and talented learners, additional time may be important. Coskun (2011) reports a study where the number of valuable ideas produced in creative brainstorming groups was positively related to the amount of time allowed.

Accommodations for Test Anxiety In addition to being given extra time for learning and assessment, many other accommoda- tions for learners with special needs are possible and often desirable. For example, steps can be taken to improve the test performance of learners with severe test anxiety. Geist (2010) suggests that one way of doing this is to reduce negative attitudes toward school subjects such as mathematics. As Putwain and Best (2011) showed, when elementary school students are led to fear a subject by being told that it will be difficult and that important decisions will be based on how well they do, their performance suffers. The lesson is clear: Teachers should not try to motivate their students by appealing to their fears.

For severe cases of test anxiety, certain cognitive and behavioral therapies, in the hands of a skilled therapist, are sometimes highly effective (e.g., Brown et al., 2011). And even in less skilled hands, the use of simple relaxation techniques might be helpful (for example, Larson et al., 2010).

It is worth keeping in mind, too, that test anxiety often results from inadequate instruction and learning. Not surprisingly, after Faber (2010) had exposed his “spelling-anxious” students to a systematic remedial spelling training program, their spelling performance increased and their test anxiety scores decreased.

Test Fairness Chapter 2

Accommodations for Minority Languages Considerable research indicates that children whose first language is not the dominant school language are often at a measurable disadvantage in school. And this disadvantage can become very apparent if no accommodations are made in assessment instruments and procedures—as is sometimes the case for standardized tests given to children whose dominant language is not the test language (Sinharay, Dorans, & Liang, 2011). As Lakin and Lai (2012) note, there are some serious issues with the fairness and reliability of ability measures given to these children without special accommodations. As we saw in Chapter 1, accommodations in these cases are mandated by law (see In the Classroom: Culturally Unfair Assessments).

Accommodations for Other Special Needs Teachers must be sensitive to, and they must make accommodations for, many other “special needs.” These might include medical problems, sensory disabilities such as vision and hearing problems, emotional exceptionalities, learning disabilities, and intellectual disabilities. They might also include cultural and ethnic differences among learners.

Figure 2.5 describes some of the accommodations that fair assessments of students with spe- cial needs might require.

I N T H E C L A S S R O O M :

Culturally Unfair Assessments

Joseph Born-With-One-Tooth knew all the legends his grandfather and the other elders told—even those he had heard only once. His favorites were the legend of the Warriors of the Rainbow, and the legend of Kuikuhâchâu, the man who took the form of the wolverine. These legends are long, complicated stories, but Joseph never forgot a single detail, never confused one with the other. The elders named him ôhô, which is the world for owl, the wise one. They knew that Joseph was extraordinarily gifted.

But in school, it seemed that Joseph was unremarkable. He read and wrote well, and he performed better than many. But no one even bothered to give him the tests that singled out those who were gifted and talented. Those who are talented and gifted are often identified through a combina- tion of methods, beginning with teacher nominations that then lead to further testing and perhaps interviews and auditions (Pfeiffer & Blei, 2008). Those who don’t do as well in school, sometimes because of cultural or language differences, tend to be overlooked.

Joseph Born-With-One-Tooth is not alone. Aboriginal and other culturally different children are vastly underrepresented among the gifted and the talented (Baldwin & Reis, 2004). By the same token, they tend to be overrepresented among programs for those with learning disabilities and emotional disorders (Briggs, Reis, & Sullivan, 2008).

There is surely a lesson here for those concerned with the fairness of assessments.

Test Fairness Chapter 2

Biases and Stereotypes

Accommodations for language differences are not especially difficult. But overcoming the many biases and stereotypes that can affect the fairness of assessments often is.

Biases are preconceived judgments usually in favor of or against some person, thing, or idea. For example, I might think that Early Girl tomatoes are better than Big Boys. That is a harmless bias. And like most biases, it is a personal tendency. But if we North Americans tend to believe that all Laplanders are such and such, and most Roma are this and that (such and such and this and that of course being negative), then we hold some stereotypes that are potentially highly detrimental.

Closer to home, historically there have been gender stereotypes about male–female differ- ences whose consequences can be unfair to both genders. Some of these stereotypes are based on long-held beliefs rooted in culture and tradition and propagated through centuries of recorded “expert” opinion. And some are based on various controversial and often con- tested findings of science.

It’s clear that males and females have some biologically linked sex differences, mainly in physi- cal skills requiring strength, speed, and stamina. But it’s not quite so clear whether we also have important, gender-linked psychological differences. Still, early research on male–female differences (Maccoby & Jacklin, 1974) reported significant differences in four areas: verbal abil- ity, favoring females; mathematical ability, favoring males; spatial–visual ability (evident, for example, in navigation and orientation skills), favoring males; and aggression (higher in males).

Figure 2.5: Fair assessment accommodations for children with special needs

These are only a few of the many possible accommodations that might be required for fair assess- ment of children with special needs. Each child’s requirements might be different. Note, too, that some of these accommodations might increase the fairness of assessments for all children.

f02.05_EDU645.ai

Instructional Accommodations • teacher aides and other professional assistance • special classes and programs • individual education plans • special materials such as large print or audio devices • provisions for reducing test anxiety • increased time for learning

Testing Accommodations • increased time for test completion • special equipment for test-taking • different form of test (for example, verbal rather than written) • giving test in different setting • testing in a different language

Possible Accommodations for Fair Assessment of Students with Special Needs

Test Fairness Chapter 2

Many of these differences are no longer as apparent now as they were in 1974. There is increasing evidence that when early experiences are similar, differences are minimal or non- existent (Strand, 2010).

But the point is that experiences are not always similar, nor are opportunities and expecta- tions. In the results of many assessments, there are still gender differences. These often favor males in mathematics and females in language arts (e.g., De Lisle, Smith, Keller, & Jules, 2012). And there is evidence that the stereotypes many people still hold regarding, say, girls’ inferior- ity in mathematics might unfairly affect girls’ opportunities and their outcomes.

In an intriguing study, Jones (2011) found that when women were shown a video supporting the belief that females perform more poorly than males in mathematics, subsequent tests revealed a clear gender difference in favor of males on a mathematics achievement test. But when they were shown a video indicating that women performed as well as men, no sex dif- ferences were later apparent.

Inconsistent Grading

Approaches to grading can vary enormously in different schools and even in different class- rooms within the same school. They might involve an enormous range of practices, including

• Giving or deducting marks for good behavior

• Giving or deducting marks for class participation

• Giving or deducting marks for punctuality

• Using well-defined rubrics for grading

• Basing grades solely on test results

• Giving zeros for missed assignments

• Ignoring missed assignments

• Using grades as a form of reward or punishment

• Grading on any of a variety of letter, number, percentage, verbal descriptor, or other systems

• Allowing students to disregard their lowest grade

• And on and on . . .

No matter what practices are used in a given school, for assessments to be fair, grades need to be arrived at in a predictable and transparent manner. Moreover, the rules and practices that underlie their calculation need to be consistent. This approach is also critical for describing what students know and are able to do. If a math grade is polluted with behavioral objec- tives such as participation, how will the student and parents know what the student’s math skills are?

Inconsistent grading practices are sometimes evident in disparities within schools, where dif- ferent teachers grade their students using very different rules. In one class, for example, stu- dents might be assured of receiving relatively high grades if they dutifully complete and hand in all their assignments as required. But in another class, grades might depend entirely on test results. And in yet another, grades might be strongly influenced by class participation or by spelling and grammar.

Test Fairness Chapter 2

Inconsistent grading within a class can also present serious problems of fairness for students. A social studies teacher should not ignore grammatical and spelling errors on a short-answer test one week and deduct handfuls of marks for the same sorts of errors the following week. Simply put, the criteria that govern grading should be clearly understood by both the teacher and students, and those criteria should be followed consistently.

Cheating and Test Fairness

Most of us, even the cheaters among us, believe that cheating is immoral. Sometimes it is even illegal—such as when you do it on your income tax return. And clearly, cheating is unfair.

First, if cheating results in a higher than warranted grade, then it does not represent the stu- dent’s progress or accomplishments—which hardly seems fair.

Second, those who cheat, by that very act, cheat other students. I once took a statistics course where, in the middle of a dark night, a fellow student sneaked up the brick wall of the education building, jimmied open the window to Professor Clark’s office, and copied the midterm exam we were about to take. He then wrote out what he thought were all the cor- rect answers and sold copies to a bunch of his classmates.

I didn’t buy. No money, actually. And I didn’t do nearly as well on the test as I expected. I thought I had answered most of the questions correctly; but, this being a statistics course, the raw scores (original scores) were scaled so that the class average would be neither distress- ingly low nor alarmingly high.

The deception was soon uncovered. Some unnamed person later informed Professor Clark who, after reexamining the test papers, discovered that 10 of his 35 students had nearly iden- tical marks. More telling was that on one item, all 10 of these students made the same, highly unlikely, computational error.

Cheating is not uncommon in schools, especially in higher grades and in postsecondary pro- grams where the stakes are so much higher. In addition, today there are far more oppor- tunities for cheating than there were in the days of our grandparents. Wireless electronic communication; instant transmission of photos, videos, and messages; and wide-scale access to Internet resources have seen to that.

High-Stakes Tests and Cheating There is evidence, too, that high-stakes testing may be contributing to increased cheating, especially when the consequences of doing well or poorly can dramatically affect entire school systems. For example, state investigators in Georgia found that 178 administrators and teach- ers in 44 Atlanta schools who had early access to standardized tests systematically cheated to improve student scores (Schachter, 2011).

Some school systems cheat on high-stakes tests by excluding certain students who are not expected to do well; others cheat by not adhering to guidelines for administering the tests, perhaps by giving students more time or even by giving them hints and answers (Ehren & Swanborn, 2012).

A more subtle form of administrative and teacher cheating on high-stakes tests takes the form of “narrowing” the curriculum. In effect, instructional objectives are narrowed to topics covered by the tests, and instruction is focused specifically on those targets to the exclusion

Test Fairness Chapter 2

of all others. This practice, notes Berliner (2011), is a rational—meaning “reasonable or intel- ligent”—reaction to high-stakes testing.

With the proliferation of online courses and online universities, the potential for electronic cheating has also increased dramatically (Young, 2012). For example, online tests can be taken by the student, the student’s friend, or even some paid expert, with little fear of detection.

Preventing Cheating Among the various suggestions for preventing or reducing cheating on exams are the following:

• Encourage students to value honesty.

• Be aware of school policy regarding the consequences of cheating, and communicate them to students.

• Clarify for students exactly what cheating is.

• When possible, use more than one form of an exam so that no two adjacent students have the same form.

• Stagger seats so that seeing other students’ work is unlikely.

• Randomize and assign seating for exams.

• Guard the security of exams and answer sheets.

• Monitor exams carefully.

• Prohibit talking or other forms of communication during exams.

Of course, none of these tactics, or even all of them taken together, is likely to guarantee that none of your students cheat. In fact, one large-scale study found that 21% of 40,000 undergraduate students surveyed had cheated on tests, and an astonishing 51% had cheated at least once on their written work (McCabe, 2005; Figure 2.6).

Sadly, that cheating is prevalent does not justify it. Nor does it do anything to increase the fairness of our testing practices.

Figure 2.6: Cheating among college undergraduates

Percentage of undergraduate students who admitted having cheated at least once.

f02.06_EDU645.ai

Percent of 40,000 students who admitted cheating

Cheated on written work

Cheated on tests

0 10 20 30 40 50 60

Source: Based on McCabe, D. (2005). It takes a village: Academic dishonesty. Liberal Education. Retrieved September 2, 2012, from http://www.middlebury.edu/media/view/257515/original/It_takes_a_village.pdf.

Validity Chapter 2

Figure 2.7 summarizes the main characteristics of fair assessment practices. Related to this, Table 2.3 presents the American Psychological Association (APA) Code of Fair Testing Practices in Education. Because of its importance, the code is reprinted in its entirety at the end of this chapter.

2.3 Validity In addition to the characteristics of fair assessment practices listed in Figure 2.6 and Table 2.3, the fairness of a test or assessment system depends on the reliability of the test instruments and the validity of the inferences made from the test results.

Simply put, a test is valid if it measures what it is intended to measure. For example, a high schooler’s ACT scores should not be used to decide if a student should have a driver’s license. The test is designed to predict college performance rather than readiness to drive. From a measurement point of view, validity is the most important characteristic of a measuring instrument. If a test does not measure what it is intended to measure, the scores derived from it are of no value whatsoever, no matter how consistent and predictable they are.

Test validity has to do not only with what the test measures, but also with how the test results are used. It relates to the inferences we base on test results and the consequences that fol- low. In effect, interpreting test scores amounts to making an inference about some quality or characteristic of the test taker.

For example, based on Nora’s brilliant performance on a mathematics test, her teacher infers that Nora has commendable mathematical skills and understanding. And one consequence of this inference might be that Nora is invited to join the lunchtime mathematics enrichment group. But note that the inference and the consequence are appropriate and defensible only if the test on which Nora performed so admirably actually measures relevant mathematical skills and understanding.

The important point is that in educational assessment, validity is closely related to the way test results are used. Accordingly, a test may be valid for some purposes but totally invalid for others.

Figure 2.7: Fair assessment practices

Assessments are not always fair for all learners. But their fairness can be improved by paying atten- tion to some simple guidelines.

f02.07_EDU645.ai

The Fairest Assessment

Practices

• Cover material that every student has had an opportunity to learn. • Reflect learning targets for that course. • Allow sufficient time for students to finish the test. • Discourage cheating. • Provide accommodations for learners with special needs. • Ensure that tests are free of biases and stereotypes. • Avoid misleading questions. • Follow consistent and clearly understood grading practices. • Base important decisions on a variety of d