Understanding Test Scores

NATIONAL ASSOCIATION OF SCHOOL PSYCHOLOGISTS

by Andrea Canter, Ph.D., NCSP, Minneapolis Schools

Background: Using Test Scores

Since the beginning of public schooling, students' skills have been measured by some type of testing. Invariably, the product of testing is a score, a "yardstick" by which an individual student is compared with others and/or by which progress is documented. Teachers and other educators use tests, and subsequently test scores, in a variety of ways: 1) to provide a profile of the entire class in a given subject such as reading; 2) to identify students with specific strengths and weaknesses who might benefit from different instruction; 3) to sort out a group of students for additional testing or special help; 5) to evaluate the effectiveness of instruction; 6) to illustrate student progress over time; 7) to compare individual students or groups of students with some national or local standard.

Types of Tests and Test Scores

Most tests used in schools are either norm-referenced or criterion-referenced.

Norm-referenced tests compare an individual's performance to that of his or her classmates, thus emphasizing relative rather than absolute performance. Scores on norm-referenced tests indicate the student's ranking relative to that group. Typical scores used with norm-referenced tests:

Percentiles A percentile is a score that indicates the rank of the score compared to others (same age or same grade), using a hypothetical group of 100 students. A percentile of 25, for example, indicates that the student's test performance equals or exceeds 25 out of 100 students on the same measure; a percentile of 87 indicates that the student equals or surpasses 87 out of 100 (or 87% of) students. Percentiles are derived from raw scores using the norms obtained from testing a large population when the test was first developed. Percentiles are probably the most commonly used test score in education.

Stanines Stanines are essentially groups of percentile ranks, with the entire group of scores divided into 9 parts, with the largest number of individuals falling in the middle stanines, and fewer students falling at the extremes. Few tests in common usage use stanines today, although these scores can be useful in understanding the relative range of a student's performance.

Standard scores A standard score is also derived from raw scores using the norming information gathered when the test was developed. Instead of reflecting a student's rank compared to others, standard scores indicate how far above or below the average (mean) an individual score falls, using a common scale, such as one with an "average" of 100. Standard scores also take "variance" into account, or the degree to which scores typically will deviate from the average score. Standard scores can be used to compare individuals from different grades or age groups because all scores are converted to the same numerical scale. Most intelligence tests and many achievement tests use some type of standard scores.

Age/Grade Equivalent scores Some tests provide age or grade equivalent scores. Such scores indicate that the student has attained the same score (not skills) as an average student of that age or grade. For example, if Sally obtains a grade-equivalent score of 3.6 on a reading comprehension test, this means that she obtained the same score as the typical student in the sixth month of third grade. Sally may or may not have acquired the same skills as the typical third grader. Age/grade scores seem to be easy to understand but are often misunderstood and many educators discourage their use.

Standard scores, percentile ranks, and stanines can be compared using the "normal" or bell shaped curve. Most tests used in education are developed in order to yield a standard curve of scores, where the majority of all students would fall within a small range (or one "standard deviation") of the mean or average score, and where 50% of all students would fall above, and 50% would fall below the average score. Some tests, however, do not have such "normal" distributions of scores and these different types of scores may not be comparable.

Criterion-referenced tests are used to measure student mastery of instructional objectives or curriculum (absolute performance), rather than to compare one student with another or to rank students. They are often used as end-of-unit tests in textbooks or as a "benchmark" to identify areas of strength or weakness in a given curriculum, readiness to move on to a different level of instruction, etc. Typically, raw scores are used to reflect the number of correct responses, the number of completed objectives, etc. Such tests will often use percentages to reflect the level of mastery of a given instructional objective, such as setting a goal of "90% correct addition problems." Raw scores are converted to a percent correct.

Limitations of Test Scores

Scores can be useful, but must be interpreted carefully, keeping the purpose of the test in mind. A single test can only reflect a sample of skills; therefore, every score includes some degree of error in measurement - no score is absolutely accurate. Scores are influenced by many factors, but particularly by the reliability and validity of the test. Reliable tests will yield consistent results if administered many times. Valid tests actually measure and predict what they are intended to measure. For example, a valid reading decoding test will accurately distinguish students with strong and weak decoding skills; it will accurately predict which students will perform well in a phonics program and which students may have difficulty. We can place greater confidence in the scores provided by reliable and valid tests.

Many tests are of limited value when used with students who come from communities or cultures which are different from the White, middle class groups from which the scores are derived. Students who grow up in poor, rural or inner city communities, in ethnic minority families, or in families who primarily speak a language other than English are not typically well-represented in the standardization of tests, and scores obtained by these students often reflect some degree of bias.

Certain test scores are only useful for certain purposes, and not for others. Percentiles and standard scores, for example, are not intended to measure student growth. They only show relative standing compared to others, and this relationship does not necessarily change as students progress through the curriculum. Raw scores, such as words read correctly or number of multiplication problems solved correctly, can be plotted over time to reflect progress.

Limitations of test scores can be minimized by using other types of information to describe student performance. Test scores alone should never be used to make placement or other serious educational decisions. Observations, school and health history, parent information, daily work samples, etc. should be used in conjunction with test scores. Limitations of test scores can be further minimized by the use of qualified assessment personnel such as school psychologists to assist in the selection of tests and the interpretation of results.

Reporting Test Results to Parents

Parents are often overwhelmed by the test reports they receive from school personnel. In order to help establish a true partnership between parents and teachers, it is essential that information about student progress be communicated clearly, respectfully, and accurately. Test results should be provided to parents using:

• Simple, clear language free from educational and test jargon

• Explanations of the purpose(s) of the tests used

• Scores appropriate to the test's purpose

• Clear explanations of the meaning of scores

• Information about the limitations of the test scores

• Corroborating information from other sources

• Opportunities for parents to ask questions about scores, testing, etc.

• Information about how the test results will be used

Raw scores and percentiles, when appropriate to the tests used, are generally the most easily explained and familiar types of scores to report to parents. A visual reference such as the figure above may also be helpful, particularly in explaining the concept of "average." Numbers can be misleading and in some cases frightening; it is important for parents to understand the limitations of any test or score and to receive a comprehensive picture of their child's performance, not just test results.

Resources

The following texts provide an overview of assessment issues for educators, including test scores:

Gellman, E. (1995). School testing: What parents and educators need to know. Westport, CT:

Praeger. Hamill, D. (1987) Assessing the abilities and instructional needs of students. Austin, TX: Pro-Ed. Salvia, J. & Ysseldyke, J. (1992) Assessment in special and remedial education, 5th edition. Boston:

Houghton-Mifflin.

©1998 National Association of School Psychologists, 4340 East West Highway, Suite 402, Bethesda MD 20914-301-657-0270.