Achievement Tests
Designed to measure what you already know.
In 1962, the Scholastic Aptitude Test replaced the
essay test used by the College Entrance Examination Board.
This test, and the advent of machine scoring led to
a rapid increase in the use of standardized achievement tests in the U.S.
Achievement testing serves many purposes :
1.
Assess
level of competence
2.
Diagnose
strength and weaknesses
3.
Assign
Grades
4.
Achieve
Certification or Promotion
5.
Advanced
Placement/College Credit Exams
6.
Curriculum
Evaluation
7.
Accountability
8.
Informational
Purposes
Differences
in Approaches to Achievement Testing
The information gained from a
standardized test is dependent upon how the testing is incorporated into the
learning material.
Summative
Evaluation
: Testing is done at the end of the
instructional unit. The test score is
seen as the summation of all knowledge learned during a particular subject
unit.
Formative
Evaluation
: Testing occurs constantly with learning so that teachers can evaluate the
effectiveness of teaching methods along with the assessment of students'
abilities.
Standardized
Achievement tests can be :
Norm-Referenced
Criterion
Referenced
The
National Assessment of Educational Progress
This
organization is dedicated to improving the effectiveness of our schools
In
order to accomplish this goal, they make objective information concerning
scholastic performance available to educators and public policy officials.
They
use a criterion-referenced approach to evaluating performance in ten subject
areas, which are age stratified in four groups : at age 9,13,17, and 25-35.
10
subject areas they develop criterion reference for
Art
Occupational
Development
Citizenship
Literature
Math
Music
Reading
Science
History
Writing
The
criterions they set can be used as guidelines to evaluate the effectiveness of
the educational system within a particular area by comparing the performance to
the national criterion levels.
Types
of Standardized Achievement Tests
4
major Categories
Survey Test Batteries : Commonly used to
determine general standing with respect to group performance. He battery is a
group of subject area tests, usually containing a fairly limited sample of
questions with in each subject area.
Test batteries usually have lower reliabilities than single
subject survey tests bc of the limited question sample of each subject area.
Single
Subject Survey Tests : Longer and more detailed than batteries, but only one subject are is
covered by the test. Greater sampling of questions means higher levels of
reliability than survey batteries.
Diagnostic
Tests : Allows for the identification of specific strengths and
weaknesses within a subject area by subdividing the subject area into the
underlying components. Diagnostic tests are common in the areas of reading,
mathematics, spelling, and foreign languages are most common.
Prognostic
Tests
: Aptitude tests which are designed to
predict achievement in specific school subjects.
Criticisms
of College Entrance Examinations
1.
Preparation
for college entrance exams takes up time previously devoted to learning.
2.
Multiple
Choice questions are inherently biased bc they :
A. Favor Shrewd, nimble witted rapid readers.
B. Penalize creative, more profound thinkers
C.
Concerned with only the answer, not how the person came up with the answer.
(Banesh Hoffman, 1962)
D. Encourage Improper study habits such
as rote memorization
Criticism
of Educational Testing Service (ETS)
ETS
came under fire in the 1980's by Raplph Nader, a perpetual candidate for the
presidency and a consumer advocate.
Nader
criticized the SAT's and ACT's for not measuring imagination, idealism,
determination, and other abilities which he considered important in quantifying
human qualities.
Allan
Niarn (1980), a collegue of Nader's claimed that the SAT's measure Social
Class, rather than educational aptitude.
Nairn
claims the ETS is trying to suppress this information, bc the evil purpose
behind the SAT is to maintain the status quo of society and to deny opportunity
to those of lower socio economic status.
Nairn
claims that since the SAT is a poor p[redicotr of college grades, a different
measure of assessment should be developed.
ETS
responded to these highly publicized attacks by claiming the SAT does not deny
access to higher education for individuals from working class and poor
families.
The
National Center for Fair and Open Testing
(Fairtest)
Continued
the attacks made by Nairn by claiming the SAT is biased against minority groups
and women, and therefore deny them an equal opportunity for higher education.
They
also criticized the use of "experimental" sections of the SAT which
were not used for grading purposes.
New
York State has a "truth in testing" law which requires all test takers
be given copies of their own answer sheet, and informed how test scores will be
computed. In addition, test developers must file information concerning the
validity of the standardized measure with the State commission of Education
The
ETS claims that careful internal review of all potential test items has
resulted in removing bias within an item which would adversely affect scores of
women and/or minorities.
Another
concern of Standardized Testing
How
much can coaching affect scores ?
If
scores are significantly affected by Kaplan seminars or other preparation
methods, how reliable are these standardized tests ?
To
the degree that people with more disposable resources (higher SES) are more
likely to take advantage of these programs, is a class bias being created that
affects the interpretation of the test scores ?
Studies designed to measure the
effectiveness of these testing seminars has produced mixed results :
College
Entrance Examination Board (1971) : Claimed there was no evidence that short -term, intensive drilling on SAT type
questions did not lead to significantly higher scores on the verbal portion of
the SAT.
However,
studies done by Stanley Kaplan and the FTC (1979) showed significant gains
after a 10 week coaching course.
ETS
reanalyzed the FTC data and concluded the coaching sessions could add 20 - 35
points to both Math and verbal scores.
Demographic
Differences in SAT scores :
SAT
scores increased from 1950's - 1960's
And
have been declining ever since.
The
SAT scores were renormed in 1996, to bring the mean back to 500 and the
standard deviation back to 100.
Declines
occurred both sexes, for all ethnic groups, and for both low and high
performers.
Numerous
Explanations for the drop in Scores :
Television
Less
parental attention
Teachers
paying less attention to students
Less
parental supervision
Less
parental concern
Students
less motivated
Substance
Abuse among students
Increased
permissiveness in society
Autin
& Garber 1982 analyzed the decline in SAT scores
They
found that:
1/2
of the variance in the decline in scores was do to difference's in the overall
composition of students taking the exam.
As
college became more accessible to students from middle and lower class
backgrounds, more and more students began taking the entrance exam.
As
the population of students taking the SAT began to more closely resemble the
population of all possible students who could take the exam, the scores
dropped.
This
is one qualification public schools in NYS make when cautioning parents when
interpreting standardized test scores by school district.
If
the school district tries to maximize the proportion of students taking these
exams, they will take pains to mention that fact when their "report
cards" come out.
A high level
of performance at a school where only 20% of students take that exam can not be
meaningfully compared to test performance from a school which has 85% of their
students take the standardized exam.
Demographic
Differences in Scores
Examine
Gender, Geography, and Ethnicity
Gender
: Men score 37 points higher than
females on SAT-Math section.
Testosterone, hemispheric lateralization, differential reinforcement
from math teachers, and differential cultural expectations are four
hypothesized differences for this discrepancy.
Men
score 7 points higher than women on the SAT verbal section.
Bob
Shaffer from Fairtest : Girls are more
inclined to think through a problem, and weigh all the options, and that puts
them at a strategic disadvantage in multiple choice tests.
This
gender gap means women are less likely to receive scholarships than men.
ETS
claims the gender gap represents genuine differences in education.
In
college freshman and sophomore years, the women achieve a higher GPA than men,
on average.
Ethnicity
Differences in SAT scores
SAT
scores for Asian-Americans is higher than for Caucasian Americans
SAT
scores for all other minority groups fall below the test score levels of
Caucasian Americans.
These
lower SAT scores are accounted for in large part by :
Lower
family income of minorities, compared to Caucasians
Lower
Educational level of Parents, compared to Caucasian parents.
Average
SAT scores from large cities are typically lower than average.
Average
SAT scores from suburban regions are typically higher than average.
To
the degree primary education systems suffer from institutional racism as a
result of funding policies, the class differences which produce error variation
in SAT scores will still persist into the future.
Conclusion
Despite
the many criticisms, the SAT is still the single best predictor of who will be
seen a s academically successful during the first year of college.
The
SAT's will continue to be used extensively in the future.
The
differences in education exposed by the SAT (gender, ethnicity, geographic)
illustrates areas of improvement which must be made to ensure equal opportunity
for higher education for all Americans, not simply those who are well off
financially and live in educational districts where high levels of educational
attainment is the norm, rather than the exception.
Both 2000 presidential candidates have
stated that education is one of their "top priorities" should they be
elected.
Bush
is pushing "National Testing" as a way to ensure progress is being
made.