Final Report

Intermediate Level Assessment in Technology Education--Data and Information Related to Standard Setting

A study performed for the New York State Education Department by

Gary Echternacht
Gary Echternacht, Inc.
4 State Park Drive
Titusville, NJ 08560

(609) 737-8187
garyecht@aol.com

 

April 27, 2001

Introduction

The New York State Board of Regents has established learning standards all students must meet. One set of learning standards applies to mathematics, science and technology. The standards pertain to accomplishment in a broad range of content. Key ideas, performance indicators, and sample tasks further describe each learning standard. Standards are also broken down by educational level--elementary, intermediate, and commencement.

To assess the extent that students have met the learning standards, the New York State Education Department has developed a testing program. The content of the tests reflect accomplishment of the learning standards. For technology education, the State Education Department has developed an assessment in technology to reflect accomplishment of the learning standards for technology. Most students will take this test while in the middle school.

The purpose of the test is to help school districts identify specific areas of the middle school technology education program that may need change to help students achieve the intermediate level technology standards. Thus, the technology test is intended for program evaluation rather than for individual student accountability. School districts are encouraged to use the scores from the testing to assess their technology education programs and align their technology education programs with the learning standards.

The purpose of the study described in this report is to obtain information that the State Education Department can use to establish scores that will classify test takers into two categories. One category is intended to mean that the student did not meet the standards. The other score category is intended to mean that the student met the standards. Setting cut-points requires judgment. This study employs professionally established methods to quantify and summarize the judgements of experts related to how individuals who have met the learning standards will perform on the test.

The Assessment in Technology

The assessment in technology is a two-part written examination administered in one 90-minute block of time in June. Test content is based on the intermediate-level key ideas and performance indicators found in the learning standards and core curriculum developed and adopted by the Board of Regents. Seven content areas define content of the test:

  • Engineering design
  • Tools, resources and technological processes
  • Computer technology
  • Technological systems
  • History, evolution of technology
  • Impacts of technology
  • Management of technology

The two parts of the assessment are as follows:

  • Part I consists of 40 multiple-choice questions. The questions relate to all of the above content areas.
  • Part II consists of 8 extended response items. In an extended response question, the test taker must write a response, complete a drawing, or complete some specified task. Extended response items have possible scores of two or three points. There are 20 possible points on this part of the assessment. The extended response items are designed to test a student's critical thinking and problem solving skills in a key idea content area.

A complete description of the examination, including test specifications and scoring rubrics, is given in a test sampler.

Methods Employed

Data related to the performance standards for the test were obtained from a committee of experts. Judgments from committee members were quantified using standard practices employed by psychometricians who conduct standard setting studies. The committee made their judgments with respect to items from one test form, which had been designated as the anchor test form. Subsequent forms of the test would be equated to the anchor test form so that all scores would be comparable over different test forms.

Committee members were given definitions of two performance categories—having met the learning standards and not having met the learning standards. The State Education Department has developed these category definitions and they are applied to all of the tests that are being developed. In addition, committee members were given an exercise designed to help refresh their memories about the learning standards and an exercise in which they were asked to categorize some of their students into the performance categories as defined by the State Education Department.

The committee met as a group on February 24, 2001 at the State Education Department.

The standard setting study used the bookmarking approach because all the items and extended response items had been scaled using item response theory methods and because the bookmarking procedure enables committee members to consider multiple choice and extended response items together.

In the bookmarking procedure, items and extended response items are ordered in terms of the difficulty parameters obtained in the scaling and equating of pretest forms. Committee members are asked to apply their judgments to these ordered items. The committee meeting is conducted in rounds. The rounds and the activities employed in each round are given below.

Round

Activity

1

Committee members review the Learning Standards for technology. Committee members discuss how students who have not met the performance standards perform in class and on similar assessments. Committee members discuss the meaning of minimum competence and how that is reflected in student performance.

2

Working individually, committee members set their bookmark for having met the standards. That is, committee members conceive of an individual who has the minimum level of skill and knowledge needed to meet the learning standards and indicate the item at which such students are more likely to miss the item than to answer the item correctly (or to write an essay that is at least as good). Responses are recorded both on data sheets and in the notebook of ordered items.

3

A report of the results of round 2 is given committee members. The committee is divided into small groups and the individual results are discussed. Committee members revise their judgments in light of the discussion. Responses are recorded both on data sheets and in the notebook of ordered items.

Committee members were also asked two overall questions about accomplishment of the learning standards and test performance. Answers to these questions might aid New York in setting appropriate performance standards on the test. These questions asked:

  • Each committee member's estimate of the percentage of students in their classes who are currently meeting the learning standards. This estimate was based on their observations of class work and their own developed assessments
  • Which was the more serious error--to categorize a student as having met the learning standards who in reality has not met the learning standards or to categorize a student as having not met the learning standards when in reality that student has met the learning standards?

Committee Members

The New York State Education Department's Office of Curriculum and Instruction assembled a committee of 19 people to provide judgments for the study. Committee members were all current or former classroom teachers. All committee members were recognized as very knowledgeable of the learning standards related to technology education and of how students perform on standardized tests similar to the Intermediate Level Technology Education test. Some had worked on an aspect of either the standards or development of the tests.

Committee members, their schools, and the number of years experience each has in teaching technology education are given in the table below:

 

Committee Member

School and Location

Years Teaching Technology Education

Robert Bloom

Haviland Middle School

Hyde Park

15

Irene Bodnaruk

Cosgrove Middle School

Spencerport

6

Nancy Bryan

North Junior High School

New burgh

20

Sharon Crnkovich

Cosgrove Middle School

Spencerport

20

Karin Dykeman

Liverpool High School

Liverpool

9

John Gagliardo

Horseheads Middle School

Horseheads

28

Sandra Gleason

J.W. Leary Junior High School

Massena

14

Laura Gutenmann

Norwood-Norfolk Junior High School

Norwood

10

Jack Hall

J.W. Watson Bailey

Kingston

15

Melissa Hirt

Philip Livingston Magnet Academy

Albany

10

Alan Horowitz

Felix V. Festa Middle School

West Nyack

33

Kenneth Johnson

Roosevelt Middle School

Roosevelt

31

Christopher Lombardi

Burnt Hills Bellstrom Lake Middle School

Burnt Hills

11

James McNeight

School 27 and 70

Buffalo

12

Joseph Pesce

Hazard Street Middle school

Solvay

18

William Rock

Sommers Middle School

Somers

26

Henry Strada

Louis M. Klein Middle School

Harrison

16

James Taylor

Charlotte Middle School

Rochester

2

Jason Zuba

Emmet Belknap Middle School

Lockport

2

Committee members were chosen so that they would represent a wide range of schools and different types of students. Each committee member was asked to complete a short background questionnaire that included questions about their sex, ethnic background, and the setting for their school. Results of the questionnaire tabulations are given in the table below.

Characteristic

Percent of committee

Sex

Female

37%

Male

63%

Ethnic Background of Committee Member

African-American

11%

White

89%

School Setting

Urban

37%

Suburban

42%

Rural

21%

Findings related to the bookmarking procedure

Findings--Round 2

In round 2 every committee member independently placed his or her own bookmarks for having met the learning standard. Committee members noted the place on the difficulty continuum where the minimally competent individual (described to the committee as an individual on the borderline between having and not having met the learning standards). The results of the placements are given in the table below. The table gives the item response difficulty parameter and corresponding raw score on the anchor test form raw score. The maximum raw score for the test is 60. The cut-points presented include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee estimates.

Cut-point

Difficulty

Raw score

(Max=60)

Mean + 2 SD

1.14

59

Mean + 1 SD

.54

54

Mean

-.06

41

Mean - 1 SD

-.66

18

Mean - 2 SD

-1.26

9

75%

.7

57

Median

-.1

41

25%

-.6

21

Committee estimates varied widely, which is typical of an independent pass through the notebook. Variation was also expected because of the content area. Technology education programs vary widely in their delivery model design and programs have wide latitude in delivery of content. Committee members indicated that their judgments of whether a test item was easy or difficult depended in large part on whether or not the content was emphasized in class work.

Findings--round 3

In round three, committee members received a report of their round two results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for passing based on the information and knowledge they had gained up to this point. The round three results, which generally show little change from the round two results, are given in the table below.

Cut-point

Difficulty

Raw score (Max=60)

Mean + 2 SD

1.12

59

Mean + 1 SD

.52

54

Mean

-.08

41

Mean - 1 SD

-.68

18

Mean - 2 SD

-1.28

9

75%

.4

54

Median

-.2

40

25%

-.6

21

Although most committee members changed their bookmark, the overall distribution of bookmarks changed very little.

Other Judgments Obtained

Committee members were asked to provide their best judgment of the percentage of their students who are not currently achieving at least a proficient level of performance with respect to the learning standards. These judgments were made not with respect to the test, but with respect to the learning standards and the definitions of having met the learning standards. The committee averages and standard deviation are presented in the table below

Standard

Committee Average

Standard Deviation

% not meeting standards

23%

% meeting standards

77%

8%

The data in the table above relates to the cut-point score for the test in that the committee on average was indicating that in their judgment almost one of four students in the state is not currently achieving at the minimum level suggested by the learning standards. This assessment was made without test scores and is independent of the test scores. The percentages of students not currently meeting the learning standards reported varied between 15% and 40%.

With respect to the relative severity of the errors of classification, the committee reported different results depending on the use of the test. When the test is used for program evaluation, 58% of the committee said that the most serious classification error classifying a student as having met the standards when in fact the student had not met the standards. If the test were used as an individual accountability measure (i.e., passing the test is a requirement for graduation or promotion), 95 % of the committee in indicated that the most serious classification error was classifying a student as not having met the learning standards when in fact the student had met the learning standards.

Discussion and Recommendations

The purpose of this study is to obtain data and information that New York may use in setting cut-points for its assessment in technology education. The data should be used to guide those decisions.

The committee that provided the data was diverse and well represented the diversity of New York students, teachers, and school districts. With that diversity, it is not surprising that committee judgments varied.

The final bookmarks from the procedure are given in the table below

Cut-point

Difficulty

Raw score (Max=60)

Mean + 2 SD

1.12

59

Mean + 1 SD

.52

54

Mean

-.08

41

Mean - 1 SD

-.68

18

Mean - 2 SD

-1.28

9

75%

.4

54

Median

-.2

40

25%

-.6

21

The committee also indicated that currently about 23% of students are not currently achieving the learning standards. Further, the committee believed that as long as the test is used for program evaluation, the misclassification error of classifying a student as having met standards who has not actually met the learning standards should be minimized.

What should be made of these results?

The study author recognizes that New York has the responsibility and duty to set cut-points in such a way that the purpose of the testing program is best accomplished. That requires judgment and consideration of all the data and information that is available at the time cut-points are set.

To the study author, one item stands out in importance. Committee members indicated in their judgments of the relative severity of classification errors that their choice of cut-point depends on how the tests is used. As long as the test is used for program evaluation, the committee is much more tolerant of misclassifying students who have met the standards, but who score below the cut-point on the test. To the study author, that implies setting an initial cut-point that is near the committee bookmark average.

Thus, the study author recommends that the initial cut-point be somewhere between a raw score of 37-45. The maximum raw score for the test is 60. If forced to choose a raw score for initial implementation of the testing, the study author would choose a raw score of 41 as the score. New York is strongly encouraged to make its own decision based on the data available, however.

It is important to note that this study relies heavily on the accuracy of the IRT difficulty parameters that were obtained during pretesting. The study author notes that those parameters have extremely large standard errors—of the magnitude of about .1 unit. These large standard errors are due to three factors:

  • Relatively small sample sizes.
  • Variation in motivation on the part of students.
  • Variation in degree to which different content areas are currently emphasized.

To the study author, these large standard errors make the initial choice of a cut-point tentative until further data is collected. The study author recommends that:

  1. The state set an initial cut-point that falls within the range of 37-45 raw score points.
  2. The state should obtain a distribution of scores from school districts who use the test in 2001. This distribution should be both a distribution of raw scores and passing percentages.
  3. If the percentage of students who are considered to have met the learning standards based on test scores is between 70% and 85% (recall that the committee indicated that 77% of students were currently meeting state learning standards for technology education), keep the initially established cut-point.
  4. If the percentage falls outside of that range, reconvene the committee and repeat the bookmarking procedure, but add another round where committee members are given the estimated percentage of students who would be considered as having met the learning standards for their choice of cut-point.
  5. Repeat recommendations 2-4 for the tests given in 2002.

If the state chooses to change the intended use of the test from that of program evaluation to individual accountability, at the time the state does this, the study author recommends that the cut-point be lowered by one or two raw score points and that recommendations 2-4 again be implemented.