Final Report

Living Environment Regents Examination--
Data and Information Related to Standard Setting

A study performed for the New York State Education Department by

Gary Echternacht
Gary Echternacht, Inc.
4 State Park Drive
Titusville, NJ 08560

(609) 737-8187
garyecht@aol.com
mailto:garyecht@aol.com

April 27, 2001

Introduction

The New York State Board of Regents has established learning standards all students must meet to graduate from high school. One set of learning standards is for mathematics, science and technology. Within those learning standards, some apply to Living Environment.

Key ideas, performance indicators, and sample tasks further describe each learning standard. Standards are also broken down by educational level--elementary, intermediate, and commencement. To assess the extent that students have met the learning standards, the New York State Education Department has developed a testing program. The content of the tests reflect accomplishment of the learning standards. For Living Environment, the State Education Department has developed a Regents Examination in Living Environment to reflect accomplishment of the learning standards pertaining to the appropriate standards.

Although scores for Living Environment Regents Examination are placed on a numerical scale, essentially there are only three scores—does not meet standards, meets standards, and meets standards with distinction. New York State teachers, using professionally established procedures, have developed the test items, and the items have been pretested and field-tested on samples of students.

The purpose of the study described in this report is to obtain information that the State Education Department can use to establish scores that will classify test takers into does not meet standards, meets standards, and meets standards with distinction categories. Setting cut-scores requires judgment. This study employs professionally established methods to quantify and summarize the judgments of experts related to how individuals who have met the learning standards will perform on the test.

The Living Environment Regents Examination

The Living Environment Regents Examination assesses student achievement at the commencement level. Items for the examination were developed through the cooperative efforts of teachers, school districts, other science educators, and New York State Education Department staff. The concepts and skills tested can be found in the Living Environment Core Curriculum. Students are asked to graph, complete a data table, label diagrams, design experiments, analyze data, and write responses. In addition questions require students to hypothesize, interpret, evaluate and apply their scientific knowledge to real-world situations.

The examination is administered in a three hour period and has three parts:

Part A consists of multiple choice questions assessing the student’s knowledge and understanding of core material.
Part B consists of multiple choice and constructed response questions assessing the student’s ability to apply, analyze, and evaluate material.
Part C consists of constructed response and extended response questions assessing the student’s ability to apply knowledge of science concepts and skills to address real-world situations.

The examination blueprint, taken from the test sampler, is given in the table below:

Content

Approximate Weight (%)

Standard 1 (Analysis, Inquiry, and Design)

Laboratory Checklist

10-20

Standard 4

Key Idea 1

13 – 17

Key Idea 2

9 – 13

Key Idea 3

8 – 12

Key Idea 4

6 – 10

Key Idea 5

13 – 17

Key Idea 6

10 – 14

Key Idea 7

11 - 13

A complete description of the examination, including test specifications and scoring rubrics, is given in a test sampler.

Methods Employed

Data related to the performance standards for the test were obtained from a committee of experts. Judgments from committee members were quantified using standard practices employed by psychometricians who conduct standard setting studies. The committee made their judgments with respect to the difficulty scale resulting from the scaling and equating of field test items. In the field testing, each item, or score category if the item has multiple scores, is given a difficulty parameter obtained through item response methods. Test items corresponding to various points on the difficulty scale are presented as examples of test items at that difficulty level. The items used for the study came from the anchor test form. The anchor test form is the test form upon which the cut-points are set and the form to which all later forms of the test will be equated.

Committee members were given definitions of three performance categories—does not meet standards, meets standards, and meets standards with distinction. The State Education Department has developed these category definitions and they are applied to all of the Regents tests that are being developed. In addition, committee members were given an exercise designed to help familiarize them with the examination and an exercise in which they were asked to categorize some of their students into the three performance categories.

The committee met as a group on March 30, 2001 at the State Education Department.

The standard setting study test used the bookmarking approach because all the multiple choice items and constructed response item had been scaled using item response theory methods and because the bookmarking procedure enables committee members to consider these two item types together.

In the bookmarking procedure, multiple choice items and constructed response items are ordered in terms of their difficulty parameters. The purpose of the items is to illustrate the meaning of the difficulty scale at specific points. Committee members are asked to apply their judgments to these ordered items. The committee meeting is conducted in rounds. The rounds and the activities employed in each round are given below.

Round

Activity

1

Committee members review the Learning Standards for the content area and consider ways of measuring accomplishment of the performance indicators and key ideas. Committee members review the ordered items and learn and understand the increasing complexity of the items and responses required.

2

Working individually, committee members set their bookmark for meeting standards. That is, committee members conceive of an individual who has the minimum level of skill and knowledge needed to meet standards and indicate the last item (or difficulty level) that the hypothetical individual is likely to answer correctly two-thirds of the time (or to construct a response that is at least as good).

3

Working individually, committee members set their bookmarks for meeting standards with distinction. That is, committee members conceive of an individual who has the minimum level of skill and knowledge needed to meet the standards with distinction and indicate the last item at which students are likely to answer correctly (or to construct a response that is at least as good).

4

A report of the results of round 2 is given committee members. The committee is divided into small groups and the individual results are discussed. Committee members revise their judgments in light of the discussion.

5

The same procedure as in round 4 is used with the round 3 results.

6

A report of rounds 4 and 5 are given the committee. Also given the committee are the impacts (percent not meeting standards and percent meeting standards with distinction based on field test results). Committee members make final judgments based on the accumulated judgments and data.

Committee members were also asked four overall questions about accomplishment of the learning standards and test performance. Answers to these questions might aid New York in setting appropriate performance standards on the test. These questions asked:

Each committee member's estimate of the percentage of students in their classes who are currently meeting the learning standards.

Each committee member's estimate of the percentage of students in their classes who are currently meeting the learning standards with distinction.

Which was the more serious error--to categorize a student as having not met the standards when in reality the student has met the standards or to categorize a student as having met the standards when in reality the student has not met the standards?

Which was the more serious error--to grant distinction to a student who has not met the learning standards at that level or to fail to grant distinction to a student who has achieved that level of proficiency.

Committee Members

The New York State Education Department's Office of Curriculum and Instruction assembled a committee of 22 people to provide judgments for the study. Committee members were, with one exception, current or former classroom teachers. All committee members were recognized as very knowledgeable of the learning standards pertaining to living environment and of how students perform on standardized tests similar to the Living Environment Examination. Some had worked on an aspect of either the standards or development of the tests.

Committee members, their schools, the number of years experience each has in teaching Living Environment or Biology and the number of students who are in their Living Environment or Biology classes are given in the table below.

Committee Member

School and Location

Years Teaching

Number of Students Currently

Donna Barosso

Maple Hill High School

Castleton

21

60

Joyce Thornton Barry

Carle Place High School

Carle Place

12

27

Natasha Bell

Josh Lofton High School

Rochester

16

27

Arlene Blecher

Long Island City High School

Long Island City

30

34

Arthur Broga

Canastota High School

Canastota

33

40

James Buckley

Edwards-Knox Central School

Russell

18

60

Barbara Byrne

Ichabod Crane High School

Valatie

23

95

Diane DiGravio

Armstrong Middle School

Ontario Center

3

38

Jack Edelman

John Dewey High School

Brooklyn

22

140

James Fizer

Port Jervis High School

Port Jarvis

25

60

Joyce Fruchter

Yeshivah of Flatbush High School

Brooklyn

16

90

Kathy Giglio

Odyssey School

Rochester

22

0

Dawn George

Hilton High School

Hilton

9

53

Ellen Mandel

Campus Magnet High Schools

Cambria Heights

22

15

Barbara Poseluzny

Phillip Randolph Campus High School

New York

25

60

Patricia Prime

Canajoharie High School

Canajoharie

17

78

Kathleen Schuehler

Liverpool High School

Liverpool

22

100

Jim Senning

Peekskill High School

Peekskill

22

68

Diana Shanty

Scotia-Glenville High School

Scotia

18

115

William Siebert

Arlington High School

LaGrangeville

19

85

Winston Silvera

Harry S. Truman High School

Bronx

30

170

Philip Zenowich

Saratoga Central Catholic High School

Saratoga Springs

12

40

Committee members were chosen so that they would represent a wide range of schools and different types of students. Each committee member was asked to complete a short background questionnaire that included questions about their sex, ethnic background, and the setting for their school. Results of the questionnaire tabulations are given in the table below.

Characteristic

Percent of Committee

Sex

Female

64%

Male

36%

Ethnic Background of Committee Member

African-American

5%

White

90%

Other

5%

School Setting

New York City

27%

Other urban

9%

Suburban

41%

Rural

23%

Findings related to the bookmarking procedure

Findings--Round 2

In round 2 every committee member independently placed his or her own bookmarks for meeting standards. The results of the placements are given in the table below. The table gives the difficulty parameter, its corresponding raw score, and the corresponding percent of students that would fall below that cut-point based on the field test data. The cut-points include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee estimates.

Cut-point

Difficulty

Raw score (Max=85)

Percent below

Mean + 2 SD

1.38

60

74%

Mean + 1 SD

.75

46

44%

Mean

.13

34

19%

Mean - 1 SD

-.50

15

2%

Mean - 2 SD

-1.12

6

0%

75%

.25

34

19%

Median

.00

31

14%

25%

-.20

22

6%

Findings--round 3

In round 3 every committee member independently placed his or her own bookmarks for meeting standards with distinction. The results of the placements are given in the table below. The table gives resulting difficulty parameter, its corresponding raw score, and the corresponding percent that would fall above that cut-point based on the field test data. The cut-points include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee estimates.

Cut-point

Difficulty

Raw score (Max=85)

Percent above

Mean + 2 SD

2.70

84

0%

Mean + 1 SD

2.22

82

1%

Mean

1.74

70

13%

Mean - 1 SD

1.26

57

31%

Mean - 2 SD

.78

47

54%

75%

2.00

75

8%

Median

1.80

71

11%

25%

1.58

66

18%

Findings--round 4

In round four, committee members received a report of their round two results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for meeting standards based on the information and knowledge they had gained up to this point. The round four results, which generally show less variation than the round two results, are given in the table below.

Cut-point

Difficulty

Raw score (Max=85)

Percent below

Mean + 2 SD

.54

38

26%

Mean + 1 SD

.24

34

19%

Mean

-.07

28

11%

Mean - 1 SD

-.37

17

3%

Mean - 2 SD

-.68

12

1%

75%

.10

33

18%

Median

.00

31

14%

25%

-.28

21

5%

Findings--round 5

In round five, committee members received a report of their round three results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for meeting standards with distinction based on the information and knowledge they had gained up to this point. The round five results, which generally show less variation than the round three results, are given in the table below.

Cut-point

Difficulty

Raw score (Max=85)

Percent above

Mean + 2 SD

2.18

81

2%

Mean + 1 SD

2.03

77

6%

Mean

1.88

72

10%

Mean - 1 SD

1.74

70

13%

Mean - 2 SD

1.59

66

18%

75%

1.90

72

10%

Median

1.80

71

11%

25%

1.80

71

11%

Findings--round 6

In round six, committee members received a report of their round four and five judgments. They also received a report of the impact of their estimates from that round. Impact was reported in terms of the frequency distributions of the field test scores. The committee was also advised that scores from field testing generally underestimate operational test performance, but that the amount of the underestimate was not known. Committee members then returned to their groups and discussed the report and their judgments. At the end of the discussion, committee members were asked to place final bookmarks for both meeting standards and for meeting standards with distinction based on the information and knowledge they had at that time. Results of this final placement are given in the table below.

 

Cut-point

Meeting standards

Meeting standards with distinction

Diff

Raw score

Percent below

Diff

Raw score

Percent above

Mean + 2 SD

.45

36

23%

2.24

83

0%

Mean + 1 SD

.21

34

19%

2.10

80

3%

Mean

-.02

30

13%

1.95

73

10%

Mean - 1 SD

-.26

21

5%

1.81

71

11%

Mean - 2 SD

-.49

16

3%

1.66

68

15%

75%

.10

33

18%

2.00

75

8%

Median

.00

31

14%

1.90

72

10%

25%

-.15

24

7%

1.86

72

10%

Other Judgments Obtained

Committee members were asked to provide their best judgment of the percentage of their students who are currently achieving the learning standards as well as the percentage of their students who are achieving with distinction with respect to the learning standards. These judgments were made not with respect to the test, but with respect to the learning standards and the definitions of meeting standards and meeting standards with distinction. The committee averages and standard deviations are presented in the table below.

Standard

% Meeting Standards

% Meeting Standards with Distinction

Mean + 2 SD

100%

57%

Mean + 1 SD

89%

38%

Mean

73%

19%

Mean - 1 SD

58%

0%

Mean - 2 SD

43%

0%

75%

84%

24%

Median

75%

14%

25%

71%

10%

The data in the table above relates to the cut-points for the test in that the committee on average was indicating that in their judgment almost one of four students in the state is not currently achieving at the minimum level suggested by the learning standards. This assessment was made without test scores and is independent of the test scores. Similarly, the committee on average judged that about 15%-20% of students were achieving at the distinguished level.

Also noteworthy are the relatively large variation for the estimates. This reflects the very real variation in achievement among classrooms. For example, estimates of the percentage of students achieving at least at the meets standards level ranged from 20% to 90%. For meeting standards with distinction, the estimates ranged from 0% to 85%.

With respect to the relative severity of the errors of classification, 73% of the committee said that categorizing a student as having not met the standards who in reality has met them was more serious an error than classifying a student as having met the standards who in reality has not met the learning standards. Twenty-seven percent of the committee said the opposite. With respect to meeting the standards with distinction, the committee was evenly divided. Fifty-five percent said that granting distinction to a student who in reality has not reached that level of achievement was more serious than failing to grant distinction to a student who has achieved at that level. Forty-five percent of the committee said the opposite.

The above suggests that the committee might be regarded as "lenient" with respect to the cut-point for meeting standards, but could be considered indifferent with respect to errors of classification for the cut-point for meets standards with distinction.

Discussion and Recommendations

The purpose of this study is to obtain data and information that New York may use in setting cut-points for the Living Environment Examination. The data should be used to guide those decisions.

The committee that provided the data was diverse and well represented the diversity of New York students, teachers, and school districts. With that diversity, it is not surprising that committee judgments varied.

The final bookmarks from the procedure are given in the table below.

 

Cut-point

Meeting standards

Meeting standards 
with distinction

Diff

Raw score

Percent below

Diff

Raw score

Percent above

Mean + 2 SD

.45

36

23%

2.24

83

0%

Mean + 1 SD

.21

34

19%

2.10

80

3%

Mean

-.02

30

13%

1.95

73

10%

Mean - 1 SD

-.26

21

5%

1.81

71

11%

Mean - 2 SD

-.49

16

3%

1.66

68

15%

75%

.10

33

18%

2.00

75

8%

Median

.00

31

14%

1.90

72

10%

25%

-.15

24

7%

1.86

72

10%

The committee also indicated that currently about 20%-30% of students are not achieving the learning standards and that about 15%-20% of students are achieving at the distinction level with respect to the learning standards. Further, the committee overwhelmingly believes that the error of classifying a student as having not met standards who has met the standards should be minimized. The committee seems indifferent with respect to classification errors at the distinction level.

Finally, it is well known that student performance improves once operational testing begins. What is not known is the amount of improvement that might be expected.

What should be made of these results?

The study author recognizes that New York has the responsibility and duty to set cut-points in such a way that the purpose of the testing program is best accomplished. That requires judgment and consideration of all the data and information that is available at the time cut-points are set. The study author strongly encourages New York to not routinely adopt the mean committee bookmarks and to consider all of the relevant data presented and to exercise its judgment within the parameters suggested in this report.

To the study author, one item stands out in importance. Committee members overwhelmingly indicated that it was more important to not fail a student who had the requisite level of skill and knowledge than to pass someone who did not have the requisite skill or knowledge. To the study author, that implies setting a cut-point that might best be described as "lenient," or "giving the benefit of any doubt" to the student.

At the same time, committee members indicated that in their judgment about 25% of students are not currently performing at least at the meets standards level.

Although the committee was willing to give the benefit of doubt to the student when it came to meeting standards, that was not the case when it came to meeting standards with distinction. There committee members were equally divided about the severity of the two types of classification error. To the study author, that implied that one need not, nor should not, be willing to give any benefit of doubt to the student at this level of performance.

A second item that is extremely important has to do with the impact of any cut-point. It is well known that field test results underestimate how well students perform on operational testing. Under performance in field testing is due to several factors, chief of which are student recognition that the test scores do not count and that teaching practices are not yet congruent with the standards on which the tests are based. The amount of underestimation for Living Environment is unknown. This suggests to the study author that the state closely monitor the initial operational administrations and repeat the standard setting with difficulty parameters and impact data based on operational data.

For initial operational testing, the study author recommends that the cut-point for meeting standards be set within the raw score range of 30-40. The committee means and medians fall within this range. Within this range, the final cut-point should be set based on the state's best judgment as to the improvement that will actually occur once operational testing begins. That judgment should be informed by discussions with test developers, curriculum specialists, and teachers. The study author recommends choosing a raw score of 36, which is high with respect to the committee bookmarks, but more in line with committee estimates of current performance.

For initial operational testing, the study author recommends that the cut-point for meeting standards with distinction be set within the raw-score range of 72-78. Again, all committee mean and median judgments fall within that range. And again, within that range choice should be made based on the estimated improvement from field testing to operational testing. The state should realize, however, that improvement in the upper range of scores is likely to be less than improvement in the lower range of scores. The study author recommends choosing a raw score of 73 for that cut-point.

Reconvened Supplement: Reconvened Round of Living Environment Standard Setting

Six committee members were reconvened for a final round of judgment for the standard setting of Living Environment. Their instructions are given below. The final round was convened to confirm or adjust the judgments made to this point in the study. The same criteria for judgments were employed in this round.

After about two hours of deliberation the six judges set the logistical values for passing and for passing with distinction as follows:

Judge

Passing

Passing with Distinction

1

2

3

4

5

6

0.1

0.1

0.4

0.4

0.0

0.0

1.8

1.9

2.1

2.1

1.9

2.1

Sum

Mean

1.0

0.167

11.9

1.983

Conclusion

The passing cut score is about .639 standard deviations higher than the mean judgment to this point. The passing with distinction cut score is about the same. These both represent closer agreement with the definitions of achieving the standards in the best judgment of the panelists.

Final Round: Questions for Panel

You have made your judgments and have had the opportunity to review and adjust them in terms of the judgments made by the other panelists. I am going to ask you to:

Please record what you think was your original judgment for passing or for passing with distinction

The cutoff scores recommended thus far are was -0.2 and 1.95, respectively. If there are any adjustments you would like to make based on the originals, please record them.