Final Report

Mathematics B Regents Examination--Data and Information Related to Standard Setting

A study performed for the New York State Education Department by

Gary Echternacht
Gary Echternacht, Inc.
4 State Park Drive
Titusville, NJ 08560

(609) 737-8187
garyecht@aol.com

April 27, 2001

Introduction

The New York State Board of Regents has established learning standards all students must meet to graduate from high school. One set of learning standards is for mathematics, science and technology. Students entering grade nine in September 2001 and thereafter will have to complete three credits in mathematics and will have two diploma options. All students will have to pass the Mathematics A Regents Examination in order to obtain a Regents Diploma. Students wishing to pursue an Advanced Regents Diploma will also have to pass the Mathematics B Regents Examination. Thus, the Mathematics B Regents Examination is required only for students seeking an advanced Regents Diploma. A curriculum for mathematics B is currently being developed, but is not yet widely implemented.

Although scores for Mathematics B Regents Examination are placed on a numerical scale, essentially there are only three scores—does not meet standards, meets standards, and meets standards with distinction. New York State teachers, using professionally established procedures, have developed the test items, and the items have been pretested and field-tested on samples of students.

The purpose of the study described in this report is to obtain information that the State Education Department can use to establish scores that will classify test takers into does not meet standards, meets standards, and meets standards with distinction categories. Setting cut-scores requires judgment. This study employs professionally established methods to quantify and summarize the judgements of experts related to how individuals who have met the learning standards and curricular objectives for mathematics B will perform on the test.

The Mathematics B Regents Examination

The Mathematics B Regents Examination is a four-part examination. Test content is based on the commencement-level key ideas and performance indicators found in the Learning Standards for mathematics and the Mathematics Resource Guide with Core Curriculum, published by the State Education Department. The four parts of the examination are as follows:

  • Part I consists of 20 multiple-choice questions. The questions cover all content areas. There is no partial credit given on this section of the test. Answers are recorded on a separate answer sheet. Each correct answer scores two points. Each incorrect answer scores 0 points.
  • Part II consists of six open-ended response questions. These questions are scored either as 0, 1, or 2. If the question is not answered correct, a score of 0 is given. If the question is answered correctly, but without the work needed to get the answer shown, a score of 1 is given. If the answer is correct and the work is shown, a score of 2 is given.
  • Part III consists six open-ended response questions. These questions are scored on a 0-4 point scale. A score of 1 is given for a correct answer that is given without supporting work. Otherwise partial credit is given responses.
  • Part IV consists of two open-ended questions. These questions are scored on a 0-6 point scale. A score of 1 is given for a correct answer that is given without supporting work. Otherwise partial credit is given responses.

In general, the complexity of questions in Parts II-IV increases. Total test scores are found by adding the number of points over all parts of the test.

Specifications for the test are given in the table below:

Key Ideas

Range

Item type

Number of Items, points

Mathematical Reasoning

5-10%

Multiple-choice

20, 2 points each

Number and Numeration

5-10%

Short constructed response

6, 2 points each

Operations

5-10%

Longer constructed response

6, 4 points each

Modeling/Multiple Representations

15-25%

Extended constructed response

2, 6 points each

Measurement

15-20%

Uncertainty

10-15%

Patterns and Functions

15-25%

Test takers are required to use graphing calculators.

A panel of mathematics experts at the high school and college level, with representatives from business and the community, developed the Mathematics B section of the core curriculum from portions of the commencement and four-year sequence level of the mathematics learning standards.

A complete description of the examination, including test specifications and scoring rubrics, is given in a test sampler.

Methods Employed

Data related to the performance standards for the test were obtained from a committee of experts. Judgments from committee members were quantified using standard practices employed by psychometricians who conduct standard setting studies. The committee made their judgments with respect to the difficulty scale resulting from the scaling and equating of field test items. In the filed testing, each item, or score category if the item has multiple scores, is given a difficulty parameter obtained through item response methods. Test items corresponding to various points on the difficulty scale are presented as examples of test items at that difficulty level. The majority of the items used came from the anchor test form. The anchor test form is the test form upon which the passing standards are set and the form to which all later forms of the test will be equated.

Committee members were given definitions of three performance categories—not meeting standards, meeting standards, and meeting standards with distinction. The State Education Department has developed these category definitions and they are applied to all of the Regents tests that are being developed. In addition, committee members were given an exercise designed to help familiarize themselves with the examination and an exercise in which they were asked to categorize some of their students into the performance categories as defined by the State Education Department.

The committee met as a group on March 1, 2001 at the State Education Department.

The standard setting study test used the bookmarking approach because all the multiple choice items and constructed response item had been scaled using item response theory methods and because the bookmarking procedure enables committee members to consider these two item types together.

In the bookmarking procedure, multiple choice items and constructed response items are ordered in terms of their difficulty parameters. The purpose of the items is to illustrate the meaning of the difficulty scale at specific points. Committee members are asked to apply their judgments to these ordered items. The committee meeting is conducted in rounds. The rounds and the activities employed in each round are given below.

Round

Activity

1

Committee members review the Learning Standards for the content area and consider ways of measuring accomplishment of the performance indicators and key ideas. Committee members review the ordered items and learn and understand the increasing complexity of the items and responses required.

2

Working individually, committee members set their bookmark for passing. That is, committee members conceive of an individual who has the minimum level of skill and knowledge needed to meet the standards and indicate the last item (or difficulty level) where the hypothetical individual is likely to answer the item correctly two-thirds of the time (or to construct a response that is at least as good).

3

Working individually, committee members set their bookmarks for meeting standards with distinction. That is, committee members conceive of an individual who has the minimum level of skill and knowledge needed to meet the standards with distinction and indicate the last item at which such students are likely to answer correctly (or to construct a response that is at least as good).

4

A report of the results of round 2 is given committee members. The committee is divided into small groups and the individual results are discussed. Committee members revise their judgments in light of the discussion.

5

The same procedure as in round 4 is used with the round 3 results.

6

A report of rounds 4 and 5 are given the committee. Also given the committee are the impacts (percent failing and passing with distinction based on field test results). Committee members make final judgments based on the accumulated judgments and data.

Committee members were also asked two overall questions about accomplishment of the learning standards and test performance. Answers to these questions might aid New York in setting appropriate performance standards on the test. These questions asked:

  • Which was the more serious error--to pass a student who has not met the learning standards and curricular objectives or to fail a student who has met the learning standards and curricular objectives.
  • Which was the more serious error--to pass with distinction a student who has not met the learning standards and curricular objectives at that level or to fail to pass a student with distinction who had achieved at that level.

Committee Members

The New York State Education Department's Office of Curriculum and Instruction assembled a committee of 20 people to provide judgments for the study. Committee members were, with one exception, current classroom teachers. One committee member was a representative from the teachers union who had taught mathematics and who was well versed in the learning standards and mathematics B curriculum. All committee members were recognized as very knowledgeable of the learning standards and mathematics B curriculum and of how students perform on standardized tests similar to the Mathematics B Examination. Some had worked on an aspect of either the standards or development of the curriculum or tests.

Committee members, their schools, the number of years experience each has in teaching mathematics, and the number of students they are currently teaching advanced mathematics are given in the table below.

 

Committee Member

School and Location

Years Teaching Mathematics

Number of Students Currently

Steven Arnofsky

George W Wingate High School

Brooklyn

32

36

Antoine Atinkpahoun

Lincoln High School

Yonkers

3

25

Sheila Batson

Hempstead High School

Hempsted

20

100

Carole Bernhardt

Sheepshead Bay High School

Brooklyn

30

34

James Burrell

McKinley High School

Buffalo

32

140

Virginia Cronin

Lincoln High School

Yonkers

20

40

Eva Demyen

Valley Stream Central High School

Valley Stream

27

40

Melody DeRosa

New York State United Teachers

7

0

Peggy Fisher

North Syracuse Central High School

Cicero

30

140

Arlane Frederick

Kenmore West High School

Kenmore

31

75

Marcia Horelick

Saint Anne Institute

Albany

27

25

Kathleen Klee

Randolph Central High School

30

115

John Maus

North Shore Middle School

Glenhead

10

80

S. Mary Ann Napier

St. Francis Preparatory

Fresh Meadows

30

124

Marguerite Niforos

Galway High School

Galway

27

55

David Passer

Mexico High School

Mexico

16

96

Harry Rattien

Townsend High School

Flushing

28

70

Richard Robertson

Susquehanna Valley High School

Conklin

30

80

John Woodward

Northville High School

Northville

35

100

Phyllis Zagelbaum

Samuel H Wang Yeshiva University High School

Holliswood

32

105

Committee members were chosen so that they would represent a wide range of schools and different types of students. Each committee member was asked to complete a short background questionnaire that included questions about their sex, ethnic background, and the setting for their school. Results of the questionnaire tabulations are given in the table below.

Characteristic

Percent of committee

Sex

Female

60%

Male

40%

Ethnic Background of Committee Member

African-American

15%

White

85%

School Setting

New York City

25%

Other urban

20%

Suburban

25%

Rural

25%

Not representing a school

5%

Findings related to the bookmarking procedure

Findings--Round 2

In round 2 every committee member independently placed his or her own bookmarks for meeting standards. The results of the placements are given in the table below. The table gives the difficulty level of the last item that the student who has minimally met the learning standards is likely to answer correct, the corresponding raw score for that item, and the corresponding percent of students that fall below each cut-point based on the field test data. The cut-points include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee estimates.

Cut-point

Difficulty

Raw score (Maximum 87)

Percent below

Mean + 2 SD

1.5

66

98%

Mean + 1 SD

1.0

47

83%

Mean

0.5

33

58%

Mean - 1 SD

0.0

20

28%

Mean - 2 SD

-0.5

8

8%

75%

0.8

46

82%

Median

0.7

41

75%

25%

0.3

26

41%

It is important to note that individuals in the field test had not take the mathematics B course. The field tests were administered on a voluntary basis and the many of test takers had just completed the mathematics 3 course. Thus, the estimates provided are surely overestimates of the percentage of students who fall below the cut-point.

Findings--round 3

In round 3 every committee member independently placed his or her own bookmarks for meeting standards with distinction. The results of the placements are given in the table below. The table gives the raw score, difficulty of the item corresponding to the cut-point, and the corresponding percent above that cut-point based on the field test data. The cut-points include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee estimates.

Cut-point

Difficulty

Raw score (Maximum 87)

Percent achieving

Mean + 2 SD

2.0

72

1%

Mean + 1 SD

1.8

70

1%

Mean

1.5

66

2%

Mean - 1 SD

1.3

58

5%

Mean - 2 SD

1.0

47

17%

75%

1.6

68

1%

Median

1.5

66

2%

25%

1.4

64

2%

Again, it is important to note that individuals in the field test had not take the mathematics B course. The field tests were administered on a voluntary basis and the majority of test takers had just completed the mathematics 3 course. Thus, the impact estimates provided are surely underestimates of the percentage of students who might fall above the cut-points.

Findings--round 4

In round four, committee members received a report of their round two results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for meeting standards based on the information and knowledge they had gained up to this point. The round four results are given in the table below:

Cut-point

Difficulty

Raw score (Maximum 87)

Percent below

Mean + 2 SD

1.1

50

88%

Mean + 1 SD

.7

41

75%

Mean

.4

31

53%

Mean - 1 SD

0

20

27%

Mean - 2 SD

-.4

9

8%

75%

.8

46

83%

Median

.3

29

49%

25%

.1

21

30%

Similar comments about the nature of the field test results apply again.

Findings--round 5

In round five, committee members received a report of their round three results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for meeting standards with distinction based on the information and knowledge they had gained up to this point. The round five results, which generally show less variation than the round three results, are given in the table below.

Cut-point

Difficulty

Raw score (Maximum 87)

Percent above

Mean + 2 SD

2

72

1%

Mean + 1 SD

1.7

69

1%

Mean

1.5

66

2%

Mean - 1 SD

1.3

58

5%

Mean - 2 SD

1.1

50

12%

75%

1.7

69

1%

Median

1.5

66

2%

25%

1.4

64

2%

Similar comments about the nature of the field test results apply again.

Findings--round 6

In round six, committee members received a report of their round four and five judgments. They also received a report of the impact of their estimates from that round. Impact was reported in terms of the frequency distributions of the field test scores. The committee was also advised that scores from field-testing will underestimate operational test performance, but that the amount of the underestimate was not known. Committee members then returned to their groups and discussed the report and their judgments. At the end of the discussion, committee members were asked to place new bookmarks for both meeting standards and meeting standards with distinction based on the information and knowledge they had at that time. Results of this final placement are given in the table below.

 

Cut-point

Meeting standards

Meeting standards with distinction

Diff

Raw Score

Percent below

Diff

Raw Score

Percent above

Mean + 2 SD

.6

38

70%

1.8

70

1%

Mean + 1 SD

.4

31

53%

1.6

68

1%

Mean

.2

25

39%

1.4

64

2%

Mean - 1 SD

0

20

28%

1.2

56

6%

Mean - 2 SD

-.2

14

15%

1

47

17%

75%

.3

29

49%

1.5

66

2%

Median

.1

23

33%

1.5

66

2%

25%

.1

23

33%

1.1

50

12%

Other Judgments Obtained

When tests are used to classify individuals into categories, there are always two kinds of classifying errors that are made. For example, in classifying students into passing and failing categories, a student may be misclassified into these two categories. These misclassifications always occur and they are inversely related. That is when we try to reduce one type of classification error, we increase the other type of classification error.

With respect to the relative severity of the errors of classification, 85% of the committee said that failing a student who should pass was more serious than passing a student who should fail. Fifteen percent of the committee said the opposite. Thirty percent of the committee said that passing a student with distinction who should only pass was more serious than just passing a student who should pass with distinction. Seventy percent of the committee said the opposite.

Discussion and Recommendations

The purpose of this study is to obtain data and information that New York may use in setting passing points for its Mathematics B Examination. The data should be used to guide those decisions.

The committee that provided the data was diverse and well represented the diversity of New York students, teachers, and school districts. With that diversity, it is not surprising that committee judgments varied.

The final bookmarks from the procedure are given in the table below.

Cut-point

Meets standards

Meets standards with distinction

Diff

Raw Score

Percent below

Diff

Raw Score

Percent above

Mean + 2 SD

.6

38

70%

1.8

70

1%

Mean + 1 SD

.4

31

53%

1.6

68

1%

Mean

.2

25

39%

1.4

64

2%

Mean - 1 SD

0

20

28%

1.2

56

6%

Mean - 2 SD

-.2

14

15%

1

47

17%

75%

.3

29

49%

1.5

66

2%

Median

.1

23

33%

1.5

66

2%

25%

.1

23

33%

1.1

50

12%

Further, the committee overwhelmingly believes that the error of failing a student who should pass should be minimized. The committee also believes, though to a lesser extent, the same about the passing with distinction classification.

Finally, the impact data—i.e., the performance data from the field-testing—was based on students who had not had the Mathematics B course and many of whom had just completed the mathematics 3 course. Thus, these estimates of the percentage of students failing are overestimated. The percentage of students who would achieve passing with distinction is also underestimated.

What should be made of these results?

The study author recognizes that New York has the responsibility and duty to set cut-points in such a way that the purpose of the testing program is best accomplished. That requires judgment and consideration of all the data and information that is available at the time cut-points are set.

To the study author, one item stands out in importance. The field test data upon which the difficulty parameters were calculated and which forms the basis of estimating the impact of the average passing and passing distinction points is seriously flawed. It is flawed because the students who had taken the field test had not been exposed to the course content and because most of the students had just completed the mathematics 3 course. It is possible, and certainly highly probable at the higher levels of difficulty, that the items chosen to represent specific levels of difficulty do not accurately represent those levels of difficulty.

Thus, the study author’s strongest recommendation to New York is to repeat both the scaling and standard setting studies after this year’s administration is completed. At that time, more valid and reliable data should be available. Further, the study author urges New York to repeat both the scaling and standard setting annually until the curriculum is in place statewide and operational testing is taking place.

It is extremely important to recognize that cut-points are not immutable. All cut-points should be set based on the best information that is available. But as more information becomes available, cut-points should be revised (or at a minimum reviewed) to make sure that they are consistent with the information available. This may result in periodic raising or lowering cut-points until stable conditions of instruction and testing conditions are achieved.

Having said that, the issue at hand is what to implement as a passing score and passing with distinction score for the 2001 operational year. For 2001, the study author recommends that New York choose a cut-score for meeting standards between 23 and 31 raw score points and a cut-score for meeting standards with distinction between 56 and 68. If forced to recommend single cut-points, the study author would recommend raw scores of 30 and 66 for the two cut-points. The study author is most concerned over the effect the reported impact data had on the round 6 bookmarks. There are 87 possible raw score points on the test and although the test is recognized as being difficult, having a cut-score of about 30 raw score points appears to the study author to be very low. For that reason, the recommended cut-score for meeting standards is slightly higher than the round 6 average and median.

The study author believes that test developers and other state staff who know and understand implementation of the Mathematics B curriculum can make the best choice of cut-points within the proposed ranges.

In general, the study author also believes that medians are better guides than means because the judgments committee members give appear not to be normally distributed.