Mathematics B Regents Examination--Data and Information Related to Standard Setting
A study performed for the New York State Education Department by
April 27, 2001
The New York State Board of Regents has established learning standards all students must meet to graduate from high school. One set of learning standards is for mathematics, science and technology. Students entering grade nine in September 2001 and thereafter will have to complete three credits in mathematics and will have two diploma options. All students will have to pass the Mathematics A Regents Examination in order to obtain a Regents Diploma. Students wishing to pursue an Advanced Regents Diploma will also have to pass the Mathematics B Regents Examination. Thus, the Mathematics B Regents Examination is required only for students seeking an advanced Regents Diploma. A curriculum for mathematics B is currently being developed, but is not yet widely implemented.
Although scores for Mathematics B Regents Examination are placed on a numerical scale, essentially there are only three scores—does not meet standards, meets standards, and meets standards with distinction. New York State teachers, using professionally established procedures, have developed the test items, and the items have been pretested and field-tested on samples of students.
The purpose of the study described in this report is to obtain information that the State Education Department can use to establish scores that will classify test takers into does not meet standards, meets standards, and meets standards with distinction categories. Setting cut-scores requires judgment. This study employs professionally established methods to quantify and summarize the judgements of experts related to how individuals who have met the learning standards and curricular objectives for mathematics B will perform on the test.
The Mathematics B Regents Examination
The Mathematics B Regents Examination is a four-part examination. Test content is based on the commencement-level key ideas and performance indicators found in the Learning Standards for mathematics and the Mathematics Resource Guide with Core Curriculum, published by the State Education Department. The four parts of the examination are as follows:
In general, the complexity of questions in Parts II-IV increases. Total test scores are found by adding the number of points over all parts of the test.
Specifications for the test are given in the table below:
Test takers are required to use graphing calculators.
A panel of mathematics experts at the high school and college level, with representatives from business and the community, developed the Mathematics B section of the core curriculum from portions of the commencement and four-year sequence level of the mathematics learning standards.
A complete description of the examination, including test specifications and scoring rubrics, is given in a test sampler.
Data related to the performance standards for the test were obtained from a committee of experts. Judgments from committee members were quantified using standard practices employed by psychometricians who conduct standard setting studies. The committee made their judgments with respect to the difficulty scale resulting from the scaling and equating of field test items. In the filed testing, each item, or score category if the item has multiple scores, is given a difficulty parameter obtained through item response methods. Test items corresponding to various points on the difficulty scale are presented as examples of test items at that difficulty level. The majority of the items used came from the anchor test form. The anchor test form is the test form upon which the passing standards are set and the form to which all later forms of the test will be equated.
Committee members were given definitions of three performance categories—not meeting standards, meeting standards, and meeting standards with distinction. The State Education Department has developed these category definitions and they are applied to all of the Regents tests that are being developed. In addition, committee members were given an exercise designed to help familiarize themselves with the examination and an exercise in which they were asked to categorize some of their students into the performance categories as defined by the State Education Department.
The committee met as a group on March 1, 2001 at the State Education Department.
The standard setting study test used the bookmarking approach because all the multiple choice items and constructed response item had been scaled using item response theory methods and because the bookmarking procedure enables committee members to consider these two item types together.
In the bookmarking procedure, multiple choice items and constructed response items are ordered in terms of their difficulty parameters. The purpose of the items is to illustrate the meaning of the difficulty scale at specific points. Committee members are asked to apply their judgments to these ordered items. The committee meeting is conducted in rounds. The rounds and the activities employed in each round are given below.
Committee members were also asked two overall questions about accomplishment of the learning standards and test performance. Answers to these questions might aid New York in setting appropriate performance standards on the test. These questions asked:
The New York State Education Department's Office of Curriculum and Instruction assembled a committee of 20 people to provide judgments for the study. Committee members were, with one exception, current classroom teachers. One committee member was a representative from the teachers union who had taught mathematics and who was well versed in the learning standards and mathematics B curriculum. All committee members were recognized as very knowledgeable of the learning standards and mathematics B curriculum and of how students perform on standardized tests similar to the Mathematics B Examination. Some had worked on an aspect of either the standards or development of the curriculum or tests.
Committee members, their schools, the number of years experience each has in teaching mathematics, and the number of students they are currently teaching advanced mathematics are given in the table below.
Committee members were chosen so that they would represent a wide range of schools and different types of students. Each committee member was asked to complete a short background questionnaire that included questions about their sex, ethnic background, and the setting for their school. Results of the questionnaire tabulations are given in the table below.
Findings related to the bookmarking procedure
In round 2 every committee member independently placed his or her own bookmarks for meeting standards. The results of the placements are given in the table below. The table gives the difficulty level of the last item that the student who has minimally met the learning standards is likely to answer correct, the corresponding raw score for that item, and the corresponding percent of students that fall below each cut-point based on the field test data. The cut-points include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee estimates.
It is important to note that individuals in the field test had not take the mathematics B course. The field tests were administered on a voluntary basis and the many of test takers had just completed the mathematics 3 course. Thus, the estimates provided are surely overestimates of the percentage of students who fall below the cut-point.
In round 3 every committee member independently placed his or her own bookmarks for meeting standards with distinction. The results of the placements are given in the table below. The table gives the raw score, difficulty of the item corresponding to the cut-point, and the corresponding percent above that cut-point based on the field test data. The cut-points include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee estimates.
Again, it is important to note that individuals in the field test had not take the mathematics B course. The field tests were administered on a voluntary basis and the majority of test takers had just completed the mathematics 3 course. Thus, the impact estimates provided are surely underestimates of the percentage of students who might fall above the cut-points.
In round four, committee members received a report of their round two results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for meeting standards based on the information and knowledge they had gained up to this point. The round four results are given in the table below:
Similar comments about the nature of the field test results apply again.
In round five, committee members received a report of their round three results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for meeting standards with distinction based on the information and knowledge they had gained up to this point. The round five results, which generally show less variation than the round three results, are given in the table below.
Similar comments about the nature of the field test results apply again.
In round six, committee members received a report of their round four and five judgments. They also received a report of the impact of their estimates from that round. Impact was reported in terms of the frequency distributions of the field test scores. The committee was also advised that scores from field-testing will underestimate operational test performance, but that the amount of the underestimate was not known. Committee members then returned to their groups and discussed the report and their judgments. At the end of the discussion, committee members were asked to place new bookmarks for both meeting standards and meeting standards with distinction based on the information and knowledge they had at that time. Results of this final placement are given in the table below.
Other Judgments Obtained
When tests are used to classify individuals into categories, there are always two kinds of classifying errors that are made. For example, in classifying students into passing and failing categories, a student may be misclassified into these two categories. These misclassifications always occur and they are inversely related. That is when we try to reduce one type of classification error, we increase the other type of classification error.
With respect to the relative severity of the errors of classification, 85% of the committee said that failing a student who should pass was more serious than passing a student who should fail. Fifteen percent of the committee said the opposite. Thirty percent of the committee said that passing a student with distinction who should only pass was more serious than just passing a student who should pass with distinction. Seventy percent of the committee said the opposite.
Discussion and Recommendations
The purpose of this study is to obtain data and information that New York may use in setting passing points for its Mathematics B Examination. The data should be used to guide those decisions.
The committee that provided the data was diverse and well represented the diversity of New York students, teachers, and school districts. With that diversity, it is not surprising that committee judgments varied.
The final bookmarks from the procedure are given in the table below.
Further, the committee overwhelmingly believes that the error of failing a student who should pass should be minimized. The committee also believes, though to a lesser extent, the same about the passing with distinction classification.
Finally, the impact data—i.e., the performance data from the field-testing—was based on students who had not had the Mathematics B course and many of whom had just completed the mathematics 3 course. Thus, these estimates of the percentage of students failing are overestimated. The percentage of students who would achieve passing with distinction is also underestimated.
What should be made of these results?
The study author recognizes that New York has the responsibility and duty to set cut-points in such a way that the purpose of the testing program is best accomplished. That requires judgment and consideration of all the data and information that is available at the time cut-points are set.
To the study author, one item stands out in importance. The field test data upon which the difficulty parameters were calculated and which forms the basis of estimating the impact of the average passing and passing distinction points is seriously flawed. It is flawed because the students who had taken the field test had not been exposed to the course content and because most of the students had just completed the mathematics 3 course. It is possible, and certainly highly probable at the higher levels of difficulty, that the items chosen to represent specific levels of difficulty do not accurately represent those levels of difficulty.
Thus, the study author’s strongest recommendation to New York is to repeat both the scaling and standard setting studies after this year’s administration is completed. At that time, more valid and reliable data should be available. Further, the study author urges New York to repeat both the scaling and standard setting annually until the curriculum is in place statewide and operational testing is taking place.
It is extremely important to recognize that cut-points are not immutable. All cut-points should be set based on the best information that is available. But as more information becomes available, cut-points should be revised (or at a minimum reviewed) to make sure that they are consistent with the information available. This may result in periodic raising or lowering cut-points until stable conditions of instruction and testing conditions are achieved.
Having said that, the issue at hand is what to implement as a passing score and passing with distinction score for the 2001 operational year. For 2001, the study author recommends that New York choose a cut-score for meeting standards between 23 and 31 raw score points and a cut-score for meeting standards with distinction between 56 and 68. If forced to recommend single cut-points, the study author would recommend raw scores of 30 and 66 for the two cut-points. The study author is most concerned over the effect the reported impact data had on the round 6 bookmarks. There are 87 possible raw score points on the test and although the test is recognized as being difficult, having a cut-score of about 30 raw score points appears to the study author to be very low. For that reason, the recommended cut-score for meeting standards is slightly higher than the round 6 average and median.
The study author believes that test developers and other state staff who know and understand implementation of the Mathematics B curriculum can make the best choice of cut-points within the proposed ranges.
In general, the study author also believes that medians are better guides than means because the judgments committee members give appear not to be normally distributed.