Intermediate Level Assessment in Technology Education--Data and Information Related to Standard Setting
A study performed for the New York State Education Department by
April 27, 2001
The New York State Board of Regents has established learning standards all students must meet. One set of learning standards applies to mathematics, science and technology. The standards pertain to accomplishment in a broad range of content. Key ideas, performance indicators, and sample tasks further describe each learning standard. Standards are also broken down by educational level--elementary, intermediate, and commencement.
To assess the extent that students have met the learning standards, the New York State Education Department has developed a testing program. The content of the tests reflect accomplishment of the learning standards. For technology education, the State Education Department has developed an assessment in technology to reflect accomplishment of the learning standards for technology. Most students will take this test while in the middle school.
The purpose of the test is to help school districts identify specific areas of the middle school technology education program that may need change to help students achieve the intermediate level technology standards. Thus, the technology test is intended for program evaluation rather than for individual student accountability. School districts are encouraged to use the scores from the testing to assess their technology education programs and align their technology education programs with the learning standards.
The purpose of the study described in this report is to obtain information that the State Education Department can use to establish scores that will classify test takers into two categories. One category is intended to mean that the student did not meet the standards. The other score category is intended to mean that the student met the standards. Setting cut-points requires judgment. This study employs professionally established methods to quantify and summarize the judgements of experts related to how individuals who have met the learning standards will perform on the test.
The Assessment in Technology
The assessment in technology is a two-part written examination administered in one 90-minute block of time in June. Test content is based on the intermediate-level key ideas and performance indicators found in the learning standards and core curriculum developed and adopted by the Board of Regents. Seven content areas define content of the test:
The two parts of the assessment are as follows:
A complete description of the examination, including test specifications and scoring rubrics, is given in a test sampler.
Data related to the performance standards for the test were obtained from a committee of experts. Judgments from committee members were quantified using standard practices employed by psychometricians who conduct standard setting studies. The committee made their judgments with respect to items from one test form, which had been designated as the anchor test form. Subsequent forms of the test would be equated to the anchor test form so that all scores would be comparable over different test forms.
Committee members were given definitions of two performance categories—having met the learning standards and not having met the learning standards. The State Education Department has developed these category definitions and they are applied to all of the tests that are being developed. In addition, committee members were given an exercise designed to help refresh their memories about the learning standards and an exercise in which they were asked to categorize some of their students into the performance categories as defined by the State Education Department.
The committee met as a group on February 24, 2001 at the State Education Department.
The standard setting study used the bookmarking approach because all the items and extended response items had been scaled using item response theory methods and because the bookmarking procedure enables committee members to consider multiple choice and extended response items together.
In the bookmarking procedure, items and extended response items are ordered in terms of the difficulty parameters obtained in the scaling and equating of pretest forms. Committee members are asked to apply their judgments to these ordered items. The committee meeting is conducted in rounds. The rounds and the activities employed in each round are given below.
Committee members were also asked two overall questions about accomplishment of the learning standards and test performance. Answers to these questions might aid New York in setting appropriate performance standards on the test. These questions asked:
The New York State Education Department's Office of Curriculum and Instruction assembled a committee of 19 people to provide judgments for the study. Committee members were all current or former classroom teachers. All committee members were recognized as very knowledgeable of the learning standards related to technology education and of how students perform on standardized tests similar to the Intermediate Level Technology Education test. Some had worked on an aspect of either the standards or development of the tests.
Committee members, their schools, and the number of years experience each has in teaching technology education are given in the table below:
Committee members were chosen so that they would represent a wide range of schools and different types of students. Each committee member was asked to complete a short background questionnaire that included questions about their sex, ethnic background, and the setting for their school. Results of the questionnaire tabulations are given in the table below.
Findings related to the bookmarking procedure
In round 2 every committee member independently placed his or her own bookmarks for having met the learning standard. Committee members noted the place on the difficulty continuum where the minimally competent individual (described to the committee as an individual on the borderline between having and not having met the learning standards). The results of the placements are given in the table below. The table gives the item response difficulty parameter and corresponding raw score on the anchor test form raw score. The maximum raw score for the test is 60. The cut-points presented include the committee average plus or minus one or two standard deviations (i.e., standard deviations of the committee estimates) and the median committee cut-point including the cut-points corresponding to the 75th and 25th percentile ranks of committee estimates.
Committee estimates varied widely, which is typical of an independent pass through the notebook. Variation was also expected because of the content area. Technology education programs vary widely in their delivery model design and programs have wide latitude in delivery of content. Committee members indicated that their judgments of whether a test item was easy or difficult depended in large part on whether or not the content was emphasized in class work.
In round three, committee members received a report of their round two results. They also were placed in small groups where individual results were discussed. After the discussion, committee members were asked to place another bookmark for passing based on the information and knowledge they had gained up to this point. The round three results, which generally show little change from the round two results, are given in the table below.
Although most committee members changed their bookmark, the overall distribution of bookmarks changed very little.
Other Judgments Obtained
Committee members were asked to provide their best judgment of the percentage of their students who are not currently achieving at least a proficient level of performance with respect to the learning standards. These judgments were made not with respect to the test, but with respect to the learning standards and the definitions of having met the learning standards. The committee averages and standard deviation are presented in the table below
The data in the table above relates to the cut-point score for the test in that the committee on average was indicating that in their judgment almost one of four students in the state is not currently achieving at the minimum level suggested by the learning standards. This assessment was made without test scores and is independent of the test scores. The percentages of students not currently meeting the learning standards reported varied between 15% and 40%.
With respect to the relative severity of the errors of classification, the committee reported different results depending on the use of the test. When the test is used for program evaluation, 58% of the committee said that the most serious classification error classifying a student as having met the standards when in fact the student had not met the standards. If the test were used as an individual accountability measure (i.e., passing the test is a requirement for graduation or promotion), 95 % of the committee in indicated that the most serious classification error was classifying a student as not having met the learning standards when in fact the student had met the learning standards.
Discussion and Recommendations
The purpose of this study is to obtain data and information that New York may use in setting cut-points for its assessment in technology education. The data should be used to guide those decisions.
The committee that provided the data was diverse and well represented the diversity of New York students, teachers, and school districts. With that diversity, it is not surprising that committee judgments varied.
The final bookmarks from the procedure are given in the table below
The committee also indicated that currently about 23% of students are not currently achieving the learning standards. Further, the committee believed that as long as the test is used for program evaluation, the misclassification error of classifying a student as having met standards who has not actually met the learning standards should be minimized.
What should be made of these results?
The study author recognizes that New York has the responsibility and duty to set cut-points in such a way that the purpose of the testing program is best accomplished. That requires judgment and consideration of all the data and information that is available at the time cut-points are set.
To the study author, one item stands out in importance. Committee members indicated in their judgments of the relative severity of classification errors that their choice of cut-point depends on how the tests is used. As long as the test is used for program evaluation, the committee is much more tolerant of misclassifying students who have met the standards, but who score below the cut-point on the test. To the study author, that implies setting an initial cut-point that is near the committee bookmark average.
Thus, the study author recommends that the initial cut-point be somewhere between a raw score of 37-45. The maximum raw score for the test is 60. If forced to choose a raw score for initial implementation of the testing, the study author would choose a raw score of 41 as the score. New York is strongly encouraged to make its own decision based on the data available, however.
It is important to note that this study relies heavily on the accuracy of the IRT difficulty parameters that were obtained during pretesting. The study author notes that those parameters have extremely large standard errors—of the magnitude of about .1 unit. These large standard errors are due to three factors:
To the study author, these large standard errors make the initial choice of a cut-point tentative until further data is collected. The study author recommends that:
If the state chooses to change the intended use of the test from that of program evaluation to individual accountability, at the time the state does this, the study author recommends that the cut-point be lowered by one or two raw score points and that recommendations 2-4 again be implemented.