IRS

Information and Reporting Services

Confidence Intervals

An underlying assumption of a school accountability system that measures change from grade cohort to grade cohort (that is, compares this year’s fourth grade, for example, with last year’s) is that the performance differences from cohort to cohort are caused by changes under the school’s control: revisions in curriculum, instruction, and/or support systems.

However, a performance measure is subject to error from three sources that are not under a school’s or district’s control:

  • measurement error — related to such fluctuating factors such as health, motivation, attention and fatigue — around each student’s hypothetical true score, which averages zero when the sample is sufficiently large;
  • sampling error (that is, the error caused by random variations in student ability, early preparedness, and motivation from grade to grade in the same school); and
  • changes in the environment not under the school’s control, for example, the events of September 11.

These sources of error, not controllable by a school or district, may cause a school's or district's observed performance to be different than its "true" performance. To minimize the chance that a district or school erroneously will be deemed to have not made Adequate Yearly Progress, New York State’s accountability system uses a “confidence interval” to determine whether a group has met its Annual Measurable Objective (AMO). A confidence interval recognizes the sampling error associated with an observed score and permits the analyst to determine whether the difference between the observed Performance Index (PI) and the AMO falls within certain bounds (that is, within the margin of error attributable to random sampling error) or whether that difference falls outside of the margin of error and is, therefore, not attributable to chance alone.

On average, the sampling error associated with the observed score (PI) for each accountability group decreases as the group gets larger. Through empirical analyses, we have determined the distribution of probable observed PIs around the “true score” for groups of varying sizes.

To operationalize the confidence interval in a way that makes it relatively easy to determine whether an accountability group has achieved the AMO, we have developed a table of Effective AMOs (Table 2). The Effective AMO indicates, for an accountability group of size n, the smallest observed PI that is not statistically different from the AMO. The graph below illustrates the distribution of probable observed PIs around the true score for a group of size n, when the true score equals the AMO. Any group with an observed PI above the Effective AMO will be counted as having achieved the AMO.

The Expected Frequency of Each Observed PI Given a "True" Score Equal to the AMO

Because it is impossible to make statistical statements about the performance of a school with total accuracy, there will always be a degree of error when deciding whether a group met the AMO. New York State’s system minimizes the chance that we will erroneously conclude that a group did not make the AMO. The Effective AMOs have been set so that there is at most a 10 percent chance that we will falsely conclude that the group did not meet the AMO when its true performance was, in fact, equal to or greater than the AMO. This ten-percent band is shown in the area of the graph below the Effective AMO. On the other hand, when the observed PI is exactly equal to the Effective AMO, there is a 90 percent chance that the group’s true score is below the AMO. Even when the observed PI is exactly equal to the AMO, there is a 50 percent chance that the group’s true score is below the AMO.

Use of the Effective AMO Table

Table 2 provides an Effective AMO for each accountability grade and subject and each group size. The Effective AMOs apply to accountability decisions for English language arts and mathematics. They do not apply to decisions about science or graduation rate. In those areas, the school must meet the State standard to make Adequate Yearly Progress. To use the table, the observed PI must be compared with the Effective AMO for the appropriate group size. If the observed PI is equal to or greater than the Effective AMO, we conclude that the group's performance is not statistically different than the AMO. If the observed PI is smaller than the Effective AMO, we conclude that the group's performance is not equal to the AMO.

Table 1 below shows the number of continuously enrolled students tested in each accountability group in a sample school. For every accountability group with 30 or more students, the Effective AMO from Table 2 is shown. The observed PI of the group must equal or exceed the Effective AMO for the group to make Adequate Yearly Progress. Groups with an observed PI that is lower than the Effective AMO may make Adequate Yearly Progress by making “safe harbor.” To make safe harbor, the group must reduce the difference between its previous year’s PI and the goal of 200 by 10 percent. The Effective AMOs do not apply to decisions about whether or not a group has made safe harbor. To make safe harbor the group must meet its English language arts or mathematics safe harbor target and its science or graduation rate target. Verification reports will show the Effective AMO and the English language arts or mathematics safe harbor target for each group with at least 30 members.

Table 1: Accountability Groups for Sample School

Accountability Group

Number in Group

Effective AMO for Grade 4 ELA

All Students

99

121

Students with Disabilities

12

 

American Indian/Alaskan Native

2

 

Asian or Pacific Islander

5

 

Black (not Hispanic)

35

115

Hispanic

17

 

White

40

116

Not English Proficient

7

 

Economically Disadvantaged

55

118

An Effective AMO is the lowest PI that an accountability group of a given size can achieve in a subject for the group’s PI not to be considered significantly different from the AMO for that subject. If an accountability group’s PI equals or exceeds the Effective AMO, the group is considered to have made AYP.

Table 2: Effective Annual Measurable Objectives  (Effective AMOs) for 2004–05

Subject

AMO

Number of Students Participating (Valid Scores)

    30- 34 35- 39 40- 44 45- 49 50- 59 60- 69 70- 89 90- 119 120- 149 150- 219 220- 279 280- 399 400- 589 590- 979 980- 1899 1900- 5299 5300+  
ELA 4 131 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 Effective AMOs
Math 4 142 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141
ELA 8 116 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115
Math 8 93 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
HS ELA 148 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147
HS Math 139 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138
Last Updated: February 15, 2011