Confidence Intervals
An underlying assumption of a school accountability system that measures change from grade cohort to grade cohort (that is, compares this year’s fourth grade, for example, with last year’s) is that the performance differences from cohort to cohort are caused by changes under the school’s control: revisions in curriculum, instruction, and/or support systems.
However, a performance measure is subject to error from three sources that are not under a school’s or district’s control:
 measurement error — related to such fluctuating factors such as health, motivation, attention and fatigue — around each student’s hypothetical true score, which averages zero when the sample is sufficiently large;
 sampling error (that is, the error caused by random variations in student ability, early preparedness, and motivation from grade to grade in the same school); and
 changes in the environment not under the school’s control, for example, the events of September 11.
These sources of error, not controllable by a school or district, may cause a school's or district's observed performance to be different than its "true" performance. To minimize the chance that a district or school erroneously will be deemed to have not made Adequate Yearly Progress, New York State’s accountability system uses a “confidence interval” to determine whether a group has met its Annual Measurable Objective (AMO). A confidence interval recognizes the sampling error associated with an observed score and permits the analyst to determine whether the difference between the observed Performance Index (PI) and the AMO falls within certain bounds (that is, within the margin of error attributable to random sampling error) or whether that difference falls outside of the margin of error and is, therefore, not attributable to chance alone.
On average, the sampling error associated with the observed score (PI) for each accountability group decreases as the group gets larger. Through empirical analyses, we have determined the distribution of probable observed PIs around the “true score” for groups of varying sizes.
To operationalize the confidence interval in a way that makes it relatively easy to determine whether an accountability group has achieved the AMO, we have developed a table of Effective AMOs. The Effective AMO indicates, for an accountability group of size n, the smallest observed PI that is not statistically different from the AMO. The graph below illustrates the distribution of probable observed PIs around the true score for a group of size n, when the true score equals the AMO. Any group with an observed PI above the Effective AMO will be counted as having achieved the AMO.
The Expected Frequency of Each Observed PI Given a "True" Score Equal to the AMO
Because it is impossible to make statistical statements about the performance of a school with total accuracy, there will always be a degree of error when deciding whether a group met the AMO. New York State’s system minimizes the chance that we will erroneously conclude that a group did not make the AMO. The Effective AMOs have been set so that there is at most a 10 percent chance that we will falsely conclude that the group did not meet the AMO when its true performance was, in fact, equal to or greater than the AMO. This tenpercent band is shown in the area of the graph below the Effective AMO. On the other hand, when the observed PI is exactly equal to the Effective AMO, there is a 90 percent chance that the group’s true score is below the AMO. Even when the observed PI is exactly equal to the AMO, there is a 50 percent chance that the group’s true score is below the AMO.
Use of the Effective AMO Table
The Effective Annual Measurable Objectives (Effective AMOs) table provides an Effective AMO for each accountability grade and subject and each group size. The Effective AMOs apply to accountability decisions for English language arts and mathematics. They do not apply to decisions about science or graduation rate. In those areas, the school must meet the State standard to make Adequate Yearly Progress. To use the table, the observed PI must be compared with the Effective AMO for the appropriate group size. If the observed PI is equal to or greater than the Effective AMO, we conclude that the group's performance is not statistically different than the AMO. If the observed PI is smaller than the Effective AMO, we conclude that the group's performance is not equal to the AMO.
Table 1 below shows the number of continuously enrolled students tested in each accountability group in a sample school. For every accountability group with 30 or more students, the Effective AMO from Effective Annual Measurable Objectives (Effective AMOs) table is shown. The observed PI of the group must equal or exceed the Effective AMO for the group to make Adequate Yearly Progress. Groups with an observed PI that is lower than the Effective AMO may make Adequate Yearly Progress by making “safe harbor.” To make safe harbor, the group must reduce the difference between its previous year’s PI and the goal of 200 by 10 percent. The Effective AMOs do not apply to decisions about whether or not a group has made safe harbor. To make safe harbor the group must meet its English language arts or mathematics safe harbor target and its science or graduation rate target. Verification reports will show the Effective AMO and the English language arts or mathematics safe harbor target for each group with at least 30 members.
Accountability Group 
Number in Group 
Effective AMO for ELA 38 
All Students 
99 
112 
Students with Disabilities 
12 

American Indian/Alaskan Native 
2 

Asian or Pacific Islander 
5 

Black (not Hispanic) 
35 
106 
Hispanic 
17 

White 
40 
107 
Not English Proficient 
7 

Economically Disadvantaged 
55 
109 