NYSED:IRS:Accountability:AMOs:Confidence Intervals

Confidence Intervals

An underlying assumption of a school accountability system that measures change from grade cohort to grade cohort (that is, compares this year’s fourth grade, for example, with last year’s) is that the performance differences from cohort to cohort are caused by changes under the school’s control: revisions in curriculum, instruction, and/or support systems.

However, a performance measure is subject to error from three sources that are not under a school’s or district’s control:

measurement error — related to such fluctuating factors such as health, motivation, attention and fatigue — around each student’s hypothetical true score, which averages zero when the sample is sufficiently large;
sampling error (that is, the error caused by random variations in student ability, early preparedness, and motivation from grade to grade in the same school); and
changes in the environment not under the school’s control, for example, the events of September 11.

These sources of error, not controllable by a school or district, may cause a school's or district's observed performance to be different than its "true" performance. To minimize the chance that a district or school erroneously will be deemed to have not made Adequate Yearly Progress, New York State’s accountability system uses a “confidence interval” to determine whether a group has met its Annual Measurable Objective (AMO). A confidence interval recognizes the sampling error associated with an observed score and permits the analyst to determine whether the difference between the observed Performance Index (PI) and the AMO falls within certain bounds (that is, within the margin of error attributable to random sampling error) or whether that difference falls outside of the margin of error and is, therefore, not attributable to chance alone.

On average, the sampling error associated with the observed score (PI) for each accountability group decreases as the group gets larger. Through empirical analyses, we have determined the distribution of probable observed PIs around the “true score” for groups of varying sizes.

To operationalize the confidence interval in a way that makes it relatively easy to determine whether an accountability group has achieved the AMO, we have developed a table of Effective AMOs (Table 2). The Effective AMO indicates, for an accountability group of size n, the smallest observed PI that is not statistically different from the AMO. The graph below illustrates the distribution of probable observed PIs around the true score for a group of size n, when the true score equals the AMO. Any group with an observed PI above the Effective AMO will be counted as having achieved the AMO.

The Expected Frequency of Each Observed PI Given a "True" Score Equal to the AMO

Because it is impossible to make statistical statements about the performance of a school with total accuracy, there will always be a degree of error when deciding whether a group met the AMO. New York State’s system minimizes the chance that we will erroneously conclude that a group did not make the AMO. The Effective AMOs have been set so that there is at most a 10 percent chance that we will falsely conclude that the group did not meet the AMO when its true performance was, in fact, equal to or greater than the AMO. This ten-percent band is shown in the area of the graph below the Effective AMO. On the other hand, when the observed PI is exactly equal to the Effective AMO, there is a 90 percent chance that the group’s true score is below the AMO. Even when the observed PI is exactly equal to the AMO, there is a 50 percent chance that the group’s true score is below the AMO.

Use of the Effective AMO Table

Table 2 provides an Effective AMO for each accountability grade and subject and each group size. The Effective AMOs apply to accountability decisions for English language arts and mathematics. They do not apply to decisions about science or graduation rate. In those areas, the school must meet the State standard to make Adequate Yearly Progress. To use the table, the observed PI must be compared with the Effective AMO for the appropriate group size. If the observed PI is equal to or greater than the Effective AMO, we conclude that the group's performance is not statistically different than the AMO. If the observed PI is smaller than the Effective AMO, we conclude that the group's performance is not equal to the AMO.

Table 1 below shows the number of continuously enrolled students tested in each accountability group in a sample school. For every accountability group with 30 or more students, the Effective AMO from Table 2 is shown. The observed PI of the group must equal or exceed the Effective AMO for the group to make Adequate Yearly Progress. Groups with an observed PI that is lower than the Effective AMO may make Adequate Yearly Progress by making “safe harbor.” To make safe harbor, the group must reduce the difference between its previous year’s PI and the goal of 200 by 10 percent. The Effective AMOs do not apply to decisions about whether or not a group has made safe harbor. To make safe harbor the group must meet its English language arts or mathematics safe harbor target and its science or graduation rate target. Verification reports will show the Effective AMO and the English language arts or mathematics safe harbor target for each group with at least 30 members.

Table 1: Accountability Groups for Sample School
Accountability Group	Number in Group	Effective AMO for Grade 4 ELA
All Students	99	121
Students with Disabilities	12
American Indian/Alaskan Native	2
Asian or Pacific Islander	5
Black (not Hispanic)	35	115
Hispanic	17
White	40	116
Not English Proficient	7
Economically Disadvantaged	55	118

An Effective AMO is the lowest PI that an accountability group of a given size can achieve in a subject for the group’s PI not to be considered significantly different from the AMO for that subject. If an accountability group’s PI equals or exceeds the Effective AMO, the group is considered to have made AYP.

Table 2: Effective Annual Measurable Objectives (Effective AMOs) for 2004–05
Subject	AMO	Number of Students Participating (Valid Scores)
		30- 34	35- 39	40- 44	45- 49	50- 59	60- 69	70- 89	90- 119	120- 149	150- 219	220- 279	280- 399	400- 589	590- 979	980- 1899	1900- 5299	5300+
ELA 4	131	114	115	116	117	118	119	120	121	122	123	124	125	126	127	128	129	130	Effective AMOs
Math 4	142	125	126	127	128	129	130	131	132	133	134	135	136	137	138	139	140	141
ELA 8	116	99	100	101	102	103	104	105	106	107	108	109	110	111	112	113	114	115
Math 8	93	76	77	78	79	80	81	82	83	84	85	86	87	88	89	90	91	92
HS ELA	148	131	132	133	134	135	136	137	138	139	140	141	142	143	144	145	146	147
HS Math	139	122	123	124	125	126	127	128	129	130	131	132	133	134	135	136	137	138

Last Updated: February 15, 2011

IRS

Information and Reporting Services

Confidence Intervals

Subject

AMO

Number of Students Participating (Valid Scores)

University of the State of New York - New York State Education Department