Homogeneity of the Regents Comprehensive Examination in English
for English Language Learner (ELL) and Students
in Monolingual English Curricula
, June 1999

Gerald E. DeMauro
December 2000

Introduction

Samples of English Language Learners (ELL) and monolingual English curriculum students were analyzed to estimate the homogeneity of Regents Comprehensive examination in English (CEE) from June 1999. In fact, five ELL populations were compared to the Department Review sample of 488.

The ELL samples were drawn from large and small cities. The Department Review sample represents a secondary random sample of the randomly sampled 10 percent of all papers re-read as part of the State Education Department audit of test scores.

Because test validity is largely concerned with the support for score based inferences, (see Joint Standards, Validity 2000), it is important if the test has the same consequences based on those inferences for different population groups that the evidence in support of them is comparable.

Among the most effective means of measuring the relative degree of support of inference across populations is by assessing the homogeneity, that is, the concurrence of the meaning of test scores across population groups in terms of the agreement in the ordering of test questions from least to most difficult. The logic of this is that certain levels of skill and knowledge imply mastery of some content and difficulty with other content. Naturally, then, there should be a greater probability of answering correctly the items measuring the mastered content than of answering correctly the items measuring the difficult content. If the same inferences of strengths and weaknesses can be drawn across populations from the same score, than the ordering of item difficulties should be homogenous across populations because this ordering operationally defines which content is mastered and which is difficult.

Methods

A common demonstration of the agreement of item ordering is the correlation of item difficulties (Angoff and Modu, 1973). Item difficulty values are measured by the proportion of children who answer them correctly. The proportion is converted to an equal interval (ratio) scale and these converted item values are correlated across populations.

The current study assesses the homogeneity of the construct measured by the CEE for ELL and monolingual curriculum students, and for groups that vary accordingly to three ability levels based on overall scale score: 0-54 (fail); 55-64 (local pass), 65-100 (Regents pass).

Samples

A small city, two large cities, and a suburban district were sampled because they volunteered. The means and standard deviations are presented below.

Table I

Means and Standard Deviations on the June 1999 Administration of the
Regents Comprehensive Examination in English (CEE)
Per Selected Samples of English Language Learners (ELL)
and Monolingual English Curriculum Students

   

Raw

Total

Multiple-Choice

Open-Ended

Scale

 

n

mean

s.d.

mean

s.d.

mean

s.d.

mean

s.d.

Large City 1

41

40.24

7.69

17.46

4.29

11.39

2.58

60.93

9.63

Large City 2

9

24.89

6.33

12.33

3.12

6.28

1.94

41.67

8.03

Small City

17

47.47

16.22

19.12

6.06

14.18

5.42

69024

22.22

Suburb

18

29.50

7.65

14.39

3.55

7.56

2.30

47.50

9.54

Monolingual

488

45.84

8.80

19.06

3.41

13.39

3.33

67.94

10.99

Item Difficulty Correlation

Item difficulty values were converted from proportion correct to delta values. These values are centered at 13 with a standard deviation of -4. They are an equal interval scale which permits statistical manipulations (e.g., averaging, correlations) that would not be possible with

p-values (proportion correct).

The samples were divided into three groups each: fail, low pass, and pass. The test has 26 multiple choice and four open-ended questions worth maximum values of one and six points each, respectively. The raw total is computed as: multiple choice total plus 2 x open-ended total. This raw value is converted to a scale score ranging from 0 to 100, in which students may be eligible for a local diploma at 55 (decided by their school board) and are eligible for a State endorsed Regents diploma at 65. For the purpose of this research, low pass was 55-64 and pass was 65-100.

Open-ended questions totals were each first divided by six, the maximum possible point value, before being converted to the delta scale.

Table 2 shows the item difficulty correlations. Note that no correlations between the two student groups, ELL or monolingual curriculum were below .35. The lowest correlation was between failing ELL students and low passing ELL students (.581). The highest correlation was between passing and low passing monolingual curriculum students (.941). On the whole, the correlations between the monolingual curriculum students (.938, .890, and .941) were higher than the respective correlations between ELL students (.581, .685, and .680) although both groups show considerable homogeneity. However, the correlations between the ELL and monolingual curriculum students were .638, .648, and .698, showing again substantial agreement in the measurement properties of the test for the two groups.

Table 2
Item Difficulty Correlations (in Delta Units)
for English Language Learners (ELL)
and Monolingual English Curriculum Students
on the June 1999 Regents Comprehensive Examination in English (CEE)

 

Fail

Low Pass

Pass

 
 

ELL

Eng

ELL

Eng

ELL

Eng

Mean

Fail ELL

           

13.47

Eng

.638

         

12.01

               

Low Pass ELL

.581

.586

       

11.09

Eng

.661

.938

.644

     

11.63

               

Pass ELL

.685

.687

.680

.708

   

.924

Eng

.617

.890

.612

.941

.698

 

.922

Standard Errors of Estimate

Finally, the two groups were matched on scale scores. The delta values for the ELL group were regressed onto the delta values for the monolingual curriculum students to yield a predicted value and a residual value. The square root of the average squared residual, called the standard error of estimate, is a good measure of the accuracy of predicting the difficulty. The smaller the value, the more accurate the prediction.

Table 3 shows that the greatest agreement (least standard error of estimate) was between 55 and 76, right where it should be to maximize the inferences supported by the test scores. This finding agrees with the Table 2 evidence that the construct for the ELL and monolingual curriculum students is substantially equivalent.

Table 3
Standard Errors of Estimate
between ELL and Monolingual English Curriculum Students
in Relation to CEE Scale Score

Scale
Score

Standard Errors
of Estimate

Scale
Score

Standard Errors
of Estimate

46

13.46

74

5.62

47

7.71

76

6.62

48

9.34

77

10.26

49

5.22

79

9.87

51

8.20

81

7.94

52

11.81

84

7.59

54

11.91

91

7.89

56

5.68

98

5.84

59

8.92

   

61

10.00

   

62

6.58

   

63

6.00

   

64

5.89

   

66

6.47

   

67

4.86

   

68

8.24

   

69

7.40

   

71

5.83

   

72

4.40

   

Conclusion

The generalizations supported by this use of available samples are limited. Nevertheless, the result of these analyses suggest that the Regents Comprehensive Examination in English defines the same strengths and weaknesses for ELL and monolingual curriculum students. This supports the validity and use of this examination for ELL students.

References

American Educational Research Association, American Psychological Association, and

National Council on Measurement in Education

Standards for Educational and Psychological Testing (Joint Standards).
(Washington, D.C.: Author, 1999)

Angoff, W. H. & Modu, C.C.

Equating the Scales of the Prueba de Aptitud Acadmica and the Scholastic Aptitude Test.
(New York: College Entrance Examination Board. Research Report 3, 1973).