GRADE 4 ELA 2001 QUESTION & ANSWER DOCUMENT

*For help with questions not addressed below, please call the Measurement Incorporated Grade 4 ELA Helpline (877) 516-2403. The line is available weekdays February 7 - 16 from 8:00 a.m. - 5:00 p.m.

INTRODUCTION - After the videotaping of the training sessions, the Scoring Leaders participated in Question and Answer sessions. Staff from Measurement, Inc. and the State Education Department provided responses to participants' questions. Many of the questions refer to particular student answer papers in the Practice Sets, and responses may refer to the Scoring Guide. The following transcript places questions of a general nature first. These are followed by questions organized by content area: Listening; Writing; Writing Mechanics; and Reading.

Q: How do I use the Videotapes?

A: Videotapes have been provided for each component of the Grade 4 English Language Arts test to assist in training scoring leaders/scorers. The trainer in each videotape will discuss the contents of the Scoring Guide and the Practice Set for that content area. The Scoring Guide will be presented first to demonstrate how the scoring rubric should be applied to student responses. We suggest that Training Leaders stop or pause the videotape before the videotaped trainer begins discussion of the Practice Set. This provides an opportunity for those being trained to read their Practice Sets and practice making scoring decisions.

We also suggest that scorers practice on only one or two student responses at a time, stopping and reviewing the correct score(s) before moving on to the next. The Scoring Leader may read and discuss the annotations and marginalia in their copies of the Practice Sets, or may resume the videotape at appropriate intervals. Several short practice segments followed by review maximizes the opportunity to learn by doing and assists in building scorer skill and confidence.

Q: As a Scoring Leader, how should I prepare to train Table Facilitators and Scorers?

A: Training procedures and the logistics of live scoring are covered in the Scoring Leader Handbook, which should be read thoroughly before training. You should also review your Scoring Guide and Practice Set while viewing the videotape.

Q: How should a response be scored if it’s entirely blank, or it says the student refuses to answer, or it’s written in another language?

A: A list of Condition Codes can be found near the back of the Scoring Guide, and the Scoring Leader andbHandbook contains the procedures for assigning such codes. Responses written in another language should always be scored as an "E" even if the Scoring Leader or another scorer understands the other language, since the test is intended to assess English communication skills.

Q: Sometimes a student will respond to some but not all items. What Overall score should such responses receive?

A: Near the back of the Scoring Guide is a list of Scoring Considerations. These outline the effect of missing responses on the Overall score.

Q: Suppose a student leaves the short responses blank and answers only the extended response, but in the extended response clearly demonstrates understanding of all of the questions posed in the other items? Since the Overall score is supposed to holistically reflect the understanding of the student, can such a response receive a "4"?

A: No. If only the extended response is answered, the Scoring Considerations limit the Overall score to a "2."

Q: When training is over, should scorers refer to the training materials while scoring actual student responses?

A: YES! To maintain accuracy and consistency in scoring, it is very helpful to refer occasionally to the student responses used in the training materials as examples of the various score points. These responses are often called "anchor papers" because they help to fix the acceptable range within a score point and prevent the scorer from "drifting" higher or lower in their expectations for awarding a score point.

Q: I understand that holistic scoring involves weighing and balancing various factors. What are these factors, and what weight should be given to each?

A: The scoring rubric addresses the factors that should be considered in determining the score of a response by listing characteristics that tend to occur among the score points. These characteristics reflect the degree to which focus, development, organization, and writing style are found within a response. Focus is how well the response fulfills the requirements of the task and the connections to the task found within the response. Development is how much information is presented: the details, the specifics, and the amount of elaboration on ideas. Organization is the order in which the information is presented. Does one idea logically follow another? If it’s a narrative, how tight is the sequence of events? Writing style generally concerns word choice and sentence patterns. How fluent is the response? Is it easy to read? Writing style should not be confused with writing mechanics. Style concerns what word is used, whereas mechanics concerns how the word is spelled. Style looks at how the sentence patterns create a flow of ideas, while mechanics looks at how the sentences are punctuated. Remember that writing mechanics is scored separately and should not be a factor in scoring Independent Writing. In assigning a score to an Independent Writing response, all relevant factors should be assessed. However, the most important factor by far, and the one accorded the most weight, is DEVELOPMENT. The amount of development is central to each score point. How much information are we being given? What are the details and the specifics? Are ideas or events elaborated and expanded upon? Development is not only important in and of itself; it also impacts the other factors. There must be a certain amount of information presented for a scorer to be able to assess a response’s focus, organization, and fluency. Caution: development is not synonymous with length! Obviously, the process of presenting the amount of information necessary to get to a higher score point will result in longer responses. However, note that the training materials contain several examples where a response that appears longer receives a lower score than a response that appears shorter. For example, in the Writing Scoring Guide #8 is a high "2" that covers a page and a half, while #10 is a low "3" that covers only one page. Handwriting size obviously makes a difference, but other considerations also come into play. Repetition will make a response appear longer, as in Writing Practice Set #3, #4, and #6. Word choice can also affect development. Specific and/or vivid words pack more information into less space. One example is in Writing Scoring Guide #5, where the word "satisfied" paints a mental picture of a dog that is full, comfortable, happy, and unlikely to run away.

Q: Our scorers are experienced teachers who adhere to certain standards in their classrooms. Some scorers may find it difficult to follow the standards set by the rubric and the training materials if those standards seem higher or lower than those used by the scorer in the classroom. How should I advise a scorer who hesitates to apply the standards appropriate for this test?

A: We value the classroom experience of our scorers, and we realize that some variation of expectations may exist between districts, schools, and individuals. However, it is very important that all scorers separate their classroom expectations from the standards used in scoring this statewide test. Every scorer should use the same standards in applying the rubric to student responses. Uniform standards in scoring are crucial to obtaining the consistency and accuracy necessary for a valid assessment of student performances across the entire state. Accurate assessments ultimately benefit everyone.

Q: How can a scorer avoid "drifting" from the correct standards while scoring?

A: After scoring a number of responses, a scorer may gradually, even unconsciously, begin to accept more or less than is appropriate in awarding a particular score point. This could result in scoring inequity, where a student response could receive a different score from the same person depending on when it was scored. To maintain the consistency and accuracy of all scores, it is important to prevent any "drift" in scoring standards. This is best accomplished by frequent reference to the "anchor papers" in the training materials, and by encouraging scorers to consult their Table Facilitators or Scoring Leaders with responses that seem on the line between two score points.

Q: What if I should encounter a response where the student indicates that he or she is in a crisis situation and needs intervention? How should such sensitive responses be handled?

A: Sometimes a student in a difficult situation will use the test as an opportunity to reach out and ask for help. The Scoring Leader Handbook and the School Administrator Manuals have information on the procedures to be followed if such a response is encountered. Scorers should be instructed to bring such responses to the immediate attention of the Scoring Leader.

Q: What if a student puts the correct information for a response on a different page, such as the planning page, instead of on the correct response page?

A: If the response page is blank, it must be scored to reflect that it is blank. However, if a student indicates graphically on the correct response page that a response is written or continued onto another page, then the scorer can follow the student’s instructions and consider the information on the indicated page.

Q: The rubric says a "4" will have "vivid language and a sense of engagement or voice." Where in each of the "4"s in the training materials can I find examples of vivid language and voice?

A: Not all "4"s will have vivid language or a sense of engagement. However, the precision of language and the manner of expression can be factors in strengthening a response if all of the other elements are present. Voice, where the personality of the student shows itself in the manner of expression, is like a cherry on the sundae. The sundae must be there first before the cherry can be seen as adding anything substantial. Keep in mind also that what is vivid language or voice for an elementary school student may be different from what you or I may consider to be vivid. An example of voice on the fourth grade level may be seen in Practice Set #7c: "He realized nothing is better than himself."

Q: On borderline calls, when deciding between adjacent score points, should the scorer always give the "benefit of the doubt" to the student and award the higher score?

A: No. Such a practice can result in scoring "drift." After scoring a number of responses, a scorer may gradually, even unconsciously, begin to accept less (or demand more) than is appropriate in awarding a particular score point. Scoring "drift" can create an unfair situation where a student response could receive a different score from the same scorer depending on when the response was scored. To prevent "drift" and maintain the consistency and accuracy of all scores, it is helpful to refer occasionally to the student responses used in the training materials as examples of the various score points. These responses are often called "anchor papers" because they help to fix the acceptable range within a score point and prevent the scorer from "drifting" higher or lower in their expectations for awarding a score point. Scorers should also be encouraged to consult their Table Facilitators and Scoring Leaders with responses that seem on the line between two score points.

READING

Q: Practice Set #5 seems comparable to Practice Set #4 in many ways, and its response to Item 34 seems clearer than that in #4. Why does #4 receive a higher Overall score than #5?

A: These two responses provide a good opportunity to practice the process of holistic scoring, and should also provide useful reference when making difficult decisions between score points. To determine the Overall score, all four responses must be considered, and the strengths and weakness of each and all should be assessed. The responses to Item 33 in #4 and #5 are comparable, and Item 34 in #5 is clearer in its connection between the event and the change of attitude than #4. However, Item 35 in #4 contains more correct information than in #5, and the extended response, Item 36, is significantly stronger in #4 than in #5. The extended response in #5 has text support only from the story, while in #4 there is text support from both the story and the poem. Weighing and balancing various factors for each response is necessary when determining which Overall score is most appropriate.

Q: For Item 34, all acceptable responses in the Scoring Guide and Practice Set involve a contrast between the ideas of playful (or nice) and dangerous. Are any other contrasts or comparisons acceptable?

A: The training materials were assembled using student responses from a field test. This provided a limited sample size. Given the large number of students who will be taking the actual test, it is certainly possible that other valid and acceptable contrasts or comparisons may be encountered in scoring the responses. If a response raises a point that seems valid but was not addressed in the training materials, please call the Helpline for advice and assistance. The toll free number is on the last page of the Scoring Leader Handbook.

Q: In Practice Set #3, Item 34 gets a "2" in Analytics, while in Practice Set #7, Item 34 only gets a "1." The responses seem very similar. Why does #3 get a higher Analytics score?

A: In #7, no connection is made between an event and the change of attitude, which is necessary for a "2" in Analytics. Note that the response does not mention falling in the water, so the reference to fishbait is unclear. However, the response in #3 makes that connection between the event of falling in and the change of attitude.

Q: Do the answers in Item 35 have to be in separate boxes to receive credit?

A: No. The answers can all be in one box, or outside the boxes, as long as they are on the appropriate response page.

Q: If a similarity listed for Item 35 comes from the graphics rather than the text, can it be accepted? An example might be "a whale’s flippers look like a bird’s wings."

A: Yes. Comparisons based on graphics are acceptable.

Q: The response to Item 35 in Scoring Guide #8 receives credit for "large and grand" as a comparison. This may describe whales, but doesn’t seem to apply to birds. Sparrows are small! Why is this a valid response?

A: The student’s phrase is supported by the text since it is a direct quote from the poem. The description could reasonably be applied to some birds, such as eagles and ostriches, and is therefore not incorrect. A phrase from the poem that could be applied to birds as well as to whales is acceptable. The same rationale is used for accepting comparisons in environment (sea versus air) implied by the poem.

Q: The extended response (Item 36) in Practice Set #3 doesn’t seem to have much engagement or voice. Why is this worthy of an Overall score of "4"?

A: Practice Set #3 is a low "4" that should be helpful for comparison purposes when making scoring decisions between a "3" and a "4". The Overall score is assigned based on all four responses. Not all "4"s will have engagement or voice. That is just one factor that you may find in a "4." Please remember that the process of holistic scoring involves weighing and balancing various factors and assessing the relative strengths and weaknesses of a student’s response. In Practice Set #3, Items 33 and 34 both contain solid detail from the text. Text detail from the poem is used to clearly imply comparison in Item 35. In the extended response, Item 36, the student presents a concise and coherent sequence of ideas, synthesizing text information with information outside the text to create a new idea. This demonstrates insight and a clear understanding of the task. The overall strengths of this student’s responses deserve a low "4."

WRITING

Q: The student responses in Scoring Guide #5 and Scoring Guide #8 are very different from each other. The type of organization used is different, and Scoring Guide #8 has much more development than #5, yet they both receive the same score of "2." Why is there such a wide range within the same score point?

A: Given the large number of students taking the test and the three-point scale used to evaluate on-topic responses, each score point will necessarily encompass a wide variety of response types. For this reason, the training materials provide examples of student responses representing the lower and the higher end of each score point. In the Scoring Guide, #4 is a high "1," #5 is a low "2," #8 is a high "2," and #9 is a low "3." In the Practice Set, #1 and #4 are high "1"s, #2 is a low "2," #8 is a high "2," and #7 is a low "3." These responses demonstrate the range and the perimeters of the score points.

Q: Some student responses do not resemble any of those seen in training, and others seem to fall between the examples of the score points in the training materials. For example, a response may be better than Guide #8 and Practice #8, but not quite as good as Guide #9 and Practice #7. How should such responses be scored?

A: The scoring rubric in the Scoring Guide should be the starting point for evaluating all responses. The rubric addresses the factors that are used in assessing the relative strengths and weaknesses of a response, and the examples in the training materials demonstrate how the rubric has been applied to various student responses. Since the student responses in the training materials were taken from a field test, there was a limited sample from which to draw examples. Since a large number of students are taking the actual test, we can expect to see a much wider variety of responses than those encountered during training. The rubric lists characteristics that tend to occur among the various score points. These characteristics address the qualities of focus, development, organization, and writing style found within a response. However, many responses deserving of a particular score point will not have all of the characteristics listed in the rubric for that score point. Many contain elements of more than one score point. For example, some "1"s exhibit good word choice or good sentence structure or some evidence of organization, but are too brief to receive a higher score. Many "3"s will not have vivid language or stylistic sophistication. The holistic scoring process involves a weighing and balancing of various factors in evaluating the relative strengths and weaknesses of a response. If a scorer can’t assign a score to a response after consulting the "anchor papers" in the training materials and weighing the response’s relative strengths and weaknesses, the Table facilitator or Scoring Leader should be asked. (Follow the procedure for your scoring site). The Scoring Leader may FAX the response and call the toll-free Helpline for advice and assistance if necessary. The telephone numbers for the toll-free Hotline and the FAX line can be found in the back of the Scoring Leader Handbook.

Q: If a response fulfills all of the other requirements for receiving a "3" but does not address what happened after the problem was solved, should I score the response as a "2"? Such a response does not cover all of the bullets listed in the prompt, and therefore does not "fulfill the requirements of the task" as the rubric specifies for a score of "3."

A: The writing prompt and the instructions are intended to give the student a topic and a launching point from which to write. An overly restrictive interpretation of what the instructions require can defeat the purpose of the test, which is assessing how well the student can write on a sustained basis. Of course, a response must be on-topic to receive higher than a "0," but any response addressing the topic of an animal with a problem should be scored according to its merits. The factors from the rubric should be weighed and balanced. Remember that all characteristics listed for a score point on the rubric will not always be found in every response that deserves that score point.

Q: How should I score a response that tells a story from a book or a movie or some other source? The prompt tells the student to "make up a story or write about something that really happened."

A: Any response that addresses the topic of an animal with a problem should be scored according to its relative strengths and weaknesses. It is not necessary for a story to be either original or true; the response need not even be a narrative (see Scoring Guide #5). As noted above, the writing prompt and the instructions are intended to give the student a subject and a launching point from which to write. An overly restrictive interpretation of what the instructions require can defeat the purpose of the test, which is assessing how well the student can write on a sustained basis. A student who retells a story already known is still performing the tasks of focusing, developing, organizing, and verbalizing the story. These are the skills we are evaluating.

WRITING MECHANICS

Q: Sometimes a student will respond to one or two but not all three extended response items. What Writing Mechanics score should such responses receive?

A: Near the back of the Scoring Guide is a list of Scoring Considerations. These outline the effect of missing responses on the Writing Mechanics score. At least two responses must be present for a Writing Mechanics score to be assigned. If only one extended response is present, it should be scored an Insufficient.

Q: If a student answers two or three of the extended responses, but the responses are very brief, how should the Writing Mechanics be scored? Are a few short but clean sentences enough to earn a "3"?

A: A score of "3" in Writing Mechanics indicates that the student has demonstrated control of the conventions of written English. A certain amount of sustained writing is necessary before such control can be demonstrated. While a "3" does not have to be as lengthy as Writing Mechanics Guide #4 or Practice Set #3, it should contain enough substance to show that the student has acquired control of the conventions of written English. Remember that we are considering and balancing what they attempt and what they accomplish. Therefore, not only the length of the response, but also the complexity of the words and sentence structures are important. As with any question during live scoring, if a scorer is not sure what score to assign, the Table Facilitator or Scoring Leader should be asked. The Scoring Leader may FAX the response and call the toll-free Helpline for advice and assistance if necessary. The telephone numbers for the toll-free Hotline and the FAX line can be found in the back of the Scoring Leader Handbook.

Q: Can we make notes on the actual responses to keep track of errors as we read?

A: No. Scorers should never write on the actual student responses. The margin notes in the training materials were to assist with recognizing and identifying errors. If you find it necessary to keep a running tally, please do so on a separate piece of paper. Please remember that there is no ratio or formula for scoring mechanics. A strictly numerical approach to errors can result in inaccurate scoring. Not only the number of errors, but also the length and complexity of the responses must be considered, and the degree to which the errors affect one’s ability to read and comprehend the responses. Although some scorers may find it helpful to count errors to assist in their final determination of a score, please remember that the density and severity of the errors are what determines the score for Writing Mechanics.

Q: The scoring rubric says a "2" will have errors that interfere with readability but don’t substantially interfere with comprehension. What is the difference between readability and comprehension?

A: Readability is how easy it is to read through the response, whereas comprehension is how well you can understand what is written. A helpful technique may be to ask yourself how much effort it takes to determine what the response is saying. Do the errors merely slow you down while you mentally "fix" them, or are you having to stop and puzzle out what the response means to say?

Q: Is there a hierarchy of errors? Are some errors always more serious than others?

A: No. The important factor is not what type of error it is, but how the error affects readability and comprehension. While it is true that some types of errors are more likely to affect the clarity of writing, one can’t say that all errors of a certain type are more or less serious than all errors of another type. Consider spelling. A minor misspelling or a homonym may be easy to read through and determine meaning, while a serious misspelling of a key word can render an entire passage incomprehensible.

Q: Some student test books indicate that the student has been given testing modifications in an area involving Writing Mechanics, or someone has scribed the response for the student. How should such responses be scored?

A: Special considerations must be followed. For example, if a notation on the test book indicates that the student is exempt from spelling requirements, misspelled words should have been crossed out and correctly written at the home school. (If this has not been done the Table Facilitator or Scoring Leader should be consulted.) The scorer should score the corrected response. If there is a notation that someone has scribed the response for the student, the scorer should assume that the student has directed the scribe’s use of spelling, punctuation, and other mechanical considerations, or that the student has been provided with the testing modifications specified in the IEP. A scribed response should be scored on its merits as written, as any other response would be scored.

LISTENING

Q: Should the information in Item 29, the graphic organizer, be listed in any particular order? Do we count it wrong if it is out of sequence?

A: No. The information can even be outside the boxes and still be considered, as long as it is on the correct response page.

Q: In Item 29, some students may provide all of the correct information asked for, but include some incorrect information. Can we ignore what’s wrong and give credit only for what’s right?

A: No. Incorrect information indicates that the student did not fully understand the task. The score assigned in Listening is intended to holistically reflect the degree of understanding exhibited of the folktale. Incorrect information would affect the degree of understanding demonstrated by the student.

Q: In Item 30, the instructions ask for why Toshiyuki changed "each time." Does this mean the student has to indicate why each separate change took place to receive a "4"?

A: No. See Scoring Guide #12, which receives a low "4" and gives only one reason for the changes. One effective explanation with text support is sufficient to fulfill the requirement.

Q: In Item 31, "insight" seems to be important for the higher score points. Is it enough to merely paraphrase the lesson learned by Toshiyuki? What is insight and how do I recognize it?

A: Insight is when a student takes information from the text and adds outside information to it to create a new idea. When a student’s response takes relevant details from the text brings in relevant information from outside of the text, such as the student’s own experience, and makes strong connections to a relevant idea not explicit in the text, insight is demonstrated. Examples in the training materials include "important," "special," and "The most powerful thing is you."

Q: What if an extended response, Item 31, makes a good relevant connection at the beginning of the response rather than at the end? Would that affect the score?

A: Not necessarily. The fact that such a connection is made is much more important than where it is found. Of course, for the higher score points, organization is a factor. However, organization looks at whether the ideas presented are in a coherent and logical sequence. It is not necessary that a key idea be in any particular place within an essay.

Q: Some responses seem to be on the line between the characteristics of adjacent score points on the scoring rubric. How do I decide if a cluster of items deserves one score point or another?

A: When making difficult scoring decisions, it is helpful to consult the training materials. The examples used in the Listening Guide and the Practice Set show how the rubric has been applied to student responses. The responses representative of the upper and lower ends of each score point are valuable reference in making scoring decisions. In the Scoring Guide, #4 is a high "1," #5 is a low "2," #8 is a high "2," #9 is a low "3," #11 is a high "3," and #12 is a low "4."