One of the great professional highlights of my career of twenty-plus years as an educator has been getting to know some of the giants in the field on a first name basis such as Jim Popham, John Hattie, Larry Ainsworth, and others. One of the most influential has certainly been Dr. Jim Popham, Professor Emeritus at UCLA. The tenor in which Dr. Popham has challenged me on issues related to assessment literacy have been some of the greatest points for my professional growth beyond anything a Master’s or Doctor of Philosophy degree have afforded.
This coincides with an unprecedented bastardization of the use of educational assessment to name, shame, and blame educators regardless of the adverse effects on the very results some of these assessments claim to improve. One such case of shabby assessment use is what has come to be known as the “Third Grade Gate.” This assessment has most recently been implemented as a part of a state law in Mississippi as part of what has come to be known as the Literacy-Based Promotion Act. The premise is that students in 3rd grade must take and score (at the time of this writing the score has not been set and it’s March) a certain number on a standardized reading assessment or be retained in 3rd grade. Thus the name, “Third Grade Gate.”
Now when I look to Merriam-Webster Online for the word “Gate,” I find it to mean “a city or castle entrance often with defensive structures (as towers).” This is hardly the picture I would suppose the pioneers of educational assessment intended.
One issue with the 3rd grade assessment is the disingenuous ways in which these results can be reported to parents and the public. The idea of using a single assessment as the sole determination of whether a student is reading on grade level is problematic. From what I understand, the scale score - not yet set - will be reported (somewhere in the neighborhood of 530). In addition, the “screener” also reports the grade-equivalent score as well. In the study of tests (psychometrics), it is widely accepted that a test scale score, and especially a grade-equivalent score, is an estimate of actual performance in the skill to be measured. The problem with grade-equivalent scores is these scores are created on the basis of estimation, not real test-score data. This being the case, if a 3rd grader gets a grade-equivalent score of 7.5, it is not accurate to say, “The 3rd Grader is doing well in 7th grade reading.” It is more accurate to say that a grade-equivalent score of 7.5 is an estimate of how an average 7th grader “might” have performed on the 3rd Grader’s reading test (p. 336).” (Popham, 2014)
The key take-away here being that these estimations that we may be using to decide as to whether a student is retained or not in 3rd grade could be in error based on issues with sampling and the nature of estimates. Now I assume that at best, all test related inferences are estimations, but are these estimations worthy of being the sole factor as to whether students (or teachers for that matter) are retained?
Another problem with this paltry policy is the awful and persistently negative impact that “retention” has on students in the first place. How many more studies do we need to show that having students repeat a grade has a negative impact on learning? John Hattie conducted one of the largest syntheses of educational research of all time entitled Visible Learning (2009, 2012). Hattie found that “Retention,” or having low performing students repeat a grade, was one of the few things that we do in education that in every single study had a negative impact on student learning (effect size = -0.16). When a student repeats a single grade, the chances of that student dropping out of school doubles, and when the student is retained twice, this all but ensures the student will drop out of school. (Foster, 1993) This is based on a synthesis of over 207 studies on the subject.
Please understand that I am passionately committed to student learning. Going further, I am passionately in favor of the measurement of that student learning by instructionally sensitive assessments that provide teachers, students and parents important information about learning, or the lack thereof. But, what we must question is if tests, and tests alone, can truly take the place of human judgment when it comes to the evaluation of human beings? In addition, is the trade off of using single estimations of ability to decide if we inflict one of the most harmful practices on kids we can worth the risk of getting it wrong?
What are your thoughts?
This coincides with an unprecedented bastardization of the use of educational assessment to name, shame, and blame educators regardless of the adverse effects on the very results some of these assessments claim to improve. One such case of shabby assessment use is what has come to be known as the “Third Grade Gate.” This assessment has most recently been implemented as a part of a state law in Mississippi as part of what has come to be known as the Literacy-Based Promotion Act. The premise is that students in 3rd grade must take and score (at the time of this writing the score has not been set and it’s March) a certain number on a standardized reading assessment or be retained in 3rd grade. Thus the name, “Third Grade Gate.”
Now when I look to Merriam-Webster Online for the word “Gate,” I find it to mean “a city or castle entrance often with defensive structures (as towers).” This is hardly the picture I would suppose the pioneers of educational assessment intended.
One issue with the 3rd grade assessment is the disingenuous ways in which these results can be reported to parents and the public. The idea of using a single assessment as the sole determination of whether a student is reading on grade level is problematic. From what I understand, the scale score - not yet set - will be reported (somewhere in the neighborhood of 530). In addition, the “screener” also reports the grade-equivalent score as well. In the study of tests (psychometrics), it is widely accepted that a test scale score, and especially a grade-equivalent score, is an estimate of actual performance in the skill to be measured. The problem with grade-equivalent scores is these scores are created on the basis of estimation, not real test-score data. This being the case, if a 3rd grader gets a grade-equivalent score of 7.5, it is not accurate to say, “The 3rd Grader is doing well in 7th grade reading.” It is more accurate to say that a grade-equivalent score of 7.5 is an estimate of how an average 7th grader “might” have performed on the 3rd Grader’s reading test (p. 336).” (Popham, 2014)
The key take-away here being that these estimations that we may be using to decide as to whether a student is retained or not in 3rd grade could be in error based on issues with sampling and the nature of estimates. Now I assume that at best, all test related inferences are estimations, but are these estimations worthy of being the sole factor as to whether students (or teachers for that matter) are retained?
Another problem with this paltry policy is the awful and persistently negative impact that “retention” has on students in the first place. How many more studies do we need to show that having students repeat a grade has a negative impact on learning? John Hattie conducted one of the largest syntheses of educational research of all time entitled Visible Learning (2009, 2012). Hattie found that “Retention,” or having low performing students repeat a grade, was one of the few things that we do in education that in every single study had a negative impact on student learning (effect size = -0.16). When a student repeats a single grade, the chances of that student dropping out of school doubles, and when the student is retained twice, this all but ensures the student will drop out of school. (Foster, 1993) This is based on a synthesis of over 207 studies on the subject.
Please understand that I am passionately committed to student learning. Going further, I am passionately in favor of the measurement of that student learning by instructionally sensitive assessments that provide teachers, students and parents important information about learning, or the lack thereof. But, what we must question is if tests, and tests alone, can truly take the place of human judgment when it comes to the evaluation of human beings? In addition, is the trade off of using single estimations of ability to decide if we inflict one of the most harmful practices on kids we can worth the risk of getting it wrong?
What are your thoughts?