CAT-4 – A Floor, But Not a Ceiling

The Transparency Files: CAT*4 Results Part 3 (of 3)

Welcome to “Part III” of our analysis of this year’s CAT4 results!

We only began taking the CAT*4 at this window of time in 2019 in Grades 3-8.
We did NOT take the CAT*4 in 2020 due to COVID.
We only took the CAT*4 in Grades 5-8 in 2021.
We resumed taking the CAT*4 in Grades 3-8 in 2022.

In the future, that part (“Part II”) of the analysis will only grow more robust and meaningful. We also provided targeted analysis based on cohort data.

Here, in Part III, we will finish sharing comparative data, this time focusing on snapshots of the same grade (different groups of children). Because it is really hard to identify trends while factoring in skipped years and seismic issues, unlike in Part II where we went back to 2019 for comparative purposes, we are only going focus on four grades that have multiyear comparative data post-COVID: Grades 5-8 from 2021, 2022, and 2023.

Here is a little analysis that will apply to all four snapshots:

Remember that any score that is two grades above ending in “.9” represents the max score, like getting a “6.9” for Grade 5.
Bear in mind, that the metric we are normally looking at when it comes to comparing a grade is either stability (if the baseline was appropriately high) or incremental growth (if the baseline was lower than desired and and the school responded with a program or intervention in response).
In 2023 we took it in the “.1” of the school year and in all prior years in the “.2”. For the purposes of this analysis, I am to give or take “.1”.

Here are the grade snapshots:

What can we learn from Grade 5 over time?

Remember these are different children taking this test in Grade 5. So even though, say, for “Writing Conventions” in 2022 they “only” scored at grade level and the other two years it maxxed out, you cannot necessarily conclude that something was amiss in Grade 5 in 2022. [You could – and I did – confirm that by referring back to Part II and checking that cohort’s growth over time.]
What we are mostly seeing here is stability at the high end, which is exactly what we hope to see.
Now what might constitute a trend is what we see in “Computation & Estimation” where we began below grade level, have worked hard to institute changes to our program and find a trajectory upwards.

What can we learn from Grade 6 over time?

Again, because these are different children, we have to be careful, but it will be worth paying attention to “Writing Conventions” and “Spelling” to make sure that that this a cohort anomaly and not a grade trend.
We will also be looking for greater stability in “Computation & Estimation”.
Overall, however, high scores and stability for Grade 6.

What can we learn from Grade 7 over time?

Extremely high scores with reasonably high stability!
We’ll keep an eye on “Computation & Estimation” which, although high the last two years, is a bit all over the place by comparison.

What can we learn from Grade 8 over time?

Extremely high scores with high stability.
We’ll need a few more years of data to speak more authoritatively, but a snapshot of where all our students are by their last year at OJCS has to reassuring for our current parents and, hopefully, inspiring to all those who are considering how OJCS prepares its graduates for high school success.

Current Parents: CAT4 reports will be timed with report cards and Parent-Teacher Conferences. Any parent for whom we believe a contextual conversation is a value add will be folded into conferences.

The bottom line is that our graduates – year after year – successfully place into the high school programs of their choice. Each one had a different ceiling – they are all different – but working with them, their families and their teachers, we successfully transitioned them all to the schools (private and public) and programs (IB, Gifted, French Immersion, Arts, etc.) that they qualified for.

And now again this year, with all the qualifications and caveats, our CAT*4 scores continue to demonstrate excellence. Excellence within the grades and between them.

Not a bad place to be as we enter the second week of the 2024-2025 enrollment season…with well over 50 families already enrolled.

The Transparency Files: CAT*4 Results Part 2 (of 3)

Welcome to “Part II” of our analysis of this year’s CAT*4 results!

We only began taking the CAT*4 at this window of time in 2019 in Grades 3-8.
We did NOT take the CAT*4 in 2020 due to COVID.
We only took the CAT*4 in Grades 5-8 in 2021.
We resumed taking the CAT*4 in Grades 3-8 in 2022.

This means that there are only five cohorts that have comparative data – this year’s Grades 4-8. And only two of those cohorts have comparative data beyond two years – this year’s Grades 7-8. It is hard to analyze trends with without multiple years of data, but we’ll share what we can.

Here is a little analysis that will apply to all five snapshots:

Remember that any score that is two grades above ending in “.9” represents the max score, like getting a “6.9” for Grade 5.
Bear in mind, that the metric we are normally looking at when it comes to comparing a cohort over time is whether or not we see at least one full year’s growth (on average) each year – here we are factoring an expected two full year’s growth between 2019 and 2021. [Feel free to refer to prior years’ results for specific analyses of both “COVID Gaps” and “COVID Catch-Ups”.]
In 2023 we took it in the “.1” of the school year and in all prior years in the “.2”. If we are being technical, therefore, “.9” would actually be the truest measure of growth since the time frame is “.1” less. For the purposes of this analysis, I am going round “.9” up and consider it a “year’s” worth of growth.

Here are the cohort snapshots:

What does this snapshot of current Grade 4s reveal?

Huge growth in Reading, Vocabulary and Writing Conventions.
Better context for Spelling. Last week, we shared that Grade 4 Spelling (3.4) was one of only two instances out of thirty-six of scoring below grade-level across the whole school. Here we can see that despite that (relatively) “low” score that annual growth is intact. That’s the positive. On the other hand, in order for this score to fully catch up to our school’s expectations, it will have grow more than one year at a time over the next few years.
Better context for Math. Although both of this year’s current scores are above grade-level expectation, we did not see the growth we would expect. This is why we take the tests and provide our teachers with not only the results, but coaching on how to use the results. Our Grade 4 Math Teacher now has the data she needs to help individual students fill gaps and best prepare students for math success in Grade 5.

What does this snapshot of current 5s reveal?

That they are crushing it! Max scores in all, but one category, along with appropriate growth.
Better context for Computation & Estimation. Both scores are well above grade level, almost-appropriate growth from year to the next, and there is still room to grow. Let’s go!

What does this snapshot of current Grade 6s reveal?

Again, overall really strong scores and mostly strong growth.
Better context for Writing Conventions. It may not max out, but we showed more than a year’s worth of growth.
Better context for Spelling. We already knew that Grade 6 Spelling (5.6) was the other of the two instances out of thirty-six of scoring below grade-level across the whole school. Now we know that it went down. Hmmm…this could be an anomaly. This is why we keep anecdotal records; maybe we’ll learn something about when Grade 6 took this section that helps explain the results. Or maybe it is something. Our Middle School Language Arts Teacher will be on it.
Better context for Computation & Estimation. Again, it didn’t max out, but we can see huge growth from last year.

What does this snapshot of current Grade 7s reveal?

That they and their teachers are crushing it!
Better context for Computation & Estimation. It shows that even though this score is lower than their other max scores, while still being above grade-level, it grew more than a year’s worth from last year.

No analysis of current Grade 8s needed, just appreciation for three years of near perfection. Not a bad advertisement for OJCS Middle School.

To sum up this post, we have so much to be proud of in the standardized test scores of these particular cohorts over time. The Math and Language Arts Teachers in Grades 3-8 have now begun meeting to go through their CAT*4 results in greater detail, with an eye towards what kinds of interventions are needed now – in this year – to fill any gaps (both for individual students and for cohorts); and how might we adapt our long-term planning to ensure we are best meeting needs. Parents will be receiving their child(ren)’s score(s) soon and any contextualizing conversations will be folded into Parent-Teacher Conferences.

Stay tuned next week for the concluding “Part III” when we will look at the same grade (different students) over time, see what additional wisdom is to be gleaned from that slice of analysis, and conclude this series of posts with some final summarizing thoughts.

The Transparency Files: CAT*4 Results Part 1 (of 3)

[Note from Jon: If you have either read this post annually or simply want to jump to the results without my excessive background and contextualizing, just scroll straight to the graph. Spoiler alert: These are the best results we have ever had!]

Each year I fret about how to best facilitate an appropriate conversation about why our school engages in standardized testing (which for us, like many independent schools in Canada, is the CAT*4, but next year will become the CAT*5), what the results mean (and what they don’t mean), how it impacts the way in which we think about “curriculum” and, ultimately, what the connection is between a student’s individual results and our school’s personalized learning plan for that student. It is not news that education is a field in which pendulums tend to wildly swing back and forth as new research is brought to light. We are always living in that moment and it has always been my preference to aim towards pragmatism. Everything new isn’t always better and, yet, sometimes it is. Sometimes you know right away and sometimes it takes years.

The last few years, I have taken a blog post that I used to push out in one giant sea of words, and broke it into two, and now three parts, because even I don’t want to read a 3,000 word post. But, truthfully, it still doesn’t seem enough. I continue to worry that I have not done a thorough enough job providing background, research and context to justify a public-facing sharing of standardized test scores. Probably because I haven’t.

And yet.

With the forthcoming launch of Annual Grades 9 & 12 Alumni Surveys and the opening of the admissions season for the 2024-2025 school year, it feels fair and appropriate to be as transparent as we can about how well we are (or aren’t) succeeding academically against an external set of benchmarks, even as we are still facing extraordinary circumstances. [We took the text just a couple of weeks after “October 7th”.] That’s what “transparency” as a value and a verb looks like. We commit to sharing the data and our analysis regardless of outcome. We also do it because we know that for the overwhelming majority of our parents, excellence in secular academics is a non-negotiable, and that in a competitive marketplace with both well-regarded public schools and secular private schools, our parents deserve to see the school’s value proposition validated beyond anecdotes.

Now for the annual litany of caveats and preemptive statements…

We have not yet shared out individual reports to our parents. First our teachers have to have a chance to review the data to identify which test results fully resemble their children well enough to simply pass on, and which results require contextualization in private conversation. Those contextualizing conversations will take place in the next few weeks and, thereafter, we should be able to return all results.

There are a few things worth pointing out:

Because of COVID, this is now only our fifth year taking this assessment at this time of year. We were in the process of expanding the range from Grades 3-8 in 2019, but we paused in 2020 and restricted 2021’s testing to Grades 5-8. So, this is the second year we have tested Grades 3 & 4 on this exam at this time of year. When we shift in Parts 2 & 3 of this analysis to comparative data, this will impact who we can compare when analyze the grade (i.e. “Grade 5” over time) or the cohort (i.e. the same group of children over time).
Because of the shift next year to the CAT*5, it may be true that we have no choice, but to reset the baseline and (again) build out comparative data year to year.
The ultimate goal is to have tracking data across all grades which will allow us to see if…
- The same grade scores as well or better each year.
- The same cohort grows at least a year’s worth of growth.

The last issue is in the proper understanding of what a “grade equivalent score” really is.

Grade-equivalent scores attempt to show at what grade level and month your child is functioning. However, grade-equivalent scores are not able to show this. Let me use an example to illustrate this. In reading comprehension, your son in Grade 5 scored a 7.3 grade equivalent on his Grade 5 test. The 7 represents the grade level while the 3 represents the month. 7.3 would represent the seventh grade, third month, which is December. The reason it is the third month is because September is zero, October is one, etc. It is not true though that your son is functioning at the seventh grade level since he was never tested on seventh grade material. He was only tested on fifth grade material. He performed like a seventh grader on fifth grade material. That’s why the grade-equivalent scores should not be used to decide at what grade level a student is functioning.

Let me finish this section by being very clear: We do not believe that standardized test scores represent the only, nor surely the best, evidence for academic success. Our goal continues to be providing each student with a “floor, but no ceiling” representing each student’s maximum success. Our best outcome is still producing students who become lifelong learners.

But I also don’t want to undersell the objective evidence that shows that the work we are doing here does in fact lead to tangible success. That’s the headline, but let’s look more closely at the story. (You may wish to zoom in a bit on whatever device you are reading this on…)

A few tips on how to read this:

We normally take this exam in the “.2” of each grade-level year, but this year we took at at the “.1”. [This will have a slight impact on the comparative data.] That means that “at grade-level” [again, please refer above to a more precise definition of “grade equivalent scores”] for any grade we are looking at would be 5.1, 6.1, 7.1, etc. For example, if you are looking at Grade 6, anything below 6.1 would constitute “below grade-level” and anything above 6.1 would constitute “above grade-level.”
The maximum score for any grade is “.9” of the next year’s grade. If, for example, you are looking at Grade 8 and see a score of 9.9, on our forms it actually reads “9.9+” – the maximum score that can be recorded.
Because of when we take this test – approximately one-two months into the school year – it is reasonable to assume a significant responsibility for results is attributable to the prior year’s teachers and experiences. But, of course, it is very hard to tease it out exactly, of course.

What are the key takeaways from these snapshots of the entire school?

Looking at six different grades through six different dimensions there are only two instances out of thirty-six of scoring below grade-level: Grades 4 (3.4) and 6 (5.6) Spelling. This is the best we have ever scored! Every other grade and every other subject is either at or above or way above.
For those parents focused on high school readiness, our students in Grades 7 & 8 got the maximum score that can be recorded for each and every academic category except for Grade 7 Computation & Estimation (7.6). Again, our Grade 8s maxxed out at 9.9 across the board and our Grades 7s maxxed out at 8.9 across the board save one. Again, this is – by far – the best we have ever scored.

It does not require a sophisticated analysis to see how exceedingly well each and every grade has done in just about each and every section. In almost all cases, each and every grade is performing significantly above grade-level. This is a very encouraging set of data points.

Stay tuned next week when we begin to dive into the comparative data. “Part II” will look at the same cohort (the same group of students) over time. “Part III” will look at the same grade over time and conclude this series of posts with some additional summarizing thoughts.

The Transparency Files: CAT4 Results Part 3 (of 3)

Welcome to “Part III” of our analysis of this year’s CAT4 results!

In Part I, we provided a lot of background context and shared out the simple results of how we did this year. In Part II, we began sharing comparative data, focusing on snapshots of the same cohort (the same group of children) from 2019 to 2021 (with bonus data from 2018’s Grade 3). Remember, based on which grades have taken the CAT4 when, we were only able to compare at the cohort level from 2019’s Grades 3-5 to 2021’s Grades 5-7 to 2022’s Grades 6-8. [Remember, that we did not take them at all in 2020 due to COVID.] In the future, that part of the analysis will only grow more robust and meaningful. We also provided targeted analysis based on cohort data.

Here, in Part III, we will finish sharing comparative data, this time focusing on snapshots of the same grade (different groups of children). We are able, now, to only provide data on Grades 5-8 (from 2019, 2021, & 2022, with bonus data from 2018’s Grade 6), but in future years we’ll be able to expand this analysis downwards.

Here is a little analysis that applies to all four snapshots:

Remember that any score that is two grades above ending in “.9” represents the max score, like getting a “6.9” for Grade 5.
We are no longer comparing the same children over time, as when it comes to analyzing a cohort, therefore we aren’t looking for the same kinds of trajectories or patterns in the data. You could make a case – and I might below – that this part of the data analysis isn’t as particularly meaningful, but we go into it open to the idea that there may be patterns or outliers that jump out and warrant a thoughtful response.
As we have mentioned, the jump between 2019 and 2021 might have been the place one would have expected to see a “COVID Gap” (but we largely did NOT) and between 2021 and 2022 one might expect to see a “COVID Catch-Up”.

Here are the grade snapshots:

What do these grade snapshots reveal?

Again, keeping in mind that we are not tracing the trajectory of the same students, outliers like “Spelling” and “Computation & Estimation” for Grade 7 in 2021 help us understand that whatever is happening there is more a function of the cohort than the grade, which means that the remedy or intervention, if needed, has less to do with the curriculum or the program in Grade 7 and more to do with better meeting the needs of that that particular cohort of children. [And you can see how that played out and with what results by cross-checking with the cohort data in Part II.] To be clear we aren’t suggesting that the only explanation for their outlier status is about them that it is the children’s fault! The deeper dive into the data helps clarify that this is not a “Grade 7” issue, it doesn’t absolve us from better understanding or applying a remedy.
You can see a little of the reverse by looking at “Computation & Estimation” in Grade 6. Now, in this case we are only dealing with being at grade-level or above, but you can see that Grade 2021’s relatively higher score (7.7) is an outlier. If the goal was to have each Grade 6 score nearly a grade-and-a-half above – which is certainly doesn’t have to be – you would look at the data and say this is a Grade 6 issue and we’d be looking at how students come out of Grade 5 and what we do in the beginning of Grade 6. Again, this is not about intervening to address a deficit, but I use it to point out how we can use the data to better understand outliers and patterns.
To the degree that this data set is meaningful, the trajectory that feels the most achievable considering we are dealing with different children is what you see in Grade 5 “Computation & Estimation” – small increases each year based on having identified an issue an applying an intervention.
The bottom line is essentially the same as having viewed it through the cohort lens: almost each grade in almost every year in almost each area is scoring significantly above its grade-level equivalencies.

Current Parents: CAT4 reports will be coming home this week. Any parent for whom we believe a contextual phone call is a value add has, or will, be contacted by a teacher.

And now again this year, with all the qualifications and caveats, and still fresh out of the most challenging set of educational circumstances any generation of students and teachers have faced, our CAT4 scores continue to demonstrate excellence. Excellence within the grades and between them.

Not a bad place to be as we open the 2023-2024 enrollment season…

If you want to see how all the dots connect from our first Critical Conversation (Jewish Studies), our second Critical Conversation (French), our CAT4 results, and so much more…please be sure to join us for our third and final Critical Conversation, “The ‘Future’ of OJCS” on Thursday, February 9th at 7:00 PM.

The Transparency Files: CAT4 Results Part 2 (of 3)

Welcome to “Part II” of our analysis of this year’s CAT4 results!

In last week’s post, we provided a lot of background context and shared out the simple results of how we did this year. Here, in our second post, we are now able to begin sharing comparative data, focusing on snapshots of the same cohort (the same group of children) from 2019 to 2021 (with bonus data from 2018’s Grade 3). In other words, for now based on which grades have taken the CAT4 when, we can only compare at the cohort level from 2019’s Grades 3-5 to 2021’s Grades 5-7 to 2022’s Grades 6-8. [Remember, that we did not take them at all in 2020 due to COVID.] In the future, this part of the analysis will only grow more robust and meaningful.

Here is a little analysis that will apply to all three snapshots:

Remember that any score that is two grades above ending in “.9” represents the max score, like getting a “6.9” for Grade 5.
Bear in mind, that the metric we are normally looking at when it comes to comparing a cohort over time is whether or not we see at least one full year’s growth (on average) each year – here we are factoring an expected two full year’s growth between 2019 and 2021. As we discussed last year, that might have been the place one would have expected to see a “COVID Gap” (but we largely did NOT) and between 2021 and 2022 one might expect to see a “COVID Catch-Up”.

Here are the cohort snapshots:

What does this snapshot of current Grade 6s reveal?

They consistently function a full grade if not not more above the expected grade level.
That even with COVID we consistently see at least a year’s worth of growth each year across almost all the topics.
Technically, there is only six month’s worth of growth “Mathematics” (6.9 to 7.5) from 2021 to 2022, but that is already significantly above grade level.
The one domain, Computation & Estimation, where they are barely below grade level (6.0), we can now properly contextualize by noting that they grew from 4.4 in 2021 to 6.0 in 2022 – more than a year’s worth of growth in a year (the year we would expect a bit of “COVID Catch-Up”. This means, that they should be more than on track to match all the rest of their scores being significantly above grade level when they take the text in 2023.

All in all…excellent news and trajectory for our current Grade 6s.

What does this snapshot of current Grade 7s reveal?

Not much! This cohort has maxed out their scores in almost every domain in almost each year! And in the few places they did not, they were still above grade level – like “Spelling” (4.9) and “Computation & Estimation” (5.5) in 2019 – and grew at least a full grade level each year so that by now, in Grade 7, it is max scores all across the board! That is pretty awesome to see.

What does this snapshot of current Grade 8s reveal?

This class had a bit of stranger trajectory, but essentially ends where we would like. “Spelling” took a strange path, beginning way above grade level, plateauing with a dip where we should have seen two years worth of growth, and now fully rebounding to grade level. “Computation” had a more normal curve, but went from being consistently a year below grade level before completely catching up and now being well above.

To sum up this post, we have a lot to be proud of in the standardized test scores of these particular cohorts over time. The two areas (Spelling and Computation & Estimation) that were worthy of prioritization the last couple of years (this year’s Grades 6 & 8) were indeed prioritized. We began providing professional growth opportunities for language arts teachers in our school on Structured Word Inquiry as part of larger conversation about the “Science of Reading”. [Please check out our Director of Special Needs, Sharon Reichstein’s recent post on this issue, which I’ll also have more to say about in Part III.] With regard to Computation & Estimation, we discussed it during last year’s November PD Day which focused on “Data-Driven Decision Making” and it has continued to be a point of emphasis. The results indicate that these efforts have borne fruit.

The Math and Language Arts Teachers in Grades 3-8 have now begun meeting to go through CAT4 results in greater detail, with an eye towards what kinds of interventions are needed now – in this year – to fill any gaps (both for individual students and for cohorts); and how might we adapt our long-term planning to ensure we are best meeting needs.

The Transparency Files: CAT4 Results Part 1 (of 3)

As committed to “transparency” as I am, I find myself growing more and more ambivalent each year about how to best facilitate an appropriate conversation about why our school engages in standardized testing (which for us, like many independent schools in Canada, is the CAT4), what the results mean (and what they don’t mean), how it impacts the way in which we think about “curriculum” and, ultimately, what the connection is between a student’s individual results and our school’s personalized learning plan for that student. It is not news that education is a field in which pendulums tend to wildly swing back and forth as new research is brought to light. We are always living in that moment and it has always been my preference to aim towards pragmatism. Everything new isn’t always better and, yet, sometimes it is. Sometimes you know right away and sometimes it takes years.

I have already taken a blog post that I used to push out in one giant sea of words, and over time broke it into two ,and now three parts, because even I don’t want to read a 3,000 word post. But, truthfully, it still doesn’t seem enough. I continue to worry that I have not done a thorough enough job providing background, research and context to justify a public-facing sharing of standardized test scores. Probably because I haven’t. [And that’s without factoring in all the COVID gaps that come along with it.]

And yet.

With the forthcoming launch of Annual Grades 9 & 12 Alumni Surveys and the opening of the admissions season for the 2023-2024 school year, it feels fair and appropriate to be as transparent as we can about how well we are (or aren’t) succeeding academically against an external set of benchmarks, even as we are still just freshly coming out facing extraordinary circumstances. That’s what “transparency” as a value and a verb looks like. We commit to sharing the data and our analysis regardless of outcome. We also do it because we know that for the overwhelming majority of our parents, excellence in secular academics is a non-negotiable, and that in a competitive marketplace with both well-regarded public schools and secular private schools, our parents deserve to see the school’s value proposition validated beyond anecdotes.

Now for the annual litany of caveats and preemptive statements…

There are a few things worth pointing out:

Because of COVID, this is now only our fourth year taking this assessment at this time of year. We were in the process of expanding the range from Grades 3-8 in 2019, but we paused in 2020 and restricted last year’s testing to Grades 5-8. This means that we can only compare at the grade level from 2019’s Grades 5-8 to 2021’s Grades 5-8 to 2022’s Grades 5-8.
And we can only compare at the cohort level from 2019’s Grades 3-5 to 2021’s Grades 5-7 to 2022’s Grades 6-8.
This is the first year we have tested Grades 3 & 4 on this exam at this time of year.
From this point further, assuming we continue to test in (at least) Grades 3-8 annually, we will soon have tracking data across all grades which will allow us to see if…
- The same grade scores as well or better each year.
- The same cohort grows at least a year’s worth of growth.

The last issue is in the proper understanding of what a “grade equivalent score” really is.

Grade-equivalent scores attempt to show at what grade level and month your child is functioning. However, grade-equivalent scores are not able to show this. Let me use an example to illustrate this. In reading comprehension, your son in Grade 5 scored a 7.3 grade equivalent on his Grade 5 test. The 7 represents the grade level while the 3 represents the month. 7.3 would represent the seventh grade, third month, which is December. The reason it is the third month is because September is zero, October is one, etc. It is not true though that your son is functioning at the seventh grade level since he was never tested on seventh grade material. He was only tested on fifth grade material. He performed like a seventh grader on fifth grade material. That’s why the grade-equivalent scores should not be used to decide at what grade level a student is functioning.

A few tips on how to read this:

We take this exam in the “.2” of each grade-level year. That means that “at grade level” [again, please refer above to a more precise definition of “grade equivalent scores”] for any grade we are looking at would be 5.2, 6.2, 7.2, etc. For example, if you are looking at Grade 6, anything below 6.2 would constitute “below grade level” and anything above 6.2 would constitute “above grade level.”
The maximum score for any grade is “.9” of the next year’s grade. If, for example, you are looking at Grade 8 and see a score of 9.9, on our forms it actually reads “9.9+” – the maximum score that can be recorded.
Because of when we take this test – approximately two months into the school year – it is reasonable to assume a significant responsibility for results is attributable to the prior year’s teachers and experiences. But, of course, it is very hard to tease it out exactly, of course.

What are the key takeaways from these snapshots of the entire school?

Looking at six different grades through six different dimensions there are only five instances out of thirty-six of scoring below grade-level: Grade 3 (Vocabulary 2.2, Writing Conventions 2.5, and Spelling 2.6), Grade 5 (Computation Estimation 4.6), and Grade 6 (Computation Estimation barely falling short at 6.0).
I’m not quite sure what to make of Grade 3’s Language Arts scores altogether. Reading and Writing has been the most notable lagging skill for the Grade 3 cohort since their entry into Grade 2. This is in part due to disruptions to their learning through their foundation-building years in Kindergarten and Grade 1. In Grade 2, this cohort’s remediation was heavily focused on closing the gaps in reading and comprehension abilities, as developmentally this is what comes first. The remediation focus has shifted to writing at the start of Grade 3, as this is a lagging skill that was already identified prior to the CAT-4 testing. Supports and interventions have already been put in place to address this lagging skill and we have seen academic growth in these areas. To put it more simply: These are our youngest students whose early learning was the most disrupted by COVID and they have never taken a standardized test before in their lives. It will become a baseline that I imagine us jumping over quickly in the years to come – I’m inclined to toss them out as an anomaly.
Importantly, tracing the trajectory from our 2019 results to our 2021 results to 2022’s, we can now more conclusively state that Spelling and Computation & Estimation are no longer globally lower as a school relative to the other dimensions. I will have more to say about why we believe this to be true in Parts II & III.

Stay tuned next week when we begin to dive into the comparative data. “Part II” will look at the same cohort over time. “Part III” will look at the same grade (the same group of students) over time and conclude this series of posts with some additional summarizing thoughts.

The Transparency Files: CAT4 Results (Yes, Even During COVID) Part II

Welcome to “Part II” of our analysis of this year’s CAT4 results! In Tuesday’s post, we provided a lot of background context and shared out the simple results of how we did this year. Here in our second post, we are now able to begin sharing comparative data, however patchy. It will take at least one more non-COVID year before we can accurately compare the same grade and the same cohort year after year. But we can get a taste of it with Grades 5-8. What you have below are snapshots of the same cohort (the same group of children) from 2019 to 2021 (with bonus data from 2018’s Grade 3):

What are the key takeaways from this comparison (remembering that any score that is two grades above ending in “.9” represents the max score, like getting an “8.9” for Grade 7)?

Now bear in mind, that the metric we are normally looking at when it comes to comparing a cohort over time is whether or not we see at least one full year’s growth (on average) each year – here we are looking to see two full year’s growth since we last took the test in 2019. This would be the place one might expect to see the full measure of COVID’s impact – these are the two years of COVID between the two tests. However, for all four cohorts in all categories save two (2019 Grade 3 to 2021 Grade 5 “Computation & Estimation” and 2019 Grade 5 to 2021 Grade 7 “Spelling”) you see at least two full year’s growth (technically 2019 Grade 5 to 2021 Grade 7 “Computation & Estimation” was just shy) and in may cases you see more than two full year’s growth.

I’m going to say that again.

During the time of the pandemic, with all the pivots back and forth, all the many challenges of both hyflex and at-home learning, all the prolonged absences by many students (and teachers), with all the social and emotional stress and anxiety, with everything we know about what COVID has been doing to children and to families, in almost each category that we tested our students in Grades 5-8 – spending no time or energy preparing for the exams and with diverse and inclusive classes – in 22 of 24 domains we see at least the pre-COVID expected two-year gain, and in many cases we see more than two full year’s growth.

As was true with our overall scores, I was expecting to see a significant number of gaps for all the reasons I just described, but surprisingly and encouragingly, that is not what the data yields.

Let’s look at one more set data points. We can also get a taste of how the same grade performs from one year to the next as well. Again, we only have Grades 5-8 to look at with (with a bonus 2018 Grade 6):

Now, remember that these scores represent a completely different group of children, so it is not unusual or surprising to see variances. Teachers can only grow students from the place they received them and it is that annual growth that we are concerned with. But over time you are looking for patterns. Ideally each domain settles in at least a full grade above with slight fluctuations from year to year depending on that year’s particular constellation of students. Even-better would be to see slight ticks up each year as a result of new ideas, new pedagogies, new programs, etc. And that is actually where much of the story currently is.

In the places where we aren’t quite where we want to be, we still have work to do. If with additional data we come to believe that Spelling or Computation & Estimation are institutional weaknesses, we will want to know whether they are weakness in every grade or do they dip in certain grades. Between COVID and gaps in testing, we simply have no way to conclude much more than we have already laid out. But in another year or so, we will be able to plot the trajectory of both cohorts (the same students) and grades over time to see what additional stories they tell.

To try sum up both posts, we have a lot to be proud of in our standardized test scores. We have two areas (Spelling and Computation & Estimation) to prioritize in two grades (Five & Seven). With regard to Spelling, it is interesting to note that when we flagged it in 2019 as a more global concern, we began providing professional growth opportunities for language arts teachers in our school on Structured Word Inquiry. The sample sizes are too small to make grand conclusions, but it is possible that those interventions help explain why Spelling is no longer a global concern, although we do need to pay attention to where and why it is lagging where it is. With regard to Computation & Estimation, we will – like with Spelling – have an internal conversation which may lead to PD for Math Teachers.

This fits in with the work we began on our November PD Day which focused on “Data-Driven Decision Making”. The Math and Language Arts Teachers in Grades 5-8 will be meeting to go through CAT4 results in greater detail, with an eye towards what kinds of interventions are needed now – in this year – to fill any gaps (both for individual students and for grades); and how might we adapt about our long-term planning to ensure we are best meeting needs.

And now again this year, despite all the qualifications and caveats, and in the face of the most challenging set of educational circumstances any generation of students and teachers have faced, our CAT4 scores continue to demonstrate excellence. Excellence within the grades and between them.

Not a bad place to be as we prepare to open the 2022-2023 enrollment season…

The Transparency Files: CAT4 Results (Yes, Even During COVID) Part I

This may seem like a very odd time to be sharing out results from this year’s standardized testing, which in our school is the CAT4. We are just finishing up our first days in this year’s most recent pivot back to distance learning and we are confident that everyone – students, parents and teachers – has more pressing concerns than a very long and detailed analysis of standardized tests that we managed to squeeze in during the in-person portion of our school year. (The post is so long that I am splitting it into two parts, and each part is still a bit lengthy.) But with our launch of Annual Grades 9 & 12 Alumni Surveys and the opening of the admissions season for the 2022-2023 school year, one might argue that there is not a better time to be more transparent about how well we are (or aren’t) succeeding academically against an external set of benchmarks while facing extraordinary circumstances.

There is a very real question about “COVID Gaps” and the obvious impacts on children and schools from the many pivots, hyflex, hybrid, masked and socially-distanced, in-person and at-home learning experiences we have all cycled through together since March of 2020. (I wrote earlier in the year about some of the non-academic COVID gaps that we are very much experiencing, all of which I imagine growing proportionate to the length of this current pivot.) And it seems logical that there should be and are academic gaps, at least at the individual student level. One might ask why we even bothered taking the CAT4 at all this year; we didn’t take it last school year for example, so it will be really hard to make meaningful apples-to-apples comparisons. So why take them? And why share the results, whatever they may be?

We did it for a few reasons…

The first and primary reason is that we are curious. Curiosity may not be a “North Star” at OJCS, but it is a value. And we are very curious to see how our standardized test scores measure up pre-COVID and post-COVID, both by grade (2019 Grade 5 v. 2021 Grade 5) and by cohort (2019 Grade 5 v. 2021 Grade 7). We would normally be looking for patterns and outliers anyway, but now we can also look for COVID impacts as well.

Why share the results? Because that’s what “transparency” as a value and a verb looks like. We commit to sharing the data and our analysis regardless of outcome because we believe in the value of transparency. We also do it because we know that for the overwhelming majority of our parents, excellence in secular academics is a non-negotiable, and that in a competitive marketplace with both well-regarded public schools and secular private schools, our parents deserve to see the school’s value proposition validated beyond anecdotes.

Now for the caveats and preemptive statements…

We have not yet shared out individual reports to our parents. First our teachers have to have a chance to review the data to identify which test results fully resemble their children well enough to simply pass on, and which results require contextualization in private conversation. Those contextualizing conversations will take place in the next few weeks and thereafter, we should be able to return all results.

There are a few things worth pointing out:

Because of COVID, this is now only our third year taking this assessment at this time of year. We were in the process of expanding the range from Grades 3-8 in 2019, but we paused in 2020 and restricted this year’s testing to Grades 5-8. This means that we can only compare at the grade level from 2019’s Grades 5-8 to 2021’s Grades 5-8, and we can only compare at the cohort level from 2019’s Grades 3-6 to 2021’s Grades 5-8. And remember we have to take into account the missing year…this will make more sense in “Part II” (I hope). Post-COVID, we will have tracking data across all grades which will allow us to see if…
- The same grade scores as well or better each year.
- The same cohort grows at least a year’s worth of growth.

The other issue is in the proper understanding of what a “grade equivalent score” really is.

Grade-equivalent scores attempt to show at what grade level and month your child is functioning. However, grade-equivalent scores are not able to show this. Let me use an example to illustrate this. In reading comprehension, your son in Grade 5 scored a 7.3 grade equivalent on his Grade 5 test. The 7 represents the grade level while the 3 represents the month. 7.3 would represent the seventh grade, third month, which is December. The reason it is the third month is because September is zero, October is one, etc. It is not true though that your son is functioning at the seventh grade level since he was never tested on seventh grade material. He was only tested on fifth grade material. He performed like a seventh grader on fifth grade material. That’s why the grade-equivalent scores should not be used to decide at what grade level a student is functioning.

But I also don’t want to undersell the objective evidence that shows that the work we are doing here does in fact lead to tangible success. That’s the headline, but let’s look more closely at the story. (You may wish to zoom (no pun intended!) in a bit on whatever device you are reading this on…)

A few tips on how to read this:

We take this exam in the “.2” of each grade-level year. That means that “at grade level” [again, please refer above to a more precise definition of “grade equivalent scores”] for any grade we are looking at would be 5.2, 6.2, 7.2, etc. For example, if you are looking at Grade 6, anything below 6.2 would constitute “below grade level” and anything above 6.2 would constitute “above grade level.”
The maximum score for any grade is “.9” of the next year’s grade. If, for example, you are looking at Grade 8 and see a score of 9.9, on our forms it actually reads “9.9+” – the maximum score that can be recorded.
Because of when we take this test – approximately two months into the school year – it is reasonable to assume a significant responsibility for results is attributable to the prior year’s teachers and experiences. But it is very hard to tease it out exactly, of course.

What are the key takeaways from these snapshots of the entire school?

Looking at four different grades through six different dimensions there are only three instances (out of twenty-four) of scoring below grade-level: Grade 5 in Computation & Estimation (4.4), and Grade 7 in Spelling (6.6) and Computation & Estimation (6.0).
Interestingly, compared to our 2019 results, those two dimensions – Spelling and Computation & Estimation are no longer globally lower as a school relative to the other dimensions. In 2019, for example “Spelling” was a dimension where we scored lower as a school (even if when above grade level) relative to the other dimensions. In 2021, we don’t see “Spelling” as scoring globally below. (That’s a good thing!) [We also have some anecdotal evidence that a fair number of students in Grade 7 may not have finished the Computation section, leaving a fair number of questions blank – in the case of this cohort, it might be more valuable to know how well they did on the questions they actually answered (which we will do).]

What stands out the most is how exceedingly well each and every grade has done in just about each and every section. In almost all cases, each and every grade is performing significantly above grade-level. This is NOT what I was expecting considering the impacts of COVID over the last two years – I was fully expecting to see at least .5 (a half-year) gap globally across the grades and subjects. This is a surprising and very encouraging set of data points.

Stay tuned for “Part II” in which we will dive into the comparative data – of both the same grade and the same cohort (the same group of students) over time – and offer some additional summarizing thoughts.

This Is (Not) A Test: OJCS (Doesn’t) Prep For CAT-4

From October 29th-31st, students at the Ottawa Jewish Community School in Grades 3, 6 and 8 will be writing the Fourth Edition of the Canadian Achievement Tests (CAT- 4). The purpose of this test is to inform instruction and programming for the 2018-2019 school year, and to measure our students’ achievement growth over time.

Seems pretty non-controversial, eh?

These days, however, the topic of “standardized testing” has become a hot topic. So with our testing window ready to open next week, this feels like a good time to step back and clarify why we take this test and how we intend to use and share the results. But first, two things that are new this year:

We moved our test window from the spring to the fall to align ourselves with other private schools in our community. This will be helpful for comparison data. (This is also why we didn’t take them last year.)
We have expanded the number of grades taking the test. We have not yet decided whether that number will expand again in future years.

What exactly is the value of standardized testing and how do we use the information it yields?

It sounds like such a simple question…

My starting point on this issue, like many others, is that all data is good data. There cannot possibly be any harm in knowing all that there is to know. It is merely a question of how to best use that data to achieve the fundamental task at hand – to lovingly move a child to reach his or her maximum potential. [North Star Alert! “We have a floor, but not a ceiling.”] To the degree that the data is useful for accomplishing this goal is the degree to which the data is useful at all.

Standardized tests in schools that do not explicitly teach to the test nor use curriculum specifically created to succeed on the tests – like ours – are very valuable snapshots. Allow me to be overly didactic and emphasize each word: They are valuable – they are; they really do mean something. And they are snapshots – they are not the entire picture, not by a long shot, of either the child or the school. Only when contextualized in this way can we avoid the unnecessary anxiety that often bubbles up when results roll in.

Like any snapshot, the standardized test ought to resemble its object. The teacher and the parent should see the results and say to themselves, “Yup, that’s him.” It is my experience that this is the case more often than not. Occasionally, however, the snapshot is less clear. Every now and again, the teacher and/or the parent – who have been in healthy and frequent communication all the year long – both look at the snapshot and say to themselves, “Who is this kid?”

When that happens and when there is plenty of other rich data – report cards, prior years’ tests, portfolios, assessments, etc. and/or teacher’s notes from the testing which reveal anxiety, sleepiness, etc. – it is okay to decide that someone put their thumb on the camera that day (or that part of the test) and discard the snapshot altogether.

Okay, you might say, but besides either telling us what we already know or deciding that it isn’t telling us anything meaningful, what can we learn?

Good question!

Here is what I expect to learn from standardized testing in our school over time if our benchmarks and standards are in alignment with the test we have chosen to take:

Individual Students:

Do we see any trends worth noting? If the overall scores go statistically significantly down in each area test after test that would definitely be an indication that something is amiss (especially if it correlates to grades). If a specific section goes statistically significantly down test after test, that would be an important sign to pay attention to as well. Is there a dramatic and unexpected change in any section or overall in this year’s test?

The answers to all of the above would require conversation with teachers, references to prior tests and a thorough investigation of the rest of the data to determine if we have, indeed, discovered something worth knowing and acting upon.

This is why we will be scheduling individual meetings with parents in our school to personally discuss and unpack any test result that comes back with statistically significant changes (either positive or negative) from prior years’ testing or from current assessments.

Additionally, the results themselves are not exactly customer friendly. There are a lot of numbers and statistics to digest, “stanines” and “percentiles” and whatnot. It is not easy to read and interpret the results without someone who understands them guiding you. As the educators, we feel it is our responsibility to be those guides.

Individual Classes:

Needless to say, if an entire class’ scores took a dramatic turn from one test to the next it would be worth paying attention to – especially if history keeps repeating. To be clear, I do not mean the CLASS AVERAGE. I do not particularly care how the “class” performs on a standardized test qua “class”. [Yes, I said “qua” – sometimes I cannot help myself.] What I mean is, should it be the case that each year in a particular class each student‘s scores go up or down in a statistically significant way – that would be meaningful to know. Because the only metric we concern ourselves with is an individual student’s growth over time – not how s/he compares with the “class”.

That’s what it means to cast a wide net (admissions) while having floors, but no ceilings (education).

School:

If we were to discover that as a school we consistently perform excellently or poorly in any number of subjects, it would present an opportunity to examine our benchmarks, our pedagogy, and our choice in curriculum. If, for example, as a Lower School we do not score well in Spelling historically, it would force us to consider whether or not we have established the right benchmarks for Spelling, whether or not we teach Spelling appropriately, and/or whether or not we are using the right Spelling curriculum.

Or…if we think that utilizing an innovative learning paradigm is best for teaching and learning then we should, in time, be able to provide evidence from testing that in fact it is. (It is!)

We eagerly anticipate the results to come and to making full use of them to help each student and teacher continue to grow and improve. We look forward to fruitful conversations.

That’s what it means to be a learning organization.