Education Assessment and Accountability Review Subcommittee

Minutes of<MeetMDY1> May 20, 2005

The<MeetNo2> Education Assessment and Accountability Review Subcommittee met on<Day> Friday,<MeetMDY2> May 20, 2005, at<MeetTime> 10:00 AM, in<Room> Room 131 of the Capitol Annex. Senator Jack Westwood, Co-Chair, called the meeting to order, and the secretary called the roll.

Present were:

Members:<Members> Senator Jack Westwood, Co-Chair; Representative Harry Moberly Jr, Co-Chair; Senators Dan Kelly and Ken Winters; Representatives Jon Draud, Mary Lou Marzian, and Frank Rasche.

Guests: Clyde Caudill, Jefferson County Public Schools; Andrea Sinclair and Art Thacker, HumRRO; Eleanor Mills, Murray Independent/SCAAC; Skip Kifer and Scott Trimble, University of Kentucky; Kathy Lousignont, Kentucky School Board Association; and Cindy Heine, Prichard Committee for Academic Excellence.

LRC Staff: Sandy Deaton, Audrey Carr, Jonathan Lowe, Janet Stevens, and Lisa Moore.

Senator Westwood skipped over approval of the minutes as the subcommittee did not have a quorum. He said Senate Joint Resolution 156 from the 2004 Regular Session directed the Office of Education Accountability (OEA) to conduct a study of the Commonwealth Accountability and Testing System (CATS). He introduced Ms. Marcia Seiler, Director, OEA, who explained that staff from AEL would report on the literature review they conducted related to the issues of validity and reliability of CATS. LRC staff will report on an analysis of the writing portfolio audits as they relate to the validity and reliability of the assessment. LRC staff will also report on the results of the surveys conducted for this study.

Ms. Seiler introduced Dr. Doris Redfield, Chief Executive Officer, and Dr. Kimberly Hambrick, Director of Assessment, AEL. Dr. Redfield said AEL's purpose was to look at what the research literature has shown over the years relative to the components that are included in the CATS. The key was looking at whether or not the CATS components provide appropriate achievement measures of core content for students in different grades and classifications.

Dr. Redfield said one of the issues that Kentucky is grappling with concerns the reliability and validity of the system for holding schools accountable versus the reliability and validity of individual aspects of the system for making accurate inferences about individual students. She said AEL's focus has been on the reliability and validity of the system. The method that was chosen for selecting the literature was as follows: 1) review of existing research literature, including HumRRO documents; and 2) review of nationally recognized guidelines on assessments and uses. She said that HumRRO reports are important because the literature is not specific to CATS.

Dr. Redfield said several conclusions were drawn from AEL's research. They are: 1) Appropriateness and validity are purpose specific; 2) Format is not the issue; multiple formats can provide increased "opportunities to perform" - a validity issue; 3) The main issue is alignment of the system as a whole; and 4) As stakes increase, the desired level of reliability also increases.

Dr. Redfield said nationally norm-referenced tests (NRTs) are not specific to Kentucky's learning goals and academic expectations. The NRTs may have a purpose in the system, but do not provide adequate information specific to Kentucky's core content. She said the standards-based or criterion-referenced tests (e.g., KCCT) are designed to test specific core content. The content validity requires an independent alignment.

Dr. Redfield said multiple-choice items are efficient and can be scored with high degrees of reliability. She said there is disagreement among researchers about the extent to which multiple-choice items can measure complex skills or higher levels of thinking. She also said that open-response items can measure the achievement of rigorous content. It can be challenging to establish high levels of technical quality, can provide a more complete measure of student learning, and may influence teacher practices. Dr. Redfield said on-demand writing tasks have similar advantages/disadvantages to open-response items, but multiple writing samples (the recommendation is five) are needed for accurate conclusions.

Dr. Redfield said the writing portfolios need to include multiple samples and ensure consistency in administration and scoring for reliability. There is disagreement among researchers about the usefulness in large-scale assessment. She said the on-demand writing task and the writing portfolio may measure different writing constructs. It cannot be concluded that one measure can be substituted for the other without further and specific study.

Dr. Redfield said that using combinations of the multiple-choice and open-response items on the reading, math, science, social studies, arts and humanities, and practical living assessments strengthens the content coverage and alignment of the assessments with the Kentucky Core Content. She said the combination of on-demand writing and portfolio writing provides the number of tasks recommended by researchers for reliable assessment inferences.

Dr. Redfield said the Kentucky Writing Portfolio is intended to closely link assessment to classroom practice. The Kentucky Department of Education (KDE) provides extensive training in writing instruction and the administration and scoring of the portfolio. The KDE conducts audits to monitor the consistency of portfolio scoring. She said determining whether the link to classroom instruction is achieved is an empirical question, and whether the audits of portfolio scoring for high stakes purposes adequately provides for scoring accuracy warrants continuing study.

Dr. Redfield discussed the CATS at various grade levels. She said the combination of multiple-choice and open-response items for elementary, middle, and high school yields internal consistency reliability in reading, mathematics, social studies, and science comparable to other states with similar assessment systems. Reliabilities and classification accuracy for arts and humanities and for practical living at elementary, middle, and high school are lower than for other content domains. She said the fewer items taken by each student on these assessments may account for the difference.

Dr. Redfield said the on-demand writing tasks meet inter-rater reliability levels recommended by some researchers for reasonable agreement at the elementary, middle, and high school levels and exact agreements for elementary and middle school levels. She said the writing characteristics scored for the writing portfolio are the same as for the on-demand writing task. The portfolio audit results for exact agreement meet levels recommended by some researchers, except at the high school level. She said the grade 12 exact agreement rate (between scorers and auditors) was lower than at other grade levels. Dr. Redfield also explained that the required number of portfolio writing samples differs from grade to grade.

Dr. Redfield discussed developmental characteristics. She said read-aloud accommodations produce mixed results for students with disabilities, and no benefit was experienced by general education students. She said multi-day test accommodations have been shown to benefit students with disabilities, but no benefit was shown for general education students. Multiple accommodations on a single test do not conclusively benefit students with disabilities. Computer accommodations seem to benefit students who are highly experienced on computers, but overall findings are inconclusive. She said one qualitative study on the CATS online assessment indicates that students were able to demonstrate their content knowledge in a way similar to their daily classroom activities.

Dr. Redfield said students with limited English proficiency benefit most from: 1) modified English (accommodations, in which non-content language is simplified); 2) read aloud accommodations; 3) glossary plus extended time; and 4) native language tests are the most effective, and the most expensive to administer. She said Kentucky currently uses a translator to allow oral word-for-word translations.

Dr. Redfield said researchers disagree on whether writing portfolios can be successfully integrated into large-scale assessments. There are several conditions required for valid inclusion of writing portfolios: 1) close match to classroom writing requirements; 2) extensive training in the administration and scoring of the portfolio; 3) external scoring and/or effective audits of scoring; and 4) audits/monitoring of portfolio administration. She said that Kentucky's writing portfolio meets levels of inter-rater agreement and scoring consistency suggested by some researchers.

Dr. Redfield said the primary source of information about student achievements of core content must be based on items that, together, measure the core content. She said researchers disagree on what combination of item types can best accomplish this goal, and many researchers suggest that full coverage is likely to require more complex performance than multiple-choice items can deliver.

Dr. Redfield said Kentucky reports internal consistency reliabilities and classification for accuracy for combined multiple-choice and open-response assessments of reading, mathematics, science, and social studies comparable to other states (i.e., Maine and Massachusetts). She said up to 20 items may be required to draw valid, reliable, and generalizable conclusions about individual student performance. This implies that accurate conclusions about individual student performance cannot be drawn below the content domain level.

Representative Draud asked if Kentucky currently has a system that can track students over a period of time. Dr. Redfield asked for guidance from the KDE about how the data is managed, and if students are tracked from year to year on test scores. Mr. Gene Wilhoit, Commissioner, KDE, said Kentucky will start doing this next year.

Representative Draud said it was his understanding that the only way to track a student over a period of time was to use a NRT. Dr. Redfield said the ability to track students is not dependent upon whether the test is norm-referenced or criterion-referenced, but the database has to be set up in certain ways. She believes the NRT that Kentucky is using (the CTBS) has some vertical scaling built in. This however, does not answer the question of how students were doing relative to the Kentucky standards. In order to measure the Kentucky Core Content, the criterion-referenced test must be analyzed. She asked why one would want to track student progress from grade to grade since the standards or the content that a student is to know differs at each grade level?

In response to a question from Representative Draud, Dr. Redfield said student progress could be tracked from year to year if the tests were vertically scaled and certain statistical procedures were used. Representative Draud asked if Kentucky could measure student progress on a year to year basis with its current criterion-referenced test. Dr. Redfield, said yes, but some scaling studies would have to be completed as well as vertical equating.

Representative Draud commented that many researchers seem to disagree on the issues presented by Dr. Redfield. Dr. Redfield said that is correct. She said the literature of what is expected of current accountability systems is very new, and the years and years of research on reliability and validity is not necessarily in the context of today's systems which are expected to make decisions using a variety of different measures that have some significant stakes attached to them for students.

Representative Moberly discussed the portion of the presentation involving the integration of the writing portfolios, and the conditions required for inclusion. Dr. Redfield said for an assessment to be in alignment with classroom instruction, it is really important for the students to have the opportunity to learn the material in the same format that they will be tested on the material. For example, if students use a computer for their writing in the classroom, and are tested on the writing exam using long hand, this can cause difficulties for some of them. She said it is important for the method of instruction to be in alignment with how what the students were supposed to learn is measured. She asked how representative are the portfolios of the actual instruction and learning in the classroom. She said this issue causes much disagreement among researchers. Some researchers would say the samples of writing in the portfolio need to be the absolutely best samples of writing that the student can produce because Kentucky needs to know the best writing that the student can do. Other researchers say it is best to have random samples of writing to get a true sense of what the student is doing in the classroom on a day-to-day basis.

Representative Moberly said it was never intended for the writing portfolios to consume such a large amount of time. He said the big issue in Kentucky concerning the writing portfolio is whether it is worth spending the vast amount of time that is spent on it. He said Kentucky needs to determine the purpose of the writing portfolio. Dr. Redfield agreed, and said the other issue is whether the writing portfolios are intended to be an assessment only, or are they intended to be part of the instructional process. Representative Moberly said that it was intended to be both.

Senator Westwood said another issue Kentucky struggles with in the portfolio assessment is how much is completed by the students, the parents, teachers, and peers. This is an issue that Kentucky has not been able to get a good handle on. He asked what the difference was between the agreement and correlation rates in the handout provided in the presentation. Dr. Redfield explained that the agreement rate would be the percentage of time that the scorers and auditors rating a writing sample are in exact agreement. She said the correlation rate would be rank ordering the ratings of the scorers, and rank ordering the ratings of the auditors, and then running a correlation. Senator Westwood commented that the correlation score seemed low. Dr. Redfield said it is about purpose and what the score is going to be used for. Senator Westwood asked if there is anything useful to gather from a correlation rate of .50. Dr. Redfield said it is a pretty gross measure, and she would be more concerned with the percent of agreement. The correlation rate takes into account all of the ratings, and not just the agreement ratings. This could be useful for some purposes, but not useful for making a decision about an individual student.

Senator Westwood asked if AEL felt comfortable using the current portfolios and CATS to determine a child's advancement from grade to grade or to determine if a student should graduate. Dr. Redfield said it is not just about the portfolio, but the system overall. She said if Senator Westwood would have asked her that question about the core content test, her answer would be no because it is a score that is fairly gross about making an individual student decision. She said she would not use any of Kentucky's measures in isolation to make individual student decisions, but would use them to triangulate information.

Representative Rasche said when Kentucky uses the NRT in fifth and sixth grades, these percentiles are used to measure students against their peers. He said this is a content expectation change although it is just based upon statistics of how a certain peer group is performing. He said if a child was in the 70th percentile in the fifth grade, and in the 70th percentile in the sixth grade, has this child gone up a grade? Dr. Redfield said no. Representative Rasche asked what did happen to that child. Dr. Redfield said there would be no change in performance. Representative Rasche said this is interesting because the notion of grade levels are often used. Dr. Redfield said there is controversy around grade level scores because they are not particularly reliable, and they are an equal interval type of scale, which other measures are not.

Representative Rasche said he does not like the NRT because somebody has to be first and somebody has to be last, and a student's ranking depends on somebody else's progress. Dr. Redfield said some states instead of discussing vertical scaling are actually more interested in vertical alignment, where the scope and sequence of the content is appropriate. She said Kentucky could benefit, especially instructionally, by having information to look at the point of testing where a student's relative strengths and weaknesses are, and then look to see if this changes over time, and this may or may not be represented by a number.

Senator Winters discussed agreement and correlation rates. He said the handout in the presentation shows Kentucky's correlation rate increasing one year, and then decreasing the following three years. He asked what caused the decline in the correlation rate. Dr. Redfield said there could be a number of reasons. It is not unusual to see an increase over time because when something is first introduced, and then people have more experience with it, it is likely to increase. It would seem logical for Kentucky to see increases from 1999 - 2002, and then when CATS was introduced in 2002, it was logical to see a drop in the correlation rate. She said this is a normal course of events.

Representative Rasche asked if turnover in scores themselves make a difference in a fluctuation in the correlation rate. Dr. Redfield said this could be the case, but she said it is typical in these systems to have drift checks or intra-rated reliability checks over time, to make sure that individual raters and paired ratings do not drift over time.

Ms. Seiler introduced Dr. Barry Boardman, Staff Economist, Legislative Research Commission (LRC), who discussed the writing portfolio audit review. Dr. Boardman said the KDE provided six years of portfolio audits from 1999 to 2004. He said over 39,000 were randomly audited portfolios, and the data includes the school's rating and the auditor's rating.

Dr. Boardman explained what is meant by inter-rater reliability. He said it is the difference or the variability between the auditor and the school. The common ways inter-rater reliability is measured are: 1) rater agreement; 2) correlation; 3) comparing average ratings; and 4) generalized study.

Dr. Boardman said his office could not find an exact number that represents a threshold that must be crossed to determine reliability. This is due in part because there are four score points, but only three of them are actually used. He said less than one percent of the portfolios are ever considered distinguished. He said another problem is over 53 percent of the portfolios are considered apprentice. With this type of information, it is difficult to say specifically that any particular reliability measure is appropriate as the writing portfolio is used here in CATS.

Dr. Boardman said the rater agreement has been steady since 2001. Noticeably, the 4th grade rater reliability with respect to agreements has tended to be typically higher than the other two grades.

Dr. Boardman discussed correlation rates. He said it is not entirely appropriate to include this with respect to the way the writing portfolio is used in Kentucky because it is used as a criterion, or an absolute. For example, has a student reached a certain level of aptitude, or reached proficiency. He said correlation really has to do with how consistent are the scorers and auditors ranking them.

Dr. Boardman said the average scores combined all the school portfolios that were randomly audited, and averaged the auditor scores. He said every year the schools always rated the portfolios higher on average higher than the auditors. He noted the average has been increasing steadily since 2001. He also noted that the fourth grade seemed to be much closer in agreement with the auditors, and two out six years, they were statistically the same. He said this could not be said for the seventh and twelfth grades.

Dr. Boardman concluded his presentation with three major conclusions. They were: 1) difficulty using suggested reliability standards with CATS writing portfolio; 2) fourth grade inter-rater reliability better than seventh or twelfth grades; and 3) no measurable improvement in rater reliability.

Senator Westwood asked about differences in ratings of the auditors and the ratings from the schools. Is there any research on which rating is more accurate based upon experience or training? Dr. Boardman said both the auditor and the schools raters' are trained the same. He said the literature shows antidotal, not empirical data, that the teachers have more knowledge about their students and this is reflected in their scores. He said there is a host of reasons as to why the scores may be different, but the data that Dr. Boardman has does not allow for him to answer those types of questions.

Senator Westwood asked if the differences between auditor and school ratings were significant or are they to be expected. Dr. Boardman said statistically there are significant differences. He said the portfolio represents 11.4 percent over two years, so a ten percent point swing would only have a half-point difference in any given year. It is hard to say if this is significant.

Ms. Seiler introduced Dr. Kimberly Hambrick, Director of Assessment, AEL, and Dr. Greg Hager, Committee Staff Administrator, Program Review and Investigations Committee, LRC, who presented background and format of the focus groups, and the survey results to the subcommittee.

Dr. Hambrick said AEL was hired to help with focus groups for the study. She said AEL responded to LRC's Request for Proposals (RFP) in July of 2004. AEL agreed to conduct at least 20 focus groups across the six role groups. She said the first series of focus groups were conducted in the Fall of 2004. She said the focus groups were used to help identify the emergent themes, or what might be salient issues in order to develop the state surveys to be distributed to a larger group of people including teachers, principals, superintendents, school board members, parents and students in grades 10-12.

Dr. Hambrick said AEL worked closely with LRC staff on survey questions. She said they divided the state up into six regions to set up the focus groups. There were 33 focus groups divided into six regions, and the groups had between six and ten participants, and six role groups.

Dr. Hambrick said LRC staff proposed additional focus groups with teachers and principals. In November of 2004, the plan was modified to compensate for low response rate from superintendents, school board members, and parents. She said 27 focus group were conducted in October and November 2004. She said the final numbers in attendance were 125 teachers, 13 principals, 16 superintendents, 12 school board members, 10 parents, and 17 students.

Dr. Hambrick said LRC staff drafted surveys based on findings from the focus groups. The response rates were as follows: 1) teachers: 338 respondents of 1143 surveys sent = 30 percent; 2) school board members: 149 respondents of 536 surveys sent = 28 percent; 3) parents: 119 respondents of 612 surveys sent = 19 percent; 4) students: 160 respondents of 612 surveys sent = 26 percent; 5) principals: 234 respondents of 602 surveys sent = 39 percent; and 6) superintendents: 109 respondents of 176 surveys sent = 62 percent.

Senator Westwood asked who conducted the other statewide survey. Dr. Hambrick said the KDE had a survey sent to the teachers about the writing portfolio. Ms. Seiler said OEA's survey covered more topics. Senator Westwood said the two surveys should be compared in order to review the results.

Dr. Hager gave a brief summary of the survey results. He said the number of respondents to the survey is low; particularly for the parents, guardians, and students. He said the margins of error range from plus or minus 5.3 percentage points (teachers) to plus or minus 9.5 (parents and guardians).

Dr. Hager said more than 90 percent of each group felt multiple-choice questions were appropriate for all three grade levels. More than 80 percent of each group responded that that the open-response and on-demand formats were appropriate for grades nine to twelve. More than 70 percent reported that both formats were appropriate for grades seven and eight.

Dr. Hager said there was no consensus, however, about open-response questions and on-demand writing for those in grades four and five. More than three-fourths of principals and superintendents reported that open-response was appropriate at this level; only 64 percent of teachers agreed. Nearly half of the teacher and principals responding did not agree that on-demand writing was appropriate for fourth and fifth graders. Three-fourths of the superintendents said that on-demand writing was appropriate for grade four and five, but this was the lowest level of support form the group for any of the three question formats.

Dr. Hager said one of the main objectives of the study was to determine the validity of assessments for students overall, students at different levels, and different sub-populations of students. He explained all the numbers and survey responses, which are available in detail in the report in the LRC library. He also said to note that more superintendents than principals agree that the Kentucky Core Content Test (KCCT) subject tests were valid for special education students. Fewer teachers believed they were valid for special education teachers. There was not a majority of any of the three groups of educators who agreed that any subject test was valid for special education students.

Dr. Hager said there were corresponding survey questions regarding students with limited English proficiency. Overall, educators were even less positive about CATS testing for these students than for special education students. Again, there was no test that a majority agreed was valid.

Dr. Hager said most students, parents, and guardians felt that the CATS test is a fair measure of the child's knowledge of school subjects. He said all the groups rated on-demand writing over the writing portfolio as a good measure of how the students were writing. He also explained about the integration of writing portfolios into a student's learning experience. The strongest response on the survey was a majority of students thought the writing portfolio was worth the time. Copies of the report are available in the LRC library.

Dr. Hager discussed the effects of CATS testing on curriculum, instruction, and learning. He said average group responses indicated that the majority of respondents felt that: 1) Getting ready for or taking the CATS tests takes too much time away from class time; 2) CATS testing is too stressful and reduces enjoyment of teaching and learning; 3) Teaching what is to be covered on the CATS test is too limited; 4) Teachers and students are forced to cover material too quickly in order to prepare for CATS tests; 5) CATS testing provides needed focus and organization; and 6) CATS testing helps align the curriculum. He also noted that 40 percent of teachers specifically said that they taught to the CATS test.

Dr. Hager said the percentages of respondents agreeing that CATS tests provided useful information about how well students or schools were doing was lowest among teachers, and highest among superintendents. He said there was near consensus among focus groups of teachers, principals, superintendents, and school board members that student test reports are not useful or well understood by parents. The general issues surrounding the student reports are: 1) they are too general; 2) parents received the reports too late; 3) they are confusing; and 4) many parents understand A, B, C grades; not the terms novice, apprentice, proficient, distinguished. He said the positive mentions usually were to the effect that reports must be discussed with school personnel to be effective.

Senator Kelly said the parents seem to have found the student reports more useful than the school professionals. Dr. Hager said just under one-half of the parents and guardians indicated that the CATS tests reports helped them understand how their children were doing in school. Only 14 percent of the parents and guardians who responded to the survey indicated that someone from their child's school usually discussed the CATS tests reports with them. Senator Kelly said also the parents who probably responded to the survey were the more motivated parents. Dr. Hager said he is assuming that these are the motivated parents.

Senator Kelly commented that in almost every area on the survey, the superintendents tend to have a rosier view than the principals, who see things more positively than the teachers. Dr. Hager said the Program Review and Investigations Subcommittee conducted a similar survey about two years ago and received the same grouping of responses. Teachers were less positive than principals, who in turn, were less positive than the superintendents. Senator Kelly said he was surprised at the consistency and the divergence of the responses from the groups.

Dr. Hager said the survey data suggests that teachers appreciate the fact that CATS testing helps to focus the curriculum and organize material, but they saw the negative aspect of having to cover the material for the CATS test, therefore neglecting other content areas that also needed to be taught. Senator Kelly said yes, teachers responded that the organization and focus for the CATS test forced them to cover material too quickly, limited what they could cover, created extra stress and enjoyment of learning, and took too much time away from other things they feel are important for students to learn.

Dr. Hager said the survey information is antidotal. It is fairly common for teachers to say that they lost teachable moments where the students were really into a subject, and then for time reasons had to move on to something else because they were getting away form the path of the core content. He said teachers experience more of the negative aspects more so than principals and superintendents.

Representative Draud said he is concerned about the number of the respondents. He asked if policymakers should take this data very seriously based on the number of people who responded? Dr. Hager said surveys are based upon samples of larger populations, and even with a high response rate, there is no absolute certainty that it represents what people think. He is more confident in the results of this survey based upon the number of respondents for teachers, principals, and superintendents because they are consistent with responses in the similar questions on a survey sent out two years ago. He said he is less confident in the responses of the parents, students, and school board member groups. It needs to be taken into consideration that these are probably the more motivated students and parents who are responding.

Representative Draud discussed the "teaching to the test" issue. He said there is pressure on educators because the Kentucky General Assembly has adopted a high stakes testing program, and teachers have figured out that they have to teach to the core content or the test in order to be successful in a high stakes testing program. He asked Dr. Hager if he wanted to comment on the issue. Dr. Hager said if the decision is made for teachers to teach a specific curriculum, the question is do they effectively teach it, and not necessarily that they like it.

Representative Rasche said the sample used in the survey was roughly one percent of Kentucky's teachers. He wondered if the group selected for the survey felt really strong one way or the other about the CATS system, and if this could have affected the responses. Dr. Hager said this is possible, but with the previous CATS survey there were twice as many respondents in each of the groups, and the responses were very similar to this survey.

Representative Rasche asked if there was any correlation between schools and the respondents, or if certain teaching practices led to certain responses on the survey. Was the origin of the answers looked at, or was the sample just too small to do anything like that. Dr. Hager said all responses were confidential, and staff did not know who they were.

Representative Moberly said the information in the survey about the writing portfolio was not very helpful. He said he was hoping to find out exactly how much time teachers were spending working on the portfolios. He said maybe some schools are spending more time on this than what was intended, but the data in the survey does not show how much time is being spent. He is searching for meaningful information about why some teachers are spending inordinate amounts of time on the portfolios. He wondered if it is a lack of professional development or guidance from KDE. He said the survey responses to CATS were more positive than the responses to the Kentucky Essential Skills Test, which was not a high stakes accountability test.

Senator Westwood asked for a motion to approve the minutes since there was a quorum present. Representative Rasche moved for approval of the minutes, seconded by Senator Winters. The motion was approved by voice vote. The subcommittee group took a lunch recess at 12:20 p.m. and reconvened at 1:30 p.m.

Senator Westwood introduced Commissioner Wilhoit and Ms. Hilma Prather, a member of the Kentucky Board of Education (KBE), and Chair of the Assessment and Accountability Committee. He said Ms. Prather had dealt with these issues for a number of years.

Commissioner Wilhoit made some general comments on the report. He said Kentucky is in alignment with industry standards across the country of inter-rater agreement, and school characteristics and classification criteria. He said given all the general measures of acceptability, the report indicates that Kentucky is within the criteria.

Commissioner Wilhoit said there are several areas of the report that need to be reviewed, particularly in terms of how to improve efficiency in the assessment of reading scores. He said the National Technical Advisory Panel on Assessment and Accountability (NTAPAA) has given KDE advice in the past of how to increase reliability, and suggested adding more items for which to judge student performance. He said KDE is moving in that direction to increase reliability for on-demand and portfolio writing.

Commissioner Wilhoit is concerned about the number of responses to the survey, and KDE will be looking for other ways to find additional answers to these questions. The KDE conducted a survey last year that generated several thousand responses, and he will provide that information to the members.

Commissioner Wilhoit said Kentucky has some practices that need modification. He said the writing portfolio needs to be examined, while remembering however, that powerful writing needs to be in place for the K-12 education system.

Commissioner Wilhoit said the two greatest areas of contention in the report relate to special needs students and students with limited English proficiency. These two groups cause difficult areas for teachers, and they have the biggest deficits when they leave Kentucky's system. How do you apply positive pressure to the system to make sure that Kentucky does more for these students than we are doing now, and at the same time, try to be responsive and thoughtful of the teachers. He is optimistic from messages coming from Secretary Spellings that the federal government will provide states some relief in these areas in terms of looking at growth as a possibility, instead of an actual standard for performance. He said we need to support teachers in trying to help these students, and not create additional tensions and non-productive energy.

Commissioner Wilhoit said Kentucky has a continuing issue of an inability to align instruction and assessment. He said Kentucky has direct control in many ways on what kind of goals we set for our assessment system, and how to leverage the system around assessment, but has less direct control about the instructional process that takes place in schools. He said we need to provide tools and examples to the teachers in the field. He said more partnerships will be needed to achieve this.

Ms. Prather said the state board has tried to be responsive to concerns from legislators and advisory groups, and to the students. She discussed the 2004 task force on writing, which included parents, teachers, administrators, postsecondary staff, and workforce people. The writing task force said the majority of them did want to keep writing portfolios in the accountability formula, but at a reduced weight. She said this will be a recommendation from the board in the new RFP.

Ms. Prather said professional development is the key to affect classroom change. She said the drop in recent years of the portfolio inter-rater agreement correlations are probably due to Kentucky closing regional service centers, reducing KDE employees by over 100 in the last few years, and the ability of KDE staff to visit schools and disseminate information has diminished. She said Kentucky needs to be creative in how to deliver the new assessment package to schools.

Ms. Prather asked the legislators to be patient as the school board and KDE work through these changes. She said the worst thing that Kentucky could do would be to dismantle the assessment system on a piecemeal basis. She said the system is meant to work together to affect change.

Senator Kelly said he appreciated the hard work the state board has put into this issue. He believes the most important issue facing education both nationally and in the state, is the question of what is the appropriate role for assessment in teaching practices and in accountability. He said it has a different use in both of those cases.

Senator Kelly said he and Representative Moberly feel very strongly about the need for increased use of assessment for diagnostic purposes for intervention, and to determine whether or not Kentucky is having successful results. He said the current accountability testing system is not designed to do that, it is designed to measure how schools are doing in teaching the core content.

Senator Kelly said the federal No Child Left Behind (NCLB) testing requirements are different as well. He said two major problems that policymakers are facing in Kentucky right now are: 1) the timeline that it takes to implement policy effectively - Kentucky's RFP is due, which will establish policy for the next four years; and 2) political pressure to respond to problems within the testing system.

Ms. Prather discussed student accountability and the intended purpose for CATS. She said the board has operated in the past from the perception that its primary purpose is to measure school accountability, and not individual student accountability. She said they are currently trying very hard to find a way to expand that purpose and the role of the system itself to incorporate some student accountability measures. With this in mind, she said other states offer assessment systems that measure individual student accountability, but offer students repeated opportunities to take any individual student assessment such as the ACT. She said it is the hope in Kentucky to not have an assessment system where students have multiple opportunities to take an assessment for many reasons, not the least of which is test security.

Senator Kelly said he is concerned about the continued layering. There needs to be some relief in Kentucky on the amount of time that is being invested on assessment. He said it is important as Kentucky is implementing a diagnostic assessment and longitudinal assessment that it is not incorporated into accountability so it does not become this high stakes issue. He said Kentucky needs to find the right balance of how to use assessment for both teaching practices and accountability. He thinks the rubberband is a little tight right now in Kentucky, and needs to be relaxed.

Commissioner Wilhoit said he agreed with Senator Kelly's assessment, and as Kentucky has attempted to make some changes and incorporated federal requirements to the system, it has added to the structure. He said Kentucky has stepped back from an assumption that things have to be done as usual, and some major shifts in test design have been made to the system that is in place. He also said in the proposed KBE assessment directions that new tools will be given to teachers and students in an effort to support them.

Senator Westwood asked how reading is tested for disabled students or students with limited English proficiency. Commissioner Wilhoit said it depends upon the nature of the disability. For example, if a student has a disability that he or she could not write, it would be possible for that student to give a narrative response. He said these are called accommodations. He said it is important for the test accommodation to match the instructional accommodation that is agreed upon between the student, parent, and the school. Commissioner Wilhoit noted that anytime an accommodation is used, the difficult dilemma arises as to the accommodations giving an unfair advantage to that student over another student, or a disadvantage. He said Kentucky is limited by resources in what accommodations can be offered. It is very expensive to deliver assessments to students with limited English proficiency in the student's language. Kentucky currently uses an interpreter for those students. He said this does cause problems when comparisons are made with such tests as the National Assessment of Education Progress (NAEP), which does not allow for accommodations for their disabled students. He said it has to be considered what is best instructionally for that child, and how can that child's knowledge be captured in the best way. He said it is more of an issue in English and language arts than any other content area.

Senator Westwood is concerned about testing students on reading who cannot read. Commissioner Wilhoit said NAEP measures decoding capacities in children, and Kentucky does account for comprehensive understanding if a student can understand what is being read to them. He said it would be possible for a student to not have the ability to decode, but also have a high ability of high comprehension of what is being read. Senator Westwood asked if that was a different test altogether. Commissioner said yes, it is a different test. Senator Westwood asked if there was not a way to test reading. Commissioner Wilhoit said yes, reading is tested, but if an accommodation is needed, the decoding process is removed. Senator Westwood would like to know how many students have to receive the test in this format, and Commissioner Wilhoit said it is only a few, but he will get him the exact numbers.

Senator Westwood asked how Kentucky's test aligns or does not align with the core content. He said since the core content is unique to Kentucky, this prevents the state of going with a NRT instead of CATS because Kentucky's core content is different than the rest of the states. He asked what would be wrong with looking at the national standards, and revising the core content to align with those standards so Kentucky could use a NRT.

Commissioner Wilhoit said this would require two steps. He said in speaking to aligning core content to national standards, Kentucky is in the final stages of looking at how coherent the content is, and making revisions in the core content so that it builds on the cognitive challenges that Kentucky expects of students. Kentucky is also looking at another part of the process of how these expectations align with expectations at a national level.

Commissioner Wilhoit said the states are all over the place in terms of expectations. Kentucky's expectations are in a more rigorous category than most both in terms of the content expected, and also in terms of the cognitive challenges the students are expected to undertake, which would define a different type of assessment other than what would suffice in other states. He said this is important because it is the agenda for the state to overcome its history, and move forward aggressively. Secondly, it is more in-line with national thinking. He said Kentucky has begun to align with the NAEP assessment, and The American Diploma Project, which is the work of five benchmark states (Indiana, Kentucky, Massachusetts, Ohio, and Nevada). These five states will be subject to a national critique. Commissioner Wilhoit believes the final product will be more in-line with what higher education is expecting for success, and more in-line with the national standards. It still leaves the difficult task of finding an assessment that meets those standards. He said it leads Kentucky to a criterion-referenced approach, while using normative comparisons in the development of the criterion-referenced test.

Representative Draud said that the tenth amendment in the United States Constitution says education is reserved for the states. He does not want a national curriculum, or a national core content. He can understand states working together to reach certain goals, but not following the same curriculum.

Representative Moberly said the KBE is approaching this in the right way. He said Kentucky has the right concepts, but needs to deal with the problems. It would not be wise to go back to the attitude before 1998, where there was a bunker mentality and the whole concept of the assessment and accountability was almost lost because of refusing to deal with the problems. He continues to believe that a significant number of Kentucky's problems are failures of implementation, and not of concept, which is related to inadequate professional development.

Representative Rasche asked how much of what is occurring in classroom practice is filtering into the schools of education in the postsecondary system. Ms. Prather said two members of the KBE work in higher education. She also said they have regular conversations with the P-16 council. She realizes that postsecondary needs to be a partner.

Ms. Prather said Kentucky needs to turn out teachers who understand the link between assessment and instruction, and can implement these things appropriately in the classrooms. In fact, these teachers need to serve as the models for teachers who have been in the classroom. Representative Rasche said traditionally this has not been a very tight feedback loop.

Commissioner Wilhoit said the Education Professional Standards Board (EPSB) conducts an annual survey of the first, second, and third year teachers and interns to see what kinds of alignment or misalignments are occurring. Generally, the areas of greatest need are being able to identify learning deficits and prescribe appropriate interventions. He said KDE is working on getting these teachers additional tools in order to cope with the problems.

Senator Winters said postsecondary educators want to be responsive as much as they can, but exceptional dialogue is needed. He said KBE's role is important, and commended the board on the types of conversations they are engaging in currently.

Senator Winters asked Commissioner Wilhoit for the survey results for the massive survey they conducted with teachers. Commissioner Wilhoit said he would provide the members with copies of that survey immediately. Senator Winters said he felt this survey would be important to see as the number of teachers surveyed was much more comprehensive than this current survey conducted by AEL.

Senator Westwood introduced Dr. James Catterall, Chair, and Dr. John Poggio, Vice Chair, NTAPAA, to give a summary of the review and recommendations by NTAPAA regarding assessment and accountability issues. He explained that the subcommittee had asked NTAPAA to provide a written summary of the assessment and accountability issues brought to the panel by the KBE and KDE as CATS revision discussions have progressed, as well as the advice offered by NTAPAA regarding each issue. He also said NTAPAA was requested to provide the subcommittee a written review of the overall assessment and accountability modifications being considered by KBE as it prepares for testing contract negotiations. The review should include any suggestions or recommendations the panel would like to make to the subcommittee.

Dr. Catterall said the big issues facing the design of CATS for the new contract include: 1) test design; 2) revising the core content; 3) changing the writing assessment; 4) shifts to online testing; 5) explorations of student-level assessments and student accountability; 6) plans and steps for complying with NCLB, especially longitudinal assessments, augmented NRT's to fill-in required test grades, and diagnostic assessments.

Dr. Catterall discussed several issues for system design. He said NTAPAA members emphasize the importance of clarity of purpose in their consideration of all issues related to the design and administration of future tests. Since multiple purposes are inevitable in large-scale assessment systems, an assessment design will inevitably reflect questions of balance and priority among goals, in context of limited resources.

Dr. Catterall said NTAPAA members were queried by KBE on the status of CATS. He said members attended part of the December 2004 meeting of KBE. Most of the discussion aimed at various provisions of the writing assessment. In addition, panel members made comments on areas that needed improvement in Kentucky's system. The document is on file in the LRC library for a comprehensive list of all comments.

Dr. Catterall said another recommendation is continuing key design elements. NTAPAA has encouraged testing to be spread across grade levels, and to continue assessing content areas in addition to reading and mathematics as required by NCLB.

Dr. Poggio said there should be more focused content specification for tests. NTAPAA members emphasized the advantages of a test design calling for highly focused content specification - somewhat narrower or more selective than the blueprints for CATS tests. A test with focused content specification is more easily interpreted by teachers, students, and parents and thus potentially more useful toward effecting improvements.

Dr. Poggio said the system design should maintain a strong open-response format. NTAPAA recommends that Kentucky's test design should continue to call for open-response items of the length, complexity, and general format currently used -- if the state continues to value those types of performances.

Dr. Poggio said that to transition to a common English, language arts, and math test design at all grade levels, NTAPAA members suggest moving expeditiously to a common test design across all grade levels for reading and math. The design should not enlist a mix of augmented NRT's and traditional KCCT assessments (these are very different designs). Such a design mix will be used in 2006 - the first year of required testing in all grades, 3-8, and probably in 2007 as well.

Dr. Poggio discussed the role of commercial NRT's. NTAPPA supports the use of augmented NRT's as part of the test design for 2006 and 2007, but only to the extent that an independent panel review finds sufficient alignment of the NRT tested skills to the Kentucky Core Content. It appears that achieving all presumed goals of CATS will include efficiencies afforded by a commercial NRT.

Senator Westwood said Kentucky would not want a core content that was not fairly in-line with an existing NRT unless it just wanted to be unique for some reason from the other 50 states. Dr. Poggio said he was right. He personally feels there should be an alignment study conducted right now with all the major NRT's against Kentucky's core content.

Senator Westwood asked if Kentucky was doing any of the longitudinal assessment in the current system. Dr. Poggio said Kentucky is not doing it at all. Senator Westwood said the statute requires longitudinal studies so he is concerned about finding a way for Kentucky to have valid longitudinal studies so that Kentucky is in compliance with the law.

Dr. Poggio said Kentucky can incorporate longitudinal studies within the current system, however it will incur some costs. It is also going to take some time because the system will have to be built to allow for it.

Representative Moberly asked Commissioner Wilhoit about the several pilot projects that KDE were working on to look at the longitudinal aspect. Commissioner Wilhoit said the longitudinal pilot is being implemented this year. He said the 2005 Spring assessment included it, but he would also like to see improvements made to it. Representative Moberly said he hoped this was being done, and wants to see a report on this soon. Commissioner Wilhoit said KDE has decided to change the alignment of the design somewhat.

Dr. Poggio discussed the longitudinal efforts that did begin in 2000 at the student level. He said there was an effort to begin this, but it was not promising enough as NCLB came in and said that every grade from 3 - 8 had to start common testing in reading and math.

Representative Moberly asked if the longitudinal aspect that Kentucky was counting on is gone, or what needs to happen. Commissioner Wilhoit said no.

Dr. Catterall said the opportunities are greater for the longitudinal aspect now that math is being tested at every grade level. He said it would have been very hard to study something longitudinally when some students were only being tested for math skills every three years. He said longitudinal means different things to different people.

Senator Westwood asked if there would be a problem having a longitudinal assessment based on realizing that CATS is not designed to be valid or reliable for students as it is currently constructed. Dr. Catterall said it would need to be more valid for individual student scores. Senator Westwood said this presents another challenge.

Dr. Poggio said the hardest part of longitudinal is building the database to track the child, and not building the test to give the data. He said Dr. Catterall's model is one that is well respected.

Senator Winters said Kentucky has a very short time to make some very big decisions. He said longitudinal data is needed particularly for postsecondary institutions.

Senator Winters commented about all the negativity he has heard regarding the writing portfolios, but he has refused to allow the concept to die. He said the very best expertise is needed before crucial decisions are made.

Dr. Catterall said the surveys seem to indicate some places where knowing more about what actually occurs in the classroom would be very valuable. He said determining what teachers do with assessment information and how it impacts their instruction on a day-to-day basis is very hard to obtain. He said certainly a small survey is not the answer to obtaining this information, and it may be that a survey is not the answer at all.

Dr. Poggio said the numbers of the survey are very palatable considering this was a first read. He said the survey however, should have targeted experienced teachers with at least ten years of experience about questions concerning the writing portfolio.

Senator Winters was just given a survey where 152 teachers responded through email. He said the results are very similar to AEL's survey results.

Commissioner Wilhoit presented briefly the KBE assessment directions. The major ten initiatives are: 1) The KBE has directed the KDE to improve the Kentucky core Content for Assessment; 2) The KBE will expand the purpose of CATS beyond a school accountability to include additional student-based measures; 3) The KBE approves moving from 100 percent per year core content coverage to a model that would allow more flexibility (a. 100 percent - 85 percent or b. one or two years); 4) The KBE will consider a change to the number of on-demand writing prompts or how we assess on-demand writing; 5) The KBE prefers that the KCCT test design include a core of common items to provide additional student levels results and matrix items for coverage of core content, equating and pretesting; 6) The KBE wishes to continue emphasizing higher order skills while assigning greater weight to open response items; 7) If possible, the KBE wishes the state to pursue an embedded NRT for a longitudinal measure in reading and mathematics; 8) The KBE wishes staff to initiate pilot studies to develop and/or identify assessment approaches in arts and humanities and practical living/vocational studies that will address what students do as well as what they know in these areas; 9) The KBE wishes staff to include in the RFP a predictive measure of college success; and 10) The KBE has directed staff to make improvements in the writing portfolio process.

Commissioner Wilhoit said KDE is also aggressively pursuing end-of-course tests in high schools. This will increase accountability at the high school level.

With no further business before the committee, the meeting adjourned at 3:20 p.m.