octype html public "-//w3c//dtd html 4.0 transitional//en"
Evaluation can be conducted as formative, summative, or both. Formative evaluation is a way to detect problems and weaknesses in components in order to revise them. In projects with sufficient time and funding, formative assessment is conducted prior to the implementation of the final program. In practice, many projects begin with the "best effort" and conduct a formative evaluation with implementation, correcting weaknesses and errors as the project unfolds.Summative evaluation is a process that concerns final evaluation to ask if the project or program met its goals. In both types, the media or instructional program can be evaluated, but typically the summative evaluation concentrates on learner outcomes rather than only the program of instruction. Traditional tests and other evaluation methods commonly employed in classrooms are used in both instances, but specific kinds of evaluation can be used in formative evaluation. Records, observations, interviews, and other data will permit the use of qualitative analysis of information for formative and summative evaluation.
The following points are
adapted from a classification system defined by William
M. Trochim:
The major evaluation
models in education are:
In recent years there have been many approaches to program evaluation based on discrepancy analysis of Malcolm Provus in the 1960's, but with his untimely death and his books out of print, the testament to this heritage lives on primarily through the work of Daniel Stufflebeam and his associates at The Evaluation Center. While DEM is called an evaluation method,Scriven considers the term "evaluation" to be inappropriate and seems to prefer monitoring as a more appropriate description. Alter described discrepancy evaluation as follows:
The Provus Discrepancy Evaluation Model designed by Malcolm Provus in 1969, is a well tested and commonly accepted utilitarian model to use in evaluating academic programs. He defined evaluation as the process of agreeing upon program standards, determining whether a discrepancy exists between some aspect of the program and standards governing that aspect of the program, and using discrepancy information to identify weaknesses of the program. His stated purpose of evaluation is to determine whether to improve, maintain or terminate a program . . . His model is primarily a problem-solving set of procedures that seeks to identify weaknesses (according to selected standards) and to take corrective actions with termination as the option of last resort.A more comprehensive coverage of the DEM is online in a reprinted article by Alter, and you may find copies of original works by Provus in libraries or the shelves of older professors: Discrepancy Evaluation, Discrepancy evaluation for educational program improvement and assessment, andThe grand experiment : the life and death of the TTT program as seen through the eyes of its evaluators.
The DEM uses stages and content
categories is to permit comparisons. The stages are:
At each of the four stages
the defined standard is compared to actual program performance to determine
if any discrepancy exists. The use of discrepancy information always leads
to one of four choices:
According Gredler (1996)
cited by Alter, the model is most effective under the following circumstances:
Discrepancy Model Components (DEM) The discrepancy evaluation model is based on standards, which are mutually agreed to by all members of a project as the outset. The standards are the most important part of development and all evaluations are later based on the discrepancy between standards and actual performance. This is really a systems theory approach with three major divisions: input, process and output. A DEM may show that a program or parts of it failed for lack of sufficient inputs, which could be money, qualified personnel, and other necessary prerequisites. After standards are developed it is possible at any point in the process to determine if a discrepancy exists, which can be done in many ways with available data and instruments developed for the purpose. In theory, the discrepancy in a formative assessment can be used to determine what needs to be done to get up to standards or even to eliminate some element and start over.
Standards
A standard can be expressed in the forms of objectives or outcomes, and it can be enumerated by reference to specific parts of the design when the project is planned. In actual practice, the standards for a project should be stated in such a way that it is possible to determine if they have been met. This is similar to the use of objectives in behavioral terms. If not, then some consideration need to be given to defining terms. If a standard is that subjects will "appreciate" classical music, some attention needs to be given to what is meant by appreciation. Otherwise, how will you know if you reach the standard? For many administrative tasks the standard may be simple and straightforward, such as organize a committee, hire a staff, open an office, and so forth, but even in this cases there may be more descriptive or qualifying terms. For example, an office may have to meet certain criteria imposed by ADA or other external agency.
Performance
Performance is simply a matter of determining if the project meets standards and can be considered in the divisions of input, process, and output.
Discrepancies
The measure of the difference between a standard and what is or has actually happened is a discrepancy. If a project activity does not meet standards, the there is a discrepancy.
Data Collection
The Checklist
Project of the The Evaluation Center at Western Michigan University
gives examples of such tools.
Terminology (From Kirk Alter)
Inputs a) the things the program is attempting to change, and b) things that are prerequisite to program operation, but which remain constant.
Process those activities which change inputs into desired outputs.
Outputs the changes that have come about including a) enabling objectives, b) terminal outcomes, and c) other benefits.
Enabling Objectives intervening behaviors/tasks which students must complete as a necessary basis for terminal outcomes.
Terminal Outcomes the behaviors the clients are expected to demonstrate upon completion of the program.
Design Criteria contains
a comprehensive list of program elements (input, process, output) that
become the standard of performance in Stage 1.
Below is a level 2 network of only one component (organization and planning).
A discrepancy evaluation approach can be used to account for design, formative, and summative evaluations. In fact, evaluation questions are best asked prior to the development of a project. The process is based on the expressed objectives and the activities that lead to outputs. The discrepancy evaluation has these characteristics:
1. It includes more than evaluation of terminal or outcome objectives.A variety of tools and procedures can be used to conduct evaluation, most of which will be determined by project staff, teachers, and consultants. These would include criterion tests, records and files, expert opinion, formal and informal observations, and products. The evaluation focuses on process objectives related to the design and implementation of the program and on outcome objectives related to student affect , participation, and achievement of objectives. A detailed design including questions, data collection procedures and forms, survey instruments and tests, and data analysis procedures can be developed. The following types of instruments can be used:
2. It looks for relationships among context, inputs, processes, and products.
3. There is importance in collecting data on key developmental factors to assess progress at a given point.
Criterion tests. Criterion tests will be used to assess the progress of participants. Pre and post test information will be collected and submitted to accepted statistical procedures to determine significance.The evaluation plan may be depicted as follows: (questions could be asked about inputs)Records & files. Records and files of the project will be used to address questions pertaining to the major activities of the project. Records and files can be audited and examined to answer specific questions.
Expert opinion. Experts from the education community and external experts will offer assessments of the project activities. Such information will be used to improve the project activities.
Formal & informal observations. Formal and informal observations will be made by staff and appointed professionals to collect information and data to improve the project.
Printed or other materials. Any products developed by the project will be examined for quality and effectiveness.
Process Questions:
1. Is the staff effectively managing the project?Outcome Questions:
2. Do the delivery systems work effectively and efficiently?
3. Are the training materials comprehensive, technically accurate, instructionally diversified and related to
explicit objectives?
4. Did participants acquire the needed skills?In the process of evaluating multimedia applications and uses of computers in education and training at any level, instructional designers and educational technologists are confronted with many of the same issues and problems that confront anyone concerned with evaluation of programs and projects. In most small projects the developer may be responsible for evaluation. In large project an external evaluator will usually be employed, someone who is not likely to be affected by the organization in order to be impartial.
5. Were more teachers actually available for employment in rural schools as science, math, and foreign
language teachers?
6. Is this an effective and efficient means for providing training and support to school personnel?
7. Was a management system developed to allow management of non-traditional delivery of education for
pre-service and inservice?
Misuse and Abuse of Standardized Tests in Program Evaluation
While standardized tests may be used to unfairly characterize individuals and groups, the most blatant abuse of such tests is by state governments. Many states now require schools to administer standardized achievement tests, and these scores are then used to award grades to the schools. Alabama and Florida give annual report cards with grades of A to F, so to speak, depending upon the average achievement scores of students. The Florida A+ grading system passed in December, 1999. A brief news article describes the last meeting when opponents and proponents of the system had their final arguments.
School scores may be listed in comparisons to other schools or to national averages, but so far only Florida has gone to the extent of awarding bonus money to schools with an "A " and providing public money as vouchers to children in "F" schools for use in private and parochial schools. The absurdity of this is that schools are not matched nor are controls used, only the achievement scores. Florida can arrange schools by size for football and basketball divisions and championships. Almost everyone understands that a big school in Miami is likely to have a team that will easily crush a team from a small community. The reason is simple, the schools are not comparable.
Until the 1970s there was no expectation that all students would finish high school, in fact most did not. Dropout rates, one bete noir of modern education and the other achievement scores, was once more or less a Darwinian concept or simply a matter of sorting the wheat from the chaff. High school completion was for those few who might go on to college. In fact, most people in the "good old days" did not need a college degree or even a high school diploma. Vocational programs and other options with practical skills were developed, but the core academic program of the high school has always imitated the college curriculum and been geared to preparing a few to matriculate. The fact that so many students are encouraged to stay in high school today has caused the proliferation of AP programs and the International Baccalaureate, which give greater prestige to the college bound, a sort of academic varsity team. With the new demand for high levels of literacy and technical skills just to get ordinary jobs, expectations have changed. Now schools are expected to graduate most students with high levels of literacy. This is like expecting the coach to keep all the players on the team and also make them all excel in their positions.
Continuing with a sports analogy, the distribution of academic skills among schools is not normal. If we think in terms of height instead of scholasticism, some schools have mostly short kids, some have mostly tall, and some have a good mix of both. Statistically, half the people in the United States are below average in height---that is they are short. This is expected. If school enrollments were distributed according to the normal curve, or if students were distributed in neighborhoods by height, like they are for poverty, some school teams would never win. Of course there might be a state mandate for schools to make all the short kids taller. One way to do this is to spread the short kids around to different schools in order to change the averages, or to get rid of the short kids all together. Private schools do not have to accept students unless they meet their criteria for enrollment; public schools have no choice. This is similar to professional football and basketball teams who can select the best players from anywhere. There is no question that the performance of the successful teams is different than the rest, but the circumstances that cause this have less to do with coaching (read teaching) than with selection.
Schools that have high concentrations of minority and poor children are dealing with students who have different academic skill levels than the wealthy schools. In Florida and other states, schools are punished for the type of children they have. There is no surprise that the overall achievement of some schools will be very high, especially if the children are drawn from the same neighborhoods. In Alabama, for example, the highest achieving school district has children from families with the highest median income and students with free or reduced lunchesl. This is not surprising to anyone. Under the Florida plan, this Alabama school system, the Mountain Brook schools, would get bonuses for the happy circumstance of being located in an affluent neighborhood, using funds removed from the decaying inner city schools of Birmingham and other poor, minority communities. The true test of the Florida presumptions would be to switch the faculties of the affluent schools with the poorest schools and see if there are differences in achievement scores of students.
Alternatives to the Florida
A+ systems are as follows:
Indiana uses a canonical analysis to group schools by SES and cognitive ability | http://ideanet.doe.state.in.us/pba/welcome.html |
Georgia uses clusters SES levels in order to make comparisons within each cluster | http://arcweb.gsu.edu/csp/default.htm http://arcweb.gsu.edu/csp/csp_cluster.htm#why |
Oregon also uses clustering | http://www.ode.state.or.us/asmt/results/glossary.htm |
The Tennessee Value-Added System controls for SES | http://www.k-12.state.tn.us/arc/rptcrd97/tvaas.htm |
Is the major indicator that distinguishes effective from ineffective educational practice a standardized test score? There is considerable difficulty in using student achievement data in models for teacher and school evaluation, primarily because of the difficulty of delineating teacher and school effects on student learning from demographic effects--effects that are inherent in the students independent of formal education.
The American Educational
Research Association (AERA) has issued a "Position
Statement" concerning high-stakes testing in PreK-12 education:
Production Function Much of research in education has been aimed at finding a production function (mathematical expression of the relationship between inputs and outputs in education), similar to that used in manufacturing efforts. It must be remembered that the shape of modern education has its roots in scientific management, so research approaches have regarded schooling as something done to students, instead of regarding education as something that students do for themselves. Traditional schools, school-based management, total quality management (TQM), charter schools, voucher systems and other recent trends in education are still based on the concept of the school as a manufacturer, and students are the raw materials. Many educational research paradigms are predicated on the assumption that the task is to relate variations in measured achievement or attitudes of pupils to variations in the observed behaviors or other traits of teachers, which have included gender, age, amount of training, type of training, personality, type of questions used, and so forth, or to sets of circumstances, behaviors, or routines that can be reliably replicated to result in higher achievement scores (Rosenshine & Stevens). Some investigators presume to reveal the precise degree of relationship between teachers' behaviors (i.e., questions, classroom organization, homework, seatwork) and student achievement scores, such as the researcher Walberg. However, Monk (1990; 1992) concluded that production studies of education have not yielded very much useful knowledge.
Unlike manufacturing or service industries, outcomes, inputs, and processes are difficult to identify, isolate, and investigate. In education outcomes are confounded--multiple, jointly produced, and not easily transformed into standardized units for comparison. Unlike a factory, it is nearly impossible to track inputs directly to students. As Monk illustrated, a teacher may invest considerable time providing tutorial instruction for one student, but the student may "..decline the assistance, either overtly or covertly."
Relationships between aptitude (entry ability) and the criterion (what the student is to learn) are determined by the nature and quality of learning experiences, but primarily by ability and motivation. Some students fall farther behind as the pace of the curriculum moves forward. Instruction can range significantly across classrooms, particularly the extent to which instruction meets the needs of a learner. At the microlevel, any differences may be attributed to a myriad of variables, such as the child's intelligence, motivation, SES, or to the classroom environment, curriculum, teacher, peers, parents, nutrition, and so forth. There may be a fundamental problem with the production metaphor as applied to education, because in school it is not evident what (or who) the raw materials are, nor who is doing the producing, nor what the product is. Quite unlike manufacturing, few schools have any real control over their raw materials and basic inputs, except for the most prestigious and exclusive private schools. Under this premise, the most productive schools have the best raw products.
As can be seen, the difficulties in educational research are related to the roles of students in the learning process. Are students producers or raw materials shaped by teachers? If students are relatively free agents, as in constructivist models of learning, one set of assumptions is invoked. If they are objects to be manipulated, as in behavioral models of learning, clearly the teacher is entirely responsible for productivity. This helps explain the problems for researchers and policymakers in making accountability models that delineate a production relationship in education. As Liven (1993) explained, it would be interesting to imagine a factory in which raw materials had minds of their own and could make independent decisions about whether or not they would be part of whatever was being manufactured. Students decide to attend, pay attention, undertake work seriously, and concentrate on grades (Doyle, 1986).
This only underscores the importance of deciding how to consider the purpose of a school or any educational program. If it is not quite like manufacturing, what is it? Attempts in recent years to improve schools have adopted yet another manufacturing concept, Total Quality Management or TQM, spawned by W. Edwards Deming. The Deming story is almost legendary, how after World War II armed with the principles of total quality management he could get no followers in the United States, so he took his program to Japan and was able to transform the Japanese economy into the most productive in the world. Deming summarized his principles of management into 14 points. Three important points are summarized here:
Deming observed that most problems in organizations are created by the structure and not by the workers. Efforts to change the productivity of an organization must focus on the structure and not on individual workers. The causes of low quality and low productivity belong to the system and lie beyond the power of the work force. If education were to follow Deming's recommendations, we would abolish annual merit ratings, management by objective, testing, and SAT scores.
- Cease dependence on inspection to achieve quality. Eliminate the need for inspection on a mass basis by building quality into the product in the first place.
- Eliminate slogans, exhortations, and targets for the work force asking for zero defects and new levels of productivity.
- The only route to excellence is by means of self-assessment.
As Deming showed, evaluation of another person does not create excellence. In schools the board assesses the superintendent, who assess the principals. The principals assess the teachers, and teachers assess the students. Ultimately, the state department of education, which is usually not assessed by anyone, assesses the school district. Giving outstanding teaching awards and bonuses will not alter the system, nor will posting achievement scores in the local newspaper. A state department grading system for school districts is just another elaboration of the problem of assessment. For schools to improve it is necessary that the system change to engage in self-assessment. The best teachers spend their time improving rather than assessing (blaming) their circumstances, the principal, their students, their students' parents, and so forth. Apparently we may never reach the levels in education suggested by Deming, for schools have actually become more tied to inspection, evaluation, and assessment than ever, something that certain to be a central focus of the upcoming presidential campaign.
There is no reason to consider that program evaluation for computers or multimedia applications and projects would be different than any other kind of program evaluation in education and training. While educators and IT personnel will be required to use evaluation, there are contradictory forces. At the same time that TQM and other advocates are saying that we need to change the system and eliminate evaluation in favor of systemic self-evaluation, we find more evaluation being required by state governments and federal agencies. While constructivist philosophy seems to be spreading throughout education at all levels, we are bound more tightly to imposed instructional procedures and assessments. Perhaps this is an illustration of the Chinese curse, "May you live in interesting times."
References Gredler, M.E. (1996). Program Evaluation. Englewood Cliffs, New Jersey: Prentice-Hall, Inc.
Levin, B. (1993). Students and educational productivity. Education Policy Analysis Archives, 1:5.
Monk, D. (1990). Educational Finance: An Economic Approach. New York: McGraw-Hill.
Monk, D. (1992). Education productivity research: An update and assessment of its role in education finance reform. Educational Evaluation and Policy Analysis, 14(4), 307-332.
Worthen, B.R. & Sanders, J.R. (1987) Educational evaluation: Alternative approaches and practical guidelines. White
Plains, NY: Longman.