The Nature of Evaluation and Measurement in Education

In this article we will discuss the nature of evaluation and measurement in education. We will explain here educational measurement and evaluation. Furthermore we will showered light that what is educational measurement and evaluation? We will also discuss in this article what is the difference between measurement and evaluation.
 
We will also try to find out the answers of the following question from this article:

What is Measurement?
What is Evaluation?
What are the Definitions of Measurement and Evaluation?
What is Measurement in Education?
What is Evaluation in Education?
What is the Purpose of Evaluation and Measurement in Education?
What is the Nature of Evaluation and Measurement in Education?
What is the Concept of Measurement in Education?
What is the Concept of Evaluation in Education?
What is the Concept of Test in Education?
What is Difference between Measurement and Evaluation?
What is the Relationship between Evaluation and Measurement?
 
educational-measurement-and-evaluation

Purpose of Evaluation and Measurement

The purpose of evaluation is to make a judgment about the quality or worth of something-an educational program, worker performance or proficiency or student attainments. That is what we attempt to do when we evaluate student's achievements, employee’s productivity or prospective practitioner’s competencies. 

In each case the goal is not simply to describe what the student' employees or other personnel can do. Instead we seek answer to such questions as:

  • How good is the level of achievement?
  • How good is the performance?
  • Have they learned enough?
  • Is their work good enough?

These are questions of value that require the exercise of judgement. To say simply that evaluation is the process of making value judgements understates the complicity and difficulty of the effort required. 

Once it has been determined that evaluation is needed, the evaluator must decide what kind of information needed, how the information should be gathered and how the information should be synthesized to support the outcome the value judgement.

Thus, evaluation is as concerned with information gathering as it is with making decisions. In addition, the term is used to refer to the product or outcome of process. 

That is, we might, for example, submit our evaluation (the product) of public's school performance to his parents, following our evaluation (the process) of his accomplishments. In this respect evaluation has a dual connotation.

1. Concept of Evaluation/ Concept of Evaluation in Education

Educational Evaluation is broader in scope and more objective than measurement. It is the process of carefully appraising the individual from a variety of information giving device. 

Besides testing and other tools of measurement, evaluation seeks additional evidences from various sources of information supplementing each other; like interviews, questionnaires, anecdotal records, cumulative records, case conferences, mechanical or electronic records, case studies, projective techniques etc.

The selection,, through careful analysis of data, most pertinent to a wise just and comprehensive interpretation to make value judgement of the individual, or group under study.

Evaluation is based on two philosophies. One, traditional philosophy is that ability to learn is randomly distributed in the general population. It means that if some learning task is assigned to a class and then a test is administered to study their performance. 

The result of the test shows that some student’s score is very high and some student’s score is low and majority of the students, score falls between these two extremes. It was the opinion of old educators that all are not endowed with same intellectual abilities to benefit from schooling. 

Generally, teachers weeded out students who tended to learn less well than their peers. This was the old philosophy based on the superiority of heredity.

This gave birth to norm-referenced measurement of intellectual abilities. It has been used in schools to differentiate among individuals of some defined group or whatever is being measured. In norm-referenced measurement, an individual's score is interpreted by comparing the score to those of a defined group, often called the normative group. The comparison is relative rather than absolute. 

The Philosophy of measurement has recently emerged. The new philosophy of measurement is based on democratic values and gives importance to the environment. It is based on the universalization of education. 

It assumes that if education Is thought universal, the responsibility of the teacher is to help as many students as possible to learn. It has discarded the selection philosophy of norm-referenced measurement.

All individuals can attain mastery of a learning task; they are given opportunities and time. It assumes that with property developed instructional sequence every child could reach 100% mastery of any objective. It suggests that an absolute standard should be used as reference for evaluation. 

These standards are the objectives specified for instruction. Each student's status is determined by how he achieves and satisfies its objectives for example, before a unit begins, the teacher may have decided that three objectives were essential for every student. A student has to satisfy each in order to receive a passing grade.

Thus we see that the two philosophies of evaluation are based on different concepts of human potentialities and their development. 

One believes that human abilities are not evenly distributed in the population. Achievement of individual learner differs greatly whereas the other believes that all learners can attain the mastery of learning task irrespective of individual differences among them. 

William Wiersma and Stephen G Jurs (1990) remarks that evaluation is a process that includes measurement and possibly testing, but it also contains the notion of value judgement. If a teacher administers a test to a class and computes the percentage of correct responses, it is said that measurement and testing has taken place. 

The scores also must be interpreted grades like As Bs Cs and so on or judging them to be excellent, good, fair or poor. This process is called evaluation.

So we can say evaluation is concerned with making judgements about things. When we act as evaluators, we attribute 'value' or 'worth' to behavior, objects and processes. 

In the wider community, for example, one may make evaluative comments about a play, clothes, a restaurant, a book or someone's behavior. We may enjoy a play; admire someone's clothes, rave about a restaurant and so on and so forth. Invariably these are rather simple, straightforward comments of value or worth.

According to William Wiersma and Stephen. G. Jurs (1990); To be more effective, however, evaluation requires that judgements based on appropriate and relevant data. 

Ineffective evaluation is made upon whim or fancy, even in the broader community context. To say, for example, that a film was 'good' or 'bad' says little unless the basis of these judgements is made. 

An enjoyable or good film may have a well-written script tight direction, mood-enhancing music and so forth. These are characteristics of the evaluation upon which judgement can be made subsequently.

Norman E. Gronlund (1990)

Evaluation is a systematic process of collecting analyzing and interpreting information to determined extent the pupils are achieving instructional objectives. (Answers) the question "How good"?).

In the light of above discussion, evaluation in our schools is essentially concerned with two major approaches to making judgements.

1. Product evaluation is an evaluation of student performance in a specific learning context. Such an evaluation essentially seeks to determine how well the students have achieved the stated objectives of the learning situation. In this sense the student's performance is seen as a product of the educational experience. A school report is an example of product evaluation.

2. Process evaluation examines the experiences and activities co-evolved in the learning situation i.e. making judgements about the process by which students acquired learning or examining the learning experience before it has been concluded. 

In most cases, process evaluation is used when making judgements about school effectiveness, classroom interactions, and the curriculum and the effectiveness of specific programs. 

For example, process evaluation may be conducted upon the nature of student-teacher interaction, instructional methods, school curricula, and a program for gifted students, and so forth. 

Robert L. Ebel and David A. Frisible (1986) observe the difference between product and process evaluations is something of a fine line. 

Students usually pass through a school, experience a curriculum and then depart. In that sense we can refer to product, just as we can refer to a student's progress as 'the proof is in the product' curriculum evaluations in the activity involved rarely comes to a conclusion in schools, i.e. the curriculum is ongoing. 

However, if a curriculum or a particular program had been terminated, then a form of product evaluation would be conducted.

What are subcategories of process evaluation?

The subcategories of process evaluation are frequently referred to in the literature as curriculum evaluation, teacher evaluation and program evaluation.

A. According to L.R. Hay, (1985. p-6)

(i) Evaluation is the systematic process of collecting and analyzing data in order to determine whether, and to what degree objectives have been achieved. 

(ii) Evaluation is the systematic process of collecting and analyzing data in order to make decisions. 

A systematic process or data collection, that is measurement and the analysis of collected data, is common to both definitions, although some definitions seem to equate measurement with evaluation, most recognize that measurement is one of the essential components of evaluation.

The basic difference between the two definitions is the issue of decisions, or judgements, whether they are an integral component of evaluation or not.

Proponents of definitions:

(i)agree that the results of evaluation may be used for decision-making.

(ii) consider decision making to be a part of evaluation.

For two major reasons, the second definition would seem to be preferable. 

First of all definition (i) is more inclusive. Second the notion that evaluation can be conducted for strictly descriptive purposes of evaluation implies, is naive at best perhaps ideally the sole purpose of evaluation, should be to provide feedback in order to improve the object of the evaluation, as the first definition (if any) between where we are and where we would like to be. 

B. Evaluation has been broadly defined by Stifle Beam 1971 as:

The process of delineating, obtaining and providing useful information for judging decision alternatives (p- 1 5).

C.  According to Thorndike and Hagen (1977, pp 1 5):

Measurement provides only information such as a test score and not the judgement of insight that is required for reaching a sound conclusion or plan of action. The judgement of insight is considered as the set of evaluative procedure used to interpret information into an appraisal.

D. Ebel (1979) clarified the difference between measurement and evaluation as described follow:

An evaluation is a judgement of merit, sometimes based solely on measurements such as those provided by test scores but more frequently involving the synthesis of various measurement critical incidents, subjective impressions, and other kinds of evidence. (p-376)

E. In practice, evaluation is specific in terms of function and each type of evaluation uses this general definition in a special way. Common to all evaluation is the use of adequate information to make judgement about someone or something.

F. Each of these interpretations point out that judgement and introspection are necessary when evaluating. One should clearly understand that evaluation goes beyond measurement measuring and measured.

Evaluation is the continuous evaluation of all available information concerning the student, teacher, educational program and the teaching-learning process to ascertain the degree of change in students and form valid judgement about the students and the effectiveness of the program. 

Value judgement on an observation, performance test or any data whether directly measured or inferred is called evaluation.

2. Concept of Measurement/ Concept of Measurement in Education

The term "Educational Measurement refers to any device for the general study and practice of testing, scaling, and appraising the outcomes of educational process. It includes administration and scoring or tests, scale construction, validation and standardization, and application of statistical techniques in the interpretation of obtained measures or test results. 

Definition of Measurement - What is Measurement - Measurement is the process of assigning numbers to individuals or their characteristics according to specified rules. 

Measurement requires the use of numbers but does not require that value judgements be made about the numbers obtained from the process. We measure achievement with a test by counting the number of test items a student answers correctly, and we use exactly the same rule to assign a number to the achievement of each student in the class. 

Measurements are useful for describing the amount of certain abilities that individuals have. For that reason, they represent useful information for the evaluation process. 

Education is an extensive, diverse, and complex enterprise, not only in terms of the achievements it seeks to develop, but also in terms of the means by which it seeks to develop them. 

Our understanding of the nature and process of education is far from perfect. Hence it is easy to agree that we do not know how to measure all-important educational outcomes. 

But, in principle, all-important outcomes of education are measurable. They may not even be measurable in principle, using only paper and pencil tests. But if they are known to be important, they must be measurable.

To be important, an outcome of education must make an observable difference.  That is, some time, under some circumstances, a person who has more of it must behave at differently from a person who has less of it. 

If different degrees or amounts of an educational achievement never make any observable difference, what evidence can be found to show that it is in fact important? But if such differences can be observed, then the achievement is measurable, for all that measurement requires is verifiable observation of a more-less relationship. 

Can integrity be measured? It can if verifiable differences in integrity can be observed among individuals. 

Can mother love be measured? If observers can agree that a hen shows more mother love than a female trout, or that Mrs. "A" shows more love for her children than Mrs. "B" then mother love can be measured. 

If it makes a difference, the basis for measurement exists. To say that Asma shows more "Spunk" than Omer, may not seem like much of a measurement. 

Where are the numbers? Yet out of a series of such more-less comparisons, a scale for measuring people's spunk can be constructed. 

The Ayres, scale for measuring the quality of handwriting is familiar example at this (Ayres. 1912). If a sequence of numbers is assigned to the sequence of steps or intervals, which make up the scale, then the scale can yield quantitative measurements. If it used carefully by a skilled judge, it yields measurements that are reasonably objective (that is, free from errors associated with the use of a particular set of test items or tasks).

Are some outcomes of education essentially qualitative rather than quantitative? If so, is it reasonable to expect that these qualitative outcomes can be measured? 

This person is a man; that one is a woman. This speaks only Punjabi; that one speaks only Urdu. But we can express these qualitative differences in quantitative terms, too. This person has more of the characteristics of a man; that one has less. This person has more eye-blueness; that one has less. This Person has more ability to speak Punjabi; that one has less. 

We may think of the weight of a man, his age or the size of his bank account as quantities, while regarding his health, his friendliness, or his honesty as qualities. And if they serve to differentiate him from other men because he exhibits more or less of them than other men, they become quantitative qualities.

It is difficult to think of any quality that interests us that cannot also be quantified, "whatever exists at all exists in some amount," said E.L. Thorndike (1918, p-16) and William A. Me Call (1939) has added, "Anything that exists in amount can be measured" (p-18). 

William Wiersma and Stephen G. Jurs (1990) define measurement as:
Measurement: For all practical purposes assessment and measurement can be considered synonymous. When assessment is taking place, information or data are being collected and measurement is being conducted. 

Measurement could also involve of data about teacher performance or about the performance of a curriculum. However, regardless of what is being measured, the data obtained has little value in itself and require interpretation by someone skilled in evaluation procedures. Indeed, measurement of data in the hands of unskilled persons may be grossly misinterpreted. 

For example, what does a student's score of 12/20 on a test indicate? By itself, it means very little and it requires interpretation before it is considered meaningful. It could mean that on that test, the student has performed quite poorly as the mean score on the test was 15/20 or perhaps it means that the student has performed quite well as the median score was 8/20. 

Thus the score by itself has little meaning and it requires interpretation through the use of assessment procedures

A) L.R. Gay says (1985, p-8)

Measurement is the process of quantifying the degree to which someone or something possesses a given trait, i.e. quality characteristics or feature. Measurement permits more objective description concerning traits and facilitates compare-sons. 

Thus instead of saying that Aslam is underweight for his age and height, we can say that Aslam is 18years old, 5' 8" tall, and weight only 85 pounds. 

Further, instead of saying that Aslam is more intelligent than Ali, we can say that Aslam has a measured IQ of 125 and Ali has a measured IQ of 88. 

In each case, the numerical statement is more precise, more objective, and less open to interpretation than the corresponding verbal statement.

B) Norman E. Gronlund (1990, p-5) states in his book, “Measurement and Evaluation in Teaching"

Measurement is the systematic ascertaining of a characteristic property or attribute through a numerical device. The device may be an inventory, a checklist, questionnaire, scale or test. 
 
Measurement is limited to quantitative descriptions of behavior and does not include qualitative descriptions or judgement of the desirability of the behavior being measured. In this respect measurement differs from evaluation.

C) Robert I. Thorndike and Elizabeth P. Hagen (1977)

He  pointed out that three steps are involved in developing a measurement device. First we must identify and define the quality or attribute that is to be measured. We never measure a person, only a quality or attribute of the person like intelligence or emotional maturity

Similarly, We do not measure a table but the temperature of the fire: not the automobile tire but the durability of the tire. If we are concerned with the durability of a tire, do we mean its resistance to puncture, its endurance against road wear, or its ability to hold up against deterioration.

D) According to Robert L. Thorndike and Elizabeth P. Hagen (1977, p-137)

The first step in developing a measurement device is to devise a set of operations to isolate the attribute and make it apparent to us. Take the durability of an automobile tire. Once we have identified and defined the attribute that interests us, we need to develop some standard to allow us to gauge or index it. 

If our concern is with the tire's resistance to roadway abrasion, we need to develop a procedure for ascertaining the rate at which the rubber wears away. 

Similarly, various educators and psychologists have developed the Stanford-Benet and other tests that include operations for eliciting behavior that we take to be indicative of intelligence.

Thorndike and Gagen noticed the fact that there is no single universally accepted test and that different test vary somewhat in the tasks they include and in the order in which they rank people. 

This is evidence that we do not have complete consensus as to what intelligence is on the hand, or what are the appropriate procedures for eliciting it on the other. 

The second step in measurement is to express the result of the operations established in the second step in numerical or quantitative terms. 

This involves an answer to the question. How many or how much? For example, we may employ millimeters as the units for indicating the thickness of the tread on the face of the tire and hence express the amount of wear on the tire in terms of millimeters. 

Similarly, educators and psychologists require numerical units for gauging anxiety, emotional maturity, intelligence, and other attributes. In the case of intelligence, they may have individuals, perform a number of tasks and count the total number of successes which they then convert into IQ units. 

Clearly each step in measurement rests on human-fashioned definitions. In the first step, we define the attribute that interests us in the second step; we define the set of operations that will allow us to identify the attribute. 

And in the third step we define the units in which we will state the results of our operations. 

Thus what is measured is always function of our definitions and they have their own inherent limitations. 

3. Concept of Test/ Concept of Test in Education

What is Test - In education a test consists of a question or series of questions or exercises or other devices for measuring the mental ability, capacity, skill, knowledge, achievement, progress, aptitude, attitude, interest, social and emotional adjustment or personality etc, or an individual or group.

Tests represent one particular measurement teaching. A test is a set of question each or which has a correct answer that examinees usually answer orally or in writing.

Test questions differ from those used In measures of attitudes, interest or preference, and certain other aspects of personality. Ideally, the questions in tests of achievement or many tests of intelligence have answer that content experts can agree are correct; correctness is not determined by the particular values, preferences, or dislikes of a group of judges. 

All tests are a subset of the quantitative tools or techniques that are classified as measurements. And all measurement techniques are a subset of the quantitative and qualitative techniques used in evaluation. A major concern in this text, but certainly not the only one, will be with the development of tests that can contribute to summative evaluation of student learning

Other measurement and evaluation techniques are useful for other evaluation purposes, but test that measure relevant school learning with precision is the most useful tools available to teachers for most classroom summative evaluation needs

4. Relationship Between Evaluation and Measurement

What is relation between measurement and evaluation? What is relation between measurement and evaluation in education? 

A) Stevens (1951) explains the difference between evaluation and measurement in its broadest sense, measurement is the assignment of numerals to objects, or events, according to rules. 

We measure height and weight following certain rules and then assign some numerical value to the measurements. We do not assign numbers in all cases of measurement, especially when using criterion-referenced measuring instruments. 

Here the symbols assigned may be equivalent to (+) or (-) since the measuring instrument set a single standard and the individual either meets or fails to meet the absolute standard set by the objective. When evaluating data we go beyond the concept of measurement and make a judgement about the measurements taken". 

B) Dubois, Alverson and Staley (1979) explain the distinction between evaluation and measurement in these words, "As with any assessment process, the evaluation of entering behavior involves the collection and evaluation of data. 

Psychologists working in the field of tests and measurements, use the term measurements to refer to the collection portion of the process. 

C) A Dictionary of Education (1981) explains the concept of measurement as "Fundamentally we can say that measurement entails certain rules and procedures for assigning numbers to attributes in such a way that the numbers represent the quantity of the attribute. 

It is necessary to be clear that it is not the object, organism or event itself which is being measured. For example, we don't measure a 'piece of wood' but we measure one of its attributes such as its length or weight. In educational measurement we are faced with attributes that do not lend themselves to such intuitive procedures (as used in physical sciences). 

D) Lester O. Cron and Others, Evaluation is a broader term than measurement. Evaluation not only is concerned with the determination of learning results but it also involves value judgement of the desirability of these results. 

It is a continuous process in which various techniques of testing or measurement can be utilized. Evaluation is a cooperative activity in which the principal, the teacher, the pupils and the parents participate. 

E) H.H Remmen and N. L Gage point out, "It is the felt need that has caused the shift from the term 'measurement' implying mathematically precise menstruation of knowledge to the term 'evaluation' which widens the areas to be studied to include subjective opinions and qualitative changes as well as objective and quantitative changes to include changes in attitudes, appreciation and understandings as well as acquisitions of knowledge and skills." 

F) Prof. Adediran A Taiwo (1995) distinguishes measurement and evaluation in these words, "While measurement is concerned with only the amount, quantity or frequency of a variable, evaluation matches such an amount, quantity or frequency with relevant criteria for the purpose of making some value judgement about the measured or the observed amount. 

In essence, the term evaluation involves both quantitative and qualitative description of events, behaviors, things, parameters, variables as well as value judgement of things or events being described. It therefore, follows that any time one talks of evaluation, be it in the realm of achievements of students. 

The effectiveness of a teaching method or the appropriateness of curriculum, one is concerned with both numerical and verbal description as well as value judgement of what is being described."

G) Evaluation is a comprehensive and continuous process, which covers every aspect of an individual's achievement in the educative program. It is an integral part of education in which students and teachers are partners. It signifies a wider process of judging student’s progress, in various aspects. 

On the other hand, measurement implies only a precise quantitative assessment of instructional outcomes. 

Evaluation is integrated with the entire task of education and not only with examinations, tests and measurement. 

Evaluation encompasses tests and measurement but also goes beyond them. 

Evaluation depends upon measurement but is not synonymous with it. 

Measurement is a quantitative determination of how much an individual's performance has been, while evaluation is a qualitative judgement of how good or how satisfactory an individual's performance has been. 

Measurement describes a situation; evaluation judges its worth or value. 

Measurement is only a tool to be used in evaluation. By itself, it is meaningless, but without it evaluation is likely to be of little significance. 

Sound evaluation is based upon the results of accurate and relevant measurement. It is also to be remembered that not all uses of a test or measurement in education can be considered evaluation, for evaluation is always in the light of some particular goal, purpose or value.

Evaluation is not only quantitative but also qualitative and includes value judgement. Mathematically it may be said that:

Evaluation= Measurement (quantitative description of students' achievements) + Qualitative description of students' abilities + value judgement about student’s achievements and abilities.

Difference Between Measurement and Evaluation

H) The difference between evaluation and measurement may be explained with the help of following examples:

  • A teacher measures Aslam's height to be 180 cm. He evaluates his height when he says that he is 'long'. 
  • A teacher measures Ali's achievement in Economics to be 50%. He evaluates his achievement when he says that Ali's achievement in Economics is 'satisfactory'. 
  • A teacher measures the size of a classroom and finds that it is 4mx3m. He evaluates the classroom dimensions when he reports that the classroom is 'too small' for 40 students.
  • Aslam and Ali study in the same class. In the first test they obtain 50 and 70 marks respectively in English. In the second test, both of them obtain 80 marks. Now in the second measurement (test scores), achievement in English is the same, yet the evaluation will differ. When the teacher states that the rate of progress of Aslam is comparatively better than that of Ali.

Measurement helps in evaluation. This may be clarified by taking example as following: 

Aslam and Ali study in the same class. They take two tests. In the first test, they obtain 45 and 65 marks respectively in Civics. In the second test, both of them obtain 80 marks. 

Now, in the second test, the measurement (test scores) of their achievement in Civics is the same, yet the evaluation will differ when teacher says that the rate of progress of Aslam is comparatively better than that of Ali. 

5. Summary

Measurement is principally concerned with quantitative descriptions of student achievement. Unlike evaluation, it does not imply judgements about the worth of an educational program. 

Measurement involves the assigning of numbers that represent the amount of a property possessed (that is, value) by an object or system. Scales associated with measurement include nominal, ordinal, interval and absolute. 

Testing is measuring device concerned with specific achievement of a student in terms of given objectives. 

Evaluation, on the other hand, deals with finding out as far as possible the worth of a process, system or program. When, on the basis of test results, a teacher decides on what should be done to improve the outcomes of instruction, he is assuming the role of an evaluator. Thus there is a continuous interplay between testing or measurement and evaluation. 

Measurement and evaluation play an important role in the instructional program of the school. Basically, they provide information that can be used in a variety of educational decisions. The main emphasis in classroom evaluation, however, is on decisions concerning pupil learning and development. 

From an instruction standpoint, evaluation may be defined as a systematic process of determining the extent to which instructional objectives (i.e., intended learning outcomes) are achieved by pupils. 

The evaluation process includes both measurement procedures (e.g., test) and non-measurement procedures (e.g., informal observation) for describing changes in pupil performance as well as value judgement concerning the desirability of the changes.

The process of evaluation is likely to be most effective when guided by a set of general principles. The principles emphasize in importance of:

1) clearly specifying what is to be evaluated,

2) selecting evaluation techniques in terms of their relevance, 

3) using a variety of evaluation techniques,

4) being aware of their limitations, and

5) regarding evaluation as mean to an end, and not an end in itself.


Post a Comment

0 Comments