Click
here to download zip file![]()
Test and Item Analysis
|
|
When you go
to the market to purchase a commodity. Which shop will you have more on-one that uses
stones as weight measures or the one which uses certified weights ? Obviously the latter,
because you do not want to get cheated. Something similar is the case with examinations.
The questions and tests that we use are like measures which we compare the knowledge
possessed by the students. What will happen if this measure is not standardised - a
student will get either more or less marks than he actually deserves. This harms the cause
the harm of learning in more than one ways - on one hand, we are eroding the faith of the
society in the system and on the other, we may be incompetent doctors. One of the ways to
overcome this problem is to use standardised tests by undertaking what is called test
and item analysis. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| reparing
for item- analysis |
The next step is to break this distribution in 2 groups i.e. Higher ability group (HAG) and lower ability group (LAG). If the number of students is upto 50, the groups will include 25 students each but if it is large, say 200, then you should include 30% top and 30% bottom students respectively in the two groups. Now, for each question, count the number of students ticking option a, b, c or d as the case may be, in each of these two groups. For example, a test was administered to a group of 50 students and divided into HAG and LAG. For question no. 1, the distribution of options could be something like this :
Once we have this information available about all questions, we proceed further to calculate the indices related to each. Facility Value (FV) : Simple stated, FV means, number in the group answering a question right. If 60% of the group answers the question correctly, then FV will be 60%. FV can be calculated by the formula :
Coming to the previous example, FV will be :
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| FV |
FV is a
measure of how easy or how difficult a question is Higher the FV, easier is the question.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Item Analysis for non-MCQs Uses |
Adapted from J.J.Guilbert, 1992 At this, stage we would also like to introduce you to another term, which is called negative discrimination. imply stated it means that more LAG students are answering the question right as compared to HAG students. Look at the following distribution :
We shall revert to
negative discrimination, when we talk about uses of item analysis.
For other objective type questions, the options can be arranged as 'a' (correct answer) and 'b' (wrong answer) and by dividing the students. FV and DI can be calculated by using the following formula :
You may be wondering as to what purpose is being served by undertaking these calculations. Item analysis helps in detecting specific technical flaws in the question and provides information for improvement. It increases the skill of examiners in item writing. It provides information for class discussion of results. It helps students to improve their learning and teachers to know about common misconceptions of the class. Let us elaborate on some of these points.
Question No.
This indicates that the subject area related to objective of item 1 is well known to the students and does not need too much time. Subject related to item 2 has been well taught and has been understood by most of the students. on the other hand, students were rightly answering item 3 before teaching but after teaching, they have given wrong answers. This indicates that either the question has been properly worded or else the teaching has not been proper. (c) For tests which are employed for the purpose
of selection, we prefer items with a high DI. As already stated, an item can have a
maximum DI of 1.0 but this is difficult to attain. For practical purposes, an item with a
DI of 0.35 or more is considered good while DI between 0.2 to 0.34 can be considered
acceptable. Items with DI less than 0.2 need to be revised.
The DI for the question will be :
Now suppose, by mistake, the key is marked as 'a' in place of 'c'. In that case, the DI
will become :
Also, a brilliant student who may have read a very recent reference quoting a figure of say 52, will tick option 'd'. Thus, test and item analysis will give a clue to a wrong key and prevent injustice to many deserving candidates. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Reliability of the Test : |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Do you recall our discussions on reliability ? We had discussed about the various types of
reliability. The one we are going to discuss here in detail in the internal consistency
of the test. The internal consistency is calculated by dividing the whole test into
odd and even numbered items and hence the method is also called split half
method. The following example will illustrate, how we calculate the reliability.
(E indicates summation)
This gives the reliability of half test as 0.03, which can be converted to reliability of the full test by using the Spearman formula.
This indicates that the reliability of the test is poor. One of the prime reasons for getting such a low reliability is the less number of items on the test. From standard statistical tables, you can find the figure of reliability which would be statistical acceptable. For a group of 100 students, this value is 0.27. How do you attain this - by increasing the length of the test. Use the following formula : |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Split-half method | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Building Internal Consistency |
( r you want ) x ( 1- r you got )
0.27 x ( 1 - 0.05 ) 7.11 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| New - items |
It means that to have an acceptable reliability, you should have a test 7 times longer or in other words, of 70 items of similar level of FV. *Standard error of measurement (SEM) : It is a concept related to reliability of test. SEM depends on the number of items in a test and is calculated by the formula. SEM = .4 Ön Where n is the number of items in a test. For example, SEM for a test of 20 items would be : =.4 Ö20 SEM for a test of 100 items will be : = .4 Ö100 Does it sound odd that longer a
test, higher is the SEM ? But if you look at it this way that for 20 items, of 1 mark each, SEM
represents approximately 9% while for 100 items of 1 mark each, SEM represents only 4%. What do
these figures mean - it means that just like standard deviation, 2/3rd of the students would have
got 1 SEM marks higher or lower than they actually deserved and 95% of the marks would fall
between + 2 SEM. We have already emphasised that before we actually use a test, we must have the date related to each item available ; you may be wondering, if you have written a few new items, how will you have these figures. Well, one of the ways to calculate various indices related to these questions is to give them a trial run. Thus, in an actual test situation, the first 20 questions can be new questions. The students answer them, they are marked on them but scores obtained on these 20 questions are not used for computing the results. They are used only for calculating the FV and DI and only when a question has been found to have a satisfactory level of FV and DI, is is used in the actual test situation. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||