Quality Control
Metric Quality of Tests
The evaluation of learning is done by using tests and questions (i.e. “items”). The results obtained after administration of a test will be used to take various decisions. These may promote a student, select for admission to a school or employment sector; the results can also help to plan school intervention programs. We could list many other uses of the results obtained following an evaluation of learning. Also, it is necessary that the instruments used to gather information respect high standard of quality.
There are statistical analysis techniques to assess the metric qualities of a test. We will make the presentation of a few. In addition, we recall some concepts related to the presented techniques. We will address the items analysis in relation to their metric quality and no bias penalizing certain subgroups of evaluated students.
A. Items Analysis
It is important to consider the qualities of a measuring instrument and items that are used. It is the statistical process that is usually mentioned under the term items analysis. Several techniques can be used to ensure the medic quality of a test. We briefly describe what is meant by classical item analysis and techniques based on the theory of item responses.
1. Classical items analysis
The classic items analysis is associated to multiple choice answers or ones on continuous scale or partial credit. It allows estimating the metric quality of each item belonging to a test. This type of analysis enables the estimation of the metric quality for each item that is part of a test. Three sets of statistical characteristics are usually retained.
EduStat software allows for such analysis. Thus, for each multiple choice item of a test, a report shows the number of individuals, the proportion of those who succeed the item, the coefficient of item / test correlation. It is also possible, in parallel, to obtain other statistics (e.g., success percentages of those who answer correctly the item and those who failed, the Alpha coefficient recalculated excluding the examined item, statistical discrimination according to each choice of response, the grouping of items relative to the specification table, the description of each item). Moreover, for each item witch the answers are on, a continuous scale (partial credits), the report provides the number of individuals, the minimum and maximum scores observed, the average obtained, the expression of that average on a scale 0 - 100 and the correlation coefficient item / test. For all of the test and compared with each grouping of items, the report provides internal consistency coefficient in the form of Cronbach's Alpha, the standard error of measurement and standard deviation.
2. Items analysis using “Item response theory”
The item response theory is a set of statistical techniques whose aim is to assess the quality of a measuring instrument and items that make it up. It is possible to retain models taking into account one or the other next parameter (or all three):
There are specialized software to perform these calculations. EduStat allows the preparation of files that can be used by such software. Furthermore, the EduStat software can use the results calculated by other software (for example, XCalibre) for tracing curves illustrating the results obtained to the analysis performed for each item of an event.
B. Analysis of Differential Item Functioning (DIF)
It is important to ensure that some issues part of an evaluation instrumentation do not favor a subgroup of students at the detriment of others. This is the way of examination that may affect several socio-economic and cultural dimensions. We may want to consider if there is presence of an angle to two subgroups of students (for example, boys and girls) or more (e.g., administrative regions). The analysis techniques will be different for each of these situations. The items identified as "biased" from the value of the indices calculated should be subtracted from the test or modified.
1. The "omnibus" analysis
The technique called "Omnibus" allows the calculation of indices in relation to several groups simultaneously. The produced report contains a set of statistics for each item selected for analysis. In addition, a graph illustrates the suitability of the theoretical distribution to empirical distribution that evaluates the coordinates of rank values (axis "X") and DR values (axis "Y"). A good fit between the values "rank" and "DR" is obtained by a graph in which the points (x, y) draw a straight line. Any point deviating significantly from this straight line is interpreted as a potentially marginalized item.
2. The analysis using two groups
If we have to examine the differential item functioning comparing both groups, Mantel-Haenszel technique can be used. The report produced by EduStat software as a result of the use of options for calculating the coefficients "M-H" contains the following:
Here are some notes for interpreting the statistics produced in the differential functioning of the item analysis by the Mantel-Haenszel method.
The items identified by the label "C" are those that may have biases affecting their functioning. The absolute value of Delta is equal or greater than 1.5 and is significantly different from 1.
As for the least likely to be biased items ("A" category), the absolute value of Delta is less than 1 or is not significantly different from "0".
Other items whose absolute value is between 1 and 1.5 are found in the "B" category and may be slightly biased.
A Delta positive value indicates that the item was harder for students belonging to the reference group. A negative value therefore identifies a most difficult item for the group compared to the reference group.
Email address: info@mcpythagore.com