Rejecting Doubtful Data Values Guide
Rejecting Doubtful Data Values Guide
Suspicious points
If a dataset contains a value that is appreciably different from
all the others, there is a great possibility that I am wrong, and that it is
the result of a big error. One must choose to preserve or reject it
value. If an erroneous value is preserved, the average of the data, and also the
the standard deviation of these will be distorted. On the other hand,
naturally there is the possibility that the questionable value is valid in
reality, and just be unexpected; in this case it may be that the
the precision of the analytical procedure is less than expected. It must be taken into account
be very careful, since if a valid data is rejected, it will introduce a bias (or
prejudice) in the data.
The Q test
If there is a suspicion that there is a questionable point, the Q test allows calculating a
quotient, 'Q'exp, and compare it with a table, to decide whether to reject or
preserve the value. The test does not yield a definite result, but it gives some
idea of trust that can be associated with rejecting a data value. It
calculate with the equation:
x−
q x n
Qexp =
x−
h x 1
where xqrepresents the doubtful value, xnit is the nearest neighbor value, xhit is the
data with maximum value and x1it is the data with the minimum value.
Having calculated the value Qexp we must compare it with the values of the
table corresponding to the amount of replicated data that was measured. If
Qexpis less than all the values in the table, they cannot be rejected
data with the certainty indicated in that table. If Qexpis greater than a value of Q
that appears in the table, that value can be rejected (at least), with the
certainty associated with the quotient Q that the table shows. Frequently,
a value of Qexpit falls between two values, and in this case the data can be
reject with certainty between the two values that appear.
No. Of
Rejection with 90% Rejection at 95% 99% rejection
measures
reliable reliable trustworthy
replicated
3 0.941 0.970 0.994
4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.598
10 0.412 0.466 0.568
Example
A series of replicated water content measurements in a sample of
ethanol, for the Karl Fischer method, had the following results:
0.71%
0.65%
0.68%
0.72%
0.91%
Solution:
Calculate Qexpand compare with table Q
xq0.91% is the questionable value
xn 0.72% is the closest neighboring value
xh0.91% is the maximum value of the data
x10.65% is the minimum value of the data
compare Qexpwith the Q test table, with the appropriate values that
correspond to five data points:
Qexp= 0.73
The Q values for five data points are 0.642 if no data point is to be rejected.
with 90% confidence, 0.710 to reject data with 95% confidence and
0.821 to reject them with 99% confidence:
(x−q x n )
T=
n
s
where xqit is the questionable value in question andxit is nthe value of the neighboring data more
close
In this case, the value of Tn it is also compared with a standard table of the
t-test, for the appropriate amount of replicated measures.
Example:
If the data from the previous example is used for repeated measures of
water content in an organic solvent, how confident can one be
reject the value e)?
f) 0.71%
0.65%
0.68%
0.72%
0.91%
s = 0.10% H2O
x= 0.73% H2O