RP-03 Field: Mass Calibration Performance Monitoring

N.Dupuis-Désormeaux, Senior Engineer - Gravimetry
January 2003

Amendment, in PDF format, 129KB

Table of Contents

1. Abstract

2. Monitoring Method - When to update the “base” values

3. Data Required for Analysis

3.1 Tolerance Testing (n = 1) and Comparative Weighing (n > 1)

3.1.1 Group Mean
3.1.2 Group Variance

3.2 Computing the base values

3.2.1 Process Global Mean
3.2.2 Process Variance

3.3 Procedure for Gathering Data

3.3.1 Initial Evaluation (Initial 15 Days Test)
3.3.2 Step by Step Procedure

4. Statistical Analysis

4.1 Comparison of Means Obtained from Two Groups of Data

4.1.1 The T-test
4.1.2 The Z-test

4.2 Comparison of Variances from Two Groups of Data

4.2.1 The F-test

5. Performance Criteria

6. Summary - At a Glance

7. References

Annex A: Percentage Points of the T Distribution for the T-test

Annex B: Cumulative Normal Distribution for the Z-test

Annex C: Percentage Points of the F Distribution for the F-test

Annex D: Background Information - Extracts of References

Abstract

In recent years, quality assurance and quality control mechanisms have not only gained popularity within the private sector but within the public sector as well. This increase in process monitoring needs has prompted the development of more advanced statistical evaluation techniques.

When a process is monitored for conformance to fixed statistical limits based on a history of experimental data, the process properties themselves, such as the process mean and variance, must be re-evaluated periodically.

For example, to monitor mass calibration activities, two known standards are initially compared over a period of a minimum of fifteen days, after which, the mean of their differences and the associated variance are calculated, these parameters are referred to as the initial "base" values for the process global mean and process variance. After an additional fifteen days of data has been accumulated, a new group mean and variance for the comparison are computed. If, since the last base values were established, one of the standards has been inadvertently dropped or scratched, or the device used to measure this difference has suffered damage, then the new mean difference between the two standards, computed from the group of data for the additional fifteen days, will likely differ significantly from the previously calculated value (base value of the process global mean); this variation is an actual change in the process properties and past data is no longer relevant. To differentiate statistical variation from real process variation we perform the statistical tests proposed herein. These statistical methods are widely used to evaluate if data obtained from two experiments are statistically different or if they can be combined to form one larger pool of values; or in statistical terms, to evaluate if the two samples have been drawn from the same population. In other words, these methods are employed to evaluate significant changes in process properties and to determine when new data can be combined with existing data.


To ensure that the process remains in statistical control, the process global mean and the process variance must be monitored and updated regularly. The frequency of this verification is dependent upon operational priorities, but as a minimum, entering values for each check standard should be performed once per month.


Monitoring Method

When to update the "base" values

The process global mean (µ) and process uncertainty u(Di) base values (see section 3 for definitions) for a given mass standard cannot be updated until new points accumulated are in equal number to those used to calculate these previous base values and until the statistical tests show that the samples can be combined. For example, looking at the base value for the process uncertainty, if an initial uncertainty is computed after fifteen days, call it u1(Di), then u1(Di) is the initial base value for the process uncertainty u(Di). In this case, an additional fifteen days of testing is required before the base value can be changed. Hence, the second group uncertainty, call it u2(Di) must be calculated from at least fifteen new sets of values. Once the statistical analysis presented in this recommended practice (RP) is performed, it can be decided if both groups of data can be combined. If they can be combined, then the new base value for the process uncertainty is calculated from the combination of the thirty sets of values. Now this new base value can only be modified once an additional thirty sets of values are compiled and the corresponding u3(Di) is computed and statistically compared to the base value. This evaluation continues until 120 points are accumulated for the base value, after which point, any new uk(Di) must be calculated from at least 120 new points. In this case, the mean and variance are compared as usual; however, although all points are retained for the process uncertainty base value u(Di), only the last 120 points are retained for the computation of the process global mean base value (µ).

If the total new points accumulated are greater or equal in number to those used to calculate the previous base values for a given standard, and if the statistical tests presented in this RP show that data cannot be combined, the cause of this discrepancy shall be identified and documented. This can be explained by the fact that at least one element of the process can now be considered to have changed sufficiently to warrant the use of the updated values. Please note that this is also the case when any of the standards used in establishing the process global mean and process uncertainty (or process variance) at a given nominal value is re-calibrated as this will affect the base value of the process global mean. Likewise, should a mass comparator be repaired or replaced, this will have an effect on the base value of the process uncertainty. In both these cases, the existing base values for the affected range of nominal value(s) shall be noted and a statement explaining the modification or recalibration shall be entered in the Equipment Performance Log Book. Please note that, in this case, the new base value for the process global mean shall be set to the value of the most recent group mean obtained. However, the old base value for the process uncertainty shall continue to be used for the calculation of the combined uncertainties until a new group of 15 days of data is accumulated. At this point, new data shall be gathered for a minimum of 15 days and the values evaluated and updated as discussed previously.


Even if insufficient points have been accumulated to warrant a change to the process global mean and process uncertainty base values, it is crucial to note the values for each set as well as for each group and ensure that the minimum performance criteria of section 5 are met.


Data Required for Analysis

3.1 Tolerance Testing (n = 1) and Comparative Weighing (n > 1)
3.1.1 Group Mean
3.1.2 Group Variance
3.2 Computing the base values
3.2.1 Process Global Mean
3.2.2 Process Variance
3.3 Procedure for Gathering Data
3.3.1 Initial Evaluation (Initial 15 Days Test)
3.3.2 Step by Step Procedure


The base values of the process global mean (µ) and process uncertainty u(Di) involved in the calibration of weights are initially estimated over a minimum of fifteen days, where comparative weighing is performed every day using the same standards called check standards. Please note that the process global mean and process uncertainty must be calculated for each nominal value (or range of values), for each method/equipment employed. For example, u(Di) will be different if an equal arm balance is used rather than an electronic mass comparator.

If it is not feasible to monitor the process global mean (µ) and process uncertainty u(Di)for each nominal value, ranges of nominal values can be monitored. The usual and most representative method for grouping is based on which devices are used for the calibration. In this case, one or more relative u(Di) values are computed for given ranges of masses to be calibrated on a given mass comparator or balance.

Check standards are compared to the district standards by means of comparative weighing. Please note that district standards cannot be used as check standards because the system would be redundant as we would be comparing masses to themselves. The check standards should be of the same nominal value and accuracy class as the weights involved in the process being monitored. In other words, for monitoring field calibrations we need check standards of the same class as the field working standards.


The check standards should be dedicated to this end and be clearly identified as such by, for example, being painted a different colour from the other weights and stored in a location ensuring they remain undamaged and clean. The process global mean µ and process uncertainty u(Di) should be continuously monitored and adjusted using the check standards. An Equipment Performance Log Book should be produced.

Process Global Mean (µ)

The base value of the process global mean (µ) is initially established over a minimum of fifteen calibration days (not necessarily consecutive), where comparative weighing with “n=4” comparisons (see section 3.3.1) is performed once per day for each day of calibration. This comparison establishes the difference in mass between a check standard and a district standard. The base value for the process global mean (µ) is initially set equal to the first group mean (µk=1) computed. Subsequently, after sufficient points have been accumulated, a new group mean is computed, the statistical tests in this RP are performed and a new base value for the process global mean is established, if warranted (see section 3.2.1 for more details).

Process Uncertainty u(Di)

The base value for the process uncertainty u(Di) is initially set equal to the first group uncertainty uk=1(Di) computed. The group uncertainty uk(Di) is derived from the observations made in computing the group mean (µk). Subsequently, after sufficient points have been accumulated, a second group uncertainty u2(Di) is computed, the statistical tests in this RP are performed and a new base value for the process uncertainty u(Di) is established, if warranted (see section 3.2.2 for more details).

As discussed in RP-02FIELD: Determination of Mass Calibration Values and Related Uncertainties, the process variance u2(Dt) for a given mass - likewise its process uncertainty u(Dt) - is a function of two main components: short-term variance (S2i), also called within group variance; and long-term variance u2(LTi), also called between groups variance.

u2(Dt) = (Si)2 + u2(LTi)

The short-term variance (S2i)

The short-term variance (S2i), also known as the repeatability, is based on the individual results of the calibration of the given mass and should be entered directly into its equation for the process variance u2(Dt) at the time of calibration.


Therefore, the short-term variance (S2i) portion of the Process Uncertainty u(Dt) for a given mass is NOT monitored and should be entered into the equation at the time of calibration of the given mass.


The long-term variance u2(LTi)

The long-term variance u2(LTi), also known as the reproducibility, is a measure of the long-term performance of devices, methods and environment involved in calibrating a given mass. Since it is most often impractical or impossible to obtain long-term data on each mass calibrated, the behaviour of a Check Standard of equivalent nominal value is monitored instead. Hence, u2(LTi) of a given mass is estimated by tracking the variations in the mass difference obtained when comparing a Check Standard to a District Standard and establishing u2(Di). This value IS CONTINUOUSLY MONITORED and adjusted, as described in this recommended practice.

Please recall that u2(Di) for a Check Standard must be calculated for each nominal value (or range of values) and for each method employed. Therefore u2(Di) is continuously updated as described further and is downloaded directly into the calculations for the process variance of the mass being calibrated u2(Dt) as per RP-02Field: Determination of Mass Calibration Values and Related Uncertainties.


Therefore, the long-term variance u2(LTi) portion of the Process Variance u2(Dt) for a given mass is approximated by using the base value for the process variance obtained with the check standard u2(Di) ; this value IS monitored and should be downloaded into the equation for u(Dt) at the time of calibration of the given mass.


3.1 Tolerance Testing (n=1) and Comparative Weighing (n>1)

Tolerance testing (and single comparison weighing) is when the measuring process only requires one comparative weighing; i.e. n=1 from section 3.3.1. In this case, because only one comparative weighing is performed, there is only one value in a daily set.

When section 3.3.1 shows that more than one reading is required we call this comparative weighing with “n” comparisons. In this case, there are “n” values in each daily set.

3.1.1 Group Mean µk

The group mean computed for the check standard when compared to a district standard for M sets of values is calculated as follows:


Group Mean µk

Equation 1

xζ = is the mass value mt computed for the Check Standard obtained on day (set) ”ζ”. If more than one reading of the Check Standard is required to be performed (n>1) in order to meet the performance requirements, the average value is retained as xζ.
M = number of points in that group; i.e. number of monitoring days for that Check Standard for this group of data.
µk = is the group mean of all xζ values obtained for the check standard over the total number ‘M’ of days.


3.1.2 Group Variance u2k(Di)

Recall from RP-02Field that the Process Uncertainty u(Dt) for a given mass is a function of two main components: short-term variance (S2i) and long-term variance u2(LTi). The short-term variance is based on the individual calibration of the given mass and the long-term variance is estimated with the base value of the Process Variance u2(Di) for the Check Standard of the same (or similar) value as the mass being calibrated. In other words,

u2(Dt) = (Si)2 + u2(LTi)

u2(Dt)≈ (Si)2 + u2(Di)

Now, as discussed previously, the base value of the Process Variance u2(Di) is initially established with the first Group Variance u2k=1(Di) , then monitored, and then updated as described in 3.2.2.

This implies that, initially, we set:

u2(Dt)≈ u2k-1(Di) = u12(Di)

Each time a Group Variance u2k(Di) is to be calculated, with either tolerance testing, single comparative weighing or comparative weighing with “n” comparisons, we use the following:

(3.1.2)


Group Variance u2k(Di)

Equation 2

xζ = is the mass value mt computed for the Check Standard obtained on day (set) ”ζ”. If more than one reading of the Check Standard is required to be performed (n>1) in order to meet the performance requirements, the average value is retained as xζ.
M = number of points in that group; i.e. number of monitoring days for that Check Standard for this group of data.
µk = is the group mean of all xζ values obtained for the check standard over the total number ‘M’ of days


3.2 Computing the base values

After performing the statistical tests presented in the following sections, if it is decided that the data for a newly computed group can be combined with the existing base values for the process global mean and the process variance, the following equations are used.

From the Central Limit Theorem, we know that the Process Global Mean µ will be equal to the average of the Group Means. This implies that the average of all points (in the population) will be equal to the average of the Group Means. Further, the Central Limit Theorem says that the standard deviation obtained when comparing the Group Means for groups of ‘n’ points taken from the population will be equal to the standard deviation of the population divided by the number ‘n’ of points in each group.

3.2.1 Process Global Mean µ

Using the Central Limit Theorem, we have:

(3.2.1)


Initially: µ = µk=1

Subsequently:

µ = (Central Limit Theorem) = (∑ xζ ) / N

µ = is the base value of the Process Global Mean of all µk values obtained for that Check Standard for ‘N’ days.
xζ = is the mass value mt computed on each day ‘ζ’ for the Check Standard. If multiple readings are required to be performed each day, then the average of the daily readings is the value of xζ.
N= is the total combined number of days for which we calculated a value of the Check Standard. In other words, it is the sum of the number of points in the previous group to the number of points in the new group.


3.2.2 Process Variance u2(Di)

Using again the Central Limit Theorem, we have:

(3.4.2)


Equation 3

u2 (Di) =the base value for the Process Variance of the Check Standard ‘i’.
xζ =is the mass value mt computed on each day ‘ζ’ for the Check Standard. If multiple readings are required to be performed each day, then the average of the daily readings is the value of xζ.
µ =is the base value of the Process Global Mean
N=is the total combined number of days for which we calculated a value of the Check Standard. In other words, it is the sum of the number of points in the previous group to the number of points in the new group.


3.3 Procedure for Gathering Data

3.3.1 Initial Evaluation (Initial 15 days test)

As discussed at the beginning of section 3, the base values of the Process Global Mean (µ) and Process Uncertainty u(Di) involved in the calibration of weights are initially estimated over a minimum of fifteen days.

This is done by comparing the District Standards to Check Standards by performing comparison weighing to replicate calibrations that take place within district offices.

The District Standards and Check Standards are initially compared over a period of a minimum of fifteen days, after which, the mean of their differences and the associated variance are calculated, these parameters are now said to be the initial “base” values for the Process Global Mean and Process Variance.

The initial 15 day test is performed with comparative weighing with n=4 repeated readings. This is to establish if the required performance can be met by simply performing one comparative weighing, or if more than one reading is necessary. The results are then analyzed as if, when calibrating, 4 readings will always be performed, 3 readings will be performed, 2 and then when only looking at the first daily comparative reading (which is equivalent to single comparative weighing).

In other words, for comparative weighing, the Process Global Mean and the Process Variance are first said to be equal to, respectively, the Group Mean and the Group Variance for the initial 15 day test as if n=4, n=3, n=2 and n=1. After this is done, we compare the results to the performance monitoring criteria in section 5 and establish how many readings need to be performed for each comparative weighing such that the performance requirements are met.

Once the Process Uncertainty (square root of the Process Variance) has been determined for n=4,3,2,1 we apply the following criteria to determine the number of mass comparisons required to verify masses against given tolerances.

The maximum variability of the calibration or tolerance testing procedure performed on a given device and at a given nominal value is estimated by the total dispersion in the data obtained for a given Check Standard. Based on the standard normal Z statistic, µ ± Z σ will contain a given percentage of points of the population; for example: µ ± 3 σ will contain 99.73% of the points. Since the process uncertainty u(Di) is in fact the standard deviation σ for the population of values gathered for the Check Standard, we have that, µ ± 3 u(Di) will contain 99.73% of the points. In other words, in 0.27% of the cases, a value obtained for the Check Standard will be outside of the interval µ ± 3 u(Di). In equation form, this is represented by:

Equation 4

From the Central Limit Theorem, we know that the Process Global Mean µ will be equal to the average of the Group Means. This implies that the average of all points (in the population) will be equal to the average of the Group Means. Further, the Central Limit Theorem says that the standard deviation obtained when comparing the Group Means for groups of ‘n’ points taken from the population will be equal to the standard deviation of the population divided by the number ‘n’ of points in each group.

Accordingly, the standard normal Z value is expressed as:

Equation 5

Rearranging this expression yields:

Equation 6

The above equation is really important since it implies that the difference between the Process Global Mean and any group mean f ‘n’ points will be no greater than Zσ %n.

Since we want

Equation 7

The Central Limit Theorem implies that

Equation 8

Therefore

Equation 9

Re-writing yields:

Equation 10

The above equation implies that, for comparative weighing with 'n' comparisons, we have that, if
n=1 comparative weighing: 9.00 u(Di)must be smaller than the tolerance of the mass being calibrated
n=2 comparative weighings: 6.36 u(Di)must be smaller than the tolerance of the mass being calibrated
n=3 comparative weighings: 5.20 u(Di) "
n=4 comparative weighings: 4.50 u(Di) "


Example

From the initial 15 day test, we have determined that for a given device the Process Uncertainty in the 20 kg range is u(Di) = 0.15 g.

From the above, we have that, in order to calibrate a 20 kg mass to an (OIML R111) M1 tolerance of 1 g, we need:

Equation 11

Re-writing this, we see that:

Equation 12

Therefore, in this example, we see that ‘n’ must be bigger than 1.8 and thus, two or more comparative weighings must be performed in order to meet the performance requirements. This implies that, for this example, comparative weighing with at least 2 comparisons is necessary when calibrating 20 kg masses to an M1 tolerance.

As obtained by posing n=1 in the first equation and isolating u(Di), the result implies that until we can obtain a Process Uncertainty u(Di) < 0.11g, we will need to perform at least two readings of each 20kg weight being calibrated and take the average as its calibrated value.

It should be noted that although the initial 15 day test showed that more than one reading was needed in order to calibrate the 20 kg weights, after more data is gathered (at minimum another 15 days) the performance is again evaluated. If the statistical analyses presented in this document show that both groups of 15 days of data can be combined, then the Process Uncertainty is calculated from the combined 30 days of data. If this new base value of the Process Uncertainty meets the criteria (smaller than 0.11 g for this example), the calibrations of the 20 kg masses can henceforth be performed with only one comparative weighing.


3.3.2 Step by step procedure

The substitution weighing procedure in RP-01Field is performed to gather necessary data to evaluate the process global mean and process uncertainty.

The process global mean and process uncertainty are obtained by performing ‘n’ (where ‘n’ is as described in section 3.3.1) comparative weighings between the check standard and the district standard of the same nominal value.

One set of ‘n’ comparative weighings is performed each day of data gathering and never two sets on the same day. This is initially done for 15 days and subsequently for the number of days necessary, as described in section 2.

As seen in RP-02FIELD, the value of the check standard for each comparative weighing is obtained in the same way as a test weight would be, with:


mct = ((ρt - ρac) / (ρt - ρat)) { (mcRR - ρat)/(ρR - ρac)) + (Dt ρc / ( ρc - ρac)) }

All terms are expressed int the same unit of mass


mct = conventional mass of unknown (check standard)
mcR = conventional mass of reference (district standard)
Dt = calculated difference between check standard and district standard
ρt = density of the check standard (kg/m3) - usually provided by manufacturer; an estimate can be made by knowing the material and class
ρR = density of the district standard (kg/m3) - usually found on calibration certificate for the reference; an estimate can be made by knowing the material and class
ρat = air density at the time of the calibration (kg/m3); the formula provided in section 4.2 of RP-02FIELD: Determination of Mass Calibration Values and Related Uncertainties can be used
ρac = conventional air density; equal by definition to 1.2 kg/m3
ρc = conventional mass density; equal by definition to 8000 kg/m3

It can easily be seen that only* when the actual air density (ρat) is close to the “conventional” value of the air density (ρac), i.e., ρat≈ ρac≈1.2 kg/m3, can we assume that:

mct = { mcR + (Dt ρc / ( ρc - ρac)) }
mct mcR + Dt
mc étalon de contrôle = mc étalon de district + Dt

* This will be the case if the temperature is close to 20EC, the relative humidity close to 50%, and the barometric pressure close to 101.4 kPa.

When using an electronic mass comparator, the calculated difference in mass Dt between the check standard and the district standard is :

Dt = SR (I check standard - I district standard)

Where :

I check standard = indicated reading for the check standard
I district standard = indicated reading for the district standard
SR = Sensitivity Reciprocal at that nominal value

When using an equal arm balance, the calculated difference Dt is the calculated as follows:

Dt = SR (CRP check standard - CRPdistrict standard)

Where :

CRPcheck standard = calculated rest point for the check standard
CRPdistrict standard = calculated rest point for the district standard
SR = Sensitivity Reciprocal at that nominal value

Statistical Analyses

4.1 Comparison of Means Obtained from Two Groups of Data
4.1.1 The T-test
4.1.2 The Z-test
4.2 Comparison of Variances from Two Groups of Data
4.2.1 The F-test

4.1 Comparison of Means Obtained from Two Groups of Data

In statistical terminology, when two means are compared in order to determine if they are approximately equal, this is referred to as testing the hypothesis that the two means are equal, also called the null hypothesis. If the hypothesis is true, the means are considered equivalent and may be combined if the variances of the samples are also equivalent (this will be discussed in section 4.2); if the hypothesis is false, the samples cannot be combined for they are judged as representing different events and only the latter set of data can be used until another sample of the same size is obtained and this analysis repeated.

When there are less than thirty points in each sample, a T-test must be used. If each sample has at least thirty points, then the Z-test is employed. Because, the Z-test is simpler than the T-test, it is usually the preferred method.

4.1.1 The T-test

(4.1.1)


Equation 1


The T statistic is calculated from the table Percentage Points of the t Distribution (see Appendix A) and is evaluated at (n1 + n2 - 2) degrees of freedom and at the desired level of significance ∝. It should be noted that a two-sided null hypothesis must be considered. This implies that should the null hypothesis fail, this failure would indicate not only that the second mean can be greater than the first, but can also be smaller; in other words, both sides of the statistical distribution must be observed. In this case, α is calculated as follows:

∝ = (1 - c.l.) ÷ 2
where c.l. is the desired confidence level

The other symbols used in equation 4.1.1 are:

X1= mean of sample 1. In our case, this will be the base value for the process global mean µ
S1= standard deviation of sample 1. In our case, this will be the base value for the process uncertainty u(Di)
n1= number of points in sample 1. This will be the total number of points N that were used to calculate the base value of the process global mean

X2= mean of sample 2. In our case, this will be the new group mean µk
S2= standard deviation of sample 2. In our case, this will be the square root of the new group variance uk2(Di)
n2= number of points in sample 2. In our case, this will be the number of points in the new group k

Example:

For an initial 16 days, two 20 kg mass standards are compared using a given method and their difference is noted. From these 16 comparisons, a group mean µ1 and a group variance u12(Di) are calculated. The group mean µ1 for the difference is 128.5 mg, and the group uncertainty (square root of the group variance) is 81 mg. As explained in section 3.3, we initially set the base values equal to the first group values. This means that the base value for the process global mean is set at µ = 128.5 mg , and the base value for the process uncertainty is set as u(Di) = 81 mg.

For an additional 26 days, the same mass standards are again compared. This time, the group mean µ2 is calculated as 114.9 mg and the group standard deviation u2(Di) is 36 mg.

Can these two sample means be considered equivalent with a 95% confidence level ? Can the two samples be combined and the new process global mean and process uncertainty base values be calculated from the entire 42 data points ?

Answer:

Using the above formula for the T-test, with

X1 = µ = 128.5 mg, S1 = u(Di) = 81 mg, n1 = N = 16
X2 = µ2 = 114.9 mg, S2= u2(Di) = 36 mg, n2 = 26

Equation 2

The T value is chosen from Appendix A with (16 + 26 - 2) = 40 degrees of freedom; and where α value is: ∝= (1 - c.l.) ÷ 2 = (1 - 0.95) ÷ 2 = 0.025. Therefore,

T = 2.021 ≥? 0.748

The above equation holds and shows that there is no significant difference between the means; the null hypothesis is confirmed. However, the data cannot be combined until an analysis of the variances is performed (this will be done in section 4.2).

4.1.2 The Z-test

As mentioned at the beginning of section 4.1 above, if each sample has at least thirty points, the Z-test is used because the Z-test is simpler than the T-test.

(4.1.2)


Equation 3


The Z statistic is taken from the table Cumulative Normal Distribution (see Appendix B) and is evaluated from the cumulative probabilities, which depend on the desired level of confidence. Again a two-sided null hypothesis must be considered. Therefore the cumulative probabilities are calculated as follows:

p = (1 + c.l.) ÷ 2
where c.l. is the desired confidence level

The other symbols used in equation 4.1.2 are:

µ1*= “true” theoretical mean of population 1
µ2*= “true” theoretical mean of population 2

X1= mean of sample 1. Here, this is the base value for the process global mean µ
S1= standard deviation of sample 1. In our case, this is the base value for the process uncertainty u(Di)
n1= number of points in sample 1. This is the total number of points N that were used to calculate the base value of the process global mean

X2= mean of sample 2. In our case, this is the new group mean µk
S2= standard deviation of sample 2. In our case, this is the square root of the new group variance uk2(Di)
n2= number of points in sample 2. In our case, this is the number of points in the new group k

Above, it is mentioned that a two-sided null hypothesis is considered. This “null” statement implies that the means are equivalent and can be regarded as representing the same population; therefore the difference µ1* - µ2* is posed equal to zero.

Example:

An initial test is performed where, for 36 days, two 20 kg mass standards are compared and their difference is noted. From these 36 comparisons, a group mean µ1 and a group variance u12(Di) are calculated. The group mean µ1 for the difference is 98.0 mg, and the group uncertainty (square root of the group variance) u1(Di) is 73 mg. As explained in section 3.3, we initially set the base values equal to the first group values. This means that the base value for the process global mean is set at µ= 98.0mg and the process uncertainty is set at u(Di)=73 mg.

For an additional 41 days, the same mass standards are again compared. This time, the group mean µ2 is calculated as 141.7 mg and the group standard deviation u2(Di) is 55 mg.

Can these two sample means be considered equivalent with a 95% confidence level ? Can the two samples be combined and the new process global mean and process uncertainty base values be calculated from the entire 77 data points ?

Answer:

Using the above formula for the Z-test, with

X1 = µ = 98.0 mg, S1= u(Di) = 73 mg, n1= N = 36
X2 = µ2 = 141.7 mg, S2= u2(Di) = 55 mg, n2= 41

Equation 4

The Z value is chosen from Appendix B with p value of: p = (1 + 0.95) ÷ 2 = 0.975. Therefore,

Z(0.975) = 1.96 ≥? 2.93

The above equation does not hold and the null hypothesis must be rejected. This implies that the populations are different and that the difference between the means is so significant that the samples cannot be combined.

In this case, as discussed in section 2, since the total new points accumulated are greater than those used to calculate base values and the statistical tests fail, the new data cannot be combined to the base values. The cause of this discrepancy shall be identified and documented in the Equipment Log Book, the base value for the process global mean shall be set at µ = µ2 and the process uncertainty u(Di) shall remain unchanged until at least 15 days of new values are gathered.

4.2 Comparison of Variances from Two Groups of Data

This test compares the variances of two samples in order to evaluate if they can be regarded as representing the same population. The F-test can be performed for any sample size.

An important observation to note is that, unlike the T and Z distributions which are centered around zero (0), the F distribution is not symmetrical, i.e. not normally distributed; it is skewed to the right. Therefore, an absolute value cannot be used in the test and two limits, FL and a FH, must be calculated; the calculated value must fall within these limits.

4.2.1 The F-test

(4.2.1)


Equation 5


The F statistics are calculated from the table Percentage Points of the f Distribution (see Appendix C) and are evaluated at F(n1-1, n2-1, β) for FL,H and at F(n1-1, n2-1, ∝) for FH,L. Since the limits are set such that FL < FH, the choice of the subscript is a function of if FL,H < FH,L or if FL,H > FH,L. In other words if FL,H < FH,L, then FL,H = FL and FH,L = FH. Likewise, if FL,H > FH,L, then FL,H = FH and FH,L = FL.

The desired level of significance ∝ is calculated as follows:

∝ = (1 - c.l.) ÷ 2
where c.l. is the desired confidence level

The β value is calculated as:β = 1- ∝

The other symbols used in equation 4.2.1 are:

S1= standard deviation of sample 1. In our case, this will be the base value for the process uncertainty u(Di)
n1= number of points in sample 1. This will be the total number of points N that were used to calculate the base value of the process global mean
S2= standard deviation of sample 2. In our case, this will be the square root of the new group variance uk2(Di)
n2= number of points in sample 2. In our case, this will be the number of points in the new group k of data

Please note that F(n1-1, n2-1, β) is not directly available since tables for the F-distribution are usually only computed in terms of ∝.

That is why the relationship F(n1-1, n2-1, β) = 1/ F(n2 -1, n1-1, ∝) is used.

Example: This is the same example as in section 4.1.1.

For 16 days, two 20 kg mass standards are compared and their difference is noted. From these 16 comparisons, a group mean µ1 and a group variance u12(Di) are calculated. The group mean µ1 for the difference is 128.5 mg, and the group uncertainty (square root of the group variance) is 81 mg. As explained in section 3.3, we initially set the base values equal to the first group values. This means that the base value for the process global mean is set at µ = 128.5 mg , and the base value for the process uncertainty is set as u(Di) = 81 mg.

For an additional 26 days, the same mass standards are again compared. This time, the group mean µ2 is calculated as 114.9 mg and the group standard deviation u2(Di) is 36mg.

Can these two sample means be considered equivalent with a 95% confidence level ? Can the two samples be combined and the new process global mean and process uncertainty base values be calculated from the entire 42 data points ?

Answer:

Using the above formula for the F-test, with
S1= u(Di) = 81 mg, n1= N = 16
S2= u2(Di) = 36 mg, n2 = 26

Equation 6

The value of FL,H is calculated from Appendix C with F (n1-1, n2-1, β) = 1 ÷ F (n2-1, n1-1, ∝).
The tables in Appendix C are expressed in terms of a significance level of 2∝, thus the table Upper 5 percent points is for a 95 % confidence.

We have ∝ = (1 - c.l.) ÷ 2 = (1 - 0.95) ÷ 2 = 0.025, and thus FL,H = 1 ÷ F (25, 15, 0.025). Unfortunately the value for ν1=25 is not provided in the table and we must approximate. Since

F (24, 15, 0.025) = 2.29 and
F (30, 15, 0.025) = 2.25
then F (25, 15, 0.025) is approximately 2.28.

Hence FL,H = 1 ÷ F (25, 15, 0.025)= 1/2.28 = 0.438.
The FH,L value is chosen directly with F (15, 25, ∝) = F(15, 25, 0.025) = 2.09.
Because FL,H = 0.438 and FH,L= 2.09, then FL,H < FH,L thus FL,H = FL and FH,L = FH.

Since (81 mg)2 ÷ (36 mg)2 = 5.06, the equation for the test is:


FL ≤ ? [(S1)2 ÷ (S2)2] ≤? FH
0.438 ≤ ? 5.06 ≤ ? 2.09


The above relationship does not hold and shows that although the means could be considered as equal (by section 4.1.1.), there is such a difference between the variances that the two samples cannot be combined.

If however, the first standard deviation would have been equal to, say 50 mg, the result would be as follows:

FL ≤ ? [(S1)2 ÷ (S2)2] ≤ ? FH
0.438 ≤ ? 1.93 ≤ ? 2.09

and the above equation would hold and it would be possible to combine the samples. In this case, the new base values for the process global mean µ and process uncertainty u(Di) would be computed from the entire pool of data.

Performance Criteria

Minimum Performance Criteria: INDIVIDUAL SETS (Amendment)

1- When evaluating repeatability and reproducibility of an instrument with a given check standard, each computed value xζ for the check standard shall be contained in the interval:

[µ - (3u(Di)/√n) , µ + (3u(Di)/√n)]

In other words: any computed value for the check standard must be no more than three times the base value of its process uncertainty u(Di), divided by the square root of the number of points used to compute this value of the standard, away from the base value of the process global mean (µ) of that check standard. The base value of the process global mean (µ) is the long-term average of the check standard. Should any value obtained for the check standard be outside the interval above, all calibration/tolerance testing of masses in that range must cease until the problem is rectified.

If a calculated value for a check standard is more than 2u(Di)/√n away from the base value of the process global mean (µ) for that same standard, another value should be immediately computed. If this new value is now within 2u(Di)/√n of µ, register the first average as usual. If the second value is also more than 2u(Di)/√n away from µ, the cause of this discrepancy should be determined and corrected before further measurements are made in that range of nominal values and using that procedure.

2- The standard deviation Sζ (repeatability) for each set of measurements obtained in calculating any value of a given check standard must be smaller than 1/9 the tolerance for that standard.

3- The sensitivity of the instrument shall be better than 1/9 of the tolerance applicable to the standard being monitored.

Minimum Performance Criteria: PROCESS

1- The base value of the process uncertainty u(Di) shall always be smaller than 1/9 tolerance for that standard, or √n/9 tolerance in the case of comparative weighing with ‘n’ comparisons.

2- The base value of the process global mean µ for a check standard shall always meet the following requirement: (Nom - ½ tolerance) ≤ µ ≤ ( Nom + ½ tolerance)

where Nom is the nominal value of the standard; otherwise, the check standard shall be adjusted to within 1/3 tolerance and re-calibrated.

Summary - At a Glance

This RP has discussed methods used to: calculate the process global mean and process uncertainty, monitor and update these parameters when required, and evaluate if the calibration process remains in statistical control. The statistical tests presented have shown that, when performing process monitoring, it is important not only to look at the equivalence of the means of two samples but also to look at the equivalence of the variances.


At a Glance

To monitor the mass calibration process, the following should be done:

a)For a given check standard, the group mean µ1 and group variance u12(Di) shall be initially evaluated over a period of 15 days and computed as described in section 3. These will be considered respectively as the initial base values for the process global mean and process variance (square of the process uncertainty) and should be noted in the Equipment Performance Log Book.

b)Each day of calibration at a given nominal value, “n” comparative weighings shall be performed at that nominal value between the check standard and the district standard.

c)Ensure that the minimum performance criteria (see section 5) are met for each set (day).

d)Repeat steps b) and c) until the new group of data is at least the same size as that which was used to last determine the base values for the process global mean µ and process uncertainty u(Di).

e)Compute the new group mean µ2 and group variance u22(Di)

f)Compare the group mean µ2 to the process global mean µ, using the T-test (if there are less than thirty days of data in each sample) or using the Z-test (if there are more than 30 days of data in each sample).

g)Compare the group variance u22(Di) to the process variance u2(Di) using the F-statistic.

h) Should both tests (means and variances) pass, the data from the two samples can be combined and new base values for the process global mean µ and process uncertainty u(Di) can be computed using both samples as one larger sample, as described in section 3.2

i) Ensure that the process meets the minimum performance criteria of section 5.

j) Should one or both tests (means or variances) fail, the reason for failure should be investigated, identified and noted in the Equipment Performance Log Book. In this case, the new group mean µ2 shall be retained as the base value for the process global mean. The previous base value for the process uncertainty u(Di) shall continue to be used until a new 15 day test is performed as described in section 2.

k)To ensure proper control of the calibration activities, repeat all the above steps on a on-going basis and perform comparative weighing at least once per month for each nominal value or range of values.


References

1. Chao, Lincoln L., Introduction to Statistics, Brooks/Cole Publishing Company, Monterey, California, 1980.

2. Dupuis-Désormeaux, N., Comparison of Means and Variances for Process Monitoring, Legal Metrology Branch, Industry Canada , 10 April 1996.

3. Dupuis-Désormeaux, N., Criteria and Procedures for the Verification of Inspection Standards, Legal Metrology Branch, Industry Canada, third edition, February 21, 1995.

4. Dupuis-Désormeaux, N., Calibration Services Laboratory - Recommended Practices, Version 1: RP-01: Laboratory Calibration Procedures for Standards of Mass, RP-02: Determination of Mass Calibration Values and Related Uncertainties; RP-03: Mass Calibration Process Monitoring; RP-04: Monitoring and Re-Adjustment of Calibration Intervals for Mass Standards, Measurement Canada, February 2001.

5. Joint International Committee ISO/IEC/OIML/BIPM- TAG-4, Guide to the Expression of Uncertainty in Measurement, first edition, 1993.

6. Johnson, Robert, Elementary Statistics, third edition, North Scituate, Massachusetts: Duxbury Press, 1980.

7. Taylor, John K., and Oppermann, Henry V., Handbook for the Quality Assurance of Metrological Measurements, NBS Handbook 145, pages 8.6 - 8.9.

8. Wonnacott, Thomas H., and Wonnacott, Ronald J., Introductory Statistics, New York: John Wiley & Sons, Inc., 1969.

Annex A : Percentage Points of the T Distribution for the T-test

Percentage Points of the T Distribution for the T-test

Annex B : Cumulative Normal Distribution for the Z-test

Cumulative Normal Distribution for the Z-test

Annex C : Percentage Points of the F Distribution for the F-test

Percentage Points of the F Distribution for the F-test

Annex D : Background Information - Extracts of References

Background Information - Extracts of References