RP-03 Laboratory: Mass Calibration Process Monitoring
N. Dupuis-Désormeaux, Senior Engineer, Gravimetry
August 2002
RP-03: Mass Calibration Process Monitoring, in PDF format, 289 KB
Table of Contents
2. Monitoring Method - When to update the “base” values
3.1 Tolerance Testing and Comparative Weighing
3.1.1 Group Mean
3.1.2 Group Variance
3.2 Series of Intercomparisons
3.2.1 Group Mean
3.2.2 Group Variance
3.3 Procedure for gathering data
3.4.1 Process Global Mean
3.4.2 Process Variance
4.1 Comparison of Means Obtained from Two Groups of Data
4.1.1 The T-test
4.1.2 The Z-test
4.2 Comparison of Variances from Two Groups of Data
Annex A: Percentage Points of the T Distribution for the T-test
Annex B: Cumulative Normal Distribution for the Z-test
Annex C: Percentage Points of the F Distribution for the F-test
Annex D: Background Information - Extracts of References
Abstract
In recent years, quality assurance and quality control mechanisms have not only gained popularity within the private sector but within the public sector as well. This increase in process monitoring needs has prompted the development of more advanced statistical evaluation techniques.
When a process is monitored for conformance to fixed statistical limits based on a history of experimental data, the process properties themselves, such as the process mean and variance, must be re-evaluated periodically.
For example, to monitor mass calibration activities, two known standards are initially compared over a period of a minimum of fifteen days, after which, the mean of their differences and the associated variance are calculated, these parameters are referred to as the initial “base” values for the Process Global Mean and Process Variance. After an additional fifteen days of data has been accumulated, a new group mean and variance for the comparison are computed. If, since the last base values were established, one of the standards has been inadvertently dropped or scratched, or the device used to measure this difference has suffered damage, then the new mean difference between the two standards, computed from the group of data for the additional fifteen days, will likely differ significantly from the previously calculated value (base value of the Process Global Mean); this variation is an actual change in the process properties and past data is no longer relevant. To differentiate statistical variation from real process variation we perform the statistical tests proposed herein. These statistical methods are widely used to evaluate if the data obtained from two experiments are statistically different or if they can be combined to form one larger pool of values; or in statistical terms, to evaluate if the two samples have been drawn from the same population. In other words, these methods are employed to evaluate significant changes in process properties and to determine when new data can be combined with existing data.
To ensure that the process remains in statistical control, the Process Global Mean and the Process Variance must be monitored and updated regularly. The frequency of this verification is dependent upon the operational priorities of the Calibration Standards Laboratory (CSL).
Monitoring Method
The Process Global Mean (μ) and Process Uncertainty u(Di) base values for a given mass standard cannot be updated until new points accumulated are in equal number to those used to calculate these previous base values and until the statistical tests show that the samples can be combined. For example, looking at the base value for the Process Uncertainty, if an initial uncertainty is computed after fifteen days, call it u1(Di), then u1(Di) is the initial base value for the Process Uncertainty u(Di). In this case, an additional fifteen days of testing is required before the base value can be changed. Hence, the second group uncertainty, call it u2(Di) must be calculated from at least fifteen new sets of values. Once the statistical analysis presented in this RP is performed, it can be decided if both groups of data can be combined. If they can be combined, then the new base value for the Process Uncertainty is calculated from the combination of the thirty sets of values. Now this new base value can only be modified once an additional thirty sets of values is compiled and the corresponding u3(Di) is computed and statistically compared to the base value. This evaluation continues until 120 points are accumulated for the base value, after which point, any new uk(Di) must be calculated from at least 120 new points. In this case, the mean and variance are compared as usual; however, although all points are retained for the Process Uncertainty base value u(Di), only the last 120 points are retained for the computation of the Process Global Mean base value (μ).
If the total new points accumulated are greater or equal in number to those used to calculate the previous base values for a given standard, and if the statistical tests presented in this RP show that data cannot be combined, the cause of this discrepancy shall be identified and documented. This can be explained by the fact that at least one element of the process can now be considered to have changed sufficiently to warrant the use of the updated values. Please note that this is also the case when any of the standards used in establishing the Process Global Mean and Process Uncertainty (or Variance) at a given nominal value is re-calibrated as this will affect the base value of the Process Global Mean. Likewise, should a mass comparator be repaired or replaced, this will have an effect on the base value of the Process Uncertainty. In both these cases, the existing base values for the affected range of nominal value(s) shall be noted and a statement explaining the modification or recalibration shall be entered in the CSL Equipment Performance Log Book. Please note that, in this case, the new base value for the Process Global Mean shall be set to the value of the most recent group mean obtained. However, the old base value for the Process Uncertainty shall continue to be used for the calculation of the combined uncertainties until a new group of 15 days of data is accumulated. At this point, new data shall be gathered for a minimum of 15 days and the values evaluated and updated as discussed previously.
Even if insufficient points have been accumulated to warrant a change to the Process Global Mean and Process Uncertainty base values, it is crucial to note the values for each set as well as for each group and ensure the minimum performance criteria of section 5 are met.
Data Required for Analysis
3.1 Tolerance Testing and Comparative Weighing
3.1.1 Group Mean
3.1.2 Group Variance
3.2 Series of Intercomparisons
3.2.1 Group Mean
3.2.2 Group Variance
3.3 Procedure for gathering data
3.4.1 Process Global Mean
3.4.2 Process Variance
The base values of the Process Global Mean (μ) and Process Uncertainty u(Di) involved in the calibration of weights are initially estimated over a minimum of fifteen days, where comparative weighing is performed every day using the same standards called Check Standards. Please note that the Process Global Mean and Process Uncertainty must be calculated for each nominal value (or range of values) and for each method employed. For example, the u(Di) will be different if series of intercomparisons are performed rather than single comparison weighing.
If it is not feasible to monitor the Process Global Mean (μ) and Process Uncertainty u(Di)for each nominal value, ranges of nominal values can be monitored. The usual and most representative method for grouping is based on which devices are used for the calibration. In this case, one or more relative u(Di) values are computed for given ranges of masses to be calibrated on a given mass comparator.
Check Standards are compared to the Working Standards either through a series of intercomparisons or comparative weighing depending upon the process we are evaluating. Please note that Working Standards cannot be used as Check Standards because the system would be redundant as we would be comparing masses to themselves. The Check Standards should be of the same nominal value and accuracy class as the weights involved in the process being monitored. In other words, for the method of series of intercomparisons we need Check Standards of the same class as the District Standards; and for single comparative weighing we need Check Standards of the same class as the Inspector Weights.
Please note that the data of only one set per day is to be recorded; if many sets are performed on a given day, only the first set is to be recorded for the process monitoring purposes described herein.
One set of Check Standards of Class F2 and one set of Check Standards of Class M1 should be acquired for process monitoring; the Process Global Mean μ and Process Uncertainty u(Di) should be continuously monitored and adjusted using these standards; and a CSL Equipment Performance Log Book should be produced.
Process Global Mean (μ)
The base value of the Process Global Mean (μ) is initially established over a minimum of fifteen calibration days (not necessarily consecutive), where comparative weighing is performed once per day for each day of calibration. This comparison establishes the difference in mass between a Check Standard and a Working Standard. The base value for the Process Global Mean (μ) is initially set equal to the first group mean (μk=1) computed. Subsequently, after sufficient points have been accumulated, a new group mean is computed, the statistical tests in this RP are performed and a new base value for the Process Global Mean is established, if warranted.
Process Uncertainty u(Di)
The base value for the Process Uncertainty u(Di) is initially set equal to the first group uncertainty uk=1(Di) computed. The group uncertainty uk(Di) is derived from the observations made in computing the group mean (μk). Subsequently, after sufficient points have been accumulated, a second group uncertainty u2(Di) is computed, the statistical tests in this RP are performed and a new base value for the Process Uncertainty u(Di) is established, if warranted.
As discussed in section 2.4 of RP-02: Determination of Mass Calibration Values and Related Uncertainties, the Process Variance u2(Dt) for a given mass - likewise its Process Uncertainty u(Dt) - is a function of two main components: short-term variance (S2i) and long-term variance u2(LTi).
u2(Dt) = (Si)2 + u2(LTi)
The short-term variance (S2i), also known as the repeatability, is based on the individual results of the calibration of the given mass and should be entered directly into its equation for the process variance u2(Dt) at the time of calibration. Further, when performing series of intercomparisons, the repeatability is dependant upon which Form is used and therefore is entered into the calculations for the Process Uncertainty u(Dt) for the given mass at the time of performing the calibration.
Therefore, the short-term variance (S2i or σm2 ) portion of the Process Uncertainty u(Dt) for a given mass is NOT monitored and should be entered into the equation at the time of calibration of the given mass.
The long-term variance u2(LTi), also known as the reproducibility, is a measure of the long-term performance of devices, methods and environment involved in calibrating a given mass. Since it is most often impractical or impossible to obtain long-term data on each mass calibrated, the behaviour of a Check Standard of equivalent nominal value is monitored instead. Hence, u2(LTi) of a given mass is estimated by tracking the variations in the mass difference obtained when comparing a Check Standard to a Working Standard and establishing u2(Di) . This value is continuously monitored and adjusted, as described in this recommended practice.
Please recall that u2(Di) for a Check Standard must be calculated for each nominal value (or range of values) and for each method employed. Therefore u2(Di) is continuously updated as described further and is downloaded directly into the calculations for the process variance of the mass being calibrated u2(Dt) as per RP-02: Determination of Mass Calibration Values and Related Uncertainties.
Therefore, the long-term variance u2(LTi) portion of the Process Variance u2(Dt) for a given mass is approximated by using the base value for the process variance obtained with the check standard u2(Di); this value IS monitored and should be downloaded into the equation for u(Dt) at the time of calibration of the given mass.
3.1 Tolerance Testing (n=1) and Comparative Weighing (n>1)
Tolerance Testing (Single Comparison Weighing) is when the measuring process only requires one comparative weighing (i.e. n=1 from section 3.3). In this case, because only one comparative weighing is performed, there is only one value in a daily set.
When section 3.3 shows that more than one reading is required we call this Comparative Weighing with “n” comparisons. In this case, there are “n” values in each daily set.
The group mean computed for the Check Standard when compared to a Working Standard for M daily sets of values is calculated as follows:
(3.1.1)
Group Mean μk

xζ = is the mass value mt computed for the Check Standard obtained on day (set) ”ζ”. If more than one reading of the Check Standard is required to be performed (n>1) in order to meet the performance requirements, the average value is retained as xζ.
M = number of points in that group; i.e. number of monitoring days for that Check Standard for this group of data.
µk = is the group mean of all xζ values obtained for the check standard over the total number ‘M’ of days.
Recall from RP-02Field that the Process Uncertainty u(Dt) for a given mass is a function of two main components: short-term variance (S2i) and long-term variance u2(LTi). The short-term variance is based on the individual calibration of the given mass and the long-term variance is estimated with the base value of the Process Variance u2(Di) for the Check Standard of the same (or similar) value as the mass being calibrated. In other words,
u2(Dt) = (Si)2 + u2(LTi)
u2(Dt) ≈ (Si)2 + u2(Di)
Now, as discussed previously, the base value of the Process Variance u2(Di) is initially established with the first Group Variance u2k=1(Di), then monitored, and then updated as described in 3.4.2.
This implies that, initially, we set:
u2(Dt) ≈ u2k-1(Di) = u12(Di)
Each time a Group Variance u2k(Di) is to be calculated, with either tolerance testing, single comparative weighing or comparative weighing with “n” comparisons, we use the following:
(3.1.2)
Group Variance u2k(Di)

xζ = is the mass value mt computed for the Check Standard obtained on day (set) ”ζ”. If more than one reading of the Check Standard is required to be performed (n>1) in order to meet the performance requirements, the average value is retained as xζ.
M = number of points in that group; i.e. number of monitoring days for that Check Standard for this group of data.
µk = is the group mean of all xζ values obtained for the check standard over the total number ‘M’ of days
3.2 Series of Intercomparisons
When comparing Check Standards to Working Standards using series of intercomparisons, the calibration session usually spans over more than one day. The daily value for the Check Standard (as computed by the Form for the series of intercomparisons) is xζ. Most often, only one series of intercomparisons per calibration session is performed and there is again only one value per “daily” set. However, if a Check Standard and a Working Standard are compared more than once in the same day (or calibration session), only the first value is retained for process monitoring.
The group mean computed for the Check Standard when compared to a Working Standard using series of intercomparisons for M daily sets of values is calculated as follows:
(3.2.1)
Group Mean μk

xζ = is the mass value mt computed for the Check Standard obtained on day (set) ”ζ”. If more than one reading of the Check Standard is required to be performed (n>1) in order to meet the performance requirements, the average value is retained as xζ.
M = number of points in that group; i.e. number of monitoring days for that Check Standard for this group of data.
µk = is the group mean of all xζ values obtained for the check standard over the total number ‘M’ of days.
Recall from RP-02 that the Process Variance u2(Dt) associated with the calibration of a given mass using a series of intercomparisons is:

u2 (Dt) = the process variance (repeatability and reproducibility) for mass “t”
Ni,j = respectively, the number of rows and the number of columns in Q matrix
CijQ = the coefficient in the Q matrix corresponding to row i and column j
u(i) = the uncertainty of the mass i (along the row of Q) due to process (repeatability and reproducibility)
u2(i) = (Si)i2 + u2 (Di) = σm2 + u2 (Di)
u(j) = the uncertainty of the mass j (along the column of Q) due to process (repeatability and reproducibility)
u2(j) = (Si)j2 + u2 (Dj) = σm2 + u2 (Dj)
σm2 = the variance associated with the Form being used (this is a measure of the process repeatability)
u2(Di, j) = the base value of the Process Variance for the Check Standard of nominal value ‘i’ or ‘j’ (this is a measure of the process reproducibility).This value is directly inserted into the uncertainty equation for u(i) or u(j).
Note that u(i) and u(j) are the Process Uncertainty of, respectively, mass i (row) and j (column) used in the “Q” matrix associated with the intercomparison, where the long-term component of u(i) and u(j) is estimated by use of the base values of the process uncertainty for the Check Standards of corresponding nominal value, u2(Di) and u2(Dj).
When performing series of intercomparisons, the following equation is used for estimating each uk2(Di) corresponding to each Check Standard ‘i’ (or ‘j’) in the series. Please note that a different value of σm2 will be obtained with the Forms for different calibration sessions ζ. Please also note that within the same series of intercomparisons, although σm2 is the same for all weights, uk2(Di) may be different for each weight involved since nominal values and equipment used may differ.
As discussed previously, the base value of the Process Variance u2(Di) for the Check Standard is initially established with the first Group Variance u2k=1(Di) , then monitored, and then updated as described in 3.4.2. Each time a Group Variance u2k(Di) is to be calculated for a series of intercomparisons, we use the following:
(3.2.2)
Group Variance u2k(Di)

σ2m ζ = variance implicit to the Form used for calibration of the Check Standard being monitored as obtained with this Form on each day ‘ζ’.
xζ = is the mass value mt computed on day ‘ζ’ for the Check Standard.
M = number of points in the group; number of monitoring days for that Check Standard for that group
μk = is the group mean for the Check Standard of nominal value ‘i’ being monitored, as obtained by the above 3.2.1
3.3 Procedure for Gathering Data (Initial 15 days test)
As discussed at the beginning of section 3, the base values of the Process Global Mean (μ) and Process Uncertainty u(Di) involved in the calibration of weights are initially estimated over a minimum of fifteen days.
This is done by comparing the District Standards to Check Standards by performing comparison weighing to replicate calibrations that take place within district offices.
The District Standards and Check Standards are initially compared over a period of a minimum of fifteen days, after which, the mean of their differences and the associated variance are calculated, these parameters are now said to be the initial “base” values for the Process Global Mean and Process Variance.
Series of Intercomparisons:
When comparing the Working Standards to the Check Standards by performing series of intercomparisons, the initial base values for the Process Global Mean and the Process Variance are established as discussed in 3.2 above by looking at the results obtained for 15 calibration sessions where each calibration session most likely will be longer than one day and involve either an entire weight set or only certain submultiples.
Comparative Weighing:
When comparing the Working Standards to the Check Standards by performing comparative weighing, the initial 15 day test is performed with comparative weighing with n=4 repeated readings. This is to establish if the required performance can be met by simply performing one comparative weighing, or if more than one reading is necessary. The results are then analysed as if, when calibrating, 4 readings will always be performed, 3 readings will be performed, 2 and then when only looking at the first daily comparative reading (which is equivalent to single comparative weighing).
In other words, for comparative weighing, the Process Global Mean and the Process Variance are first said to be equal to, respectively, the Group Mean and the Group Variance for the initial 15 day test as if n=4, n=3, n=2 and n=1. After this is done, we compare the results to the performance monitoring criteria in section 5 and establish how many readings need to be performed for each comparative weighing such that the performance requirements are met.
Once the Process Uncertainty (square root of the Process Variance) has been determined for n=4,3,2,1 we apply the following criteria to determine the number of mass comparisons required to verify masses against given tolerances.
The maximum variability of the calibration or tolerance testing procedure performed on a given device and at a given nominal value is estimated by the total dispersion in the data obtained for a given Check Standard. Based on the standard normal Z statistic, μ ± Z σ will contain a given percentage of points of the population; for example: μ ± 3 σ will contain 99.73% of the points. Since the process uncertainty u(Di) is in fact the standard deviation σ for the population of values gathered for the Check Standard, we have that, μ ± 3 u(Di) will contain 99.73% of the points. In other words, in 0.27% of the cases, a value obtained for the Check Standard will be outside of the interval μ ± 3 u(Di). In equation form, this is represented by:
![]()
From the Central Limit Theorem, we know that the Process Global Mean μ will be equal to the average of the Group Means. This implies that the average of all points (in the population) will be equal to the average of the Group Means. Further, the Central Limit Theorem says that the standard deviation obtained when comparing the Group Means for groups of ‘n’ points taken from the population will be equal to the standard deviation of the population divided by the number ‘n’ of points in each group.
Accordingly, the standard normal Z value is expressed as:

Rearranging this expression yields:
![]()
The above equation is really important since it implies that the difference between the Process Global Mean and any group mean f ‘n’ points will be no greater than Zσ %n.
Since we want

The Central Limit Theorem implies that

Therefore
![]()
Re-writing yields:
![]()
The above equation implies that, for comparative weighing with 'n' comparisons, we have that, if
n=1 comparative weighing: 9.00 u(Di)must be smaller than the tolerance of the mass being calibrated
n=2 comparative weighings: 6.36 u(Di) "
n=3 comparative weighings: 5.20 u(Di) "
n=4 comparative weighings: 4.50 u(Di) "
Example
From the initial 15 day test, we have determined that for a given device the Process Uncertainty in the 20 kg range is u(Di) = 0.15 g.
From the above, we have that, in order to calibrate a 20 kg mass to an (OIML R111) M1 tolerance of 1 g, we need:

Re-writing this, we see that:

Therefore, in this example, we see that ‘n’ must be bigger than 1.8 and thus, two or more comparative weighings must be performed in order to meet the performance requirements. This implies that, for this example, comparative weighing with at least 2 comparisons is necessary when calibrating 20 kg masses to an M1 tolerance.
As obtained by posing n=1 in the first equation and isolating u(Di), the result implies that until we can obtain a Process Uncertainty u(Di) < 0.11g, we will need to perform at least two readings of each 20kg weight being calibrated and take the average as its calibrated value.
It should be noted that although the initial 15 day test showed that more than one reading was needed in order to calibrate the 20 kg weights, after more data is gathered (at minimum another 15 days) the performance is again evaluated. If the statistical analyses presented in this document show that both groups of 15 days of data can be combined, then the Process Uncertainty is calculated from the combined 30 days of data. If this new base value of the Process Uncertainty meets the criteria (smaller than 0.11 g for this example), the calibrations of the 20 kg masses can henceforth be performed with only one comparative weighing.
When more than thirty points are available, the population of data is large enough to use the Central Limit Theorem. This is the case as soon as it is found, through the statistical tests in the following sections, that we can combine the data from the initial 15 day test (that established the base values) to the newly gathered days of data. Therefore, after performing the statistical tests presented in the following sections, if it is decided that the data obtained for a newly computed group can be combined with the existing base values for the Process Global Mean and the Process Variance, the following equations are used.
(3.4.1)
Initially: μ = μk=1
Subsequently:
μ
(Central Limit Theorem) (3 xζ ) / N
μ = is the base value of the Process Global Mean of all μk values obtained for that Check Standard for ‘N’ days.
xζ = is the mass value mt computed on each day ‘ζ’ for the Check Standard. If multiple readings are required to be performed each day, then the average of the daily readings is the value of xζ.
N = is the total combined number of days for which we calculated a value of the Check Standard. In other words, it is the sum of the number of points in the previous group to the number of points in the new group.
Using again the Central Limit Theorem, we have:
(3.4.2)

(Di) = the base value for the Process Variance of the Check Standard ‘i’.
xζ = is the mass value mt computed on each day ‘ζ’ for the Check Standard. If multiple readings are required to be performed each day, then the average of the daily readings is the value of xζ.
μ = is the base value of the Process Global Mean
N = is the total combined number of days for which we calculated a value of the Check Standard. In other words, it is the sum of the number of points in the previous group to the number of points in the new group.
Statistical Analyses
4.1 Comparison of Means Obtained from Two Groups of Data
4.1.1 The T-test
4.1.2 The Z-test
4.2 Comparison of Variances from Two Groups of Data
4.1 Comparison of Means Obtained from Two Groups of Data
In statistical terminology, when two means are compared in order to determine if they are approximately equal, this is referred to as testing the hypothesis that the two means are equal, also called the null hypothesis. If the hypothesis is true, the means are considered equivalent and may be combined if the variances of the samples are also equivalent (this will be discussed in section 4.2); if the hypothesis is false, the samples cannot be combined for they are judged as representing different events and only the latter set of data can be used until another sample of the same size is obtained and this analysis repeated.
When there are less than thirty points in each sample, a T-test must be used. If each sample has at least thirty points, then the Z-test is employed. Because, the Z-test is simpler than the T-test, it is usually the preferred method.
(4.1.1)

The T statistic is calculated from the table Percentage Points of the t Distribution (see Appendix A) and is evaluated at (n1 + n2 - 2) degrees of freedom and at the desired level of significance ∝. It should be noted that a two-sided null hypothesis must be considered. This implies that should the null hypothesis fail, this failure would indicate not only that the second mean can be greater than the first, but can also be smaller; in other words, both sides of the statistical distribution must be observed. In this case, α is calculated as follows:
∝ = (1 - c.l.) ÷ 2
where c.l. is the desired confidence level
The other symbols used in equation 4.1.1 are:
1= mean of sample 1. In our case, this will be the base value for the process global mean µ
S1= standard deviation of sample 1. In our case, this will be the base value for the process uncertainty u(Di)
n1= number of points in sample 1. This will be the total number of points N that were used to calculate the base value of the process global mean
2= mean of sample 2. In our case, this will be the new group mean μk
S2= standard deviation of sample 2. In our case, this will be the square root of the new group variance uk2(Di)
n2= number of points in sample 2. In our case, this will be the number of points in the new group k
Example:
For an initial 16 days, two 20 kg mass standards are compared using a given method and their difference is noted. From these 16 comparisons, a group mean μ1 and a group variance u12(Di) are calculated. The group mean μ1 for the difference is 128.5 mg, and the group uncertainty (square root of the group variance) is 81 mg. As explained in section 3, we initially set the base values equal to the first group values. This means that the base value for the process global mean is set at μ = 128.5 mg , and the base value for the process uncertainty is set as u(Di) = 81 mg.
For an additional 26 days, the same mass standards are again compared. This time, the group mean μ2 is calculated as 114.9 mg and the group standard deviation u2(Di) is 36 mg.
Can these two sample means be considered equivalent with a 95% confidence level ? Can the two samples be combined and the new process global mean and process uncertainty base values be calculated from the entire 42 data points?
Answer:
Using the above formula for the T-test, with
1 = μ = 128.5 mg, S1 = u(Di) = 81 mg, n1 = N = 16
2 = μ2 = 114.9 mg, S2= u2(Di) = 36 mg, n2 = 26

The T value is chosen from Appendix A with (16 + 26 - 2) = 40 degrees of freedom; and where α value is: ∝= (1 - c.l.) ÷ 2 = (1 - 0.95) ÷ 2 = 0.025. Therefore,
T = 2.021 ≥? 0.748
The above equation holds and shows that there is no significant difference between the means; the null hypothesis is confirmed. However, the data cannot be combined until an analysis of the variances is performed (this will be done in section 4.2).
As mentioned at the beginning of section 4.1 above, if each sample has at least thirty points, the Z-test is used because the Z-test is simpler than the T-test.
(4.1.2)

The Z statistic is taken from the table Cumulative Normal Distribution (see Appendix B) and is evaluated from the cumulative probabilities, which depend on the desired level of confidence. Again a two-sided null hypothesis must be considered. Therefore the cumulative probabilities are calculated as follows:
p = (1 + c.l.) ÷ 2
where c.l. is the desired confidence level
The other symbols used in equation 4.1.2 are:
μ1*= “true” theoretical mean of population 1
μ2*= “true” theoretical mean of population 2
1= mean of sample 1. Here, this is the base value for the process global mean μ
S1= standard deviation of sample 1. In our case, this is the base value for the process uncertainty u(Di)
n1= number of points in sample 1. This is the total number of points N that were used to calculate the base value of the process global mean
2= mean of sample 2. In our case, this is the new group mean μk
S2= standard deviation of sample 2. In our case, this is the square root of the new group variance uk2(Di)
n2= number of points in sample 2. In our case, this is the number of points in the new group k
Above, it is mentioned that a two-sided null hypothesis is considered. This “null” statement implies that the means are equivalent and can be regarded as representing the same population; therefore the difference μ1* - μ2* is posed equal to zero.
Example:
An initial test is performed where, for 36 days, two 20 kg mass standards are compared and their difference is noted. From these 36 comparisons, a group mean μ1 and a group variance u12(Di) are calculated. The group mean μ1 for the difference is 98.0 mg, and the group uncertainty (square root of the group variance) u1(Di) is 73 mg. As explained in section 3, we initially set the base values equal to the first group values. This means that the base value for the process global mean is set at μ= 98.0mg and the process uncertainty is set at u(Di)=73 mg.
For an additional 41 days, the same mass standards are again compared. This time, the group mean μ2 is calculated as 141.7 mg and the group standard deviation u2(Di) is 55 mg.
Can these two sample means be considered equivalent with a 95% confidence level ? Can the two samples be combined and the new process global mean and process uncertainty base values be calculated from the entire 77 data points?
Answer:
Using the above formula for the Z-test, with
1 = μ = 98.0 mg, S1= u(Di) = 73 mg, n1= N = 36
2 = μ2 = 141.7 mg, S2= u2(Di) = 55 mg, n2= 41

The Z value is chosen from Appendix B with p value of: p = (1 + 0.95) ÷ 2 = 0.975. Therefore,
Z(0.975) = 1.96 ≥? 2.93
The above equation does not hold and the null hypothesis must be rejected. This implies that the populations are different and that the difference between the means is so significant that the samples cannot be combined.
In this case, as discussed in section 2, since the total new points accumulated are greater than those used to calculate base values and the statistical tests fail, the new data cannot be combined to the base values. The cause of this discrepancy shall be identified and documented in the Equipment Log Book, the base value for the process global mean shall be set at μ = μ2 and the process uncertainty u(Di) shall remain unchanged until at least 15 days of new values are gathered.
4.2 Comparison of Variances from Two Groups of Data
This test compares the variances of two samples in order to evaluate if they can be regarded as representing the same population. The F-test can be performed for any sample size.
An important observation to note is that, unlike the T and Z distributions which are centered around zero (0), the F distribution is not symmetrical, i.e. not normally distributed; it is skewed to the right. Therefore, an absolute value cannot be used in the test and two limits, FL and a FH, must be calculated; the calculated value must fall within these limits.
(4.2.1)

The F statistics are calculated from the table Percentage Points of the f Distribution (see Appendix C) and are evaluated at F(n1-1, n2-1, β) for FL,H and at F(n1-1, n2-1, ∝) for FH,L. Since the limits are set such that FL < FH, the choice of the subscript is a function of if FL,H < FH,L or if FL,H > FH,L. In other words if FL,H < FH,L, then FL,H = FL and FH,L = FH. Likewise, if FL,H > FH,L, then FL,H = FH and FH,L = FL.
The desired level of significance ∝ is calculated as follows:
∝ = (1 - c.l.) ÷ 2
where c.l. is the desired confidence level
The β value is calculated as: β = 1- ∝
The other symbols used in equation 4.2.1 are:
S1= standard deviation of sample 1. In our case, this will be the base value for the process uncertainty u(Di)
n1= number of points in sample 1. This will be the total number of points N that were used to calculate the base value of the process global mean
S2= standard deviation of sample 2. In our case, this will be the square root of the new group variance uk2(Di)
n2= number of points in sample 2. In our case, this will be the number of points in the new group k of data
Please note that F(n1-1, n2-1, β) is not directly available since tables for the F-distribution are usually only computed in terms of ∝.
That is why the relationship F(n1-1, n2-1, β) = 1/ F(n2 -1, n1-1, ∝) is used.
Example: This is the same example as in section 4.1.1.
For 16 days, two 20 kg mass standards are compared and their difference is noted. From these 16 comparisons, a group mean μ1 and a group variance u12(Di) are calculated. The group mean μ1 for the difference is 128.5 mg, and the group uncertainty (square root of the group variance) is 81 mg. As explained in section 3, we initially set the base values equal to the first group values. This means that the base value for the process global mean is set at μ = 128.5 mg , and the base value for the process uncertainty is set as u(Di) = 81 mg.
For an additional 26 days, the same mass standards are again compared. This time, the group mean μ2 is calculated as 114.9 mg and the group standard deviation u2(Di) is 36mg.
Can these two sample means be considered equivalent with a 95% confidence level ? Can the two samples be combined and the new process global mean and process uncertainty base values be calculated from the entire 42 data points?
Answer:
Using the above formula for the F-test, with
S1= u(Di) = 81 mg, n1= N = 16
S2= u2(Di) = 36 mg, n2 = 26

The value of FL,H is calculated from Appendix C with F (n1-1, n2-1, β) = 1 ÷ F (n2-1, n1-1, ∝).
The tables in Appendix C are expressed in terms of a significance level of 2∝, thus the table Upper 5 percent points is for a 95 % confidence.
We have ∝ = (1 - c.l.) ÷ 2 = (1 - 0.95) ÷ 2 = 0.025, and thus FL,H = 1 ÷ F (25, 15, 0.025). Unfortunately the value for ν1=25 is not provided in the table and we must approximate. Since
F (24, 15, 0.025) = 2.29 and
F (30, 15, 0.025) = 2.25
then F (25, 15, 0.025) is approximately 2.28.
Hence FL,H = 1 ÷ F (25, 15, 0.025)= 1/2.28 = 0.438.
The FH,L value is chosen directly with F (15, 25, ∝) = F(15, 25, 0.025) = 2.09.
Because FL,H = 0.438 and FH,L= 2.09, then FL,H < FH,L thus FL,H = FL and FH,L = FH.
Since (81 mg)2 ÷ (36 mg)2 = 5.06, the equation for the test is:
FL ≤? [(S1)2 ÷ (S2)2] ≤? FH
0.438 ≤ ? 5.06 ≤ ? 2.09
The above relationship does not hold and shows that although the means could be considered as equal (by section 4.1.1.), there is such a difference between the variances that the two samples cannot be combined.
If however, the first standard deviation would have been equal to, say 50 mg, the result would be as follows:
FL ≤ ? [(S1)2 ÷ (S2)2] ≤ ? FH
0.438 ≤ ? 1.93 ≤ ? 2.09
and the above equation would hold and it would be possible to combine the samples. In this case, the new base values for the process global mean μ and process uncertainty u(Di) would be computed from the entire pool of data.
Performance Criteria
Minimum Performance Criteria: INDIVIDUAL SETS
1- When evaluating repeatability and reproducibility of an instrument with a given check standard, each computed value xζ for the check standard shall be contained in the interval:
[μ - (3u(Di)/√n) , μ + (3u(Di)/√n)]
In other words: any computed value for the check standard must be no more than three times the base value of its process uncertainty u(Di), divided by the square root of the number of points used to compute this value of the standard, away from the base value of the process global mean (μ) of that check standard. The base value of the process global mean (μ) is the long-term average of the check standard. Should any value obtained for the check standard be outside the interval above, all calibration/tolerance testing of masses in that range must cease until the problem is rectified.
If a calculated value for a check standard is more than 2u(Di)/√n away from the base value of the process global mean (μ) for that same standard, another value should be immediately computed. If this new value is now within 2u(Di)/√n of μ, register the first average as usual. If the second value is also more than 2u(Di)/√n away from μ, the cause of this discrepancy should be determined and corrected before further measurements are made in that range of nominal values and using that procedure.
2- The standard deviation Sζ (repeatability) for each set of measurements obtained in calculating any value of a given check standard must be smaller than 1/9 the tolerance for that standard.
3- The sensitivity of the instrument shall be better than 1/9 of the tolerance applicable to the standard being monitored.
Minimum Performance Criteria: PROCESS
1- The base value of the process uncertainty u(Di) shall always be smaller than 1/9 tolerance for that standard, or √n/9 tolerance in the case of comparative weighing with ‘n’ comparisons.
2- The base value of the process global mean μ for a check standard shall always meet the following requirement: (Nom - ½ tolerance) ≤ μ ≤ ( Nom + ½ tolerance)
where Nom is the nominal value of the standard; otherwise, the check standard shall be adjusted to within 1/3 tolerance and re-calibrated.
Summary - At a Glance
This RP has discussed methods used to: calculate the Process Global Mean and Process Uncertainty, monitor and update these parameters when required, and evaluate if the calibration process remains in statistical control. In addition, the statistical tests presented in this RP have shown that, when performing process monitoring, it is important not only to look at the equivalence of the means of two samples but also to look at the equivalence of the variances.
At a Glance
To monitor the CSL’s mass calibration process, the following should be done:
a) For a given Check Standard, the group mean ì1 and group variance u12(Di) shall be initially evaluated over a period of 15 days and computed as described in section 3. These will be considered respectively as the initial base values for the Process Global Mean and Process Variance (square of the Process Uncertainty) and should be noted in the CSL Equipment Performance Log Book.
b) Each day of calibration at a given nominal value, a comparison weighing or series of intercomparisons shall be performed at that nominal value (depending on which process is being monitored) to evaluate the value of the Check Standard.
c) Ensure that the Minimum Performance Criteria (see section 5) are met for each set (day).
d) Repeat steps b) and c) until the new group of data is at least the same size as that which was used to last determine the base values for the Process Global Mean ì and Process Uncertainty u(Di).
e) Compute the new group mean μ2 and group variance u22(Di)
f) Compare the group mean μ2 to the Process Global Mean μ, using the T-test (if there are less than thirty days of data in each sample) or using the Z-test (if there are more than 30 days of data in each sample).
g) Compare the group variance u22(Di) to the Process Variance u2(Di) using the F-statistic.
h) Should both tests (means and variances) pass, the data from the two samples can be combined and new base values for the Process Global Mean μ and Process Uncertainty u(Di) can be computed using both samples as one larger sample; this can be done as described in section 3.4
i) Ensure that the process meets the Minimum Performance Criteria of section 5.
j) Should one or both tests (means or variances) fail, the reason for failure should be investigated, identified and noted in the CSL Equipment Performance Log Book. In this case, the new group mean μ2 shall be retained as the base value for the Process Global Mean. The previous base value for the Process Uncertainty u(Di) shall continue to be used until a new 15 day test is performed as described in section 2.
k) To ensure proper control of the calibration activities, repeat all the above steps on a on-going basis.
References
1. Chao, Lincoln L., Introduction to Statistics, Brooks/Cole Publishing Company, Monterey, California, 1980.
2. Dupuis-Désormeaux, N., Comparison of Means and Variances for Process Monitoring, Ottawa, Legal Metrology Branch, Industry Canada , 10 April 1996
3. Dupuis-Désormeaux, N., Criteria and Procedures for the Verification of Inspection Standards, Legal Metrology Branch, Industry Canada, third edition, February 21, 1995.
4. Joint International Committee ISO/IEC/OIML/BIPM- TAG-4, Guide to the Expression of Uncertainty in Measurement, first edition, 1993.
5. Johnson, Robert, Elementary Statistics, third edition, North Scituate, Massachusetts: Duxbury Press, 1980.
6. Taylor, John K., and Oppermann, Henry V., Handbook for the Quality Assurance of Metrological Measurements, NBS Handbook 145, pages 8.6 - 8.9.
7. Wonnacott, Thomas H., and Wonnacott, Ronald J., Introductory Statistics, New York: John Wiley & Sons, Inc., 1969.
Annex A : Percentage Point of the T Distribution for the T-test
Percentage Points of the T Distribution for the T-test
Annex B : Cumulative Normal Distribution for the Z-test
Cumulative Normal Distribution for the Z-test
Annex C : Percentage Points of the F Distribution for the F-test
Percentage Points of the F Distribution for the F-test
Annex D : Background Information - Extracts of References
Background Information - Extracts of References
- Date modified: