60+	years of experience
3,000+ PSP participants
23,000+ samples shipped per year
1,000 laboratory assessments per year
2,100+ accredited labs

Precision Estimates and the AASHTO re:source Proficiency Sample Program: Why Precise Results Aren’t Always Satisfactory

By Tracy Barnhart, Quality Manager

Posted: April 2014

Why did I receive low ratings on the samples even though my results meet the precision estimate of the test method? That is bogus! Okay, I made up the “bogus” part, but you get the general idea. Why on Earth does AASHTO re:source disregard the precision estimate information when determining each laboratory’s proficiency sample ratings? It seems crazy, doesn’t it? Allow me a chance to explain the method to our madness.

Dig Before we dive in, let’s begin with some basic information. If you haven’t done so already, you might want to check out this article: Proficiency Sample Ratings: Being Average Has Never Been So Good. If you don’t feel like reading it now, or if you just need a refresher, here are some things you should know about the AASHTO re:source Proficiency Sample Program (PSP) final reports:

AASHTO re:source determines the grand average and standard deviation (i.e. 1s) for each data set
The z-score indicates how many standard deviations (1s values) a lab’s individual result is from the grand average of the results from all participating laboratories (excluding invalid and outlier results)
The laboratory rating is based on the absolute value (+ or -) of the z-score
High ratings (3’s, 4’s, and 5’s) and low z-scores are considered “good”

Here is an example page from a final report that shows this information:

Now let’s move on to precision estimates. For those of you who may not be familiar with precision and bias, here are a couple of helpful articles: The Skinny on Precision and Bias: Aiming Our Sights on Precision and Aiming Our Sights on Bias. If you have ever read a Precision & Bias (P&B) statement in an AASHTO or ASTM test method, you may have noticed that many of the estimates of precision are based on the results from the AASHTO re:source PSP. Oh, what a tangled web we weave…

joes In a nutshell, precision is a measurement of how close test results are to each other. Precision is measured in terms of repeatability and reproducibility. Repeatability (or single-operator precision) is the precision of test results obtained with the same test method, in the same laboratory, by the same person, with the same equipment. Reproducibility (or multi-laboratory precision) is the precision of test results obtained in different laboratories using the same test method. In other words, repeatability is within-lab precision, while reproducibility is between-lab precision. Together, repeatability and reproducibility establish the upper and lower limits for the precision of a test method.

ballpark Whew! Now let’s dig in a little deeper. The standard deviation (1s value) in the test methods and the rating system for the AASHTO re:source PSP are determined in a similar manner. That explains the references to the AASHTO re:source PSP results in many Precision & Bias statements. Although the 1s values may look different between the test methods and the AASHTO re:source PSP reports, they are not incompatible. So what’s the catch? Well, the information is used for somewhat different purposes and is presented slightly differently.

For those of you that have never seen a P&B statement before, here’s one from AASHTO T 11.

PSPGraphic

To further clarify:

1s = standard deviation = how much variation there is from the average value
d2s = acceptable range of two results = 2.8 x 1s

As explained above, the d2s values in the test method precision statements provide guidance for comparing two different test results. In contrast, the AASHTO re:source PSP is directed toward each laboratory being able to evaluate its own testing performance by comparing its results to the “correct” (or average) value for that material. AASHTO re:source PSP ratings are then determined by how far a laboratory’s result is from the “correct” value. The d2s values in the test methods are not appropriate for that purpose. I will explain further later. In the meantime, consider that two test results can be very close to one another in value (i.e. precise), yet be very far from the average value (i.e. not accurate).

PSPGraphic Also, it is important to note that the 1s values stated in test method precision estimates usually aren’t applicable to any one specific material. Rather, the precision estimates are typically ballpark estimates applicable to a range of materials. Therefore, some judgment must be used when applying the estimates to different materials. For example, the 1s value for sieving a gravel with rounded particles would likely be different than the 1s value for sieving a crushed stone containing angular and elongated particles.

For some sample materials used in the AASHTO re:source PSP, using the 1s values noted in the test methods would have too much variability to provide useful information for laboratories about their own testing. Conversely, these values would be too confining for other types of materials. The advantage of the AASHTO re:source PSP is that it determines the standard deviation specific to the material that was used for a specific round of testing. The standard deviation provides a sliding scale that automatically adjusts for any testing bias resulting from the material being tested. For materials that are “easy” to test (such as granite) and show similar results between laboratories, the 1s value will be smaller. For materials that are more “difficult” to test (such as shale) and show higher variability between laboratories, the 1s value used for comparison will generally be a larger.

nutshell So, what’s the difference between the d2s results in the precision statement of a test method and the way that the AASHTO re:source PSP ratings are determined? Both use the standard deviation of a set of test results from a group of laboratories that are all testing the same material. Both apply the same level of scrutiny or evaluation. Both compare results based on assumptions about how the results would be dispersed for testing that is properly performed. Sounds identical, doesn’t it? Well, once again, the purposes and criteria are a little different. The AASHTO re:source PSP examines the accuracy of a laboratory’s test result, while the test method precision statement examines the precision of two results compared to one another. Remember, accuracy is a measurement of how close a test value is from the true (or “correct”) value while precision is a measurement of how close individual results are from each other. Here is a handy graphic that represents the difference between accuracy and precision:

Accurate-Vs-Precise-The-Bullseye-Approach

The AASHTO re:source PSP and the AASHTO Accreditation Program (AAP) evaluate participating laboratories based on how far their results are from the “grand average.” The grand average is used as the “correct” (or “expected”) result, and laboratories are judged by how close their results are to the “correct” results for a properly conducted test on that particular material. This is an examination of the accuracy (not precision) of test results.

The precision estimate in a test method uses the “acceptable range of two results” (d2s) to evaluate how close the test results are to one another, regardless of what the “correct” result should be. This is an examination of the precision (not accuracy) of test results. That being said, keep in mind that it is possible for a pair of test results to be precise but not accurate. For example, if a balance isn’t providing correct readings and that balance is used repeatedly for a set of tests, the test results might be very close to the same value, (i.e. “precise”) but that measurement might be very inaccurate.

So, why can’t the test method precision statement be used to compare a test result to the “average” test result? If a result lies within the “acceptable range of two results” when compared to the “average” result, why isn’t that good enough? The reason is that the test result in question satisfies the precision requirement, but not the accuracy requirement. The reverse is also true. It is possible for two results to be accurate enough for the AASHTO re:source PSP, but not precise enough to meet the precision of the test method.

Here’s an example:

PSPGraphic

target The ratings methodology currently in use by the AASHTO re:source is considered to be fair and useful for the purpose of the AASHTO re:source Proficiency Sample Program and the AASHTO Accreditation Program. These programs have a common goal which is to judge a laboratory’s testing results by comparing them to the average of a large body of results for a specific material. Since the precision estimates stated in the test methods are not based on an average result for a specific material, applying the test method precision estimates to the AASHTO re:source PSP results would not be appropriate. If you have any further questions, please contact John Malusky at jmalusky@aashtoresource.org or 240-436-4830.

References

AASHTO, “AASHTO T 11, Standard Method of Test for Materials Finer Than 75-μm (No. 200) Sieve in Mineral Aggregates by Washing,” Standard Specifications for Transportation Materials and Methods of Sampling and Testing, Part 2A: Tests, 2013.
Johnson, Brian. "Proficiency Sample Ratings: Being Average Has Never Been So Good." AASHTO re:source In-Focus Newsletter. AASHTO re:source, Oct. 2010. Web. 07 Apr. 2014.
Norris, Rob. "The Skinny on Precision and Bias: Aiming Our Sights on Precision." AASHTO re:source In-Focus Newsletter. AASHTO re:source, Apr. 2012. Web. 07 Apr. 2014.
Price, Evan. "The Skinny on Precision and Bias: Aiming Our Sights on Bias." AASHTO re:source In-Focus Newsletter. AASHTO re:source, Oct. 2012. Web. 07 Apr. 2014.

Printer Friendly Version

Learn how to be an AASHTO lab 24/7

Precision Estimates and the AASHTO re:source Proficiency Sample Program: Why Precise Results Aren’t Always Satisfactory