I was going to comment on how the point made above about variability and how it also relates to how group size (as measured with c-t-c, aka extreme spread, ES) is seen to increase with the number of shots, but maybe I'll get back to that later.
I think Hovis raises some very important questions in his last post about problems of doing and using statistical analysis of target groups. There are definitely limitations and some "costs" and these have to be balanced for the circumstances and the goals of the individual. I agree that for some purposes the costs outweigh the benefits and it's not worth it. Also, the results of any statistical analysis are only valid for certain circumstances. So I'm not trying to say here that the points raised are wrong, but just something about where the limits of applying this kind of analysis are as a way of trying to answer some of the questions raised.
I apologize for this being a rather long post, but the questions raised are good and as often with good questions, the answers can be long -- maybe someone else could provide shorter and better written answers.
When raising CEP or any other form of statistical analysis which involves something other than what is commonly done I tend to point out the advantages which may make it seem like I'm saying c-t-c is "wrong" or useless -- that's not what I mean to say. It is a legitimate tool for certain purposes and not for others. Just like a hammer is good for driving nails, but for driving machine screws, a screwdriver is a better tool.
One of the biggest misunderstandings made in using statistical results is in distinguishing between predictions and descriptions. Descriptions are easy. The group sizes (or scores) for a target in a competition are descriptive. They are meant to say how the shooter did right then and there. It doesn't matter if he had a lucky day, an unlucky day, or his average (good or bad) day.
Predictions are different. In that case you are saying what will happen in the future. So for example people often quote a group size as a measure of how accurate a rifle is. The implication is that that group size is somehow representative of the accuracy you can expect from the rifle IN THE FUTURE. Is it a 1 MOA rifle, 0.8 MOA, etc.? As Hovis correctly points out, varying conditions will ultimately affect the accuracy achieved in any given circumstances so predictions are really limited to certain circumstances.
You have to know how barrel condition (wear, cleanliness, temperature, etc.) can be expected to affect the predictions, as well as different loads, weather, etc. In principle measurements could all be made of how these variables would affect accuracy of the equipment or the equipment and shooter, but in practice it might take forever. So while possible in principle, it is impossible in practice. But, if you limit the conditions, the predictions become useful. So if someone says rifle X is a 1 MOA rifle and rifle Y is a 3 MOA rifle, you take it to mean that under conditions reasonably similar to how they were tested, you can expect X to be considerably more accurate than Y. Knowing exactly what conditions to accept as "reasonably similar" depends on your knowledge and experience of equipment and with shooting. It would be foolish, and I haven't seen, and hope no one would suggest, that statistical results be thought of as something to take the place of knowledge and experience. Without the knowledge and experience to say what is "reasonably similar" in the first place, the statistical results aren't much use.
So "reasonably similar" puts a pretty strong limitation on how you can use a statistical analysis. But still people find it useful to say things like this is a 1 MOA rifle. Some may not care if it is 0.8 MOA or 1 MOA or 1.2 MOA, as long as it's in that range. In that case it is probably true that a more sophisticated way of doing a statistical analysis is a waste of time. On the other hand, there are constantly discussions on these forums about some new way of modifying a rifle, or some new attachment which will squeeze just a little more accuracy out of it. For someone interested in that, this kind of analysis can be very useful and I'll get back to that below.
Consider that 1 MOA rifle again, and let's say it was characterized by shooting 5 5-shot groups. Why five? Because people know that even under effectively identical conditions each group will be different in the placement of each individual shot and the variation from group to group needs to be "averaged" out to get a "representative" result. (Why not four or six? This is a qustion for another post.) Suppose that the next time the rifle is fired it is under "reasonably similar" conditions. Will the next shot fall exactly 1 MOA from the point of aim? Will the next five all fall within 1 MOA of the POI? Maybe, maybe not. There will be some variation from shot to shot no matter how close the conditions are to when the rifle was characterized. This natural variation that everyone knows is there can be described by statistics. The 1 MOA number is one statistic. It is commonly calculated by c-t-c (extreme spread). The problem is that when making a PREDICTION there is a tolerance (also called an uncertainty) in the this 1 MOA number, and here is where a big part of the complication comes in. When you go from talking about using statistical results for more than "that day at that moment" (that is, a description) to talking about later (that is, a prediction) that uncertainty becomes critical in knowing how much trust you can have in the prediction.
Suppose the whole characterization of the rifle was done over again under reasonably identical conditions. (So we assume the barrel was broken in and that the wear in the barrel from 25 shots isn't significant.) This time we might get 1.2 MOA. A third time 0.9 MOA. If we kept testing, we might find that 68% of the tests gave us a result of 1 +/- 0.2 MOA and 95% gave us 1 +/- 0.4 MOA. And maybe after all of this testing we would start to see the effect of wear on the barrel and so our results which were true for the original rifle are no longer of much use.
But, the whole mathematics of statistics and probability has been developed and TESTED over a few hundred years in solving this kind of problem. So by doing a more complete statistical analysis on the original 5 x 5 group data, you could get the 1 +/- 0.2 MOA result without having to shoot so many shots that you actually begin to wear the barrel.
Certainly for a lot of people the +/- 0.2 MOA information wouldn't be worth the extra trouble of doing the statistical calculations. But for some it might, and it might explain some of the "flyers" coming from a "1 MOA" rifle.
Now to the question of making a modification or adding some kind of "device" to a rifle to make it just a little more accurate. I'm not saying ANYTHING about the merits of any particular modification or device. These forums are full of heated arguments over such things. But, what a more complete statistical analysis does allow is a way of measuring whether any particular device provides an improvement, and if so, how much.
The common c-t-c (extreme spread) statistic for group size is very nice in that it can be done simply and quickly with minimal calculation and it's very easy to visualize. Suppose that we use a 5 x 5 set of shots and get a result of 1 MOA. Now we haven't calculated the uncertainty in that number but it's still there. Next we add our device, follow the same procedure and get 0.9 MOA. Again, we haven't calculated an uncertainty, but it's still there. So did the device really produce an improvement? Maybe, or maybe it was just the random variation we know exists from one set of shots to the next. This maybe is one reason some of the arguments about this kind of thing can go on forever and get so heated.
So here is where a more sophisticated analysis (such as CEP) than c-t-c is the right tool for the job. With the right set of statistics, but using the same data, the uncertainty in the predicted results can be calculated. So in the above case, you might find that without the device you could expect the rifle to average 1.1 +/- 0.2 MOA groups, and with it 1.0 +/- 0.2 MOA. The difference is so small that it might just be random chance. This tells you that to say with more certainty you have to shoot a few more groups to "average out" more of the uncertainty. Then you might get 1.1 +/- 0.1 and 0.9 +/- 0.1 and you could say with very good certainty that the device really made a difference. On the other hand, if the device didn't really do anything then you might get 1.1 +/- 0.1 MOA and 1.0 +/- 0.1 MOA and again it looks like the improvement is either less than about 0.1 MOA or just the result of random variation in shot groups.
As far as what "very good certainty" means, it's possible to calculate what that means in terms of a specific number using statistics and probability.
The down-side compared to the c-t-c statistic is that the calculations are long and tedious enough that doing them without a computer is really painful. Also, you lose the simplicity of being able to easily visulize what how the statistic is calculated. However, with a little experience, you can develop a "feel" for the more complicated statistics. Also, the simplicity of the c-t-c statistic also tends to give people a false sense of confidence in what it actually can predict.
The particular numbers used here are just to illustrate the idea, they aren't meant to say anything about any particular real device. But the point is that people spend a lot of time and money on trying to squeeze out a little more accuracy and the arguments over what works and doesn't never end. By using some more complete statistics, such as CEP, then it's possible to make more precise measurements so that it could be seen what works and what doesn't with out being mislead by the random variations from one target to the next.
I agree it wouldn't make sense to carry a computer around to try to figure out what to do during a match. The place it seems it would be useful would be in trying to improve AVERAGE performance of equipment and shooter so that you could expect to do better on average. That might even be more than I should say. Probably the best thing to say would be that using more complete statistical analysis can help characterize performance more accurately and precisely. Whether that is useful or not depends on what the individual is trying to achieve.