Confidence intervals for frequencies and proportions. Confidence interval. Confidence probability Definition of confidence probability

The calculation of the confidence interval is based on the average error of the corresponding parameter. Confidence interval shows within what limits with probability (1-a) is the true value of the estimated parameter. Here a is the significance level, (1-a) is also called the confidence level.

In the first chapter, we showed that, for example, for the arithmetic mean, the true population mean lies within 2 mean errors of the mean about 95% of the time. Thus, the boundaries of the 95% confidence interval for the mean will be from the sample mean by twice the mean error of the mean, i.e. we multiply the mean error of the mean by some factor that depends on the confidence level. For the mean and the difference of the means, the Student's coefficient (the critical value of the Student's criterion) is taken, for the share and difference of the shares, the critical value of the z criterion. The product of the coefficient and the average error can be called the marginal error of this parameter, i.e. the maximum that we can get when evaluating it.

Confidence interval for arithmetic mean : .

Here is the sample mean;

Average error of the arithmetic mean;

s- sample standard deviation;

n

f = n-1 (Student's coefficient).

Confidence interval for difference of arithmetic means :

Here, is the difference between the sample means;

- the average error of the difference of arithmetic means;

s 1 ,s 2 - sample standard deviations;

n1,n2

Critical value of the Student's criterion for a given level of significance a and the number of degrees of freedom f=n1 +n2-2 (Student's coefficient).

Confidence interval for shares :

.

Here d is the sample share;

– average share error;

n– sample size (group size);

Confidence interval for share differences :

Here, is the difference between the sample shares;

is the mean error of the difference between the arithmetic means;

n1,n2– sample sizes (number of groups);

The critical value of the criterion z at a given significance level a ( , , ).

By calculating the confidence intervals for the difference in indicators, we, firstly, directly see the possible values ​​of the effect, and not just its point estimate. Secondly, we can draw a conclusion about the acceptance or refutation of the null hypothesis and, thirdly, we can draw a conclusion about the power of the criterion.

When testing hypotheses using confidence intervals, one must adhere to next rule:

If the 100(1-a)-percent confidence interval of the mean difference does not contain zero, then the differences are statistically significant at the a significance level; on the contrary, if this interval contains zero, then the differences are not statistically significant.

Indeed, if this interval contains zero, then, it means that the compared indicator can be either more or less in one of the groups compared to the other, i.e. the observed differences are random.

By the place where zero is located within the confidence interval, one can judge the power of the criterion. If zero is close to the lower or upper limit of the interval, then perhaps with a larger number of compared groups, the differences would reach statistical significance. If zero is close to the middle of the interval, then it means that both the increase and decrease of the indicator in the experimental group are equally probable, and, probably, there really are no differences.

Examples:

To compare surgical mortality when using two different types of anesthesia: 61 people were operated on using the first type of anesthesia, 8 died, using the second - 67 people, 10 died.

d 1 \u003d 8/61 \u003d 0.131; d 2 \u003d 10/67 \u003d 0.149; d1-d2 = - 0.018.

The difference in lethality of the compared methods will be in the range (-0.018 - 0.122; -0.018 + 0.122) or (-0.14; 0.104) with a probability of 100(1-a) = 95%. The interval contains zero, i.e. hypothesis about the same mortality in two different types anesthesia cannot be denied.

Thus, mortality can and will decrease to 14% and increase to 10.4% with a probability of 95%, i.e. zero is approximately in the middle of the interval, so it can be argued that, most likely, these two methods really do not differ in lethality.

In the example considered earlier, the average tapping time was compared in four groups of students differing in their examination scores. Let's calculate the confidence intervals of the average pressing time for students who passed the exam for 2 and 5 and the confidence interval for the difference between these averages.

Student's coefficients are found from the tables of Student's distribution (see Appendix): for the first group: = t(0.05;48) = 2.011; for the second group: = t(0.05;61) = 2.000. Thus, confidence intervals for the first group: = (162.19-2.011 * 2.18; 162.19 + 2.011 * 2.18) = (157.8; 166.6) , for the second group (156.55- 2.000*1.88 ; 156.55+2.000*1.88) = (152.8 ; 160.3). So, for those who passed the exam for 2, the average pressing time ranges from 157.8 ms to 166.6 ms with a probability of 95%, for those who passed the exam for 5 - from 152.8 ms to 160.3 ms with a probability of 95%.

You can also test the null hypothesis using confidence intervals for the means, and not just for the difference in the means. For example, as in our case, if the confidence intervals for the means overlap, then the null hypothesis cannot be rejected. In order to reject a hypothesis at a chosen significance level, the corresponding confidence intervals must not overlap.

Let's find the confidence interval for the difference in the average pressing time in the groups who passed the exam for 2 and 5. The difference in the averages: 162.19 - 156.55 = 5.64. Student's coefficient: \u003d t (0.05; 49 + 62-2) \u003d t (0.05; 109) \u003d 1.982. Group standard deviations will be equal to: ; . We calculate the average error of the difference between the means: . Confidence interval: \u003d (5.64-1.982 * 2.87; 5.64 + 1.982 * 2.87) \u003d (-0.044; 11.33).

So, the difference in the average pressing time in the groups that passed the exam at 2 and at 5 will be in the range from -0.044 ms to 11.33 ms. This interval includes zero, i.e. the average pressing time for those who passed the exam with excellent results can both increase and decrease compared to those who passed the exam unsatisfactorily, i.e. the null hypothesis cannot be rejected. But zero is very close to the lower limit, the time of pressing is much more likely to decrease for excellent passers. Thus, we can conclude that there are still differences in the average click time between those who passed by 2 and by 5, we just could not detect them for a given change in the average time, the spread of the average time and sample sizes.

The power of the test is the probability of rejecting an incorrect null hypothesis, i.e. find differences where they really are.

The power of the test is determined based on the level of significance, the magnitude of differences between groups, the spread of values ​​in groups, and the sample size.

For Student's t-test and analysis of variance, you can use sensitivity charts.

The power of the criterion can be used in the preliminary determination of the required number of groups.

The confidence interval shows within what limits the true value of the estimated parameter lies with a given probability.

With the help of confidence intervals, you can test statistical hypotheses and draw conclusions about the sensitivity of the criteria.

LITERATURE.

Glantz S. - Chapter 6.7.

Rebrova O.Yu. - p.112-114, p.171-173, p.234-238.

Sidorenko E. V. - pp. 32-33.

Questions for self-examination of students.

1. What is the power of the criterion?

2. In what cases is it necessary to evaluate the power of criteria?

3. Methods for calculating power.

6. How to test a statistical hypothesis using a confidence interval?

7. What can be said about the power of the criterion when calculating the confidence interval?

Tasks.

For the vast majority of simple measurements, the so-called normal law of random errors is satisfied quite well ( Gauss law), derived from the following empirical provisions.

1) measurement errors can take a continuous series of values;

2) with a large number of measurements, errors of the same magnitude, but of a different sign, occur equally often,

3) the larger the random error, the lower the probability of its occurrence.

The graph of the normal Gaussian distribution is shown in Fig.1. The curve equation has the form

where is the distribution function of random errors (errors), which characterizes the probability of an error, σ is the root mean square error.

The value σ is not a random variable and characterizes the measurement process. If the measurement conditions do not change, then σ remains constant. The square of this quantity is called dispersion of measurements. The smaller the dispersion, the smaller the spread of individual values ​​and the higher the measurement accuracy.

The exact value of the root-mean-square error σ, as well as the true value of the measured quantity, is unknown. There is a so-called statistical estimate of this parameter, according to which the mean square error is equal to the mean square error of the arithmetic mean. The value of which is determined by the formula

where is the result i-th dimension; - arithmetic mean of the obtained values; n is the number of measurements.

The larger the number of measurements, the smaller and the more it approaches σ. If the true value of the measured quantity μ, its arithmetic mean value obtained as a result of measurements , and the random absolute error , then the measurement result will be written as .

The interval of values ​​from to , in which the true value of the measured quantity μ falls, is called confidence interval. Since it is a random variable, the true value falls into the confidence interval with a probability α, which is called confidence probability, or reliability measurements. This value is numerically equal to the area of ​​the shaded curvilinear trapezoid. (see pic.)

All this is true for a sufficiently large number of measurements, when is close to σ. To find the confidence interval and confidence level for a small number of measurements, which we deal with during the execution laboratory work, used Student's probability distribution. This is the probability distribution of a random variable called Student's coefficient, gives the value of the confidence interval in fractions of the root mean square error of the arithmetic mean .


The probability distribution of this quantity does not depend on σ 2 , but essentially depends on the number of experiments n. With an increase in the number of experiments n Student's distribution tends to a Gaussian distribution.

The distribution function is tabulated (Table 1). The value of the Student's coefficient is at the intersection of the line corresponding to the number of measurements n, and the column corresponding to the confidence level α

From this article you will learn:

    What confidence interval?

    What is the point 3 sigma rules?

    How can this knowledge be put into practice?

Nowadays, due to an overabundance of information associated with a large assortment of products, sales directions, employees, activities, etc., it's hard to pick out the main, which, first of all, is worth paying attention to and making efforts to manage. Definition confidence interval and analysis of going beyond its boundaries of actual values ​​- a technique that help you identify situations, influencing trends. You will be able to develop positive factors and reduce the influence of negative ones. This technology used in many well-known world companies.

There are so-called alerts", which inform managers stating that the next value in a certain direction went beyond confidence interval. What does this mean? This is a signal that some non-standard event has occurred, which may change the existing trend in this direction. This is the signal to that to sort it out in the situation and understand what influenced it.

For example, consider several situations. We have calculated the sales forecast with forecast boundaries for 100 commodity items for 2011 by months and actual sales in March:

  1. By " sunflower oil» broke through the upper limit of the forecast and did not fall into the confidence interval.
  2. For "Dry yeast" went beyond the lower limit of the forecast.
  3. On "Oatmeal Porridge" broke through the upper limit.

For the rest of the goods, the actual sales were within the specified forecast boundaries. Those. their sales were in line with expectations. So, we identified 3 products that went beyond the borders, and began to figure out what influenced the going beyond the borders:

  1. With Sunflower Oil, we entered a new trading network, which gave us additional sales volume, which led to going beyond the upper limit. For this product, it is worth recalculating the forecast until the end of the year, taking into account the forecast for sales to this chain.
  2. For Dry Yeast, the car got stuck at customs, and there was a shortage within 5 days, which affected the decline in sales and going beyond the lower border. It may be worthwhile to figure out what caused the cause and try not to repeat this situation.
  3. For Oatmeal, a sales promotion was launched, which resulted in a significant increase in sales and led to an overshoot of the forecast.

We identified 3 factors that influenced the overshoot of the forecast. There can be many more of them in life. To improve the accuracy of forecasting and planning, the factors that lead to the fact that actual sales can go beyond the forecast, it is worth highlighting and building forecasts and plans for them separately. And then take into account their impact on the main sales forecast. You can also regularly evaluate the impact of these factors and change the situation for the better for by reducing the influence of negative and increasing the influence of positive factors.

With a confidence interval, we can:

  1. Highlight destinations, which are worth paying attention to, because events have occurred in these areas that may affect change in trend.
  2. Determine Factors that actually make a difference.
  3. To accept weighted decision(for example, about procurement, when planning, etc.).

Now let's look at what a confidence interval is and how to calculate it in Excel using an example.

What is a confidence interval?

The confidence interval is the forecast boundaries (upper and lower), within which with a given probability (sigma) get the actual values.

Those. we calculate the forecast - this is our main benchmark, but we understand that the actual values ​​are unlikely to be 100% equal to our forecast. And the question arises to what extent may get actual values, if the current trend continues? And this question will help us answer confidence interval calculation, i.e. - upper and lower bounds of the forecast.

What is a given probability sigma?

When calculating confidence interval we can set probability hits actual values within the given forecast boundaries. How to do it? To do this, we set the value of sigma and, if sigma is equal to:

    3 sigma- then, the probability of hitting the next actual value in the confidence interval will be 99.7%, or 300 to 1, or there is a 0.3% probability of going beyond the boundaries.

    2 sigma- then, the probability of hitting the next value within the boundaries is ≈ 95.5%, i.e. the odds are about 20 to 1, or there is a 4.5% chance of going out of bounds.

    1 sigma- then, the probability is ≈ 68.3%, i.e. the chances are about 2 to 1, or there is a 31.7% chance that the next value will fall outside the confidence interval.

We formulated 3 Sigma Rule,which says that hit probability another random value into the confidence interval with a given value three sigma is 99.7%.

The great Russian mathematician Chebyshev proved a theorem that there is a 10% chance of going beyond the boundaries of a forecast with a given value of three sigma. Those. the probability of falling into the 3 sigma confidence interval will be at least 90%, while an attempt to calculate the forecast and its boundaries “by eye” is fraught with much more significant errors.

How to independently calculate the confidence interval in Excel?

Let's consider the calculation of the confidence interval in Excel (ie the upper and lower bounds of the forecast) using an example. We have a time series - sales by months for 5 years. See attached file.

To calculate the boundaries of the forecast, we calculate:

  1. Sales forecast().
  2. Sigma - standard deviation forecast models from actual values.
  3. Three sigma.
  4. Confidence interval.

1. Sales forecast.

=(RC[-14] (data in time series)-RC[-1] (model value))^2(squared)


3. Sum for each month the deviation values ​​from stage 8 Sum((Xi-Ximod)^2), i.e. Let's sum January, February... for each year.

To do this, use the formula =SUMIF()

SUMIF(array with numbers of periods inside the cycle (for months from 1 to 12); reference to the number of the period in the cycle; reference to an array with squares of the difference between the initial data and the values ​​of the periods)


4. Calculate the standard deviation for each period in the cycle from 1 to 12 (stage 10 in the attached file).

To do this, from the value calculated at stage 9, we extract the root and divide by the number of periods in this cycle minus 1 = ROOT((Sum(Xi-Ximod)^2/(n-1))

Let's use formulas in Excel =ROOT(R8 (reference to (Sum(Xi-Ximod)^2)/(COUNTIF($O$8:$O$67 (reference to an array with cycle numbers); O8 (reference to a specific cycle number, which we consider in the array))-1))

Using the Excel formula = COUNTIF we count the number n


By calculating the standard deviation of the actual data from the forecast model, we obtained the sigma value for each month - stage 10 in the attached file .

3. Calculate 3 sigma.

At stage 11, we set the number of sigmas - in our example, "3" (stage 11 in the attached file):

Also practical sigma values:

1.64 sigma - 10% chance of going over the limit (1 chance in 10);

1.96 sigma - 5% chance of going out of bounds (1 chance in 20);

2.6 sigma - 1% chance of going out of bounds (1 in 100 chance).

5) We calculate three sigma, for this we multiply the “sigma” values ​​\u200b\u200bfor each month by “3”.

3. Determine the confidence interval.

  1. Upper forecast limit- sales forecast taking into account growth and seasonality + (plus) 3 sigma;
  2. Lower Forecast Bound- sales forecast taking into account growth and seasonality - (minus) 3 sigma;

For the convenience of calculating the confidence interval for a long period (see attached file), we use Excel formula =Y8+VLOOKUP(W8;$U$8:$V$19;2;0), where

Y8- sales forecast;

W8- the number of the month for which we will take the value of 3 sigma;

Those. Upper forecast limit= "sales forecast" + "3 sigma" (in the example, VLOOKUP(month number; table with 3 sigma values; column from which we extract the sigma value equal to the month number in the corresponding row; 0)).

Lower Forecast Bound= "sales forecast" minus "3 sigma".

So, we have calculated the confidence interval in Excel.

Now we have a forecast and a range with boundaries within which the actual values ​​will fall with a given probability sigma.

In this article, we looked at what sigma and the three sigma rule are, how to determine a confidence interval, and what you can use this technique on practice.

Accurate forecasts and success to you!

How Forecast4AC PRO can help youwhen calculating the confidence interval?:

    Forecast4AC PRO will automatically calculate the upper or lower forecast limits for more than 1000 time series at the same time;

    The ability to analyze the boundaries of the forecast in comparison with the forecast, trend and actual sales on the chart with one keystroke;

In the Forcast4AC PRO program, it is possible to set the sigma value from 1 to 3.

Join us!

Download Free Forecasting and Business Intelligence Apps:


  • Novo Forecast Lite- automatic forecast calculation in excel.
  • 4analytics- ABC-XYZ analysis and analysis of emissions in Excel.
  • Qlik Sense Desktop and Qlik ViewPersonal Edition - BI systems for data analysis and visualization.

Test the features of paid solutions:

  • Novo Forecast PRO- forecasting in Excel for large data arrays.

Probabilities, recognized as sufficient to confidently judge the general parameters based on sample characteristics, are called fiduciary .

Usually, values ​​of 0.95 are chosen as confidence probabilities; 0.99; 0.999 (they are usually expressed as a percentage - 95%, 99%, 99.9%). The higher the measure of responsibility, the higher the level of confidence: 99% or 99.9%.

A confidence level of 0.95 (95%) is considered sufficient in scientific research in the field of physical culture and sports.

The interval in which the sample arithmetic mean of the general population is found with a given confidence probability is called confidence interval .

Assessment Significance Level is a small number α, the value of which implies the probability that it is outside the confidence interval. In accordance with the confidence probabilities: α 1 = (1-0.95) = 0.05; α 2 \u003d (1 - 0.99) \u003d 0.01, etc.

Confidence interval for mean (expectation) a normal distribution:

,

where is the reliability (confidence probability) of estimation; - sample mean; s - corrected standard deviation; n is the sample size; t γ is the value determined from the Student's distribution table (see Appendix, Table 1) for given n and γ.

To find the boundaries of the confidence interval of the mean value of the general population, it is necessary:

1. Calculate and s.

2. It is necessary to set the confidence probability (reliability) γ of estimation 0.95 (95%) or the significance level α 0.05 (5%)

3. According to the table t - Student's distributions (Appendix, Table 1) find the boundary values ​​of t γ .

Since the t-distribution is symmetrical about the zero point, it is sufficient to know only the positive value of t. For example, if the sample size is n=16, then the number of degrees of freedom (degrees of freedom, df) t– distributions df=16 - 1=15 . According to the table 1 application t 0.05 = 2.13 .

4. We find the boundaries of the confidence interval for α = 0.05 and n=16:

Limits of trust:

For large sample sizes (n ≥ 30) t – Student's distribution becomes normal. Therefore, the confidence interval for for n ≥ 30 can be written as follows:

where u are the percentage points of the normalized normal distribution.

For standard confidence probabilities (95%, 99%; 99.9%) and significance levels α values ​​( u) are given in Table 8.

Table 8

Values ​​for standard confidence levels α

α u
0,05 1,96
0,01 2,58
0,001 3,28

Based on the data of example 1, we define the boundaries of the 95% confidence interval (α = 0.05) for the average result of jumping up from the spot. In our example, the sample size is n = 65, then recommendations for a large sample size can be used to determine the boundaries of the confidence interval.

Confidence interval are the limiting values ​​of the statistical quantity, which, with a given confidence probability γ, will be in this interval with a larger sample size. Denoted as P(θ - ε . In practice, the confidence probability γ is chosen from the values ​​γ = 0.9 , γ = 0.95 , γ = 0.99 sufficiently close to unity.

Service assignment. This service defines:

  • confidence interval for the general mean, confidence interval for the variance;
  • confidence interval for the standard deviation, confidence interval for the general fraction;
The resulting solution is saved in a Word file. Below is a video instruction on how to fill in the initial data.

Example #1. On a collective farm, out of a total herd of 1,000 sheep, 100 sheep were subjected to selective control shearing. As a result, an average wool shear of 4.2 kg per sheep was established. Determine with a probability of 0.99 the standard error of the sample in determining the average wool shear per sheep and the limits in which the shear value lies if the variance is 2.5. The sample is nonrepetitive.
Example #2. From the batch of imported products at the post of the Moscow Northern Customs, 20 samples of product "A" were taken in the order of random re-sampling. As a result of the check, the average moisture content of the product "A" in the sample was established, which turned out to be 6% with a standard deviation of 1%.
Determine with a probability of 0.683 the limits of the average moisture content of the product in the entire batch of imported products.
Example #3. A survey of 36 students showed that the average number of textbooks read by them per academic year turned out to be 6. Assuming that the number of textbooks read by a student per semester has a normal distribution law with a standard deviation equal to 6, find: A) with a reliability of 0 .99 interval estimate for the mathematical expectation of this random variable; B) with what probability can it be argued that the average number of textbooks read by a student per semester, calculated for this sample, deviates from the mathematical expectation in absolute value by no more than 2.

Classification of confidence intervals

By the type of parameter being evaluated:

By sample type:

  1. Confidence interval for infinite sampling;
  2. Confidence interval for the final sample;
Sampling is called re-sampling, if the selected object is returned to the general population before choosing the next one. The sample is called non-repetitive. if the selected object is not returned to the general population. In practice, one usually deals with non-repeating samples.

Calculation of the mean sampling error for random selection

The discrepancy between the values ​​of indicators obtained from the sample and the corresponding parameters of the general population is called representativeness error.
Designations of the main parameters of the general and sample population.
Sample Mean Error Formulas
reselectionnon-repetitive selection
for middlefor sharefor middlefor share
The ratio between the sampling error limit (Δ) guaranteed with some probability P(t), and the average sampling error has the form: or Δ = t μ, where t– confidence coefficient, determined depending on the probability level Р(t) according to the table of the Laplace integral function .

Formulas for calculating the sample size with a proper random selection method

Selection methodSample size formulas
for middlefor share
Repeated
non-repeating
You can find the sample size using a calculator.

Confidence interval method

The algorithm for finding the confidence interval includes the following steps:
  1. the confidence probability γ (reliability) is given.
  2. the estimate of the parameter a is determined from the sample.
  3. from the relation P(α 1 the confidence interval (a - ε ; a + ε) is calculated.

Example #1. When checking the suitability of a batch of tablets (250 pieces), it turned out that the average weight of a tablet is 0.3 g, and the standard deviation of the weight is 0.01 g. Find the confidence interval in which the norm of tablet weight falls with a probability of 90%.
Solution.

Example. Based on the results of the sample observation (sample B appendix), calculate the unbiased estimates of the mean, variance, and standard deviation of the population.
Download Solution

Example. Find the confidence intervals for estimating the mean and the standard deviation of the populations at a confidence level y, if B and y are sampled from the populations.
Download Solution

Example.

1. Using the results of the calculations performed in task No. 2 and assuming that these data were obtained using proper random 10% non-repetitive selection, determine:
a) the limits beyond which, with a confidence probability of 0.954, the average value of the attribute calculated for the general population will not go;
b) how to change the sample size to reduce the marginal error of the mean by 50%.
2. Using the results of the calculations performed in task No. 2 and assuming that these data were obtained using repeated selection, determine:
a) the limits beyond which in the general population the value of the share of enterprises whose individual values ​​of the attribute exceed the mode with a confidence probability of 0.954 will not go;
b) how to change the sample size to reduce the marginal share error by 20%.
Guidelines

Exercise. The production line for the production of parts of the same type was subjected to reconstruction Two samples were given showing the percentage of rejects in batches of parts produced on this line before and after reconstruction Can it be reliably stated that after the reconstruction the percentage of rejects in batches of parts decreased?

Example. Below are data on drilling costs (c.u.) for 49 wells of the West Siberian oil base of Russia:

129 142 132 61 96 96 142 17 135 32
77 58 37 132 79 15 145 64 83 120
11 54 48 100 43 25 67 25 140 130
48 124 29 107 135 101 93 147 112 121
89 97 60 84 46 139 43 145 29
For the purpose of estimating the cost of drilling a new well:
  1. to conduct a sample in a proper random way with a volume of n=5;
  2. determine the interval values ​​of the average of the general population (X) according to the calculated sample indicators (X, s 2) using the Student's t-distribution function at a significance level of α=0.05;
  3. determine the point value of the average of the general population (X) according to the initial data;
  4. evaluate the correctness of interval calculations by comparing the point value (X) with the interval value calculated from the sample;
Solution using this calculator:

1. Select 5 values ​​from the table. Let it be column 3: 132, 37, 48, 29, 60.
In chapter "Type of statistical series" choose Discrete Series. Enter 5 in the Number of lines field.

2. Enter the initial data.

In the Number of groups field, select " do not group».

Field " Confidence interval of the general average, variance and standard deviation"Indicate the value γ = 0.95 (which corresponds to α=0.05).

In the "Sampling" field, specify the value 10 (since 5 out of 49 values ​​were chosen, which corresponds to 10.2% (5 / 49x100%)).

In chapter "Outputs to report" mark the first item "Confidence interval for the general average".

3. The resulting solution is saved in Word format (download).
Before calculations, a preliminary table is created in which the number of repetitions of X values ​​is calculated.

x(x - x sr) 2
29 1036.84
37 585.64
48 174.24
60 1.44
132 5012.64
306 6810.8
In this case, all values ​​of X occur exactly once. Interval values ​​of the population mean are calculated in the section " Interval estimation of the population center”.
Note: in this case, the calculation uses Estimation of standard deviation.

Task number 2: In order to study the time spent on the manufacture of one part, the factory workers conducted a 10% random non-repetitive sample, which resulted in the distribution of parts by time spent, presented in App. B.
Based on these data, calculate:
a) the average time spent on the manufacture of one part;
b) mean square deviations (dispersion) and standard deviation;
c) coefficient of variation;
d) with a probability of 0.954, the marginal error of the sample mean and the possible boundaries within which the average time spent on manufacturing one part at the factory is expected;
e) with a probability of 0.954, the marginal error of the sample fraction and the boundary specific gravity number of parts with minimal cost time for their manufacture. Before making calculations, it is necessary to write down the conditions of the problem and fill in the table. 2.1

Solution.
To obtain a solution, specify the following parameters:

  • Type of statistical series: A discrete series is given;
  • Number of groups: do not group;
  • To build a confidence interval for the general mean, variance and standard deviation: y= 0.954 ;
  • To build the confidence interval of the general fraction: y= 0.954 ;
  • Sample: 10 ;
  • Output to the report: Confidence interval for the general average, Confidence interval for the general share;

Task number 3: Using the results of the calculations performed in task No. 2 and assuming that these data were obtained using repeated selection, determine:

b) how to change the sample size to reduce the marginal share error by 20%.

Solution.
Using the results of the calculations performed in task No. 2 and assuming that these data were obtained using repeated selection, determine:
a) the limits beyond which the value of the share of enterprises in which the individual values ​​of the attribute exceed the mode with a confidence probability of 0.954 will not exceed in the general population;
b) how to change the sample size to reduce the marginal share error by 20%.

Task number 4: A 20% random non-repetitive sample was taken from a batch of light bulbs to determine the average coil weight. The sampling results are as follows. Weight, mg: 38-40; 40-42; 42-44; 44-46. Number of spirals: 15; 30; 45; 10. Determine with a probability of 0.95 the confidence limits within which the average weight of the coil lies for the entire batch of electric lamps.

Solution.
Enter the following parameters:

  • Type of statistical series: An interval series is specified;
  • To build a confidence interval for the general mean, variance and standard deviation: y = 0.95 ;
  • Sample: 20 ;
  • Report: Confidence interval for the general mean.

Task number 5: At the factory of electric lamps from a batch of products in the amount of 16,000 pcs. lamps taken on a sample of 1600 pieces. (random, non-repetitive selection), of which 40 pcs. turned out to be married. Determine with a probability of 0.997 the limits in which the percentage of rejects will be for the entire batch of products.

Solution.
Here N = 16000 , n = 1600 , w = d / n = 40/1600 = 0.025.

 

It might be useful to read: