Standard Deviation and Normal Distribution in Six Sigma

Standard deviation is a measure of dispersion, or how spread out a data set is. It is calculated by taking the square root of the variance, which is the average of the squared differences between each data point and the mean.

Standard deviation is a useful measure because it is expressed in the same units as the original data, making it easier to interpret.(2) For example, if the standard deviation of a set of heights is 3 inches, it means that the majority of the data points are within 3 inches of the mean.

A normal distribution is a type of probability distribution that is symmetrical and bell-shaped. It is defined by its mean, which is the center of the distribution, and its standard deviation, which determines the width of the distribution. The standard deviation is also known as the “standard error,” as it represents the amount of error in the mean.(2)

The normal distribution is important because it is used to model many real-world phenomena, such as IQ scores, height, and weight. It is also the basis for statistical tests such as the t-test and z-test, which are used to compare means and determine the likelihood that a result is due to chance.

The normal distribution is defined by a curve, with the majority of the data points clustered around the mean and fewer data points towards the extremes. The curve is shaped like a bell, hence the name “bell curve.” In a normal distribution, about 68% of the data falls within one standard deviation of the mean, 95% within two standard deviations, and 99.7% within three standard deviations.

One of the key characteristics of the normal distribution is that it is continuous, meaning that there is an infinite number of possible values between any two points on the curve. This is in contrast to a discrete distribution, in which only a finite number of possible values exist.

The normal distribution is also known as the Gaussian distribution, after the mathematician Carl Friedrich Gauss, who developed the theory of least squares, which is used to fit a normal distribution to a set of data.(3) Gauss’s work laid the foundation for the field of statistics and has significantly impacted how we analyze data.

Applications in Six Sigma

In Six Sigma, normal distribution and standard deviation are used to understand data distribution and identify opportunities for process improvement. Standard deviation is often used to identify opportunities for process improvement by identifying processes that have a high level of variability, which can lead to defects or poor quality outcomes.

By understanding the normal distribution and using standard deviation to measure the dispersion of data, Six Sigma practitioners can identify patterns and trends in data and use this information to improve processes and reduce variability. This can lead to higher quality outcomes and increased efficiency.

These techniques are used by Lean Six Sigma Black Belts while leading process improvement projects. They are required learning under the Six Sigma Body of Knowledge for Black Belt and Master Black Belt practitioners. You can gain your Lean Six Sigma Black Belt certification from the Management and Strategy Institute.

Population versus Sample Standard Deviation

In statistics, standard deviation measures a dataset’s variation or dispersion. It is an important tool used in statistical analysis to measure how spread out the data is. Two common standard deviation types are Population Standard Deviation and Sample Standard Deviation.

Population Standard Deviation measures the standard deviation of a population, which is the entire group of people, objects, or events being studied. On the other hand, Sample Standard Deviation is used to measure the standard deviation of a sample, which is a smaller subset of the population being studied.

The main difference between Population Standard Deviation and Sample Standard Deviation is the formula used to calculate them. Population Standard Deviation uses a formula that divides the sum of the squared differences between each data point, and the population mean by the total number of data points in the population. Sample Standard Deviation, on the other hand, uses a slightly different formula that divides the sum of the squared differences between each data point and the sample mean by the sample size minus one.

So why is understanding the difference between Population Standard Deviation and Sample Standard Deviation important for a Six Sigma project? Six Sigma is a data-driven approach to quality improvement that seeks to reduce defects and errors in a process to 3.4 defects per million opportunities. To achieve this goal, Six Sigma practitioners use statistical tools and methods, including standard deviation, to identify and reduce variability in a process.

In a Six Sigma project, it is vital to understand whether the standard deviation is being calculated for the entire population or just a population sample. Suppose the standard deviation is being calculated for a sample. In that case, it is important to use the Sample Standard Deviation formula to account for the fact that the sample may not perfectly represent the population as a whole. Failing to use the correct formula could lead to inaccurate conclusions and decisions.

For example, if a Six Sigma project is focused on improving the quality of a product, the standard deviation of the product’s weight could be calculated to identify the amount of variation in weight. If the standard deviation is calculated for a sample of the product, using the Population Standard Deviation formula could underestimate the amount of variation in weight, leading to incorrect conclusions about the process and potential improvements.

Example of calculating Standard deviation

Let’s say we have a set of data consisting of the weights of ten apples, in grams:

75, 78, 80, 83, 85, 86, 89, 92, 94, 97

To calculate the standard deviation of this data set, we first need to find the mean, or average, weight of the apples. We do this by adding up all of the weights and dividing by the total number of apples:

Mean = (75 + 78 + 80 + 83 + 85 + 86 + 89 + 92 + 94 + 97) / 10 = 86.9

Next, we need to calculate the variance, which is the average of the squared differences between each data point and the mean. We do this by subtracting the mean from each data point, squaring the difference, adding up all of the squared differences, and dividing by the total number of data points:

Variance = [(75 – 86.9)^2 + (78 – 86.9)^2 + (80 – 86.9)^2 + (83 – 86.9)^2 + (85 – 86.9)^2 + (86 – 86.9)^2 + (89 – 86.9)^2 + (92 – 86.9)^2 + (94 – 86.9)^2 + (97 – 86.9)^2] / 10

Variance = 67.61

Finally, we can calculate the standard deviation by taking the square root of the variance:

Standard Deviation = sqrt(67.61) = 8.221

So the standard deviation of the weights of the ten apples is 8.221 grams. This tells us how much variation there is in the data set and helps us to understand how tightly clustered the weights are around the mean.

Understanding the difference between Population Standard Deviation and Sample Standard Deviation is important for statistical analysis and Six Sigma projects. Using the correct formula for the type of data being analyzed is crucial to accurately measure the amount of variation in a process and identify areas for improvement.

What tools can a manufacturer use to determine standard deviation?

Control Charts: Control charts are a type of graph that shows how a process variable changes over time. They can help manufacturers to identify trends, patterns, and shifts in their process, and to calculate standard deviation based on the data collected over time.

Histograms: Histograms are graphs showing the frequency distribution of a data set. They can be used to identify the shape of the distribution, any outliers, and the spread of the data, all of which are important inputs in calculating standard deviation.
Statistical Software: Many software programs are available that can perform statistical analysis, including calculating standard deviation. Examples of statistical software include Minitab, SAS, and R.
Excel: Excel is a widely used tool for data analysis, and includes built-in functions for calculating standard deviation. The STDEV and STDEVP functions can be used to calculate sample and population standard deviation, respectively.

All of these tools are part of the certified Six Sigma Black Belt’s toolkit.

In general, manufacturers should choose the statistical tool best suited to the type of data they are analyzing and the precision required for their process. By calculating standard deviation and using other statistical tools, manufacturers can gain insight into the variability of their process, identify areas for improvement, and make data-driven decisions to improve quality and reduce defects.

Learn More

View MSI’s complete line of Six Sigma certifications here.

References:

Rei Takako

BS, Business Management
Chicago, IL

Tagged six sigma question