A histogram is a graphical representation of data grouped into bins, showing the distribution of data values or proportions falling into each bin. It can approximate a probability density function (PDF) by making the bin widths smaller. A PDF is a curve where the area under the curve in an interval gives the probability of a data point falling in that interval. The student created a histogram using test data on the burst pressures of heat pipes, then fitted a PDF curve through the histogram bin midpoints to more closely approximate the underlying distribution.
A histogram is a graphical representation of data grouped into bins, showing the distribution of data values or proportions falling into each bin. It can approximate a probability density function (PDF) by making the bin widths smaller. A PDF is a curve where the area under the curve in an interval gives the probability of a data point falling in that interval. The student created a histogram using test data on the burst pressures of heat pipes, then fitted a PDF curve through the histogram bin midpoints to more closely approximate the underlying distribution.
A histogram is a graphical representation of data grouped into bins, showing the distribution of data values or proportions falling into each bin. It can approximate a probability density function (PDF) by making the bin widths smaller. A PDF is a curve where the area under the curve in an interval gives the probability of a data point falling in that interval. The student created a histogram using test data on the burst pressures of heat pipes, then fitted a PDF curve through the histogram bin midpoints to more closely approximate the underlying distribution.
A good way of understanding a Pdf is to start with a histogram. Histogram is a preferred graphical way of presenting data which have been collected in categories. Data samples in categories or bins can not be plotted in scatter plots.
Histograms are similar to bar charts except that:
(i) It is drawn to represent the proportion (fraction) in each category (bin). (ii) Bar width represents the range for the category (bin). (iii) Bar height is given by Height=Fraction/width. (iv) Bar area (not height) represents the proportion. (v) The bars are adjacent (no gaps) the abscissa is a continuous variable. (vi) The proportion in each category is also the probability for belonging to that category. In other words, the next data point has the highest probability to fall in the category of highest proportion (area).
Histogram and bar charts will look very different if the category widths are not the same. Please look at the example on the first page of: http://en.wikipedia.org/wiki/Histogram.
For example, we would make a bar chart with two bars to show the number of men and women in a population. Lets do a simple example for a histogram:
The NEW ENERGY COMPANY makes pressurized heat pipes to sell commercially. The product performs well but there is concern that the heat pipes may burst. A co-op student is hired to test the burst pressure of heat pipes that are manufactured. He runs tests on 20 samples and gets the following table to make a histogram:
In this simple case the category widths are the same (1 atm). The proportion for each category is the same as the probability of the next data point to be in that pressure range.
We can get more points if we make smaller bins or categories e.g. 3 3.1, 3.1 3.2 etc.. When we do that the histogram shape approaches the probability density function (Pdf). Therefore, histograms are an approximation of a Pdf. 2 of 3 We can draw a Pdf curve to approximate this distribution by using the 3 points from the above table. Using EXCEL, we can get the curve going through these 3 points:
( ) 550 245 25 2 + = x x x pdf in % or ( ) 50 . 5 45 . 2 25 . 0 2 + = x x x pdf in fractions.
The plot is shown below:
Definition: The probability density function is a curve; the area under the curve in an interval gives the probability of a data point to be in that interval. It can be obtained by smoothing a histogram.
If we use P as the probability, then:
Mathematically, ( ) x P dx dP x pdf A A ~ = Therefore the probability:
For the interval x A from a to b: } = A b a dx pdf P ) ( = area under the curve (just like the histogram!)
What is the probability that the next data point will belong to the range 3.5 to 6.5? The answer is given by: ( ) ( ) % % . dx x x . x . P . . 100 94 75 93 550 245 25 5 6 5 3 5 6 5 3 2 ~ ~ = + = s s }
3 of 3 We can do this calculation for other intervals and compare with the histogram. We expect the Pdf to approximate the histogram:
3.5 4.5 ( ) ( ) % . dx x x . x . P . . 30 9 27 550 245 25 5 4 5 3 5 4 5 3 2 ~ = + = s s }
4.5 5.5 ( ) ( ) % . dx x x . x . P . . 50 9 47 550 245 25 5 5 5 4 5 5 5 4 2 ~ = + = s s }
5.5 6.5 ( ) ( ) % . dx x x . x . P . . 20 9 17 550 245 25 5 6 5 5 5 6 5 5 2 ~ = + = s s }