If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is called a discrete variable. Univariate data. When we conduct a study that looks at only one variable, we say that we are working with univariate data. Suppose, for example, that we conducted a survey to estimate the average weight of high school students. Since we are only working with one variable (weight), we would be working with univariate data. Bivariate data. When we conduct a study that examines the relationship between two variables, we are working with bivariate data. Suppose we conducted a study to see if there were a relationship between the height and weight of high school students. Since we are working with two variables (height and weight), we would be working with bivariate data.

When a population element can be selected more than one time, we are sampling with replacement. When a population element can be selected only one time, we are sampling without replacement. The Mean and the Median Measures of central tendency refer to the summary measures used to describe the most "typical" value in a set of values. The two most common measures of central tendency are the median and the mean, which can be illustrated with an example. Suppose we draw a sample of five women and measure their weights. They weigh 100 pounds, 100 pounds, 130 pounds, 140 pounds, and 150 pounds.

To find the median, we arrange the observations in order from smallest to largest value. If there is an odd number of observations, the median is the middle value. If there is an even number of observations, the median is the average of the two middle values. Thus, in the sample of five women, the median value would be 130 pounds; since 130 pounds is the middle weight. The mean of a sample or a population is computed by adding all of the observations and dividing by the number of observations.

As measures of central tendency, the mean and the median each have advantages and disadvantages. Some pros and cons of each measure are summarized below.

The median may be a better indicator of the most typical value if a set of scores has an outlier. An outlier is an extreme value that differs greatly from other values. However, when the sample size is large and does not include outliers, the mean score usually provides a better measure of central tendency.
If you add a constant to every value, the mean and median increase by the same constant. Suppose you multiply every value by a constant. Then, the mean and the median will also be multiplied by that constant.

Measures of variability: Statisticians use summary measures to describe the amount of variability or spread in a set of data. The most common measures of variability are the range, the interquartile range (IQR), variance, and standard deviation. Range: the range is the difference between the largest and smallest values in a set of values. interquartile range: The interquartile range (IQR) is a measure of variability, based on dividing a data set into quartiles. Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively. o Q1 is the "middle" value in the first half of the rank-ordered data set. o Q2 is the median value in the set. o Q3 is the "middle" value in the second half of the rank-ordered data set. The interquartile range is equal to Q3 minus Q1. In some texts, the interquartile range is defined differently. It is defined as the difference between the largest and smallest values in the middle 50% of a set of data Variance: In a population, variance is the average squared deviation from the population mean, as defined by the following formula: 2 = ( Xi - )2 / N where 2 is the population variance, is the population mean, Xi is the ith element from the population, and N is the number of elements in the population. The variance of a sample, is defined by slightly different formula, and uses a slightly different notation: s2 = ( xi - x )2 / ( n - 1 ) Standard Deviation: The Standard Deviation is the square root of the variance. Thus, the standard deviation of a population is: = sqrt * 2 + = sqrt * ( Xi - )2 / N ] and the standard deviation of a sample is: s = sqrt [ s2 + = sqrt * ( xi - x )2 / ( n - 1 ) ]
If you add a constant to every value, the distance between values does not change. As a result, all of the measures of variability (range, interquartile range, standard deviation, and variance) remain the same. On the other hand, suppose you multiply every value by a constant. This has the effect of multiplying the range, interquartile range (IQR), and standard deviation by that constant. It has an even greater effect on the variance. It multiplies the variance by the square of the constant.

Measures of position: Statisticians often talk about the position of a value, relative to other values in aset of observations. The most common measures of position are quartiles, percentiles, and standard scores (aka, z-scores). Percentiles: Assume that the elements in a data set are rank ordered from the smallest to the largest. The values that divide a rank-ordered set of elements into 100 equal parts are called percentiles. An element having a percentile rank of Pi would have a greater value than i percent of all the elements in the set. Thus, the observation at the 50th percentile would be denoted P50, and it would be greater than 50 percent of the observations in the set. An observation at the 50th percentile would correspond to the median value in the set. Quartiles: Quartiles divide a rank-ordered data set into four equal parts. The values that divide each part are called the first, second, and third quartiles; and they are denoted by Q1, Q2, and Q3, respectively. Note the relationship between quartiles and percentiles. Q1 corresponds to P25, Q2 corresponds to P50, Q3 corresponds to P75. Q2 is the median value in the set.

Standard Scores (z-Scores): A standard score (aka, a z-score) indicates how many standard deviations an element is from the mean. A standard score can be calculated from the following formula. z = (X - ) / where z is the z-score, X is the value of the element, is the mean of the population, and is the standard deviation.

Right skewed: less observations on the right side (on x-axis of a rank ordered graph) the center of a distribution is located at the median of the distribution. This is the point in a graphic display where about half of the observations are on either side. Distributions with one clear peak are called unimodal, and distributions with two clear peaks are called bimodal. When a symmetric distribution has a single peak at the center, it is referred to as bell-shaped. When the observations in a set of data are equally spread across the range of the distribution, the distribution is called a uniform distribution. A uniform distribution has no clear peaks. Like a bar chart, a histogram is made up of columns plotted on a graph. Usually, there is no space between adjacent columns. Here is the main difference between bar charts and histograms. With bar charts, each column represents a group defined by a categorical variable; and with histograms, each column represents a group defined by a quantitative variable. One implication of this distinction: it is always appropriate to talk about the skewness of a histogram; that is, the tendency of the observations to fall more on the low end or the high end of the X axis. With bar charts, however, the X axis does not have a low end or a high end; because the labels on the X axis are categorical - not quantitative. As a result, it is less appropriate to comment on the skewness of a bar chart. Frequency vs. Cumulative Frequency: In a data set, the cumulative frequency for a value x is the total number of scores that are less than or equal to x. Absolute vs. Relative Frequency: Frequency counts can be measured in terms of absolute numbers or relative numbers (e.g.,proportions or percentages). Columns in the chart have the same shape, whether the Y axis is labelled with actual frequency counts or with percentages. Discrete vs. Continuous Variables: In a cumulative frequency plot, Q1 is the value for which the cumulative percentage is 25%. To find Q3, follow the grid line to the right from the Y axis at 75%. Patterns of Data in Scatterplots: Scatterplots are used to analyze patterns in bivariate data. These patterns are described in terms of linearity, slope, and strength.

Linearity refers to whether a data pattern is linear (straight) or nonlinear (curved). Slope refers to the direction of change in variable Y when variable X gets bigger. If variable Y also gets bigger, the slope is positive; but if variable Y gets smaller, the slope is negative. Strength refers to the degree of "scatter" in the plot. If the dots are widely spread, the relationship between variables is weak. If the dots are concentrated around a line, the relationship is strong.

Additionally, scatterplots can reveal unusual features in data sets, such as clusters, gaps, and outliers. When the slope is positive in one half of a scatterplot and negative in the other half, the slope for the entire scatterplot is zero. Methods of Data Collection: There are four main methods of data collection. Census Sample survey Experiment: controlled study in which the researcher attempts to understand cause-andeffect relationships. The study is "controlled" in the sense that the researcher controls (1) how subjects are assigned to groups and (2) which treatments each group receives. Observation: Like experiments, observational studies attempt to understand cause-andeffect relationships. However, unlike experiments, the researcher is not able to control (1) how subjects are assigned to groups and/or (2) which treatments each group receives. Resources: sampling is cheaper than census. Generalization: random sampling can be generalized, experiments cannot be. Casual (Cause and effect): experiment is better than observation

