• Definition of content analysis
• Uses of content analysis
• Characteristics of content analysis
• Steps in content analysis
• Probability and non-probability sampling
Some Definitions…
• Defined as: “a research technique for the objective, systematic, and
quantitative description of the manifest content of communication.”

• Krippendorf (2004) defines it as a research technique for making

replicable and valid references from data to their context.

• Kerlinger’s (2000) defines it as a method of studying and analyzing

communication in a systematic, objective, and quantitative manner for the
purpose of measuring variables.

• “Any technique for making inferences by objectively and systematically

identifying specified characteristics of messages” (Holsti 1969)

• “The study of recorded human communications” (Babbie 2010: 356).

• One of the major mass communication research methods
• Allows us to describe the nature of content systematically
• Very useful because allows one to know what is present and
what is not present
• Content analyses tell us nothing about content effects; but
still useful for effects studies as they allow us to characterize
the content that is (or isn’t) having the studied effect
• Content analysis cannot establish causes or tell you
anything about the intentions of the message’s producers
• You are only analysing the product, stripped of both
• Describing Communication Content
• Fact-checking (Analysis of accuracy and sources)
• Testing Hypotheses
- E.g. “If the source has characteristic A, then messages containing elements x and y will be produced;
if the source has characteristic B, then messages with elements w and z will be produced”
• Comparing Media Content to the “Real World”
- E.g. Many content analyses are reality checks in which the portrayal of a certain group,
phenomenon, trait, or characteristic is assessed against a standard taken from real life (i.e. violence
• Assessing the Image of Particular Groups in Society
- E.g. media images of certain minority or otherwise notable groups
• Establishing a Starting Point for Studies of Media Effects
- E.g. relationship between television content and perceptions of reality
• Quantitative content analysis (e.g. frequencies of words,
occurrences of phrases or labels, word counts, size of pictures,
placement within the paper, running order)
– Concerned with producing numerical data
– Use of numbers makes quant very precise
• can report exact increases or decreases or other changes in
degree (as opposed to simply saying there is a lot or a little)
– Usually larger samples (generalization possible)

• Qualitative content analyses (e.g. themes, stereotypes,

representation, labelling, discourse and visual analyses)
– Generally more in-depth
– Often small samples makes generalization difficult

• Objective
• Permits multiple researchers to examine the same
content and come to the same conclusions
• This is possible because content analysis is
• Systematic
• Specifies an unambiguous set of rules or
procedures for coding content (theoretically,
anyone who codes will arrive at same
• Quantitative
• Quantification is important in fulfilling that objective
because it aids researchers in the quest for precision.
- The statement “Seventy percent of all prime-time programs
contain at least one act of violence” is more precise than “Most
shows are violent.”
• Focuses mostly on manifest (outward) content and
less on latent (hidden, between the lines) content
• E.g. “that is a nice shirt” (it is outwardly complimentary, but
could be sarcastic, which would give it a different latent

1. Formulate the research question or hypothesis.

2. Think of a Research Design
3. Define the population (i.e. universe in question) and
select a sample from the population.
4. Select and define a unit of analysis.
5. Create a codebook and a coding sheet
6. Start coding and test for reliability
7. Run statistical analyses and interpret findings
1. Formulate the research question or hypothesis

• Quantitative content analysis is most efficient when explicit

hypotheses or research questions are posed. Describing the
value of such explicitness, McCombs (1972) argued that a
hypothesis (or, presumably, research question) “gives guidance
to the observer trying to understand the complexities of reality.
Those who start out to look at everything in general and
nothing in particular seldom find anything at all” (p. 5).
• RQ e.g. How visible are the Brexit negotiations on British news
(BBC news at 10)?
• Can you think of a hypothesis based on the above question?
• Research design addresses questions of the study’s
time frame and how many data points are used.
• It also addresses any comparisons that may be involved.
• Comparisons can be between media (contrasting one
communicator or one medium with another), within
media (comparing among networks or newspapers with
one another), longitudinal or across time (within- or
between-media but at different points in time),
between markets, between nations, and so on.
Research Design Considerations
• How can the research question or hypothesis best be
• How Will Coders Know the Data When They See It?
- E.g. The analyst must move from the conceptual level to the
operational level, describing abstract or theoretical variables in terms of
actual measurement procedures that coders can apply.
• How Much Data Will Be Needed to Test the Hypothesis or
Answer the Research Question? Will sampling from that
population be necessary? What kind of sample? How
large a sample?
• How Can the Quality of the Data Be Maximized?
• What Kind of Data Analysis Will Be Used?
3. Sampling

• Population – “a group or class of subjects, variables,

concepts, or phenomena.”
• Census – studying every member or item in a population.
Most of the time, this is not practical.
• Sample – “a subset of the population that is
representative of the entire population.”
- Representativeness is very important. If a sample is not representative of the population “is
inadequate for testing purposes because the results cannot be generalized to the population”
• The inferences drawn from a probability sample are
subject to sampling error, but statistical procedures
enable researchers to generate estimates of this
sampling error with a given level of probability.
• If researchers assemble samples in any way other than
random sampling (and many do or must), the
representativeness of the sample is biased, and
sampling error cannot be calculated accurately.
• The larger the sample size, the less biased will be the
results, but the more resources the project will require.

• Probability sampling – involves random sampling;

every unit has a chance of selection. Amount of
sampling error can be calculated.

• Non-probability sampling – does not employ random

sampling techniques. Amount of sampling error cannot
be calculated.

• [Look at the advantages and disadvantages of the types

of sampling listed in the book. Know, given your
situation, what you should do. Sometimes it is not
possible to take an SRS (simple random sample)].

• Simple random sample – each unit has equal chance of

selection. (Researchers often use random number generators)

• Characteristics found more frequently in the population—

whether of TV dramas, news stories, or poems—also will turn
up more frequently in the sample, and less frequent
characteristics in the population will turn up less frequently in
the sample. This occurs because of the laws of probability.

• Sometimes the best probability sample is a simple random

sample, whereas at other times, a stratified or systematic
sample might work best.
• Systematic random sampling - every nth
item/subject/member is selected from a population. Here,
you must know what size population you are dealing with and
how large of a sample you desire (e.g. to select a sample of
100 from a population of 500, you would select every 5th
member). A starting point is selected at random.
• Because the starting point is randomly selected, each unit has
an equal chance of being selected.
• It requires a listing of all possible units for sampling. If the
sampling frame is incomplete (the entire population is not
listed), inference cannot be made to the population.
• Stratified sample – divide population into groups, and then randomly
select a sufficient proportion from each group.

• The approach used to get adequate representation of a subsample. The

characteristics of the subsample (strata or segment) may include almost any
variable: age, gender, religion, income level, or even individuals who listen to
specific radio stations or read certain magazines.

• E.g. Before randomly selecting subjects, the researcher divides the

population into three education levels: grade school, high school, and
college. Then, if it is determined that 10% of the population completed
college, a random sample proportional to the population should contain 10%
of the population who meet this standard.

• Proportionate stratified sampling (based on proportions in the population)

Disproportionate stratified sampling is used to oversample or under sample.
Sample Size
• Depends on at least one or more of the following seven
factors: (1) project type, (2) project purpose, (3) project
complexity, (4) amount of error tolerated, (5) time constraints,
(6) financial constraints, and (7) previous research in the area.
• Focus groups use samples of 6–12 people.
• Samples with 10–50 subjects are commonly used for
pretesting measurement instruments and pilot studies.
• Researchers often use samples of 50, 75, or 100 subjects per
group, or cell, such as adults 18–24 years old.
• A sample of + 1000 are usually used of surveys
• Also consider time, cost and similar studies in the field.

• Convenience sample – collection of readily accessible

• - i.e. a group of students enrolled in an introductory mass media course or
shoppers in a mall.
• It can be helpful in collecting exploratory information, but
samples are problematic because they contain unknown
quantities of error.
• Can be used when: a) material is difficult to obtain, b) limited
resources and ability, and 3) for under-researched subjects.
• Purposive sample – elements are selected for specific reasons.
Elements not possessing certain desired characteristics are
automatically excluded. Common in advertising when researchers
want, for example, only people who use a specific type of product.

• Studies of particular types of publications or particular times may

be of interest because these publications were important or the
time played a key role in history.

• Purposive samples differ from convenience samples as they require

justifications other than lack of money and availability
- E.g. consecutive-unit sampling, which involves taking a series of content produced during a
certain time. Content analyzing local newscasts during a 2-week period is a consecutive-day
sample. Consecutive-day sampling can be important when studying a continuing news or
feature story because connected events cannot be examined adequately otherwise (i.e.
elections, Brexit….etc.).
• Volunteer sample – people volunteer to participate and are
not selected according to any mathematical guidelines.
- E.g. the media and many websites inappropriately legitimize volunteers through various polls
or “studies” conducted for radio and television stations
- The results are, at best, only indications, not scientific evidence or proof.

• Quota sample – selection of subjects based on predetermined

percentage. (i.e. study of particular medium users)

• Snowball sampling - contacting a few qualified

respondents/subjects, and then asking them to help secure
other respondents/subjects (by providing names, contact
information, etc. of other people).

• In a strict sense, nonprobability samples are a census of the

units being studied. However, they differ from a true census
because a true census defines the population along
theoretical lines, whereas purposive and convenience samples
define the population based on practical considerations.
• The value of research using convenience samples should not
be diminished.
• Over a period of time, consistent results from a large number
of convenience samples suggest important research questions
and hypotheses or even generalizations to be checked with
probability samples or censuses (e.g. Science is a cumulative).
Other types of probability samples
• Cluster sampling – divide population into groups and then randomly
select GROUPS (i.e. districts sampling)

• Often with communication research, complete lists of units are

unavailable. To sample when no list is available, researchers use
cluster sampling, which is the process of selecting content units
from clusters or groups of content

• Multistage sampling is a description of a common practice that may

involve one or several of probability sampling techniques at different
- E.g., someone studying the content of talk radio would have to randomly select the radio stations, then
particular days from which to get content, and then the particular talk programs. Yet another stage might be
the particular topics within the radio talk programs. For magazines, the titles, dates, and content within
would be the stages. Pure multistage sampling requires random sampling for each stage.
Sampling Error
• Sampling error (designated as se or m, or CI) - provides an indication
of how close the data from a sample are to the population mean. A
low sampling error indicates that there is less variability or range in
the sampling distribution.
• Sample error is an indication of the accuracy of the sample.
• Sampling error computations are essential in research and are based
on the concept of the central limit theorem
• That any population distribution will take on a normal distribution
when an infinite number of samples is taken is called the central
limits theorem.
• CLT allows a researcher to estimate the amount of error in a
probability sample at a particular level of probability
• Sampling error for a given sample is represented by standard error.
Sample Weighting
• In an ideal study, a researcher has enough respondents or subjects
with the required demographic. But this is rare due to budget and
time constraints.
• Most researchers utilize a statistical procedure known as
weighting, or sample balancing.
• That is, when the subject totals in given categories do not reach
the necessary population percentages, subjects’ responses are
multiplied (weighted) to allow for the shortfall.
• A single subject’s responses may be multiplied by any figure to
reach the predetermined required level.
• Considerations must be given regarding who should be weighted
and how much weighting should be included (i.e. debatable).

• With an infinite number of samples, the sampling distribution

of any population will have the characteristics of a normal
• The mean, median (the middle score in a series arranged from
low to high), and mode (the most frequent score value) will be
• 50% of all the sample means will be on either side of the
mean; 68% of all sample means will be within plus or minus 1
standard error (SE) of the mean.
Confidence Level and Confidence Interval

• Computing sampling error is a process of determining,

with a certain amount of confidence, the difference
between a sample and the target population
• The confidence level indicates a degree of certainty (as
a percentage) that that the results of a study fall within
a given range of values. Typical confidence levels are
95% and 99%.
• The confidence interval (margin of error or sampling
error) is a plus-or-minus percentage that is a range
within the confidence level.
- For example, if a 5% confidence interval is used, and 50% of the sample gives a
particular answer for a question, the actual result for that question falls between
45% and 55%.

