Professional Documents
Culture Documents
Using A Word Processor To Tag and Retrieve Blocks of Text: Article
Using A Word Processor To Tag and Retrieve Blocks of Text: Article
1177/1525822X03261269
Ryan
FIELD / TAGGING AND RETRIEVING BLOCKS OF TEXT
METHODS ARTICLE
GERY W. RYAN
RAND Corporation
Coding serves two purposes in qualitative analysis: (1) Codes act as tags
to identify text in a corpus for later retrieval or indexing. Tags are not associ-
ated with any fixed units of text. They can mark simple phrases or extend
across multiple pages. (2) Codes act as values assigned to fixed units of data
(see Bernard 1991, 2002; Seidel and Kelle 1995). In this case, codes are nom-
inal, ordinal, or ratio scale values that are applied to nonoverlapping units of
texts (such as paragraphs, pages, or documents), episodes, cases, or persons.
Codes as tags are associated with grounded theory (e.g., Glaser and Strauss
1967; Strauss and Corbin 1990; Dey 1993). Codes as values are associated
with classic content analysis and content dictionaries (e.g., Berelson 1952;
Pool 1959; Krippendorf 1980; Weber 1990). The two types of code are not
mutually exclusive, but the use of one gloss, code, for both concepts can be
misleading.
Table 1 illustrates the difference between codes as tags and codes as val-
ues. The three illness narratives come from undergraduates at a Midwestern
university. Signs and symptoms are tagged with italicized text; treatments
and behavioral modifications are tagged with underlining, and diagnosis is
tagged with small caps. Note that the tags vary in size from a single word
(cold) to several lines.
The columns to the right of the narratives represent value codes. Each nar-
rative is coded as a separate unit. The variable diagnosis takes on nominal/
categorical values such as cold or sinus/upper respiratory/asthma. Signs and
symptoms and treatments are dummy variables, with dichotomous values
(yes or no). Duration is coded in days, an interval-level variable.
Assigning values to a unit of data is inherently an interpretive, qualitative
act. Of course, sometimes the interpretation is obvious. When a respondent
says in a narrative that he had a cough, runny nose, and headache, it is clear
that we would code the variable coughing as a yes. Coding decisions are not
always so simple. In one narrative, the respondent says, “Then the next min-
ute it was like I was in an ice-cube bath!” Coding this as “having chills”
Cough
Sore Throat
Vomiting
Fever
Chills
Fatigue
HR
OTC
WM
CAM
Duration
111
unusual for me. I usually only get sick only once or twice a year.
NOTE: Signs and symptoms are in italics. Treatments and behavioral modifications are underlined. Diagnoses are in small caps. HR = home remedies; OTC = over the counter;
WM = Western Medical; CAM = complementary and alternative medicine.
112 FIELD METHODS
Over the years, researchers have developed many ways to tag and retrieve
themes from data. Before computers, we wrote notes in the margins, high-
lighted texts with colored pencils and markers, and cut and sorted multiple
copies of notes and transcripts. (These are still good ways to start a project.)
High-tech solutions to the tag-and-retrieve problem in precomputer days
involved edge-notched cards and knitting needles. (One famous system was
the McBee Cards. See Bolton [1984] for a discussion of how these systems
were used in the coding and managing of field notes.)
The simplest tagging system is an index, like the one at the back of a book.
An index provides a reference table that links themes (subject headings) with
pages in a text. Indexes, however, do not specify where on the page any
theme occurs, nor do they tell us what other themes are located nearby.
With point markers (another tagging system), you place codes directly
into the text to indicate that the theme occurs “around here.” Notes scribbled
in the margins are point markers. The retrieval process consists of extracting
chunks of text (e.g., sentences or paragraphs) above and below the marker.
Deciding how much text to extract is important: Picking too much produces
extraneous information, and picking too little produces truncated hits.
Contiguous tagging solves this problem by linking themes with contigu-
ous blocks of data. In written data, blocks include words, phrases, sentences,
paragraphs, or entire pages. For sound and video, blocks mark segments of
tape. For visual data, blocks mark segments of an image. Using colored pen-
cils to underline sections of text or circle portions of an image is an example
of contiguous tagging.
Below, I outline three approaches to tagging and retrieving texts with
Microsoft Word. (Similar results can be achieved with other word proces-
sors, such as WordPerfect.) The first approach allows you to tag and retrieve
contiguous blocks of text but is best used with small codebooks (i.e., less than
ten themes). The second solution allows for larger, more complex codebooks
but limits you to using point markers. The last approach allows for contigu-
ous coding and large codebooks but requires more programming steps.
FIGURE 1
Word’s Find Font Characteristic Screen
6. Leave the Find What text box blank (this allows you to search across all text;
the attributes you selected will appear under this box).
7. Click on Find Next.
Word finds the next instance of the desired attribute(s). Note that Word will
highlight the block of text. If you close the Find dialogue box and hit Ctrl C,
the block of text will be copied into memory. You can then switch to a second
document, hit Ctrl V to paste the copied data, and then switch back to the
original document. (If you like using the mouse instead of commands such as
Ctrl C and Ctrl V, just click on Edit at the top of the document and choose
either “copy” or “paste.”)
You can automate this process with Word’s macro capability. Macros
allow users to record and play back a series of keystrokes or mouse clicks.
Before making a macro, it is good practice to run through the steps a couple
of times to make sure you consistently get the desired results.
Creating a Macro
The easiest way to create a macro is to turn on the macro recorder, run
through all the steps you want to do, and then turn off the macro recorder
(described below).
Before creating the macro, you need to do three things:
1. Open up your original document (the one that has been coded).
2. Open up a second, blank document. Save this blank document with the name
Hits.doc.
3. Return to the top of the original document.
Next we will build a macro that (1) locates the next chunk of text you have
marked in red to indicate a particular theme, (2) copies the red text to mem-
ory, (3) pastes it in the Hits.Doc document, and (4) returns you to the original
document:
1. Select Tools/Macros.
2. Select Find_Red.
3. Hit Run.
The macro should find and paste the next instance of red text into Hits.doc.
To save time, you can either place a button on a toolbar for the macro or
define a shortcut key for it. This can be done either before you record your
keystrokes or afterward.
To assign the macro to a either a toolbar or a specific keystroke before
recording your keystrokes, do the following:
To place the macro onto a tool bar after you have recorded the keystrokes,
do the following:
Now to find, copy, and paste the next instance of red, all you have to do is
to click the button.
To assign a macro to a shortcut key after recording your key strokes, per-
form the following steps:
1. Select Tools/Customize.
2. Click on the Keyboard button at the bottom of the window.
To run the macro, hold down the Alt key and hit X.
Step 1:
• Find: ^p^p (^p is Words code for new paragraphs or hard returns)
• Replace: **pp**
Step 2:
• Find: ^p
• Replace: (hit space bar once)
Step 3:
• Find: **pp**
• Replace: ^p^p
This first step converts blank lines to **pp**, the second step converts single
hard returns to spaces, and the final step converts the **pp** back to blank
lines. I recommend running this procedure on all text before tagging it with
any of the Word features described here. This is especially critical if you are
working with colleagues and you are sharing texts. You never know when a
pesky hard return will creep into a text.
marked by both Theme X and Theme Y, check multiple text attributes in the
Font Dialog box describe above. For example, checking bold and underline
locates all texts that are underlined and bolded. The simplest way to find text
marked by either Theme X or Theme Y is to search first on one theme, then
the next. All hits will be placed in the Hits Document. (Beware, though: You
might encounter duplicates if the same text is marked for both themes.)
Microsoft Word offers ten or so text attributes for marking themes (ital-
ics, bold, underline, double underline, strikethrough, shadow, and so on).
Themes also can be marked with combinations of attributes (e.g., bold and
strikethrough, underline and shadow, etc.). Such combinations, however,
tend to be more cumbersome and make it more difficult to search for overlap-
ping themes. If you anticipate building a longer codebook with subthemes,
then consider using a system of point markers.
With point markers, you place codes or mnemonics directly into the text
to indicate that the theme occurs “around here.” To use this system, read
through your document. Each time you find a place that is related to Theme1,
type in the corresponding mnemonic (e.g., [[Theme1]]). If the paragraph also
refers to Theme2, then embed the mnemonic for Theme2 as well. For addi-
tional examples of point markers, particularly those used for field notes, see
Ryan (1993).
You can make light work of the theme-marking chore by building a series
of little macros, one for each theme. You can assign each macro to a button
and place them all in a tool bar such as the one shown in Figure 2. Alterna-
tively, you assign each macro to a key, such as Alt J, or whatever. If you use
key combos, be sure not to use things such as Ctrl F (or you won’t be able to
use that combination to find things in texts) or Alt F (or you won’t be able to
open the File menu at the top left of the screen) using the pull-down menus.
You can, of course, assign a macro to Alt F and still open the File menu with
the mouse. For theme mnemonics, be sure to use characters such as double
square brackets [[ ]] that don’t occur anywhere in your text except for theme
markers. That way, when you look for the [[marriage]] theme, say, you won’t
find all the uses of the word marriage in your text, only the uses of the word
that mark a section of text as being about marriage.
To retrieve all the paragraphs that refer to Theme1, you can build a macro
that searches for the Theme1 mnemonic and copies a fixed chunk of text
(such as a paragraph, sentence, or line) to the second document.
FIGURE 2
Toolbar for Inserting Point Markers
First, make sure that both the original document and the Hits.Doc files are
open and that you are in the original document. Then do the following:
15. Hit the right arrow key once (this deselects the marked texts and moves the
cursor one position to the right so you are ready to search for the next texts).
16. Click on the Stop square in the Macro toolbox (alternatively, hit Tools/
Macro/Stop Recording).
You have now recorded a macro called THEME1. Whenever the macro
encounters a mnemonic for Theme1, it copies the entire paragraph to the sec-
ond document Hits.doc. If you want to pull larger chunks, say the paragraph
above and below the point marker, just increase the number of times you hit
the Ctrl Up and Ctrl Down arrow keys in steps 8 and 9. For smaller chunks,
you can use just the up and down arrows to move highlight blocks one or two
lines above and below the point marker.
With point markers, you can use a hierarchical codebook and search for
families of themes. For example, suppose you built the following codebook:
Theme1
Theme1.aa
Theme1.ab
Theme1.ca
Theme1.cb
Theme2
Theme2.a
Theme2.ab
Theme3.da
If you plan to use the wildcard option, do not use wildcard symbols such
as *, ?, –, @, !, , (, ), {, or } in your mnemonic coding conventions.
For those who want to use a complex codebook but need the functionality
of contiguous tagging, a two-step process is required. First, indicate where a
text block begins and ends. Then you can locate these blocks and copy them
to a second document. Following the marking system suggested by Truex
(1993), I have written two macros to accomplish these tasks.
To tag a block of text, first select the text, then start the macro
Tag_Theme. A dialogue box like that in Figure 3 will appear and ask you
which theme you want to use. If you type in “Treat” and hit the OK button,
the macro will embed [[<Treat]] at the beginning of the selection and
[[>Treat]] at the end.
To find all instances of a theme, move the cursor to the top of the docu-
ment. Start the second macro, Find_Theme. A similar dialogue box will
appear and ask you what theme you want to find. The macro then searches
through the entire text and copies all hits to a file named Hits.doc. (The
Find_Theme macro won’t work unless the Hit.doc file is located in the
default directory. See note in Appendix B for more details.)
Since each task requires input from the user, I have written the macros in
Microsoft’s Visual Basic. The code appears in Appendices A and B. The
codes are also downloadable from www.qualquant.net.
To reproduce the macro in Appendix A on your own computer, do the
following:
The Microsoft Visual Basic screen will appear showing the current mac-
ros you have stored on your computer. At the very bottom of your screen, you
should see the macro you just created: Tag_Themes. A horizontal line should
separate it from the other macros, and the following programming language
should appear below the line:
Sub Tag_Themes()
'
' Tag_Themes Macro
' Macro recorded [date] by [your name here]
'
End Sub
FIGURE 3
Tag_Theme Dialogue Box
You have two options: (1) Retype the code exactly as it appears in Appen-
dix A (you can skip the comments demarked by a single quote at the end of
each line) or (2) copy the code from the URL above and paste it into the
macro-editing window. When you are done, hit Ctrl S (save) and close the
Visual Basic screen. Follow the same steps to create the macro in Appendix B
(you might want to name it Find_Themes). If you want to assign these mac-
ros to a toolbar or a shortcut key, use the steps described above.
This approach works well if you want to tag and retrieve specific text
within a paragraph or larger text chunks that extend across paragraphs. You
can also search for “X or Y” combinations by first searching on one theme
then another. Unfortunately, it is difficult to search for “X and Y” combina-
tions. Such searches require many steps depending on the degree to which
themes overlap or are nested entirely within each other and are probably eas-
ier to do manually.
COMPARING APPROACHES
text, searches that retrieve the paragraph in which the point marker is embed-
ded will be filled with extraneous text. On the other hand, if a theme extends
across multiple paragraphs, hits will be truncated (unless you have marked
each paragraph separately).
Using the two macros describe above to embed beginning and end mark-
ers in the text allows researchers to tag and retrieve text with the same preci-
sion they can obtain using text attributes. In addition, the markers make it
easy to locate where specific themes occur in a document. Instead of creating
a macro for each theme, the two generic macros handle codebooks of any size
and complexity. Although the macro programming appears daunting, the
code is available on the Web and can be readily copied into the macro editor.
This approach, however, does not yet allow the use of wildcard or “X and Y”
Boolean searches. (I say “not yet” because the capabilities of modern word
processors are upgraded with each new release, and this might well be among
the next things that are built in.) Furthermore, the more themes that are
coded, the more cluttered the document becomes.
In general, using your word processor for basic tagging and retrieval tasks
is quite efficient. There is very little learning curve since you begin with a
program that you already know a lot about. You can use your original word-
processing documents without having to reformat them, and there is no addi-
tional cost for new software.
APPENDIX A
Macro for Marking the Beginning and End of a Text Block
The following macro can be used after you select a block of text in your document.
The macro begins by querying the user for the theme associated with the block, then
embeds appropriate beginning and end markers to the selected text.
The Tag Themes document embeds beginning and ending code mark-
ers directly in your text. They are a hassle to remove. I strongly suggest
that you make a copy of your original text file before you begin tagging
the file for codes. This way, if you decide you don’t want to use the tags,
you can always start over with a clean document.
Sub Tag_Theme()
'
' Tag_Theme Macro
' Macro recorded 7/16/2002 by Gery Ryan
'
Dim Tag$
Tag$ = InputBox("What theme do you want to use?", "Mark Themes", "")
Tag$ = CleanString(Tag$) 'cleans nonprinting chars
Tag$ = LTrim$(RTrim$(Tag$)) 'removes spaces at beginning and end
APPENDIX B
Macro for Finding Contiguous Tagged Texts
The following macro can be used to find blocks of texts marked with the macro in
Appendix A. The macro begins by querying the user for which theme is to be
searched, then finds the appropriate text and copies each hit to document 2.
This macro searches for themes in one document and pastes them in a
second document called Hits.doc. Two conditions must be met for the
macro to function correctly. First, the document to be searched must be
saved and have a real filename. Temporary files produced when you
open a new file in Word (i.e., document 1, document 2, etc.) do not
count. Second, the Hits.doc file must be located in the current default
directory. To see whether the Hits.doc is in the correct place, use the
pull-down File-Open menu and see if file Hits.doc is listed. If not, open
a new file and save it as Hits.doc.
Sub Find_Theme()
'
' Find_Theme Macro
' Macro recorded 7/16/2002 by Gery Ryan
'
Dim Tag$
Dim BeginTag$
Dim EndTag$
Dim Workdoc$
Dim Hitsdoc$
Dim Currentdir$
Dim Count_
Hitsdoc$ = "Hits.doc"
Workdoc$ = WordBasic.[FileName$]() 'identifies current working document
Currentdir$ = WordBasic.[FileNameInfo$](WordBasic.[FileName$](), 5)
Hitsdoc$ = "Hits.doc" 'identifies location of hits document
Hitsdoc$ = Currentdir$ + Hitsdoc$ '
Tag$ = InputBox("What theme do you want to search for?", "Search Themes", "")
Tag$ = CleanString(Tag$) 'cleans nonprinting chars
For Count_ = 1 To 1000 ‘Beginning of loop (max set for 1,000 hits)
Selection.EscapeKey
Selection.Find.ClearFormatting 'Search for beginning marker
With Selection.Find '
. Text = BeginTag$ '
. Replacement.Text = "" '
. Forward = True '
. Wrap = False '
. Format = False '
. MatchCase = False '
. MatchWholeWord = False '
. MatchWildcards = False '
. MatchSoundsLike = False '
. MatchAllWordForms = False '
End With '
Selection.Find.Execute '
If WordBasic.EditFindFound() = 0 Then 'Stop if not found
WordBasic.FileOpen Name:=Hitsdoc$, Revert:=0 '
If Count_ = 1 Then 'Hit summary
WordBasic.Insert "End of Search: No Hits Found" '
Else '
WordBasic.Insert "End of Search:" + Str(Count_ - 1) + "Hits Found"
Selection.TypeParagraph '
Selection.TypeParagraph '
WordBasic.FileOpen Name:=Workdoc$, Revert:=0 '
End If '
GoTo Finish
Else
REFERENCES
Berelson, B. 1952. Content analysis in communication research. Glencoe, IL: Free Press.
Bernard, H. R. 1991. About text management and computers. Cultural Anthropology Methods
Journal 3:1–4, 7, 12.
. 2002. Research methods in anthropology: Qualitative and quantitative approaches.
Thousand Oaks, CA: Sage.
Bolton, R. 1984. Computers in ethnographic research: Final report. Washington, DC: National
Institute of Education.
Dey, I. 1993. Qualitative data analysis: A user friendly guide for social scientists. London:
Routledge and Kegan Paul.
Gillespie, G. W. Jr. 1986. Using word processor macros for computer-assisted qualitative analy-
sis. Qualitative Sociology 9:283–92.
Glaser, B. G., and A. Strauss 1967. The discovery of grounded theory: Strategies for qualitative
research. Chicago: Aldine.
Krippendorf, K. 1980. Content analysis: An introduction to its methodology. Beverly Hills, CA:
Sage.
Pool, I. D. S. 1959. Trends in content analysis. Urbana: University of Illinois Press.
Ryan, G. 1993. Using WordPerfect macros to handle field notes I: Coding. Cultural Anthropol-
ogy Methods Journal 5:10, 11.
Seidel, J., and U. Kelle. 1995. Different functions of coding in the analysis of textual data.
In Computer-aided qualitative data analysis: Theory, methods and practice, edited by
U. Kelle, 52–61. London: Sage.
Strauss, A., and J. Corbin. 1990. Basics of qualitative research: Grounded theory procedures
and techniques. Newbury Park, CA: Sage.
Truex, G. F. 1993. Tagging and typing: Notes on codes in anthropology. Cultural Anthropology
Methods Journal 5:3–5.
Weber, R. P. 1990. Basic content analysis. Newbury Park, CA: Sage.