Field Investigations in Statistics

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21



"Science begins with observation,

but it is not until we count and
measure that we have begun to
truly study a thing."

Lord Kelvin [1824-1907]

Timothy Vogel, M.S. Statistics; Florida State University

Nathan Good, Ph.D. Computer Science; U-C Berkeley
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Class Development Motivation

A recent complaint about science education is that it doesn't teach young
scientists how science is actually done. This class seeks to address and fill that
glaring lack of scientific training.
Tim Vogel's first elective Biology classes at the University of Illinois in 1975 were
Biology 208-209, two semesters of "Field Investigations in Biology". Comprising
six topical areas within the biological sciences, undergraduates were afforded
the rare opportunity to be trained as principal research investigators. Authoring
their own hypotheses, experimental designs, technical papers, methodologies,
statistical analysis, and result presentation under "peer" review, these students
"did" actual science for the first time under the supervision of experts.
We take the same approach here, proposing to train fledgling researchers by
allowing them to propose, run, analyze, summarize, and defend their own
research projects across nine units of known statistical inference.
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

•  Undergraduate classes in;
•  Introductory Statistics
•  Rhetoric or Technical Writing
•  60 hours of undergraduate coursework.
•  Basic computer skills (spreadsheets; SAS, R, Matlab, etc.)

•  Boundless curiosity about the world.

• A commitment to accomplishment tempered by the ego-
strength required to risk public failure.
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Class Goals
Completion of this class will see each student having;
•  gained the confidence to tackle any research project that
presents itself.
•  garnered a profound sense of the nature of inference.
•  learned to recognize the unique inferential signature, or
lack thereof, inherent to all exercises in inference
Field Investigations In Statistics
A Curriculum For Training First-time Researchers


•  project design and implementation 40% (Units 1-8 = 5%)

•  final proposal and paper 40% (20% each)
•  attendance 10%
•  final examination 10%
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

The First Class

Introduction to the 9 statistical units
1.  Hypothesis Testing 7.  Bayesian Inference
2.  Tests of Independence 8.  Unstructured Data Analysis;
(correlation) •  Data Mining and Text Mining
3.  Discrete Data Analysis 9.  Carte Blanche;
4.  Analysis of Variance/ •  pick your own poison.
•  logistic regression, logits/probits,
ANOVA eigenspace, multidimensional scaling,
5.  Estimation and Ordinary canonical correlation, principal
components analysis, factor analysis,
Least Squares Regression supervised and unsupervised
clustering, edit distances, similarity/
6.  Multiple Regression and distance measures, cladystics
MANOVA (numerical taxonomy), neural nets,
time-series analysis, ...
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

The First Class (continued)

How each class period will be conducted
Class time trajectory for each unit;
•  lecture/review of that unit's statistical approach.
•  presentation of an example from the past; well known experiments
from the archives of the "Science Hall of Fame".
•  group break-out sessions discussing research questions and
experimental designs of your own.
•  individual class presentation summarizing each student's proposed
study and how they expect to generate their own data.
•  constructive critique by staff and students
•  Individual work to finish off your introductory proposal.
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

A Study You Could Run At Home

Could I prove this with my cat or dog?

•  When You Are Generous Your Dog Is Watching You!

•  from a study by Marshall-Pescini et al. 2011. Social eavesdropping in the domestic dog.
Animal Behaviour (2011), doi:10.1016/j.anbehav.2011.02.029
•  How to ask a good (i.e.; testable) question is hard to teach but not so difficult to learn if
given the proper training and opportunity to grow.

"Some people have generous natures, and some people are miserly. The generous ones are
happy to share what they have with others, while the miserly folks resent having to share
anything with anybody. So if given the choice, which type of person would a dog likely
approach first?"

•  "Could I prove this with my dog or cat?" might be an excellent trigger for your muse as
you face the next 9 units' requirement that you pose and test just such a question as "...if
given the choice, which type of person would a dog likely approach first?"
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 1
Hypothesis Testing

Ambrose, H.A., Young, D. (1978). Underwater Orientation in the Sand Fiddler

Crab, Uca Pugilator; Biol Bull 155: 246-258. (August 1978).

•  Assignment #1; Biology 208; Investigations of Field Biology; Fall, 1975.

Thumin, F. J. (1962). Identification of Cola Beverages. Journal of Applied

Psychology, 46, 358-360.

•  a strong hypothesis-testing example used in myriad graduate-level statistics textbooks.

Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 2
Tests of Independence (correlation)

Kinsey, Alfred C. et al. (1948). Sexual Behavior in the Human Male.

Philadelphia: W.B. Saunders; Bloomington, IN: Indiana U. Press.

Kinsey, Alfred C. et al. (1953). Sexual Behavior in the Human feale.

Philadelphia: W.B. Saunders; Bloomington, IN: Indiana U. Press.

•  perhaps the most controversial data analyses ever peformed.

Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 3
Discrete Data Analysis
Goodman, L; (1970); The multivariate analysis of qualitative data;
Interactions among multiple classifications. JASA, 65:225-56.

•  the "father of the logit", Dr. Goodman is now a joint professor of both
Sociology and Statistics at the University of California-Berkeley.
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 4
Analysis of Variance
Fisher, R.A. "The use of multiple measurements in taxonomic
problems"; Annual Eugenics, 7, Part II, 179-188 (1936);

•  there is not an ANOVA class taught on earth that doesn't begin and
end with this famous dataset. The endless combinations of complex
experimental designed inference generation that can be realized
from this single dataset is simply astounding.
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 5
Estimation and Regression (OLS)
Forbes, J. (1957). Further experiments and remarks on the
measurement of heights and boiling point of water. Transactions of
the Royal Society of Edinburgh, 21, 235-243.

•  One of the first statistical studies by a non-mathematician to be

universally accepted by the field of statistics as a
flagship example for teaching ordinary least-squares regression.
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 6
Multiple Regression/MANOVA
Harris, RJ; "Directive versus non directive instructions in the prisoner's
dilemma"; presented at The Rocky Mountain Psychological Association (1970).

•  (pp 68-69)

•  from Tim Vogel's first class in multivariate statistics.

Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 7
Bayesian Inference
An Intuitive Explanation of Bayes' Theorem

•  this is not a scholarly paper, per se, but the best way to see
the benefits to Bayesian reasoning and analysis is via Java-
applets like this one.

There are many to choose from but this really does a good
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 8
Unstructured Data Analysis - Data Mining and Text Mining

Authorship of the eleven unattributed Federalist Papers written

under the pseudonym Publius and published in various New York
newspapers in the run-up to the Constitutional Convention of 1776.

•  "Who wrote it; Hamilton, Madison, or Jay?"

•  The entire field of text–mining began with this question posed
about the eleven unattributed of Publius' "Federalist Papers".

•  Tim Vogel hopes to present his own published paper; "Statistical

Constitutionality Testing? Citizens United v. the Federal Elections
Commission (2010)". (Wired; V?; no??)
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Unit 9
Answer a "nagging" research question of your own

•  logistic regression •  cladystics (numerical taxonomy)

•  logits/probits •  neural nets
•  canonical correlation •  hidden Markova models
•  principal components analysis •  Monte Carlo simulation
•  factor analysis •  machine learning algorithms
•  supervised clustering •  linear programming
•  unsupervised clustering •  dynamic programming
•  multidimensional scaling •  optimized auctions
•  similarity/distance measures •  pick your poison!
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Final Class
•  Examination (30 minutes – essay).
• style 5-minute presentations from each unit's "winner".

•  Discussion; dual perspectives on this class' most important lessons.

•  A statistical-inference flow-chart; the scientist's "friend for life".
•  Class evaluations;
•  the class and the instructors
•  Class dismissed; pot-luck refreshments.
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Timothy Vogel Nathan Good

  B.S., Genetics; University of Illinois   B.S., Mathematics; University of Minnesota
  M.S., Statistics, Florida State University   M.S., Computer Science; University of
  Biologist; Newfound Harbor Marine Institute; Minnesota
Big Pine Key, FL   Ph.D., Computer Science; U-C Berkeley (Hal
  Management Consultant; The Werner Group; Varian, ?, ?)
  Analytics Product Manager; National   Research Fellow; Parc/Xerox
Demographics & Lifestyles, Inc; Denver, CO
  Contract Researcher; Aggregate Knowledge,
  Sr. Manager Data Mining; MCI; Denver, CO Inc; San Mateo, CA
  Analytics Architect; Macromedia, Inc; San   Founder; Good Research
Francisco, CA
  Sr. Software Engineer/Statistician; IBM; RTP, NC
  Chief Scientist; Aggregate Knowledge, Inc; San
Mateo, CA
  Founder/CEO;; San Mateo,
Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Public Datasets
•  UC-Irvine Machine Learning repository •  SBA City & County Wed Database
• •

• •  Google public data explorer

• •

Journal of Applied Econometrics Data

•  •  Knowledge Discovery in Databases
Archive •
• public-datasets.html

•  Monte Carlo Simulation


•  Amazon's Public Database in the "Clouds"

Field Investigations In Statistics
A Curriculum For Training First-time Researchers

Mentors' Links
•  Dr. Harrison J. Ambrose III •  Dr. Leo Goodman
•  Writing a Scientific Research Article •  Sociologist/Demographer

•  Dr. Con Slobodchikoff •  Dr. Jeffery Carrier

•  Animal Communication •  Shark Biologist/Physiologist


•  A guide to increased creativity in research -
inspiration or perspiration?

You might also like