Download as docx or pdf
Download as docx or pdf
You are on page 1of 5
Chapter 9 - Secondary data and archival sources 9.1 Secondary data secondary data">Secondary data is information or data that has already been collected and. recorded by someone else, usually for other purposes. The advantages of secondary data ‘© Saves time and money © Quickly start to analyse the data and try to find an answer to his or her research problem. © Ability to obtain high quality data (©. Institutions have better access to information providers © Huge budgets for data collection © Many experts involved in the data collection The disadvantages of secondary data + Data were not collected with your specific research problem in mind, © Might not fit perfectly with the requirements of your research problem, ‘The usefulness of secondary data can be assessed by the following questions: 1. Is the information provided in the secondary data sufficient to answer your research problem? a. All information you need? b, Detailed enough? ©. Same definitions? . Accurate enough? 2. Do the secondary data address the same population you want to investigate? a. Same samples used? 3. Were the secondary data collected in the relevant time period? Information quality Ifyou cannot say yes to all the answers above, your secondary data may be questionable, sample quality Five factors than can be applied to any type of information source: 1. Purpose a, What the author is trying to accomplish. 2. Scope 3. Authority a, Credentials of author or website 4, Audience 5. Format a. How the information is presented and how easy itis to find a specific piece of information. 9.2 Sources of secondary data It still makes a difference how information is stored electronically, all combined or several different files. A.wrapper identifies certain information on target web sites, extracts the information and saves the information obtained. sample Decide whether to use the entire dataset or a sample of the data, Explore Explore the selected data visually or numerically for trends or groups. Modify ‘Based on the above phase, the data may require modification, Sometimes descriptive segmentation of the data is all that is required to answer the investigative question and no further steps need to be made. Model Construction of a model begins Assess ‘Acommon method of assessment involves applying a portion of data that was, ‘not used during the sampling stage. If the model is vali, it will work for this ‘holdout’ sample, ‘The validity of the data mining information can be based on the following criteria: + Accuracy (© Data complete and match with desired information. + Reliability (© To what extent the information obtained is independent from different settings. + Reality check Big data Big data refers to the tremendous amount of data traces each of us leaves, when we use the web, our mobile phones as well as customer, credit and debit cards. Terms: ‘Data visualization > viewing aggregated data on multiple dimensions. + Clustering > enables the researcher to segment a population, ‘+ Neural networks > collections of simple processing nodes that are connected. + Decision tree models > segregates data by using a hierarchy of if-then statements based on the Values of the variables, and creates a tree-shaped structure that represents the segregation decision. + Classification > uses a set of pre-classified examples to develop a model that can classify the population of records at large

You might also like