Professional Documents
Culture Documents
hcs211-2 Answerd
hcs211-2 Answerd
INSTRUCTIONS TO CANDIDATES
Answer a total of three (3) questions, one (1) question from SECTION A, one (1) question
from SECTION B and any other question.
Marks are indicated in brackets [ ] at the end of each question or part question.
ADDITIONAL MATERIALS
Answer papers/ booklet
Graph paper
Plain paper
Ruler
Sharp HB pencil
Calculator
1. Convenience sampling: This method involves selecting subjects who are easily
accessible to the researcher. For example, a researcher might survey students in their
computer science class or interview colleagues at their workplace. Convenience sampling
is often used in exploratory research or pilot studies, where the goal is to get a
preliminary understanding of a topic.
2. Purposive sampling (or judgmental sampling): This method involves selecting subjects
who the researcher believes are representative of the population of interest. For example,
a researcher might interview experts in a particular field of computer science to get their
insights on a new technology. Purposive sampling is often used in qualitative research,
where the goal is to gain in-depth understanding of a topic.
3. Snowball sampling: This method involves selecting subjects and then asking them to
refer other potential subjects. For example, a researcher might interview a few software
developers and then ask them to refer other software developers in their network.
Snowball sampling is often used in hard-to-reach populations, such as people who use
niche technologies or who have rare medical conditions.
4. Quota sampling: This method involves selecting subjects until certain quotas are met.
For example, a researcher might want to have a sample that is 50% male and 50% female,
so they would select subjects until they have reached that quota. Quota sampling is often
used in surveys to ensure that the sample is representative of the population of interest in
terms of certain demographic characteristics.
Example in computer science: A researcher might use quota sampling to survey computer
science students to ensure that their sample is representative of the student body in terms
of gender, race, and ethnicity.
Advantages and Disadvantages of Non-probability Sampling Methods
Non-probability sampling methods are often less expensive and time-consuming than
probability sampling methods. However, they are also less rigorous and less likely to
produce representative samples. It is important to be aware of the limitations of non-
probability sampling methods when interpreting the results of research studies.
Here are some additional examples of when non-probability sampling methods might be
used in computer science:
A2 Reliability and validity issues play a critical role in research. Discuss in detail on the
different ways you can use to validate your results.[100]
Reliability and validity are two important concepts in research. Reliability refers to the
consistency of research results, while validity refers to the accuracy of research results.
There are a number of ways to validate research results, including:
Replication: Replication is the process of repeating a research study to see if the same
results are obtained. Replication is one of the most important ways to validate research
results, as it helps to ensure that the results are reliable and not due to chance.
External validation: External validation involves comparing the results of a research study
to other relevant research studies. This can be done by conducting a literature review or
by conducting a meta-analysis. External validation helps to ensure that the results of a
research study are consistent with other research on the same topic.
Internal validation: Internal validation involves examining the design and methods of a
research study to ensure that they are sound. This includes checking for potential biases
and errors in the data collection and analysis. Internal validation helps to ensure that the
results of a research study are valid.
In addition to these general methods of validation, there are a number of specific methods
that can be used to validate research results depending on the type of research being
conducted. For example, in quantitative research, statistical tests can be used to assess the
significance of the results. In qualitative research, triangulation can be used to validate the
results by using multiple data sources and methods.
Be transparent about your methods and data: This will allow other researchers to evaluate
your study and assess its validity.
Use multiple data sources and methods: This will help to reduce bias and increase the
validity of your results.
Pilot test your study: This will help to identify any potential problems with your design or
methods before you collect your data.
Seek feedback from other researchers: This can help you to identify any potential
problems with your study and to improve your design and methods.
It is important to note that there is no single perfect way to validate research results. The
best approach will depend on the type of research being conducted and the specific goals
of the study. However, by using the methods described above, researchers can increase
the likelihood that their results are reliable and valid.
Here are some specific examples of how the different methods of validation can be used
in computer science research:
Replication: A researcher might replicate a study that found a new algorithm for machine
learning outperforms existing algorithms. If the researcher is able to replicate the results,
this would provide evidence that the new algorithm is indeed more effective.
External validation: A researcher might compare the results of a study on the
effectiveness of a new educational software program to other studies on similar software
programs. If the researcher finds that the results of their study are consistent with other
studies, this would provide evidence that the new software program is effective.
Internal validation: A researcher might conduct a thorough review of the design and
methods of their study to ensure that they are sound. This would include checking for
potential biases and errors in the data collection and analysis. If the researcher finds that
their study is well-designed and executed, this would provide evidence that the results are
valid.
By using the different methods of validation, researchers in computer science can
increase the likelihood that their results are reliable and valid. This is important because it
helps to ensure that the research findings can be trusted and used to inform real-world
decisions.
A3 Explore any five ethical considerations that you need to adhere to when carrying out
research in your area of specialisation. [100]
Five ethical considerations that I need to adhere to when carrying out research in my
area of specialization (computer science) are:
Privacy: I need to protect the privacy of the people who participate in my research. This
means collecting and using data only in ways that have been agreed to by the
participants, and taking steps to ensure that the data is not compromised.
Confidentiality: I need to keep all participant data confidential. This means that I
should not share the data with anyone without the participant's consent, and that I
should take steps to protect the data from unauthorized access.
Conflicts of interest: I need to disclose any potential conflicts of interest. This means
disclosing any financial or other relationships that I have with any organizations or
individuals that could influence my research.
In addition to these general ethical considerations, there are a number of specific ethical
considerations that are relevant to computer science research. For example, I need to be
careful about how I collect and use personal data, and I need to be aware of the
potential for bias in my research algorithms. I also need to be mindful of the potential
impact of my research on society, and I need to take steps to ensure that my research is
used for good.
Here are some specific examples of how the different ethical considerations can be
applied in computer science research:
Privacy: A researcher who is collecting data from social media users needs to make sure
that the users have agreed to their data being collected and used for research purposes.
The researcher also needs to take steps to protect the data from unauthorized access,
such as by encrypting the data and storing it on a secure server.
Conflicts of interest: A researcher who is developing a new social media platform needs
to disclose any financial or other relationships that they have with any organizations or
individuals that could influence their research. For example, if the researcher has a
financial stake in the company that is developing the social media platform, they need to
disclose this conflict of interest to their research participants.
SECTION B: STATISTICS
B4 A lecturer for C++ conducted a test on his students and the following scores were
obtained and recorded below.
36 74 25 50 40 39 62 39 41 65 55 66 59 48 55 57 71 49 42 44
50 40 45 50 61 45 21 58 56 54 56 63 70 39 38 49 53 64 56 34
The merit and demerits of using the mean, median, and mode as measures of central
tendency are as follows:
Mean
Merits:
Can be used to calculate other statistical measures, such as standard deviation and
variance.
Demerits:
Median
Merits:
Demerits:
Does not take into account all of the values in the data set.
Merits:
Demerits:
Does not take into account all of the values in the data set.
May not be representative of the central tendency of the data set, especially if there
are multiple modes.
Overall, the mean is the most commonly used measure of central tendency, as it is
easy to calculate and understand, and it takes into account all of the values in the data set.
However, it is important to be aware of the potential for outliers to skew the mean. The
median is a good alternative to the mean when outliers are present, or when the data is
qualitative. The mode is not as commonly used as the mean or median, but it can be useful
for certain types of data, such as ordinal or categorical data.
Which measure of central tendency is most appropriate to use will depend on the
specific data set and the purpose of the analysis.
(a) A university follows up with its Computer Science graduating students and
collects data on salary earned and the degree classification obtained by the
students. The following data was obtained as shown in Table 1.
Table 1
Salary being earned
Degree class Low Medium High
First class 15 39 50
Second class 12 35 55
Third class 20 33 45
B5 (a) A research student was investigating the Gross Domestic Product (GDP) of a
country during the years 2015 to 2021. From the information obtained in the
table below, draw an appropriate diagram to classify this data.
(b) The table below shows the temperature received in winter and the sales of
coffee on 10 consecutive days.
Day 1 2 3 4 5 6 7 8 9 10
Temperature (0C) 8 12 9 10 12 8 9 14 10 14
[20]
Type I and type II errors are two types of errors that can occur in hypothesis testing.
Type I error is the error of rejecting a true null hypothesis. This is also known as a false
positive.
Type II error is the error of failing to reject a false null hypothesis. This is also known as a
false negative.
The probability of making a type I error is denoted by alpha (α), and the probability of
making a type II error is denoted by beta (β).
Here is a table that summarizes the four possible outcomes of hypothesis testing:
Descriptive statistics are used to summarize the characteristics of a data set. They can be used
to calculate measures of central tendency, such as the mean, median, and mode, as well as
measures of dispersion, such as the standard deviation and range.
Inferential statistics are used to make inferences about a population based on a sample. They
can be used to test hypotheses, such as whether the mean of one population is different from
the mean of another population.
Suppose we have a data set of the heights of all 10 students in our class. We can use
descriptive statistics to calculate the mean height of the students in our class. This would give
us an estimate of the central tendency of the data.
However, we cannot use descriptive statistics to infer that the mean height of all students in
our school is the same as the mean height of students in our class. To do this, we would need
to use inferential statistics to test the hypothesis that the mean height of students in our class
is equal to the mean height of all students in our school.
Nonparametric tests are statistical tests that do not require the data to be normally distributed.
Parametric tests are statistical tests that require the data to be normally distributed.
Here is a table that summarizes the key differences between nonparametric tests and
parametric tests:
Interviewer 14 12 20 18 22 13 15 20 11 16
A
Interviewer 20 18 17 16 21 17 20 17 25 10
B
Symmetric: The normal distribution is symmetrical around its mean, which means
that the left and right halves of the distribution are mirror images of each other.
Unimodal: The normal distribution has a single mode, or peak, which corresponds to
the mean.
Bell-shaped: The normal distribution is bell-shaped, with the majority of values
falling near the mean and fewer values falling in the tails of the distribution.
Empirical rule: The empirical rule, also known as the 68-95-99.7 rule, states that
approximately 68% of the values in a normal distribution fall within one standard deviation
of the mean, approximately 95% of the values fall within two standard deviations of the
mean, and approximately 99.7% of the values fall within three standard deviations of the
mean.
Continuous: The normal distribution is a continuous distribution, which means that
any value within the range of the distribution is possible.
In addition to these five characteristics, normal distributions also have the following
properties:
The mean, median, and mode of a normal distribution are all equal.
The normal distribution is the only distribution that is completely determined by its
mean and standard deviation.
The normal distribution is the limiting distribution of many statistical tests, such as
the t-test and the z-test. This means that these tests are more accurate when the data is
normally distributed.
Normal distributions are very common in nature and in many different fields of study,
including statistics, physics, biology, and economics.
A manufacturing plant produces bags of cement with a mean weight of 60
kilogrammes and a variance of 25. If a bag of cement is picked at random, find
the probability that the weight of the bag of cement is
(c) The data below represents the height of plants recorded to the nearest
millimetre.
Height (cm) 26-30 31-35 36-40 41-45 46-50 51-55 56-60 61-65
Frequency 4 5 23 58 61 30 3 3
(i)
Construct a histogram for the distribution. [15]
(ii) Superimpose a frequency polygon on the histogram [5]
(c) Using examples define
(i) Type II error
(ii) Significance level
(iii) Critical value [15]
A type II error is the error of failing to reject a false null hypothesis. This is also
known as a false negative.
Example:
A medical researcher is testing a new drug to treat a particular disease. The null
hypothesis is that the drug has no effect on the disease. The researcher
conducts a clinical trial and finds that the drug does not appear to be effective.
However, the drug may actually be effective, but the sample size of the
clinical trial was too small to detect the effect. In this case, the researcher has
made a type II error.
The significance level is the probability of rejecting a true null hypothesis. This is
denoted by the symbol alpha (α).
Example:
A researcher is testing the hypothesis that the mean height of men is greater than the
mean height of women. The researcher sets the significance level to alpha =
0.05. This means that the researcher is willing to accept a 5% chance of
rejecting the true null hypothesis that the mean height of men is equal to the
mean height of women.
The critical value is the value of the test statistic that must be exceeded in order to
reject the null hypothesis. The critical value depends on the significance level
and the distribution of the test statistic.
Example:
A researcher is using a t-test to test the hypothesis that the mean height of men is
greater than the mean height of women. The researcher has set the significance
level to alpha = 0.05. The critical value for the t-test with alpha = 0.05 and 20
degrees of freedom is 1.729. This means that if the t-statistic is greater than
1.729, then the researcher will reject the null hypothesis.
[END]