JULY 2022 Biostatistics & Research Methodology QP

JULY 2022
B.PHARMACY -VIII SEMESTER ( MAIN & BACKLOG )

EXAMINATION
BIOSTATISTICS & RESEARCH METHODOLOGY
PART- A ( 10X2 =20 MARKS)
Q:1 Define type -I error

A Type I error, also known as a "false positive" error, is a concept in statistics that occurs when a
null hypothesis is wrongly rejected, indicating a significant effect or difference when none
actually exists. In hypothesis testing, researchers set a significance level (denoted as α) as the
threshold for accepting or rejecting the null hypothesis.
A Type I error happens when the null hypothesis is true, but the statistical test or analysis leads
to its rejection. This means that the researcher concludes there is a meaningful relationship,
effect, or difference when, in reality, there is no such relationship or difference.
To understand this concept better, let's consider an example:
Suppose a pharmaceutical company is testing a new drug for a particular condition. The null
hypothesis states that the drug has no effect or is not different from a placebo, while the
alternative hypothesis suggests that the drug does have a significant effect.
If the researchers conduct a study and find a statistically significant result, leading them to reject
the null hypothesis, but in reality, the drug has no effect, it would be a Type I error. This means
they incorrectly conclude that the drug is effective when it is not.
Type I errors are important to consider in statistical analysis because they can lead to incorrect
conclusions, wasted resources, or even harmful actions based on false positive findings.
Researchers typically strive to control the risk of Type I errors by setting appropriate significance
levels, conducting power calculations, and critically interpreting the results of their analyses.
Q:2 EXPLAIN POWER OF SUDY.
The power of a study refers to the probability of correctly rejecting the null hypothesis when it is
false. In other words, it measures the ability of a statistical test or analysis to detect a true effect
or difference between groups or variables being studied. A study with high power is more likely
to identify real effects, while a study with low power is more likely to miss or fail to detect true
effects.
Power is influenced by several factors:
1. Sample size: Generally, larger sample sizes tend to result in higher power because they
provide more data to analyze and detect small or subtle effects. Increasing the sample
size reduces random variability and increases the chances of finding a true effect if it
exists.
2. Effect size: The magnitude of the difference or effect being studied plays a crucial role in
power. Larger effect sizes are easier to detect and lead to higher power, while smaller
effect sizes require larger sample sizes to achieve sufficient power.
3. Significance level: The chosen significance level (α) also affects power. A lower
significance level (e.g., 0.01) decreases the chances of a Type I error (false positive) but
reduces power. Conversely, a higher significance level (e.g., 0.10) increases power but
raises the risk of Type I errors.
4. Variability or standard deviation: The amount of variability or dispersion within the data
can impact power. Higher variability makes it more challenging to detect true effects and
reduces power. Conversely, lower variability increases the power of a study.
5. Study design and analysis methods: The choice of study design and statistical analysis
methods can influence power. A well-designed study with appropriate control of
confounding factors, randomization, and suitable statistical tests can enhance the power
of the study.
It is essential to consider power during the planning phase of a study. Researchers aim to
achieve sufficient power to detect meaningful effects or differences with a reasonable sample
size. Power analysis can be performed before conducting a study to estimate the required
sample size or after the study to assess the sensitivity of the analysis to detect the observed
effect.
In summary, the power of a study represents its ability to detect true effects or differences. It is
influenced by factors such as sample size, effect size, significance level, variability, and study
design. Adequate power is crucial to ensure reliable and meaningful results in statistical
analyses.
Q :3 WRITE THE DIFFERENCE BETWEEN HISTOGRAM AND BAR DIAGRAM .

Histograms and bar diagrams (also known as bar charts) are both graphical representations
used to display data. However, they have distinct characteristics and are used in different
scenarios. Here are the key differences between histograms and bar diagrams:
1. Measurement Scale:
 Histogram: Histograms are used to represent continuous or quantitative data.
The horizontal axis of a histogram represents the range of values divided into
intervals or bins, while the vertical axis represents the frequency or count of
observations falling within each bin.
 Bar Diagram: Bar diagrams are typically used to represent categorical or discrete
data. The categories or groups are displayed on the horizontal axis, while the
vertical axis represents the frequency, count, or proportion associated with each
category.
2. Data Representation:
 Histogram: Histograms display the distribution of data, showing how values are
spread across the range. The bars in a histogram are typically connected to each
other as they represent continuous data.
 Bar Diagram: Bar diagrams compare discrete categories or groups and their
associated values. Each category is represented by a separate bar, and there is
usually a gap between adjacent bars to distinguish between different categories.
3. Bar Width:
 Histogram: In a histogram, the width of each bar corresponds to the range of
values included in a particular bin. The width of the bars can vary, depending on
the range of values and the number of bins used.
 Bar Diagram: The width of bars in a bar diagram is usually consistent across all
categories and does not convey any specific quantitative information. The focus
is on comparing the heights or lengths of the bars rather than their widths.
4. X-axis Labeling:
 Histogram: The x-axis of a histogram represents the range of values or intervals,
and it is labeled accordingly with numerical values or intervals.
 Bar Diagram: The x-axis of a bar diagram represents the categorical or discrete
groups being compared, and it is labeled with the names or labels of those
groups.
5. Usage:
 Histogram: Histograms are commonly used to visualize the distribution of data,
identify patterns, detect outliers, and analyze continuous variables such as height,
weight, time, etc.
 Bar Diagram: Bar diagrams are frequently used to compare and display
categorical data, such as survey responses, product sales by category,
population by region, etc.
While histograms and bar diagrams share some similarities in terms of using bars to represent
data, they differ in their purpose, measurement scale, and representation of the underlying data.
Understanding these differences allows for the appropriate selection of the graphical
representation that best suits the type of data being analyzed.
Q: 4 DEFINE THE TERM MULTIPLE REGRESSION.

Multiple regression is a statistical technique used to analyze the relationship between a
dependent variable and two or more independent variables. It extends the concept of simple
linear regression, which examines the relationship between a dependent variable and a single
independent variable, to a scenario where multiple independent variables are considered
simultaneously.
In multiple regression, the goal is to create a regression model that can predict or explain the
variation in the dependent variable based on the values of the independent variables. The model
assumes a linear relationship between the variables, meaning that changes in the independent
variables are associated with a linear change in the dependent variable.
The multiple regression model can be expressed as follows:
Y = β₀ + β₁X₁ + β₂X₂ + ... + βₚXₚ + ε
Where:
 Y is the dependent variable that we want to predict or explain.

 X₁, X₂, ..., Xₚ are the independent variables.
 β₀ is the intercept, representing the value of Y when all independent variables are zero.
 β₁, β₂, ..., βₚ are the regression coefficients, representing the change in Y for a one-unit
change in the corresponding independent variable.
 ε is the error term, representing the unexplained variation in Y.
The multiple regression analysis estimates the regression coefficients (β) that best fit the data
and minimize the sum of squared differences between the observed and predicted values of the
dependent variable. These coefficients provide information about the direction and magnitude of
the relationships between the independent variables and the dependent variable.
Multiple regression allows for the examination of the individual contributions of each
independent variable while controlling for the effects of other variables. It also provides
information about the overall significance of the model, the goodness-of-fit, and statistical
inference regarding the significance of individual regression coefficients.
Multiple regression is widely used in various fields, including social sciences, economics,
psychology, finance, and biomedical research, to understand and predict the relationships
between multiple variables.
Q : 7 EXPLAIN NULL HYPOTHESIS AND ALTERNATIVE HYPOTHESIS .

In statistics and hypothesis testing, the null hypothesis (H0) and alternative hypothesis (H1) are
two competing statements that describe the relationship or effect being investigated. They are
essential components of hypothesis testing and help in drawing conclusions based on the
observed data.
1. Null Hypothesis (H0): The null hypothesis represents the default assumption or the
absence of an effect or relationship. It states that there is no significant difference, effect,
or relationship between variables or groups being studied. In other words, any observed
differences or associations in the data are due to chance or random variation.
For example, consider a study examining the effect of a new drug on blood pressure. The null
hypothesis would state that the drug has no effect on blood pressure, and any observed changes
in blood pressure between the treatment group and the control group are purely coincidental.
The null hypothesis is typically denoted as H0 and is initially assumed to be true. The goal of
hypothesis testing is to gather evidence to either reject or fail to reject the null hypothesis based
on the observed data.
2. Alternative Hypothesis (H1): The alternative hypothesis represents the researcher's claim
or the desired outcome. It contradicts the null hypothesis and suggests that there is a
significant difference, effect, or relationship between the variables or groups being
studied. It is the statement that the researcher hopes to support or find evidence for.
In the previous example, the alternative hypothesis (H1) would state that the new drug has a
significant effect on blood pressure. The alternative hypothesis can be directional (one-tailed),
specifying the direction of the effect (e.g., "the drug lowers blood pressure"), or non-directional
(two-tailed), simply stating that there is a difference or relationship without specifying the
direction.
During hypothesis testing, the goal is to gather evidence to support the alternative hypothesis
and reject the null hypothesis if the evidence is strong enough. The strength of evidence is
typically assessed through statistical tests and measures, such as p-values and confidence
intervals.
It is important to note that failing to reject the null hypothesis does not imply that the null
hypothesis is true. It simply means that there is not enough evidence to support the alternative
hypothesis based on the observed data. Hypothesis testing provides a systematic approach to
assess the likelihood of the observed data occurring under the assumption of the null hypothesis.
In summary, the null hypothesis represents the absence of an effect or relationship, while the
alternative hypothesis represents the claim or desired outcome. Hypothesis testing involves
gathering evidence to either reject or fail to reject the null hypothesis based on the observed data.
Q :8 EXPLAIN CRITICAL VALUE .
In hypothesis testing, the critical value is a threshold or cut off point used to determine whether
to reject or fail to reject the null hypothesis. It is based on the chosen level of significance (α),
which represents the maximum probability of making a Type I error (rejecting the null hypothesis
when it is actually true).
The critical value is derived from a probability distribution, such as the standard normal
distribution (Z-distribution) or the t-distribution, depending on the specific test and assumptions
being made. The critical value corresponds to a specific level of significance (α) and the degrees
of freedom associated with the test.
To determine the critical value, the researcher selects the desired level of significance (α) before
conducting the test. The significance level is typically set at 0.05 (5%) but can vary depending on
the context and the importance of making Type I errors.
For example, if the chosen significance level is α = 0.05 and the test assumes a standard normal
distribution, the critical value corresponds to the 5th percentile of the standard normal
distribution. This means that 95% of the distribution lies to the left of the critical value, and only
5% lies in the tail beyond it.
During the hypothesis test, the test statistic (e.g., Z-score or t-value) is compared to the critical
value. If the test statistic falls in the critical region (the tail beyond the critical value), it provides
evidence to reject the null hypothesis in favour of the alternative hypothesis. On the other hand, if
the test statistic falls within the non-critical region (the region between the critical values), the
null hypothesis is not rejected.
The critical value serves as a decision rule, providing a clear boundary for accepting or rejecting
the null hypothesis based on the observed test statistic. It helps maintain a balance between
making correct conclusions and controlling the risk of Type I errors.
It is important to note that critical values are specific to the chosen level of significance and the
probability distribution assumed by the test. Different tests and scenarios may require the use of
different critical values or tables specific to those tests.
In summary, the critical value is a threshold used to determine whether to reject or fail to reject
the null hypothesis in hypothesis testing. It is based on the chosen level of significance and is
derived from a probability distribution. The test statistic is compared to the critical value to make
a decision about the null hypothesis.
Q : 9 WRITE THE ADVANTAGES OF MINITAB.
Minitab is a powerful statistical software package that offers several advantages for data
analysis and quality improvement. Here are some of the key advantages of Minitab:
1. User-Friendly Interface: Minitab provides a user-friendly interface, making it accessible to

users with varying levels of statistical expertise. The software is designed to be intuitive,
allowing users to easily navigate through menus and perform statistical analyses without
extensive programming knowledge.
2. Wide Range of Statistical Tools: Minitab offers a comprehensive suite of statistical tools
for data analysis. It includes a vast array of descriptive statistics, hypothesis tests,
regression analysis, design of experiments, control charts, reliability analysis, and more.
These tools enable users to analyze data from various perspectives and gain valuable
insights.
3. Graphical Capabilities: Minitab provides robust graphical capabilities to visualize data
effectively. It offers a wide range of chart options, including scatter plots, histograms,
box plots, time series plots, and probability plots. These visual representations help users
identify patterns, trends, and outliers in the data, enhancing the understanding of the
underlying relationships.
4. Quality Improvement Tools: Minitab is widely used in quality improvement initiatives such
as Six Sigma. It includes specialized tools like process capability analysis, measurement
systems analysis (MSA), and control charts, which are essential for monitoring and
improving processes. Minitab's Quality Companion, an integrated tool, further enhances
the quality improvement process by providing a structured framework for project
management.
5. Data Import and Manipulation: Minitab supports various data file formats, allowing users
to import data from different sources seamlessly. It also offers powerful data
manipulation capabilities, including data transformation, recoding, merging, and
subsetting. These features enable users to clean and prepare data efficiently for analysis.
6. Statistical Output and Reports: Minitab generates comprehensive statistical output and
reports, facilitating clear communication of results. The software provides detailed
summaries, hypothesis test results, confidence intervals, and p-values. Users can easily
export the results to other formats, such as Word or Excel, or create customizable
reports with graphs and tables.
7. Extensive Documentation and Support: Minitab offers extensive documentation,
including user guides, tutorials, and online help resources. The software is backed by a
strong support system, including a dedicated support team and an active user
community. Users can find assistance, guidance, and solutions to their queries quickly,
ensuring a smooth and efficient analysis process.
8. Continuous Updates and Improvements: Minitab regularly releases updates and
improvements to enhance its functionality and address user needs. These updates
ensure that users have access to the latest statistical techniques, features, and bug fixes,
keeping the software up to date with evolving data analysis requirements.
Overall, Minitab's user-friendly interface, extensive statistical tools, graphical capabilities, quality
improvement features, data manipulation capabilities, comprehensive output, and robust support
make it a valuable tool for data analysis and quality improvement across various industries.
Q : 10 WRITE THE SIGNIFICANCE OF STANDARD ERROR OF MEAN.
The standard error of the mean (SEM) is a measure of the variability or precision of the sample
mean. It quantifies how much the sample means from different random samples of the same
population are expected to differ from each other. The significance of the standard error of the
mean lies in its ability to provide important information for statistical inference. Here are some
key points highlighting the significance of the standard error of the mean:
1. Precision of the Sample Mean: The SEM helps to assess the precision of the sample
mean as an estimate of the population mean. A smaller SEM indicates that the sample
mean is a more precise estimate of the population mean, while a larger SEM indicates
greater variability in the estimates. Understanding the precision of the sample mean is
crucial for making accurate inferences about the population.
2. Confidence Intervals: The SEM is used to calculate confidence intervals around the
sample mean. Confidence intervals provide a range of values within which the population
mean is likely to fall. The SEM is a key component in determining the width of the
confidence interval. A smaller SEM results in a narrower confidence interval, indicating
greater certainty about the population mean.
3. Hypothesis Testing: The SEM is essential in hypothesis testing involving the sample
mean. It helps determine the standard deviation of the sampling distribution of the mean,
which is required to calculate test statistics such as t-tests. The SEM is used in
calculating the standard error of the difference between two means, which is crucial in
comparing means from different groups or conditions.
4. Sample Size Determination: The SEM plays a role in sample size determination for
studies involving means. A smaller SEM allows for a smaller sample size to achieve a
desired level of precision in estimating the population mean. By understanding the SEM,
researchers can plan their studies more effectively, optimizing resources and ensuring
sufficient statistical power.
5. Comparing Studies: The SEM facilitates the comparison of studies with different sample
sizes. While the standard deviation provides an absolute measure of variability, the SEM
provides a relative measure that standardizes the variability by dividing it by the square
root of the sample size. This allows for meaningful comparisons of study findings, even
when sample sizes differ.
6. Meta-analysis: In meta-analysis, where multiple studies are combined to obtain an overall
estimate, the SEM is used to weigh the contribution of each study to the pooled mean
estimate. Studies with smaller SEMs (i.e., more precise estimates) are given more weight,
while studies with larger SEMs have less influence on the overall estimate.
In summary, the standard error of the mean is a significant statistical measure that informs the
precision of the sample mean, helps construct confidence intervals, facilitates hypothesis testing
and sample size determination, allows for comparisons between studies, and plays a crucial role
in meta-analysis. It provides valuable information for making reliable inferences about the
population mean based on sample data.
PART -B ( 2 X10 =20 MARKS )
Q :11 ( a) EXPLAIN IN DETAIL ABOUT OBSERVATIONAL STUDIES IN CLINICAL

STUDY DESIGN.
Observational studies are a type of study design used in clinical research to investigate the
relationship between various factors or exposures and outcomes of interest. Unlike experimental
studies, observational studies do not involve random assignment of participants to different
groups or interventions. Instead, researchers observe and collect data on individuals as they
naturally exist in their real-life settings. Here's a detailed explanation of observational studies in
clinical study design:
1. Types of Observational Studies: a. Cohort Studies: In cohort studies, researchers identify

a group of individuals based on their exposure status (e.g., exposed to a certain risk
factor) and follow them over time to assess the development of the outcome. Cohort
studies can be prospective (participants are followed forward in time) or retrospective
(participants are identified based on past exposure and outcomes). Cohort studies allow
for the examination of temporal relationships between exposures and outcomes.
b. Case-Control Studies: Case-control studies start with a group of individuals who have
the outcome of interest (cases) and a comparison group without the outcome (controls).
Researchers then retrospectively assess the exposure history of participants in both
groups. Case-control studies are useful when studying rare outcomes or outcomes with
long latency periods.
c. Cross-Sectional Studies: Cross-sectional studies collect data on both exposure and
outcome at a single point in time. These studies are used to estimate the prevalence of
an exposure or outcome in a population and examine the association between them.
Cross-sectional studies are generally less suitable for establishing cause-and-effect
relationships due to the lack of temporal information.
d. Ecological Studies: Ecological studies analyse data at the group or population level
rather than individual level. Researchers examine associations between exposures and
outcomes based on aggregate data. Ecological studies can provide insights into
population-level patterns but may not capture individual-level relationships accurately.
2. Study Design Considerations: a. Selection of Participants: Researchers must carefully
select the study population to ensure it represents the target population of interest.
Inclusion and exclusion criteria should be defined clearly to minimize biases and enhance
the generalizability of the findings.
b. Exposure Assessment: Accurate and reliable measurement of exposures is critical in
observational studies. Researchers employ various methods such as questionnaires,
interviews, medical records, or biomarker measurements to assess exposures. Efforts
should be made to minimize measurement errors and biases.
c. Outcome Assessment: Similarly, valid and reliable methods should be employed to
assess outcomes. Objective measurements, medical records, validated questionnaires,
or clinical assessments may be used to capture the outcome of interest. Blinding and
standardized protocols can enhance the reliability of outcome assessment.
d. Data Collection: Researchers collect data by various means, including interviews,
surveys, medical records, or administrative databases. It is essential to establish
rigorous data collection protocols to ensure consistency and accuracy. Quality control
measures should be implemented to minimize errors and biases during data collection.
e. Confounding Factors: Observational studies are susceptible to confounding, where the
relationship between exposure and outcome is distorted by the influence of other factors.
Researchers should carefully identify and account for potential confounding variables
through study design (e.g., matching or stratification) or statistical analysis (e.g.,
multivariate regression).
f. Bias and Limitations: Observational studies are prone to various biases, such as
selection bias, recall bias, or measurement bias. Researchers should be aware of these
limitations and employ appropriate strategies to minimize their impact on the study
findings.
3. Data Analysis: a. Descriptive Analysis: Initial analysis involves summarizing and
describing the study population, exposures, and outcomes. Descriptive statistics, such as
means, proportions, or measures of variability, provide an overview of the data.
b. Association Analysis: Observational studies typically assess the association between
exposure and outcome using statistical measures such as
Q : 11 ( b) EXPLAIN IN DETAIL ABOUT REPORT WRITING IN RESEARCH

METHODOLOGY.
Report writing in research methodology involves presenting the findings and results of a
research study in a structured and comprehensive manner. It serves as a means to communicate
the research process, methods, and outcomes to the intended audience, such as fellow
researchers, policymakers, or the general public. Here's a detailed explanation of the key aspects
of report writing in research methodology:
1. Structure of the Report:

 Title Page: Includes the title of the report, author's name, date, and any additional
relevant information.
 Abstract: A concise summary of the research study, highlighting the purpose,
methods, key findings, and conclusions.
 Introduction: Provides background information, research objectives, and the
rationale for the study.
 Literature Review: Summarizes existing knowledge and research relevant to the
study topic.
 Methods: Describes the research design, sampling technique, data collection
methods, and any statistical analyses performed.
 Results: Presents the findings of the study, often using tables, figures, and
descriptive statistics.
 Discussion: Interprets the results, compares them with previous studies,
discusses limitations, and provides explanations or hypotheses.
 Conclusion: Summarizes the main findings, implications, and potential areas for
further research.
 References: Lists all the sources cited in the report, following a specific citation
style
 Appendices: Includes supplementary material, such as questionnaires, data
collection forms, or additional analyses.
2. Clarity and Organization:
 Use clear and concise language to communicate ideas effectively.
 Ensure logical flow and coherence between sections of the report.
 Divide the report into subheadings for easy navigation and readability.
 Include appropriate transitions between paragraphs and sections to maintain the
overall coherence.
3. Accuracy and Detail:
 Provide accurate and detailed information about the research methodology,
including sample size, participant characteristics, data collection instruments,
and statistical analyses performed.
 Clearly explain any modifications or adaptations made to established research
methods or instruments.
 Include relevant information on data quality, validity, and reliability.
 Present results precisely, using appropriate statistical measures and supporting
evidence (e.g., tables, graphs).
4. Visual Presentation:
 Use visual aids (tables, graphs, figures) to present data and results clearly and
concisely.
 Ensure that visuals are properly labelled, referenced, and easily understandable.
 Choose appropriate graph types (e.g., bar charts, line graphs, pie charts) to
represent different types of data.
 Provide clear and concise captions for each visual aid, explaining its relevance to
the research study.
5. Ethical Considerations:
 Discuss ethical considerations related to the research, such as informed consent,
confidentiality, and potential conflicts of interest.
 If applicable, provide details about ethical approval or review board processes.
6. Critical Analysis and Interpretation:
 Analyse and interpret the findings within the context of the research objectives
and research questions.
 Discuss any unexpected or contradictory results and offer possible explanations
or hypotheses.
 Compare the study's findings with existing literature and highlight areas of
agreement or divergence.
7. Conciseness and Avoidance of Redundancy:
 Be concise and avoid unnecessary repetition or duplication of information.
 Eliminate irrelevant details or information that does not contribute to the main
research objectives or findings.
 Use clear and direct language to convey ideas and avoid overly complex or
convoluted sentences.
8. Proofreading and Editing:
 Thoroughly proofread the report for grammar, spelling, and punctuation errors.
 Check the accuracy of citations and references.
 Seek feedback from peers or mentors to identify areas for improvement or
clarification.
PART – C ( 7X 5 = 35 MARKS )
Q :14 DEFINE SAMPLING . EXPLAIN SAMPLING TECHNIQUES
Sampling is the process of selecting a subset of individuals or units from a larger population to
gather data and make inferences about the population as a whole. In other words, it involves
selecting a representative sample from a population to study and draw conclusions or make
generalizations.
Sampling Techniques:
1. Simple Random Sampling: In simple random sampling, each member of the population
has an equal chance of being selected. This technique ensures that every individual in
the population has an equal opportunity to be included in the sample. It is often done
using random number generators or drawing names from a hat.
2. Stratified Sampling: Stratified sampling involves dividing the population into
homogeneous subgroups called strata and then randomly selecting samples from each
stratum. This technique ensures that the sample represents the diversity or variability
within the population. Stratification can be based on demographic factors, such as age,
gender, or location.
3. Cluster Sampling: Cluster sampling involves dividing the population into clusters or
groups and then randomly selecting entire clusters as the sampling units. This technique
is useful when it is impractical or costly to sample individuals directly. It can be more
efficient in terms of time and resources, but it may introduce more variability within the
clusters.
4. Systematic Sampling: Systematic sampling involves selecting every nth individual from
the population after a random starting point. For example, if the population size is 1000
and the desired sample size is 100, every 10th individual can be selected after a random
number between 1 and 10 is chosen. This technique is straightforward and easy to
implement but can introduce bias if there is any periodicity in the population.
5. Convenience Sampling: Convenience sampling involves selecting individuals who are
readily available or easily accessible to the researcher. This technique is often used for
its simplicity and convenience, but it may not be representative of the entire population
and can introduce bias.
6. Purposive Sampling: Purposive sampling involves selecting individuals who possess
specific characteristics or meet certain criteria relevant to the research study. This
technique is often used when studying a particular subgroup or population of interest.
While it allows for targeted and focused data collection, it may not be representative of
the entire population.
7. Snowball Sampling: Snowball sampling involves identifying a few initial participants who
meet the research criteria and then asking them to refer other eligible individuals. This
technique is commonly used when studying rare populations or hard-to-reach groups. It
relies on the network of participants to identify and recruit additional participants.
Each sampling technique has its advantages and disadvantages, and the choice of technique
depends on the research objectives, available resources, and the characteristics of the
population under study. The goal is to select a sample that is representative of the population
and allows for valid and reliable inferences to be made.
Q ;15 EXPLAIN PAIRED t- test IN DETAIL .
The paired t-test, also known as the dependent samples t-test or paired-samples t-test, is a
statistical test used to compare the means of two related or paired samples. It is specifically
applicable when the two samples are not independent, such as when measurements are taken
on the same individuals or subjects before and after an intervention. The paired t-test determines
whether there is a significant difference between the means of the paired samples.
Here's a detailed explanation of the paired t-test:
Assumptions: Before conducting a paired t-test, it is important to ensure that the following
assumptions are met:
1. The differences between the paired observations should be normally distributed.

2. The differences should have a mean of zero (i.e., no systematic difference between the
two measurements).
3. The paired observations should be independent of each other.
Hypotheses: The paired t-test involves testing the null hypothesis (H₀) and the alternative
hypothesis (H₁):
 Null Hypothesis (H₀): There is no significant difference between the means of the paired
samples.
 Alternative Hypothesis (H₁): There is a significant difference between the means of the
paired samples.
Steps to perform a paired t-test:
Step 1: Define the paired samples: Identify the two related samples or measurements that are
paired together. For example, measurements before and after a treatment or measurements on
the same individuals under different conditions.
Step 2: Calculate the differences: Calculate the differences between the paired observations by
subtracting the value of one observation from the corresponding value of the other observation.
These differences represent the change or the effect of the intervention or treatment.
Step 3: Calculate the mean difference: Calculate the mean of the differences. This gives an
estimate of the average change between the paired observations.
Step 4: Calculate the standard deviation of the differences: Calculate the standard deviation of
the differences. This quantifies the variability of the differences between the paired observations.
Step 5: Calculate the t-statistic: The t-statistic is calculated using the formula: t = (mean
difference) / (standard deviation of differences / sqrt(sample size))
Step 6: Determine the degrees of freedom: The degrees of freedom (df) for the paired t-test is
equal to the sample size minus 1.
Step 7: Determine the critical value or calculate the p-value: Compare the calculated t-statistic to
the critical value from the t-distribution table with the appropriate degrees of freedom.
Alternatively, you can calculate the p-value associated with the t-statistic using statistical
software or an online calculator.
Step 8: Make a conclusion: If the calculated t-statistic is greater than the critical value or the p-
value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis. This
indicates that there is a significant difference between the means of the paired samples. If the t-
statistic is less than the critical value or the p-value is greater than the significance level, we fail
to reject the null hypothesis, suggesting no significant difference between the means of the
paired samples.
The paired t-test allows for a direct comparison of the paired samples, taking into account the
individual differences within the pairs. It is commonly used in various fields, including medicine,
psychology, and social sciences, where measurements are often collected before and after an
intervention or treatment on the same individuals.
Q : 16 DEFINE NORMATRIBUTION AND STATE ITS PROPERTIES

Normal distribution, also known as the Gaussian distribution or bell curve, is a probability
distribution that is symmetric and bell-shaped. It is characterized by its mean (μ) and standard
deviation (σ). The shape of the normal distribution is determined solely by these two parameters.
Properties of the normal distribution:
1. Symmetry: The normal distribution is symmetric around its mean. This means that the
curve is equally balanced on both sides of the mean, and the probabilities of
observations falling to the left or right of the mean are equal.
2. Bell-shaped curve: The shape of the normal distribution resembles a bell, with a peak at
the mean and gradually tapering off towards the tails. The curve is smooth and
continuous.
3. Mean, Median, and Mode: The mean, median, and mode of a normal distribution are all
equal and located at the centre of the distribution. The mean represents the balance
point of the distribution.
4. Empirical Rule: The normal distribution follows the empirical rule, also known as the 68-
95-99.7 rule, which states that approximately 68% of the data falls within one standard
deviation of the mean, about 95% falls within two standard deviations, and nearly 99.7%
falls within three standard deviations.
5. Standardization: Values in a normal distribution can be standardized by transforming
them into z-scores. A z-score represents the number of standard deviations an
observation is from the mean. This allows for comparison and interpretation of data
across different normal distributions.
6. Central Limit Theorem: The normal distribution has a special property known as the
Central Limit Theorem (CLT). According to the CLT, the distribution of the sample means,
regardless of the shape of the population distribution, tends to follow a normal
distribution as the sample size increases.
7. Probability Density Function (PDF): The normal distribution is described by its probability
density function, which is a mathematical function that represents the likelihood of
observing a particular value or range of values. The PDF of the normal distribution is
symmetric, bell-shaped, and defined by the mean and standard deviation.
The normal distribution is widely used in statistics and probability theory due to its well-
understood properties and its occurrence in many natural phenomena. It serves as a foundation
for various statistical tests, confidence intervals, hypothesis testing, and modelling in numerous
fields of study.
Q : 18 EXPLAIN IN DETAIL ABOUT THEORY OF PROBABILITY

The theory of probability is a branch of mathematics that deals with quantifying uncertainty and
analysing random events. It provides a framework for understanding and predicting the
likelihood of various outcomes in situations involving uncertainty.
Key Concepts in Probability Theory:
1. Sample Space: The sample space represents the set of all possible outcomes of an
experiment or a random event. It is denoted by the symbol Ω.
2. Event: An event is a subset of the sample space, which represents a specific outcome or
a combination of outcomes of interest. Events are denoted by capital letters (A, B, C, etc.).
3. Probability: Probability is a numerical measure that quantifies the likelihood of an event
occurring. It is denoted by the symbol P and ranges from 0 to 1. A probability of 0
indicates impossibility, while a probability of 1 indicates certainty.
4. Probability Axioms: Probability theory is based on three fundamental axioms or rules: a)
Non-Negativity: The probability of any event is greater than or equal to zero: P(A) ≥ 0. b)
Additivity: For a collection of mutually exclusive events (events that cannot occur
simultaneously), the probability of their union is the sum of their individual probabilities:
P(A ∪ B) = P(A) + P(B). c) Normalization: The probability of the entire sample space is
equal to 1: P(Ω) = 1.
5. Complementary Event: The complement of an event A, denoted by A', represents all
outcomes in the sample space that are not in A. The probability of the complement is
given by P(A') = 1 - P(A).
6. Union and Intersection of Events: The union of two events A and B (A ∪ B) represents the
event that either A or B or both occur. The intersection of two events A and B (A ∩ B)
represents the event that both A and B occur.
7. Conditional Probability: Conditional probability measures the probability of an event A
occurring given that another event B has already occurred. It is denoted by P(A|B) and is
calculated as P(A|B) = P(A ∩ B) / P(B), where P(B) ≠ 0.
8. Independent Events: Two events A and B are independent if the occurrence or non-
occurrence of one event does not affect the probability of the other event.
Mathematically, P(A ∩ B) = P(A) × P(B).
9. Bayes' Theorem: Bayes' theorem provides a way to update the probability of an event A
based on new information or evidence B. It is expressed as: P(A|B) = [P(B|A) × P(A)] /
P(B).
10. Random Variables: A random variable is a variable that takes on different values based
on the outcome of a random event. It can be discrete (taking on distinct values) or
continuous (taking on any value within a range).
11. Probability Distributions: Probability distributions describe the probabilities of different
values that a random variable can take. The two main types of distributions are the
discrete probability distribution (e.g., binomial, Poisson) and the continuous probability
distribution (e.g., normal, exponential).
Probability theory has numerous applications in various fields, including statistics, economics,
physics, engineering, and social sciences. It helps in making informed decisions, assessing risks,
predicting outcomes, and understanding the behaviour of random phenomena.
Q : 21 EXPLAIN 22 FACTORIAL DESIGNE ND WRITE ITS ADVANTAGES
A 2-square factorial design is a type of experimental design commonly used in research to study
the effects of two independent variables, also known as factors, on a dependent variable. In this
design, each independent variable has two levels or conditions, resulting in a total of four
treatment combinations.
For example, let's consider two independent variables, A and B, each with two levels, low (L) and
high (H). The four treatment combinations in a 2-square factorial design would be LL, LH, HL, and
HH.
Advantages of a 2-Square Factorial Design:
1. Efficiency: A 2-square factorial design allows researchers to simultaneously examine the

main effects of two factors as well as their interaction effect. By studying multiple
factors in a single experiment, it is more efficient in terms of time, resources, and
participant recruitment compared to conducting separate experiments for each factor.
2. Identification of Main Effects: The design allows researchers to assess the individual
effects of each factor (main effects) on the dependent variable. By comparing the means
of different treatment combinations, researchers can determine if a particular factor has
a significant effect on the dependent variable.
3. Identification of Interaction Effects: One of the major advantages of the 2-square
factorial design is its ability to investigate interaction effects between the two factors. An
interaction occurs when the effect of one factor on the dependent variable depends on
the level of the other factor. This design helps identify whether the effects of the two
factors interact and produce a combined effect that is different from their individual
effects.
4. Statistical Efficiency: The design allows for efficient statistical analysis by employing
factorial analysis of variance (ANOVA) or regression techniques. These methods enable
researchers to analyze the main effects and interaction effects simultaneously, providing
a comprehensive understanding of the factors' influence on the dependent variable.
5. Generalizability: With a well-designed 2-square factorial design, researchers can draw
conclusions and make generalizations about the effects of the factors and their
interactions beyond the specific levels used in the experiment. This enhances the
external validity and applicability of the findings to a broader population or context.
6. Control of Confounding Variables: By including multiple treatment combinations, a 2-
square factorial design helps control for potential confounding variables. Confounding
variables are factors other than the ones being studied that can influence the dependent
variable. By randomizing the order of treatments, any potential confounding effects are
distributed equally across the different combinations, minimizing their impact on the
results.
7. Cost-Effectiveness: Conducting a 2-square factorial design can be more cost-effective
compared to conducting multiple separate experiments. It reduces the need for
additional resources, such as participants, equipment, and time, as multiple factors are
studied simultaneously.
Overall, the 2-square factorial design provides a powerful and efficient approach to study the
effects of two independent variables and their interactions on a dependent variable. It allows for
a comprehensive analysis of main effects, interaction effects, and their combined influence,
leading to a better understanding of the relationships among variables.
OR
A 2-square factorial design is a type of experimental design commonly used in research studies.
It involves two independent variables, each with two levels, resulting in a total of four treatment
combinations. The design allows researchers to examine the main effects of each variable
individually, as well as the interaction between the variables.
In a 2-square factorial design, the independent variables are often referred to as factor A and
factor B. Each factor has two levels, typically labeled as low (L) and high (H). The four treatment
combinations in the design are represented as follows:
1. LL: Low level of factor A and low level of factor B

2. LH: Low level of factor A and high level of factor B
3. HL: High level of factor A and low level of factor B
4. HH: High level of factor A and high level of factor B
Advantages of a 2-square factorial design:
1. Efficiency: A 2-square factorial design is efficient in terms of utilizing resources and

reducing the number of experimental runs. It allows researchers to simultaneously
investigate the effects of two variables using a relatively small number of experimental
units.
2. Interaction Effects: This design enables the assessment of interaction effects between
the two independent variables. Interaction occurs when the effect of one variable
depends on the level of the other variable. Identifying interaction effects is valuable in
understanding the complex relationships between variables.
3. Main Effects: Besides interaction effects, the design allows for the analysis of the main
effects of each independent variable. Main effects represent the average effect of a
variable, independent of the other variable. By examining main effects, researchers can
determine the individual contributions of each variable to the outcome.
4. Control of Confounding Factors: A 2-square factorial design helps control potential
confounding factors by incorporating them into the design structure. By randomly
assigning the treatment combinations to experimental units, researchers can minimize
the influence of extraneous variables and enhance internal validity.
5. Statistical Efficiency: The design offers statistical efficiency in terms of estimating
treatment effects and evaluating statistical significance. It allows for the estimation of
main effects and interaction effects with relatively high precision, enhancing the power of
the analysis.
6. Generalizability: Results obtained from a well-designed 2-square factorial study have
good external validity. The ability to investigate multiple factors simultaneously enhances
the generalizability of the findings to real-world situations.
Overall, a 2-square factorial design provides a robust and efficient approach to study the effects
of two independent variables, enabling researchers to explore main effects, interaction effects,
and control for confounding factors in a relatively compact experimental setup.
Q : 22 EXPLAIN OPTIMIZATION TECHNIQUES IN RESPONSE SURFACE
METHODOLOGY
Optimization techniques play a crucial role in response surface methodology (RSM), which is a
collection of statistical and mathematical techniques used to model and optimize complex
processes. RSM aims to find the optimal values of input variables (factors) that result in the
desired response (output) of a system or process. Here are three common optimization
techniques used in RSM:
1. Gradient-Based Optimization: Gradient-based optimization methods utilize the gradient

(slope) of the response surface to iteratively search for the optimal solution. One widely
used algorithm is the steepest ascent/descent method, where the response is
maximized or minimized by following the steepest direction of change. This method
requires the calculation of partial derivatives of the response function with respect to
each factor. The process continues until a maximum or minimum point is reached.
2. Response Surface Methodology (RSM) Optimization: RSM optimization techniques
involve fitting a mathematical model (response surface) to the experimental data
obtained from the design of experiments (DOE). The response surface can be a
polynomial equation or a more complex mathematical model. Once the response surface
is constructed, optimization methods such as numerical optimization or graphical
methods can be used to determine the optimal factor settings that maximize or minimize
the response.
Numerical optimization algorithms, such as the simplex method or the Nelder-Mead
algorithm, can be applied to the response surface equation to find the optimal values of
the factors. These algorithms search for the minimum or maximum point by iteratively
evaluating the response surface at different factor combinations.
Graphical methods involve plotting contour or surface plots of the response surface to
visualize the relationship between the factors and the response. By examining these
plots, one can identify the factor settings that result in the optimal response.
3. Design of Experiments (DOE) Optimization: DOE is a powerful technique used in RSM to
systematically explore the factor space and determine the optimal factor settings. By
carefully selecting the factor levels and combinations in an experimental design, DOE
allows for efficient exploration of the factor space and estimation of the response
surface.
Optimization techniques based on DOE involve analyzing the experimental data using
statistical techniques such as regression analysis or analysis of variance (ANOVA).
These techniques help identify significant factors and interactions and estimate the
response surface model. Once the model is obtained, optimization methods, such as
response surface optimization or desirability function optimization, can be applied to find
the factor settings that optimize the response.
Overall, optimization techniques in RSM aim to identify the factor settings that result in the
optimal response. By employing gradient-based optimization, response surface optimization, or
DOE-based optimization, researchers can efficiently explore the factor space and determine the
optimal values for improved process performance or system design.
JULY 2021 Q : 12 ( a) WHAT IS SPSS ? EXPLAIN THE

IMPORTANT SPSS MODULES
SPSS (Statistical Package for the Social Sciences) is a software package widely used for
statistical analysis in various fields, including social sciences, market research, healthcare, and
business. It provides a range of tools and techniques for data management, data analysis, and
data visualization.
SPSS offers several important modules that cater to specific analytical needs. Here are some of
the key modules in SPSS:
1. SPSS Base: This is the core module of SPSS that provides basic data management and
statistical analysis capabilities. It includes features for data manipulation, data
transformation, and descriptive statistics.
2. SPSS Advanced Statistics: This module offers advanced statistical techniques beyond
the basic ones available in the base module. It includes features such as factor analysis,
cluster analysis, nonparametric tests, and survival analysis.
3. SPSS Regression: This module focuses on regression analysis, which is used to examine
the relationship between variables and predict outcomes. It includes various regression
methods, such as linear regression, logistic regression, and ordinal regression.
4. SPSS Custom Tables: This module is used for creating customized tables and charts to
summarize and present data. It allows users to generate complex tables and graphs with
advanced formatting options.
5. SPSS Decision Trees: This module is used for building decision trees, which are
predictive models that use a tree-like structure to represent decisions and their possible
consequences. Decision trees are often used in classification and prediction tasks.
6. SPSS Missing Values: This module provides tools for handling missing data in datasets.
It offers methods for imputing missing values and analysing the impact of missing data
on statistical results.
7. SPSS Data Preparation: This module focuses on data cleaning and preparation tasks. It
includes features for data screening, data recoding, and data transformation, helping
users to prepare their datasets for analysis.
8. SPSS Bootstrapping: Bootstrapping is a resampling technique used for estimating the
sampling distribution of a statistic. The bootstrapping module in SPSS allows users to
perform bootstrap analyses to assess the stability and variability of statistical estimates.
These are just a few examples of the important modules available in SPSS. The software also
offers modules for specific domains like SPSS Amos for structural equation modelling and SPSS
Conjoint for conjoint analysis, among others. Each module provides additional functionality and
techniques to enhance the capabilities of SPSS for statistical analysis.

JULY 2022 Biostatistics &amp; Research Methodology QP

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JULY 2022 Biostatistics &amp; Research Methodology QP

Uploaded by

Copyright:

Available Formats

JULY 2022

B.PHARMACY -VIII SEMESTER ( MAIN & BACKLOG )

PART- A ( 10X2 =20 MARKS)

Q:1 Define type -I error

To understand this concept better, let's consider an example:

Power is influenced by several factors:

Q :3 WRITE THE DIFFERENCE BETWEEN HISTOGRAM AND BAR DIAGRAM .

Q: 4 DEFINE THE TERM MULTIPLE REGRESSION.

The multiple regression model can be expressed as follows:

Y = β₀ + β₁X₁ + β₂X₂ + ... + βₚXₚ + ε

 Y is the dependent variable that we want to predict or explain.

Q : 7 EXPLAIN NULL HYPOTHESIS AND ALTERNATIVE HYPOTHESIS .

Q :8 EXPLAIN CRITICAL VALUE .

Q : 9 WRITE THE ADVANTAGES OF MINITAB.

1. User-Friendly Interface: Minitab provides a user-friendly interface, making it accessible to

Q : 10 WRITE THE SIGNIFICANCE OF STANDARD ERROR OF MEAN.

PART -B ( 2 X10 =20 MARKS )

Q :11 ( a) EXPLAIN IN DETAIL ABOUT OBSERVATIONAL STUDIES IN CLINICAL

1. Types of Observational Studies: a. Cohort Studies: In cohort studies, researchers identify

Q : 11 ( b) EXPLAIN IN DETAIL ABOUT REPORT WRITING IN RESEARCH

1. Structure of the Report:

Here's a detailed explanation of the paired t-test:

1. The differences between the paired observations should be normally distributed.

Steps to perform a paired t-test:

Q : 16 DEFINE NORMATRIBUTION AND STATE ITS PROPERTIES

Properties of the normal distribution:

Q : 18 EXPLAIN IN DETAIL ABOUT THEORY OF PROBABILITY

Key Concepts in Probability Theory:

Advantages of a 2-Square Factorial Design:

1. Efficiency: A 2-square factorial design allows researchers to simultaneously examine the

1. LL: Low level of factor A and low level of factor B

Advantages of a 2-square factorial design:

1. Efficiency: A 2-square factorial design is efficient in terms of utilizing resources and

1. Gradient-Based Optimization: Gradient-based optimization methods utilize the gradient

JULY 2021 Q : 12 ( a) WHAT IS SPSS ? EXPLAIN THE

You might also like

JULY 2022 Biostatistics & Research Methodology QP

JULY 2022 Biostatistics & Research Methodology QP