Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

4marks:

1.measures of central tendency

MEASURES of CENTRAL TENDENCY A measure of central tendency is a number that represents the
typical value in a collection of numbers. Three familiar measures of central tendency are the mean,
the median, and the mode.

We will let n represent the number of data points in the distribution.

Mean = sum of all data points n (The mean is also known as the "average" or the "arithmetic
average.")

Median = "middle" data point (or average of two middle data points) when the data points are
arranged in numerical order.

Mode = the value that occurs most often (if there is such a value).

2. python prg to scatter plot using 2 arrays

import matplotlib.pyplot as plt

# Sample data

x = [1, 2, 3, 4, 5]

y = [2, 3, 5, 7, 11]

# Create scatter plot

plt.scatter(x, y)

# Add labels and title

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.title('Scatter Plot')

# Show plot

plt.show()

The plot will show points at coordinates (1, 2), (2, 3), (3, 5), (4, 7), and (5, 11), representing the data
from the x and y arrays.
3. steps involved in making plots

Import Libraries:

import matplotlib.pyplot as plt

Prepare Data:

x = [1, 2, 3, 4, 5]

y = [2, 3, 5, 7, 11]

Create Plot:

plt.scatter(x, y)

Customize Plot:

plt.xlabel('X-axis')

plt.ylabel('Y-axis')

plt.title('Scatter Plot')

Show or Save Plot:

plt.show() # Display the plot

# OR

plt.savefig('plot.png') # Save the plot to a file

Close Plot:

plt.close() # Close the plot window

4.data visualization is better than the traditional text based data methods.

 Data visualization means showing information with pictures instead of just words or
numbers. It's like when you look at a graph or chart instead of reading a long list of numbers.
 Visuals help us understand things better because our brains like pictures. They make it easier
to see patterns, like trends going up or down, or groups of similar things.
 Using pictures to show data makes it easier to talk about it with other people. Everyone can
understand what's going on, even if they're not experts.
 It also helps us remember information better. Think about how you remember a picture you
saw compared to something you read.
 When we see data in pictures, we can spot mistakes more easily. If something doesn't look
right, it's easier to notice when it's in a graph or chart.
 So, data visualization is like telling a story with pictures, making it easier to understand,
remember, and share information.
6.box plot by calculating five number summary. Also, find the Inter Quartile Range

let's take a subset of 5 data points from the provided sample data:

{31.5,36.9,33.8,30.1,33.9}{31.5,36.9,33.8,30.1,33.9}

First, let's sort the data in ascending order: {30.1,31.5,33.8,33.9,36.9}{30.1,31.5,33.8,33.9,36.9}

Now, let's calculate the five-number summary and IQR:

Minimum (Min): The smallest value in the dataset.

Min = 30.1

First Quartile (Q1): The median of the lower half of the dataset.

Q1 = Median of {30.1, 31.5} ≈ 30.8

Median (Q2): The middle value of the dataset.

Q2 (Median) = 33.8

Third Quartile (Q3): The median of the upper half of the dataset.

Q3 = Median of {33.9, 36.9} ≈ 35.4

Maximum (Max): The largest value in the dataset.

Max = 36.9

Now, we can find the Interquartile Range (IQR): 𝐼𝑄𝑅=𝑄3−𝑄1IQR=Q3−Q1


𝐼𝑄𝑅=35.4−30.8IQR=35.4−30.8 𝐼𝑄𝑅=4.6IQR=4.6

So, for the subset of 5 data points, the five-number summary is:

Min = 30.1

Q1 ≈ 30.8

Q2 (Median) = 33.8

Q3 ≈ 35.4

Max = 36.9

And the Interquartile Range (IQR) is approximately 4.6.

7.non-probability sampling methods

Non-probability sampling: In non-probability sampling, the researcher randomly


chooses members for research. This sampling method is not a fixed or predefined selection
process. This makes it difficult for all population elements to have equal opportunities to be
included in a sample

● Types of non-probability sampling with examplesConvenience sampling:


○ This method depends on the ease of access to subjects such as
surveying customers at a mall or passers-by on a busy street.

○ It is usually termed as convenience sampling because of the


researcher’s ease of carrying it out and getting in touch with the
subjects.

○ It’s purely done based on proximity and not representativeness. This


non-probability sampling method is used when there are time and cost
limitations in collecting feedback.

● Judgmental or purposive sampling:

○ Researchers purely consider the purpose of the study, along with the
understanding of the target audience.

● For instance, when researchers want to understand the thought process of


people interested in studying for their master’s degree. Snowball sampling:

○ Researchers apply when the subjects are difficult to trace. For example,
surveying shelterless people or illegal immigrants will be extremely
challenging. In such cases, using the snowball theory, researchers can
track a few categories to interview and derive results. Researchers also
implement this sampling method when the topic is highly sensitive and
not openly discussed—for example, surveys to gather information
about HIV Aids. Not many victims will readily respond to the questions.
Still, researchers can contact people they might know or volunteers
associated with the cause to get in touch with the victims and collect
information.

● Quota sampling:

○ In Quota sampling, members in this sampling technique selection


happens based on a pre-set standard. In this case, as a sample is
formed based on specific attributes, the created sample will have the
same qualities found in the total population. It is a rapid method of
collecting samples.

8.various data partitioning schemes

 Random Partitioning: Imagine throwing your data into different groups without any
particular order or reason. It's like putting names into hats and picking which hat each name
goes into randomly.
 Hash Partitioning: Think of a secret code that each piece of data gets. Data with the same
code goes into the same group. This is useful for spreading data evenly across different
places.
 Range Partitioning: Suppose you have a big list of numbers, and you want to split them into
groups based on how big or small they are. You might put numbers between 1 and 10 in one
group, and numbers between 11 and 20 in another group, and so on.
 Round-robin Partitioning: This is like taking turns. Each piece of data gets put into a different
group, one after the other, in a circle. It's simple and fair but might not be the best for
organizing the data.
 Key-based Partitioning: Imagine sorting your data based on a special key, like the first letter
of a name. All the names starting with "A" go in one group, "B" in another, and so on. It's
helpful for quickly finding specific pieces of data.
 List Partitioning: Think of having lists of specific things and putting each piece of data into the
list it belongs to. For example, you might have lists for different types of fruits, and each fruit
goes into its respective list.
 Composite Partitioning: This is like using two or more methods together. You might first
organize your data by size and then by color, so you have small red things together, large blue
things together, and so on.
 Vertical Partitioning: Imagine splitting your data into columns. Each column goes into its own
group. It's useful when some columns are accessed more often than others, so you can
manage them separately.
 Horizontal Partitioning: Picture cutting your data into rows. Each piece of data goes into its
own row, and rows are grouped together. It's like dividing your data into chunks, which can
be spread out across different places for easier handling.

9. Balas additive algorithm for linear binary optimization.

 The Balas Additive Algorithm is a method used to solve linear binary optimization problems.
These problems involve maximizing or minimizing a linear objective function subject to
constraints, where the decision variables are binary (taking on values of either 0 or 1).
 Initialization: Start with an initial set of constraints, which typically include any equality
constraints and non-negativity constraints.
 Iterative Improvement:
 At each iteration, solve a relaxed version of the problem where the binary variables are
treated as continuous variables (i.e., allowing fractional values between 0 and 1).
 Identify a violated constraint in the relaxed solution. This is a constraint that is not satisfied
by the current solution.
 Add the violated constraint to the problem, making it more restrictive.
 Repeat this process until no violated constraints are found, indicating an optimal solution.
 Optimal Solution:
 Once no violated constraints are found in the relaxed solution, the current solution is optimal
for the original binary optimization problem.
 If fractional values are obtained for any variables, round them to the nearest integer (0 or 1)
to obtain a feasible binary solution.

You might also like