Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Use structuring methods to establish order in your dataset

As a data analytics

professional, you will often need to learn

more about your datasets. This is where the structuring

practice of EDA can help. Let's explore the

valuable methods that are part of this practice. As you'll recall from

earlier in the program, structuring helps

you to organize, gather, separate, group, and filter your data in different ways to

learn more about it. Next, we'll talk about the methods involved in

structuring and later, you'll practice these

concepts using Python. First on the list of

structuring methods is sorting. Sorting is the process of arranging data into

meaningful order. Imagine that you are given

a dataset about kangaroos. These furry creatures are native to Australia and

Papa New Guinea. They are known for

their strong tails and belly pouch they used to cradle their babies called Joeys. The kangaroo dataset

contains information about kangaroo characteristics

like pouch size, tail length, total body length, and much, much more. The first data we'll consider

is a data column measuring the volume of the kangaroo

pouches in cubic centimeters. We can sort those values

in ascending or descending order from biggest to smallest

or smallest to biggest. Another useful structuring

tool is extraction. Extracting is the

process of retrieving data from a dataset or source

for further processing, you can think of extraction as retrieving whole

columns of data. An example of extraction is to take the kangaroo

data from before, then evaluate just two

of the columns from the dataset such as pouch


volume and tail length. You can use the

resulting data for analysis, comparisons,

or visualization. Another structuring

method is filtering. Filtering is the process of selecting a smaller

part of your dataset based on specified parameters and using it for

viewing or analysis. You can think of filtering as selecting rows of a dataset. In the case of our

kangaroo dataset, filtering can look like viewing only the

kangaroo pouches of kangaroos that also have

tails shorter than one meter. This is useful in finding meaningful groups or

trends in the data. Next on the list of structuring

methods is slicing. Slicing breaks information

down into smaller parts to facilitate efficient

examination and analysis from

different viewpoints. Think of slicing as an either or both options for

columns and rows. A combination of

extraction and filtering. In the kangaroo dataset, let's say you have a column of their body length called

total body length. In another column, you have

the kangaroos identified as one of the three different

regional populations. If you were to take

the body length of only one of the three

regional populations, you would be pulling

a slice of the data. Grouping is our next

structuring method. Grouping sometimes

called bucketizing, is aggregating

individual observations of a variable into groups. An example of grouping is

to add a new column called Total Body length next to the kangaroo tail length column. Then group all
the tail

lengths into three types, long, average, and short based on the measurements
in the tail length column. You can now find and organize the total body length values based on the
kangaroo

tail length groups. The last structuring

method is merging. Merging is a method to combine two different data frames along a specified

starting column. For example imagine we had an additional dataset of kangaroo information from

a different field study, but with the same

parameters and variables. We might use the merge

or join functions to align the columns and combine the new data

into one data set. It's essential that

you do not change the meaning of the data while

performing your filtering, sorting, slicing, joining,

and merging operations. If for example, we did not merge the kangaroo

pouch measurements correctly with their matching

kangaroo name and ID, the data would not

be representative and our analysis would

be far less than useful. Being true to the data is

being true to its story. Hopefully, you are beginning

to understand the value of organizing and structuring

data in order to analyze it. Coming up, we will practice

structuring using Python.

You might also like