Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 5

1.What is Pandas and why is it used in Python?

Pandas is a powerful open-source data manipulation and analysis library for Python.
It provides easy-to-use data structures and functions to work with structured data,
such as tabular data, time series, and more. Pandas is widely used in data analysis
and manipulation tasks due to its flexibility, efficiency, and rich functionality.

2.How do you install Pandas in Python?

You can install Pandas using pip, the Python package manager. Run the following
command in your terminal or command prompt:

pip install pandas

3.Explain the primary data structures in Pandas.

The primary data structures in Pandas are:


Series: One-dimensional labeled array capable of holding any data type.
DataFrame: Two-dimensional labeled data structure with columns of potentially
different types. It is similar to a spreadsheet or SQL table.

4.How do you create a Pandas DataFrame from a Python dictionary?

You can create a DataFrame from a dictionary using the pd.DataFrame() constructor.
Each key-value pair in the dictionary corresponds to a column in the DataFrame.
python

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],


'Age': [28, 35, 42, 32]}

df = pd.DataFrame(data)

5.What is the purpose of the head() and tail() functions in Pandas?

The head() function returns the first n rows of a DataFrame, while the tail()
function returns the last n rows. They are useful for quickly inspecting the
beginning or end of a large DataFrame.
python

print(df.head()) # Returns the first 5 rows


print(df.tail(3)) # Returns the last 3 rows

6.Differentiate between a DataFrame and a Series in Pandas.

A DataFrame is a two-dimensional labeled data structure with columns of potentially


different data types, similar to a table in a relational database or a spreadsheet.
A Series, on the other hand, is a one-dimensional labeled array capable of holding
any data type, similar to a single column in a DataFrame.

7.How do you check for missing values in a DataFrame?

You can use the isnull() function to check for missing values in a DataFrame. It
returns a DataFrame of the same shape as the input with True where NaN values are
present and False otherwise.
python

missing_values = df.isnull()
8.Explain the purpose of the shape attribute in Pandas.

The shape attribute of a DataFrame returns a tuple representing the dimensions of


the DataFrame. It indicates the number of rows and columns in the DataFrame.
print(df.shape) # Output: (4, 2) - 4 rows, 2 columns

9.How can you rename columns in a Pandas DataFrame?

You can rename columns in a DataFrame using the rename() function. Specify the
current column names as keys and the new names as values in a dictionary.

df.rename(columns={'old_name': 'new_name'}, inplace=True)

10.What is the role of the dtype parameter in Pandas?

The dtype parameter in Pandas specifies the data type of the elements in a
DataFrame or Series. It allows you to explicitly set or infer the data type of each
column, such as int, float, object, datetime, etc., during DataFrame creation or
manipulation. It helps ensure data integrity and optimize memory usage.

df = pd.DataFrame(data, dtype=int)

11.Explain the difference between loc and iloc in Pandas.

loc is used for label-based indexing, meaning you can specify row and column labels
to select data. iloc is used for integer-based indexing, meaning you can specify
integer indices to select data.

# Using loc
df.loc[2, 'column_name']

# Using iloc
df.iloc[2, 0]

12.How do you select specific columns from a DataFrame?

You can select specific columns from a DataFrame by passing a list of column names
to the indexing operator [] or by using the loc or iloc accessor methods.

# Using indexing operator


selected_columns = df[['column1', 'column2']]

# Using loc
selected_columns = df.loc[:, ['column1', 'column2']]

# Using iloc
selected_columns = df.iloc[:, [0, 1]]

13.What is boolean indexing, and how is it used in Pandas?

Boolean indexing is a technique used to filter rows in a DataFrame based on a


specified condition. It involves creating a boolean mask (a Series of True and
False values) that indicates which rows satisfy the condition.

# Boolean indexing example


filtered_df = df[df['column'] > 50]

14.How do you drop columns and rows from a DataFrame in Pandas?


You can drop columns and rows from a DataFrame using the drop() function. Specify
the column(s) or row(s) to drop along with the axis parameter.

# Drop column
df.drop(columns=['column_name'], inplace=True)

# Drop row
df.drop(index=0, inplace=True)

15.Explain the purpose of the isin() function in Pandas.

The isin() function is used to filter rows in a DataFrame based on whether the
values in a column are present in a specified list or array. It returns a boolean
mask indicating which rows match the specified condition.

# Example of isin() function


filtered_df = df[df['column'].isin(['value1', 'value2'])]

16.How can you set a specific column as the index in a DataFrame?

You can set a specific column as the index in a DataFrame using the set_index()
function. Specify the column name to be used as the index.

df.set_index('column_name', inplace=True)

17.What is the purpose of the at and iat accessors in Pandas?

The at and iat accessors are used for fast scalar value access in a DataFrame. They
provide optimized methods for accessing a single value based on label (at) or
integer position (iat).

# Using at accessor
value = df.at[row_label, column_label]

# Using iat accessor


value = df.iat[row_position, column_position]

18.How do you reset the index of a DataFrame?

You can reset the index of a DataFrame using the reset_index() function. By
default, it creates a new DataFrame with the old index as a column and a new
sequential index. Use the drop parameter to avoid adding the old index as a column.

df.reset_index(inplace=True, drop=True)

19.Explain the role of the isin() function in Pandas.

The isin() function in Pandas is used to filter rows based on whether the values in
a column are present in a specified list or array. It returns a boolean mask
indicating which rows match the specified condition.

# Example of isin() function


filtered_df = df[df['column'].isin(['value1', 'value2'])]

20.How can you filter rows based on multiple conditions in Pandas?

You can filter rows based on multiple conditions using boolean indexing with
logical operators (& for AND, | for OR, ~ for NOT). Enclose each condition within
parentheses.

# Example of filtering based on multiple conditions


filtered_df = df[(df['column1'] > 50) & (df['column2'] == 'value')]

21.How do you handle missing values in a DataFrame?

Missing values in a DataFrame can be handled using methods like fillna(), dropna(),
or interpolate(). fillna() is used to fill missing values with a specified value,
dropna() is used to remove rows or columns with missing values, and interpolate()
is used to fill missing values by interpolation.

# Example of handling missing values


df.fillna(0, inplace=True) # Fill missing values with 0

22.Explain the purpose of the drop_duplicates() function in Pandas.

The drop_duplicates() function is used to remove duplicate rows from a DataFrame.


By default, it considers all columns, but you can specify subset columns to
identify duplicates.

df.drop_duplicates(inplace=True)

23.What is the purpose of the apply() function in Pandas?

The apply() function in Pandas is used to apply a function along an axis of a


DataFrame or Series. It can be used to perform custom operations on data, such as
transformations, aggregations, or element-wise calculations.

# Example of apply() function


df['new_column'] = df['existing_column'].apply(lambda x: custom_function(x))

24.How do you convert data types in a Pandas DataFrame?

You can convert data types in a Pandas DataFrame using the astype() function or
specific conversion functions like to_numeric(), to_datetime(), or to_timedelta().

# Example of converting data types


df['column'] = df['column'].astype('int')

25.Explain the purpose of the groupby() function in Pandas.

The groupby() function in Pandas is used to split a DataFrame into groups based on
some criteria, such as unique values in one or more columns. It is typically
followed by an aggregation function to perform calculations within each group.

# Example of groupby() function


grouped_df = df.groupby('column').sum()

26.How do you pivot a DataFrame in Pandas?

You can pivot a DataFrame using the pivot() function, which reshapes the data by
rearranging the rows and columns. It requires specifying columns to use as the new
index, columns, and values.

# Example of pivot() function


pivoted_df = df.pivot(index='index_column', columns='column_to_pivot',
values='value_c
27.What is the merge() function, and how is it used in Pandas?

The merge() function in Pandas is used to combine two or more DataFrames based on
one or more common columns. It performs database-style joins, such as inner, outer,
left, and right joins, to merge DataFrames.

# Example of merge() function


merged_df = pd.merge(df1, df2, on='common_column', how='inner')

28.How do you handle outliers in a DataFrame?

Outliers in a DataFrame can be handled by filtering out or transforming extreme


values using techniques like winsorization, truncation, or imputation.
Additionally, you can use robust statistical measures or outlier detection
algorithms to identify and manage outliers.

# Example of handling outliers with winsorization


from scipy.stats import mstats
winsorized_values = mstats.winsorize(df['column'], limits=[0.05, 0.05])

29.Explain the purpose of the map() function in Pandas.

The map() function in Pandas is used to apply a mapping or transformation to each


element of a Series. It accepts a dictionary, function, or Series as an argument to
perform the mapping.

# Example of map() function with a dictionary


df['column'] = df['column'].map({'value1': 'new_value1', 'value2': 'new_value2'})

30.How do you perform one-hot encoding in Pandas?

One-hot encoding is performed using the get_dummies() function in Pandas. It


converts categorical variables into dummy/indicator variables, where each category
is represented as a binary feature.

# Example of one-hot encoding


encoded_df = pd.get_dummies(df, columns=['categorical_column'])

You might also like