Day 18 - Numpy

Day 18: Array Masking and Filtering
Applying masks to arrays to filter data

Using Boolean arrays for advanced data manipulation
On Day 18, we will focus on Array Masking and Filtering in NumPy. Masking involves using Boolean arrays to filter and manipulate data in
arrays based on certain conditions. Let's explore how to apply masks to arrays and use Boolean arrays for advanced data manipulation:
Applying Masks to Arrays to Filter Data:

Example 1: Masking with a Condition
In [1]: import numpy as np
# Create a 1D NumPy array

arr = np.array([10, 20, 30, 40, 50])
# Create a mask for values greater than 30

mask = arr > 30
# Apply the mask to filter the array

filtered_arr = arr[mask]
print("Original Array:", arr)

print("Filtered Array:", filtered_arr)
Original Array: [10 20 30 40 50]

Filtered Array: [40 50]
Example 2: Applying Multiple Masks


arr = np.array([10, 20, 30, 40, 50])
# Create masks for values greater than 20 and less than 40

mask1 = arr > 20
mask2 = arr < 40
# Combine masks using logical operators

combined_mask = mask1 & mask2
# Apply the combined mask to filter the array

filtered_arr = arr[combined_mask]
print("Original Array:", arr)

print("Filtered Array:", filtered_arr)
Original Array: [10 20 30 40 50]

Filtered Array: [30]
Using Boolean Arrays for Advanced Data Manipulation:

Example 3: Broadcasting with Boolean Arrays

arr = np.array([1, 2, 3, 4, 5])
# Create a mask for even values

mask = arr % 2 == 0
# Replace even values with -1 using the mask

arr[mask] = -1
print("Modified Array:", arr)
Modified Array: [ 1 -1 3 -1 5]
Example 4: Using Boolean Arrays for Indexing

arr = np.array([10, 20, 30, 40, 50])
# Create a mask for values greater than 25

mask = arr > 25
# Get the indices of the True values in the mask

indices = np.where(mask)[0]
print("Indices of Values Greater than 25:", indices)
Indices of Values Greater than 25: [2 3 4]
In these examples:
* We created masks using Boolean conditions to filter elements in an array.

* We applied masks to filter and manipulate array data.
* We used Boolean arrays for advanced data manipulation, including broadcasting and indexing.
Masking and filtering are powerful techniques for data manipulation and analysis. Boolean arrays provide a
flexible way to perform selective operations on array elements based on conditions.
🌐 Real-World Scenario:-
1. Data Cleaning and Preprocessing:
Use Case: In data preprocessing for machine learning, you often need to clean and filter out irrelevant or noisy data.
NumPy Application: Boolean indexing helps in this process. For instance, in a dataset containing customer reviews, you can use
Boolean indexing to filter out reviews with low ratings or specific keywords that are irrelevant to your analysis.
Example: Suppose you have a dataset of product reviews, and you want to filter out reviews with a rating lower than 3 stars. You can
create a mask with a condition for ratings and use it to filter the relevant data.
# Sample data: an array of customer reviews and star ratings

reviews = np.array(["Great product!", "Not so good...", "Excellent!", "Average.", "Terrible experience."])
ratings = np.array([5, 2, 5, 3, 1])
# Create a mask for reviews with ratings less than 3 stars

low_rating_mask = ratings < 3
# Use the mask to filter reviews

filtered_reviews = reviews[low_rating_mask]
filtered_ratings = ratings[low_rating_mask]
# Display the filtered reviews and ratings

for review, rating in zip(filtered_reviews, filtered_ratings):
print(f"Review: '{review}' | Rating: {rating} stars")
Review: 'Not so good...' | Rating: 2 stars

Review: 'Terrible experience.' | Rating: 1 stars
By using Boolean indexing and masks, you've efficiently filtered out reviews with low ratings, allowing you to
focus on more relevant data for your analysis or recommendation system. This is a common preprocessing step in
machine learning and data analysis workflows to improve data quality and model performance.
2. Financial Data Analysis:
Use Case: In financial analysis, you might have a dataset containing stock prices.
NumPy Application: Boolean indexing helps in filtering out specific days or conditions.
Example: You can use Boolean indexing to filter out days when the stock price crossed a certain threshold, helping you identify
significant market events.
# Sample data: an array of dates and corresponding stock prices

dates = np.array(['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'])
prices = np.array([100.0, 102.0, 105.5, 103.2, 101.8])
# Create a mask for days when the stock price exceeded 105
significant_event_mask = prices > 105
# Use the mask to filter dates and prices

significant_dates = dates[significant_event_mask]
significant_prices = prices[significant_event_mask]
# Display the dates and corresponding stock prices for significant events
for date, price in zip(significant_dates, significant_prices):
print(f"Date: {date} | Stock Price: ${price:.2f}")
Date: 2023-01-03 | Stock Price: $105.50
Using Boolean indexing, you've efficiently filtered out the days when the stock price exceeded the specified
threshold, helping you identify significant market events. This can be valuable for making investment decisions or
further analysis in financial data analysis.
3. Epidemiology and Health Analysis:

Use Case: When analyzing health data, you can filter patient records based on specific medical conditions.
NumPy Application: Boolean indexing allows you to select only those who have been diagnosed with a particular disease for further
analysis.
Example: Imagine you have a dataset of patient health records, and you want to analyze data only for patients with a specific medical
condition, like diabetes. You can create a mask based on the condition and apply it to filter the relevant patient records.
# Sample data: an array of patient IDs, medical conditions (e.g., 'diabetes' or 'none'), and age
patient_ids = np.array([101, 102, 103, 104, 105])
medical_conditions = np.array(['diabetes', 'none', 'diabetes', 'none', 'diabetes'])
ages = np.array([45, 62, 38, 55, 60])
# Create a mask for patients with diabetes

diabetes_mask = medical_conditions == 'diabetes'
# Use the mask to filter patient records

diabetes_patients = patient_ids[diabetes_mask]
diabetes_conditions = medical_conditions[diabetes_mask]
diabetes_ages = ages[diabetes_mask]
# Display the information of patients with diabetes

for patient_id, condition, age in zip(diabetes_patients, diabetes_conditions, diabetes_ages):
print(f"Patient ID: {patient_id} | Condition: {condition} | Age: {age} years")
Patient ID: 101 | Condition: diabetes | Age: 45 years

Using Boolean indexing, you've efficiently filtered out patient records for those diagnosed with diabetes,
allowing you to focus on analyzing data specific to this medical condition in epidemiology and health analysis.
This can be crucial for research, treatment planning, or identifying trends related to diabetes.

Day 18 - Numpy

Uploaded by

Copyright:

Available Formats

You might also like

Day 18 - Numpy

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Day 18 - Numpy

Uploaded by

Copyright:

Available Formats

Day 18: Array Masking and Filtering

Applying masks to arrays to filter data

Applying Masks to Arrays to Filter Data:

In [1]: import numpy as np

# Create a 1D NumPy array

# Create a mask for values greater than 30

# Apply the mask to filter the array

print("Original Array:", arr)

Original Array: [10 20 30 40 50]

Example 2: Applying Multiple Masks

# Create a 1D NumPy array

# Create masks for values greater than 20 and less than 40

# Combine masks using logical operators

# Apply the combined mask to filter the array

print("Original Array:", arr)

Original Array: [10 20 30 40 50]

Using Boolean Arrays for Advanced Data Manipulation:

# Create a 1D NumPy array

# Create a mask for even values

# Replace even values with -1 using the mask

print("Modified Array:", arr)

Example 4: Using Boolean Arrays for Indexing

In [5]: import numpy as np

# Create a 1D NumPy array

# Create a mask for values greater than 25

# Get the indices of the True values in the mask

print("Indices of Values Greater than 25:", indices)

Indices of Values Greater than 25: [2 3 4]

* We created masks using Boolean conditions to filter elements in an array.

1. Data Cleaning and Preprocessing:

In [7]: import numpy as np

# Sample data: an array of customer reviews and star ratings

# Create a mask for reviews with ratings less than 3 stars

# Use the mask to filter reviews

# Display the filtered reviews and ratings

Review: 'Not so good...' | Rating: 2 stars

2. Financial Data Analysis:

In [10]: import numpy as np

# Sample data: an array of dates and corresponding stock prices

# Use the mask to filter dates and prices

Date: 2023-01-03 | Stock Price: $105.50

3. Epidemiology and Health Analysis:

In [11]: import numpy as np

# Create a mask for patients with diabetes

# Use the mask to filter patient records

# Display the information of patients with diabetes

Patient ID: 101 | Condition: diabetes | Age: 45 years

You might also like