Professional Documents
Culture Documents
Day 18 - Numpy
Day 18 - Numpy
Day 18 - Numpy
On Day 18, we will focus on Array Masking and Filtering in NumPy. Masking involves using Boolean arrays to filter and manipulate data in
arrays based on certain conditions. Let's explore how to apply masks to arrays and use Boolean arrays for advanced data manipulation:
Modified Array: [ 1 -1 3 -1 5]
In these examples:
Masking and filtering are powerful techniques for data manipulation and analysis. Boolean arrays provide a
flexible way to perform selective operations on array elements based on conditions.
🌐 Real-World Scenario:-
Use Case: In data preprocessing for machine learning, you often need to clean and filter out irrelevant or noisy data.
NumPy Application: Boolean indexing helps in this process. For instance, in a dataset containing customer reviews, you can use
Boolean indexing to filter out reviews with low ratings or specific keywords that are irrelevant to your analysis.
Example: Suppose you have a dataset of product reviews, and you want to filter out reviews with a rating lower than 3 stars. You can
create a mask with a condition for ratings and use it to filter the relevant data.
Use Case: In financial analysis, you might have a dataset containing stock prices.
NumPy Application: Boolean indexing helps in filtering out specific days or conditions.
Example: You can use Boolean indexing to filter out days when the stock price crossed a certain threshold, helping you identify
significant market events.
# Create a mask for days when the stock price exceeded 105
significant_event_mask = prices > 105
# Display the dates and corresponding stock prices for significant events
for date, price in zip(significant_dates, significant_prices):
print(f"Date: {date} | Stock Price: ${price:.2f}")
Using Boolean indexing, you've efficiently filtered out the days when the stock price exceeded the specified
threshold, helping you identify significant market events. This can be valuable for making investment decisions or
further analysis in financial data analysis.
# Sample data: an array of patient IDs, medical conditions (e.g., 'diabetes' or 'none'), and age
patient_ids = np.array([101, 102, 103, 104, 105])
medical_conditions = np.array(['diabetes', 'none', 'diabetes', 'none', 'diabetes'])
ages = np.array([45, 62, 38, 55, 60])
Using Boolean indexing, you've efficiently filtered out patient records for those diagnosed with diabetes,
allowing you to focus on analyzing data specific to this medical condition in epidemiology and health analysis.
This can be crucial for research, treatment planning, or identifying trends related to diabetes.