Lay-Off Analysis & Prediction System-1

LAY-OFF ANALYSIS & PREDICTION SYSTEM
Arpita Gupta, Ameya Santosh Gidh, Anurag Akrathuveetil, Param

Rajesh Joshi
Northeastern University
Course Code: DS 5110
1. SUMMARY
a. Project Background
In recent years, economic fluctuations, technological advancements, and global
events like the COVID-19 pandemic have significantly impacted companies. In
the contemporary business landscape, characterized by rapid changes and
uncertainties, modern businesses are constantly navigating through volatility and
unpredictability. This has led to fluctuations in employment levels as well. One of
the significant challenges they face is the occurrence of layoffs, which can have
profound consequences on various stakeholders, including employees, investors,
and the overall economy. The impact of layoffs extends beyond the immediate
workforce reduction, affecting employee morale, company reputation, investor
confidence, and economic stability in the broader community.
b. Motivation
Understanding the factors contributing to layoffs and accurately predicting their
occurrence is crucial for businesses to make informed decisions, mitigate risks,
and proactively manage their workforce strategies. The motivation behind this
project stems from the significant impact that layoffs can have on various
stakeholders, including employees, investors, and the broader economy. By
leveraging data science techniques, machine learning algorithms, and interactive
tools, the project seeks to provide actionable insights for decision-makers,
investors, policymakers, and employees alike.
c. Purpose And Goals
This project delves into historical layoff data spanning multiple years, aiming to
uncover patterns, trends, and factors influencing layoffs in companies across
different industries and regions.
The primary goal of this project is to develop a machine-learning model that can
predict layoff percentages based on company characteristics and historical data.
Additionally, we aim to create an interactive user interface that allows users to
input specific parameters (such as company name, stage, industry, country, and
year) and receive accurate predictions of layoff percentages along with
visualizations to aid in decision-making. By achieving these goals, we aim to
provide a valuable tool for stakeholders to make informed decisions regarding
investments, workforce planning, and strategic initiatives in response to potential
layoffs.
d. Description Of The Data
The main source of data for our project was sourced primarily from (layoff)
(theakhilb, 2024). This dataset includes vital information about companies that
conducted layoffs, including their status, the extent of the layoffs, and the primary
prediction variable “percentage of layoff” inside the company. This project was
also supplemented with a secondary dataset from (WARN), which offers
additional insights into the reasons behind the layoffs at each company (i.e.
Closure, downsizing, etc). These datasets were combined to result in a
comprehensive dataset containing all the possible deciding factors of the
prediction variable.
e. Methods And Results
Following the preprocessing steps, which involved eliminating null values and
strategically filtering out outliers, our analysis delved deeply into the data. The
findings revealed a robust correlation between COVID-19 and the percentage of
layoffs, with the financial status of companies, particularly their available funds,
emerging as a significant determining factor. Leveraging insights from this
analysis, we employed a stepwise predictor selection process, subjecting the data
to various regression algorithms. Furthermore, forecasting algorithms were
utilized for trend analysis, indicating a slight decrease in layoffs by the end of
2024, with momentum expected to pick up again by the beginning of 2025.
2. METHODS
a. Data Preprocessing (Anello)
i. Removing null and duplicate values
The percentage of missing values in the dataset (layoff) (theakhilb, 2024)
varies across different columns. Key columns like 'Company,' 'City,'
'Industry,' 'Date,' 'Source,' 'Stage,' 'Date_Added,' 'Country,' and
'List_of_Employees_Laid_Off' have no missing values. However,
'Funds_Raised' has a relatively high percentage of missing values at
10.6%, followed by 'Reason' at 6.4%. We cleaned the dataset by
eliminating empty or missing entries (null values) and redundant rows
(duplicate values). Addressing these missing values was crucial for
ensuring the completeness and accuracy of the dataset for analysis and
modeling purposes.
ii. Renaming parameters
As part of the preprocessing steps, the column 'Location_HQ' was
renamed to 'City' for clarity and consistency in the dataset. Additionally,
the column 'Closure/Layoff' was renamed to 'Reason' in the mixed dataset
to better reflect its content. Subsequently, the 'Reason' values were
categorized into distinct categories based on their descriptions, including
closure_temporary, covid_reduction, layoff_temporary, layoff_permanent,
changed_business, closure_permanent, and downsizing. These
preprocessing steps ensure that the dataset is organized, labeled
appropriately, and ready for further analysis and modeling tasks.
iii. Merging the datasets
A mixed dataset was created by left joining dataset 2 (WARN) on dataset
1 (layoff) based on the 'Company' and 'City' columns. This merging
process ensured that all columns from dataset 1 (layoff) were included in
the mixed dataset. Additionally, the 'Closure/Layoff' column from dataset
2 (WARN) was selected and renamed to "Reason" in the mixed dataset,
aligning it with the rest of the data. By merging the datasets on common
identifiers and selecting relevant columns, the preprocessing ensured the
integration of comprehensive data while maintaining consistency and
relevance in the mixed dataset for subsequent analysis and modeling tasks.
iv. Dropping irrelevant columns
In the mixed dataset, several columns were identified as irrelevant for the
layoff prediction task and subsequently dropped. These columns included
'List_of_Employees_Laid_Off,' 'Source,' and 'Date_Added.' The
'List_of_Employees_Laid_Off' column, while informative, was not
directly related to predicting layoff percentages and was therefore deemed
unnecessary for the analysis. Similarly, the 'Source' and 'Date_Added'
columns, while providing context and timing information, were not
integral to the prediction model's inputs. Removing these irrelevant
columns streamlines the dataset, focusing only on the essential features
needed for accurate layoff percentage predictions.
v. Detecting and removing outliers
Outliers were detected and removed from the mixed dataset using specific
criteria. Any entries with 'Percentage' values greater than 0.6,
'Funds_Raised' values exceeding 1108.2, or 'Laid_Off_Count' values
surpassing 416.2 were considered outliers and subsequently eliminated.
The resulting cleaned dataset, named 'clean_df,' was obtained by filtering
the mixed dataset based on these outlier thresholds. This process ensures
that the dataset is free from extreme values that could skew the analysis
and modeling, thereby improving the accuracy and reliability of the
predictive models for layoff percentages.
b. Modeling
i. Predictor selection
The predictor variable, "Percentage of Layoffs," being a continuous value,
required the use of Regression Algorithms. To select predictors, the
'Stepwise Predictor Selection’ method was employed. This method
sequentially adds the next best predictor to uncover the best-performing
combination of predictor variables. Our project executed multiple
regression algorithms, including linear regression, Gradient Boost
Regression, Decision Tree Regression, and Support Vector Regression.
Each of these algorithms underwent the stepwise method, and the resulting
predictors’ RMSE curve was plotted. This aided in selecting the optimal
algorithm among them.
ii. Time Forecast
A time forecasting method was utilized to predict future layoff counts. The
initial step involved extracting the year, month, and day from the "Date"
variable. Subsequently, this date information, along with the layoff count,
was fed into the "Prophet" forecasting algorithm as 'ds' and 'y'
respectively. The algorithm then generated a comprehensive line chart
depicting trends, along with an approximate prediction extending up to
early 2025.
3. RESULTS
a. Data Analysis
Multiple charts and visualizations were made, out of which prominent insights
were as follows:
Fig 1: Top industries by laid-off count and Percent of layoff
The bar plot above illustrates the industries most affected by layoffs, with Retail,
Consumer, and Transport topping the list. This highlights the significant impact of
COVID-19 on these sectors and underscores how quarantine measures disrupted
their supply chain management. In terms of the percentage of layoffs relative to
total employment, the finance sector stands out, indicating a higher proportion of
job losses. This observation may be linked to the 2023 US bank crisis, during
which banks experienced substantial declines in value, leading to closures and
widespread employee layoffs.
Fig 2: Top reasons for the layoff
The primary reason for layoff seems to be permanent layoff. This could be the
direct effect of over-hiring caused during the COVID-19 pandemic. The second
reason “closure permanent” could indicate the closure of startup companies and
logistics due to COVID-19 quarantine and lack of availability of capital revenue.
Fig 3: Distribution of layoff percentage grouped by Funding Stage
A significant observation the analysis indicates is a clear correlation between

layoff percentages and a company's financial stage. Startups or seed-funded firms
tend to lay off 20% to 80-90% of their workforce. As the company size increases,
the percentage of workforce layoffs decreases.
Additional observations are mentioned at the end. (see Appendix)
b. Model Prediction
Fig 4: Distribution of layoff percentage grouped by Funding Stage
The primary predictor selection method utilized was stepwise selection. After
assessing various algorithms, linear regression (MALI, 20212, #) and gradient
boost regression (Saini, 2021) emerged as the most optimized solutions. In linear
regression (MALI, 20212, #), the primary predictors include funds, year, industry,
and city. while on the next slide, for gradient boost regression, they are stage,
industry, month, and year.
Common factors like funds, stage type, and industry validate our analysis,
affirming the significant impact of company stage and industry over our target
variable “Percentage of layoff”.
Metrics: 80% of the data was used for training the model and the rest 20% was
used for testing it.
Algorithm RMSE MSE MAE
Gradient Boost Regression 0.1263 0.0159 0.092
Linear Regression 0.1352 0.0172 0.095

Table 1: Modeling Algorithms Efficiency metrics
It can be seen that, with a slightly lower RMSE value gradient boost regression
(Saini, 2021) performed the best overall.
c. Forecast
Fig 5: Forecasting of layoff count until early 2025

The project utilized time forecasting with 'Prophet' analysis, uncovering a trend of
over-hiring between 2021 and 2022. Interestingly, the data contradicts the
common misconception of a worsening job market, indicating that it has
historically been competitive, even pre-COVID. Looking ahead, late 2024 is
projected to witness a decrease in layoffs, suggesting it could be an opportune
time for job seekers, although layoffs may rise again in early 2025.
d. User Interface
Fig 6: Interactive User Interface
A user interface is also created that allows users to input parameters relevant to
the layoff prediction, such as company details, industry type, location, or funding
stage. Upon submitting the input parameters, the Flask backend processes the user
input, generates predictions using the trained forecasting models, and prepares
relevant graphs and visualizations. The predicted layoff percentages, along with
the corresponding visualizations, are then displayed to the user in the web
interface. Users can interact with the interface, adjust input parameters, and view
updated predictions and visualizations in real time.
4. DISCUSSIONS
The results of this project hold significant meaning and impact for various stakeholders in
the business and investment communities along with employees and workforce. By
accurately predicting layoff percentages in companies based on historical data and
relevant factors, this project provides valuable insights for decision-makers, investors,
policymakers, and employees. The ability to forecast layoff trends enables proactive
planning and strategic decision-making, leading to better allocation of resources, risk
mitigation, and workforce management. Investors can make informed decisions regarding
investment portfolios based on companies' layoff risks and resilience. Policymakers can
use these insights to develop targeted interventions and support mechanisms for affected
industries. Job seekers can use these insights to decide where they should work and
commit themselves in terms of security and compatibility.
In the future, the project could be enhanced by incorporating real-time data feeds and
more details of the company such as stock value, social value etc, refining machine
learning models for improved accuracy, and expanding the scope to include broader
economic indicators' impact on layoffs. Collaborating with domain experts and
stakeholders to validate and refine the predictive models would further strengthen the
project's relevance and impact in addressing workforce challenges proactively.
5. STATEMENT OF CONTRIBUTIONS
a. Arpita Gupta:
Arpita Gupta contributed to the data collection process, including gathering and
organizing layoff data from various sources such as layoffs.fyi and
layoffdata.com/data. She also played a key role in the data preprocessing phase,
which involved cleaning, merging, and transforming the raw data into a format
suitable for analysis. Additionally, Arpita was involved in the exploratory data
analysis (EDA) stage, where she utilized advanced statistical techniques and data
visualization methods to identify insights and derive actionable recommendations
from the layoff data.
b. Anurag Akrathuveetil:
Anurag Akrathuveetil was responsible for developing and implementing
predictive modeling algorithms, including linear regression, gradient boost
regression, and Prophet forecasting. He conducted feature engineering, model
training, and evaluation to generate accurate layoff predictions and assess the
performance of the predictive models. Additionally, Anurag also contributed to
the exploratory data analysis (EDA) stage, where he utilized statistical techniques
and visualization tools to uncover insights and patterns in the layoff data. He also
conducted in-depth analyses of layoff trends, patterns, and correlations.
c. Ameya Santosh Gidh:
Ameya Santosh Gidh contributed to the development of the user interface,
enabling users to interact with the predictive models and visualize the results.
Along with that, Ameya also implemented the modeling algorithms and evaluated
their performance on the layoff dataset. He also compared the results obtained
from different models to get the best model for making the layoff predictions
accurately.
d. Param Rajesh Joshi:
Param Rajesh Joshi led the data preprocessing efforts, which involved cleaning,
merging, and standardizing the raw layoff data from multiple sources. Param also
contributed to the user interface development using the Flask framework,
ensuring that it provides a seamless and intuitive experience for users to input
parameters, receive predictions, and view visualizations.
Every team member contributed to the documentation of the project, including writing
the abstract, introduction, methods, discussion, and conclusion sections, as well as
formatting and organizing the project report.
Each team member's contributions were integral to the completion of the Lay-Off
Analysis & Prediction System project, demonstrating a collaborative effort to gather,
analyze, and interpret layoff data, develop predictive models, and create an interactive
user interface for stakeholders.
6. REFERENCES
a. Dataset 1: layoff, fyi . “Dataset1.” Layoff Data, 2024, layoff.fyi.
b. Dataset 1: theakhilb. (2024). Layoffs Dataset 2024. Kaggle. Retrieved April 20,
2024, from https://www.kaggle.com/datasets/theakhilb/layoffs-data-2022
c. Dataset 2:WARN, layoffdata. “WARN dataset.” WARN Database | layoff notices
across the U.S., 2024, http://layoffdata.com. Accessed 20 April 2024.
d. Anello, Eugenia.“DATA-CLEANING-PREPROCESSING.”KDNUGGETS.COM,
2023,
https://www.kdnuggets.com/2023/08/7-STEPS-MASTERING-DATA-CLEANIN
G-PREPROCESSING-TECHNIQUES.HTML. Accessed 20 April 2024.
e. MALI, K. (20212). LINEAR-REGRESSION. KAVITA MALI.
HTTPS://WWW.ANALYTICSVIDHYA.COM/BLOG/2021/10/EVERYTHING-
YOU-NEED-TO-KNOW-ABOUT-LINEAR-REGRESSION/
f. Saini, A. (2021, 09 09). GRADIENT-BOOSTING-ALGORITHM.
ANALYTICSVIDHYA.COM. Retrieved April 21, 2024, from
https://www.analyticsvidhya.com/BLOG/2021/09/GRADIENT-BOOSTING-AL
GORITHM-A-COMPLETE-GUIDE-FOR-BEGINNERS/
g. GitHub Code Link:
https://github.khoury.northeastern.edu/anuragav/LayoffDataAnalysis
7. APPENDIX
Some other observations that have been found after analyzing the data are:
a. Layoffs peaked in 2022 and continued rising in 2023, shown graphically
b. Wednesdays might see significant layoff events
c. The USA had over 0.2 million layoffs, while India had under 50,000
d. Forecast predicts fewer layoffs in 20 days due to halts and new hires
e. Post-IPO, many US companies, like Google and Microsoft, faced layoffs, notably
after COVID-19

Lay-Off Analysis & Prediction System-1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lay-Off Analysis & Prediction System-1

Uploaded by

Copyright:

Available Formats

LAY-OFF ANALYSIS & PREDICTION SYSTEM

Arpita Gupta, Ameya Santosh Gidh, Anurag Akrathuveetil, Param

Fig 1: Top industries by laid-off count and Percent of layoff

Fig 2: Top reasons for the layoff

Fig 3: Distribution of layoff percentage grouped by Funding Stage

A significant observation the analysis indicates is a clear correlation between

Gradient Boost Regression 0.1263 0.0159 0.092

Linear Regression 0.1352 0.0172 0.095

Fig 5: Forecasting of layoff count until early 2025

Fig 6: Interactive User Interface

You might also like