Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

Industrial Training Report

on
Web Scraping & Data Analysis
Complete at
British Airways
Duration
8 May, 2023 to 8 July,2023
Submitted By
Devansh Vishwa
V Semester
Enrolment No.: ECB 2021/10/05

DEPARTMENT OF ARTIFICIAL INTELLIGENCE & DATA SCIENCE

ENGINEERING COLLEGE BIKANER


(BIKANER TECHNICAL UNIVERSITY, BIKANER)
BIKANER, RAJASTHAN
CERTFICIATE OF COMPLETION
STUDENT DECLARATION

I hereby declare that the Industrial Training Report on Web scraping and data analysis completed
at British Airways is an authentic record of my own work as requirement of Industrial Training as a
part of the V semester syllabus during the period from May 2023 to July 2023 submitted at the
Department of Artificial Intelligence and Data Science, Engineering College Bikaner for the
award of the degree of B.Tech. in Artificial Intelligence and Data Science by Bikaner Technical
University, Bikaner.

Devansh Vishwa
(21EEBAD005)
Date:
TABLE OF CONTENTS

Front Page

Certificate of completion

Student Declaration

1. Introduction 1-2

2. Web Scraping 3-6


2.1 The Skytrax Website 3

2.2 Beautiful Soup as an HTML Parser 4

2.2.1 Step-by-step Procedure for Extracting Data from Skytrax 4

2.2.2 Challenges Faced during the Data Extraction Process 5

2.3 Market Research and Competitive Analysis 6

3. Data Cleaning 7-9

3.1 Imputation Techniques 8

3.2 Handling Outliers and Anomalies in the Data 9

4. Data Analysis 7-13

4.1 Exploratory Data Analysis EDA 10

4.2 Data Visualization with matplotlib and seaborn 11

4.2.1 Introduction of Matplotlib 11

4.2.2 Introduction to PyGWalker 12

5. Results and Findings 14-15

5.1 Summary of Extracted Data from Skytrax 14

5.2 Impact and Implications of the Analysis Results 15

6. Conclusion 16-17
6.1 Recap of Objectives and Project Overview 16

6.2 Importance and Relevance of Web Scraping and Data Analysis 17

References

Appendices
INTRODUCTION

In today's data-driven world, the extraction and analysis of information from various sources have
become paramount for businesses, researchers, and enthusiasts alike. Web scraping, a technique for
automating the extraction of data from websites, has emerged as a powerful tool for accessing and
leveraging valuable information available on the internet.

One such website ripe for exploration is Skytrax, a comprehensive database containing extensive
information about airlines, airports, and aviation-related data. With its vast repository of data on
flights, routes, aircraft, and passenger statistics, Skytrax presents a wealth of opportunities for
extracting insights and uncovering trends within the aviation industry.

To harness the potential of Skytrax data, we embark on a journey of web scraping and data
visualization. Through the use of Python programming language and libraries such as Beautiful Soup
and Selenium, we set out to extract relevant data from the Skytrax website efficiently and effectively.

The process begins with identifying the structure of the Skytrax website and the location of desired
data elements within its HTML code. Using Beautiful Soup, a Python library for parsing HTML and
XML documents, we navigate through the website's pages, locate specific data elements, and extract
relevant information such as airline names, flight routes, departure and arrival times, aircraft types,
and passenger counts.

In addition to Beautiful Soup, we leverage Selenium, another powerful Python library for automating
web browsers, to handle dynamic content and interactions within the Skytrax website. With
Selenium, we can simulate user actions such as clicking buttons, filling out forms, and scrolling
through pages, enabling us to access data that may be loaded dynamically via JavaScript or AJAX
requests.

Once we have collected a substantial amount of data from Skytrax, the next step is to transform and
analyse it to derive meaningful insights. We employ various data wrangling techniques, such as
cleaning, filtering, and aggregating the extracted data to prepare it for visualization.

With our pre-processed data in hand, we turn our attention to data visualization, a powerful tool for
communicating complex information in a clear and intuitive manner. We utilize libraries such as
Matplotlib, Seaborn, and plotly in Python to create a wide range of visualizations, including line
plots, bar charts, pie charts, scatter plots, and heatmaps.

1
Through the sophisticated utilization of data visualization techniques, we embark on a
comprehensive exploration of the rich and extensive Skytrax dataset, a treasure trove of information
that encapsulates the multifaceted nuances of the aviation industry. Our endeavor transcends mere
surface-level analysis; it is an immersive journey into the intricate interplay of factors that shape the
realm of air travel.

At the forefront of our analysis lies the meticulous examination of the distribution of flights across an
expansive array of airlines and airports. We meticulously dissect the intricate tapestry of flight
patterns, scrutinizing the frequency and popularity of routes traversed by airlines across the globe.
By unraveling this complex web of data, we gain invaluable insights into the preferences of both
passengers and airlines alike, discerning trends that serve as the bedrock of the aviation landscape.

Furthermore, our exploration extends beyond the realm of flight paths to encompass the performance
of different aircraft types. We delve deep into the realms of operational metrics, scrutinizing factors
such as on-time performance, cabin comfort, and overall service quality. Through this meticulous
analysis, we provide stakeholders with a panoramic view of the aviation experience, shedding light
on the myriad factors that influence passenger satisfaction and operational efficiency.

Moreover, our journey of exploration takes us into the realm of passenger demographics and travel
preferences. By immersing ourselves in the rich tapestry of traveler data, we unravel the diverse
mosaic of passenger demographics, booking behaviours, and travel preferences. This granular
understanding enables airlines and airports to tailor their services and offerings to cater to the unique
needs and preferences of their clientele, thereby enhancing the overall travel experience.

In addition to traditional static visualizations, we harness the power of interactive visualization


techniques to provide users with an immersive and dynamic exploration experience. Through
interactive plots, dashboards, and data-driven narratives, users are empowered to navigate the data
landscape in real-time, zooming in on specific data points, filtering information based on predefined
criteria, and gaining deeper insights into the intricacies of the aviation industry.

In conclusion, the synergistic fusion of web scraping Skytrax data and visualizing the extracted
insights represents a transformative paradigm shift in the aviation industry. By leveraging the power
of Python and advanced data visualization techniques, we unlock the full potential of Skytrax data,
empowering stakeholders to make informed decisions, identify areas for optimization and
innovation, and gain a nuanced understanding of the complex dynamics that govern air travel in the
modern era.

2
WEB SCRAPING

Web scraping is the process of extracting data from websites in an automated manner. It involves
parsing the HTML structure of web pages and extracting relevant information. Web scraping has
gained popularity due to its ability to gather large amounts of data quickly and efficiently.

Web scraping involves automating the extraction of data from websites, typically by parsing the
HTML structure. The purpose of web scraping is to gather data for various applications, such as
market research, price comparison, sentiment analysis, and trend monitoring.

While web scraping offers significant advantages, it is important to consider the legal and ethical
implications. Organizations must adhere to the website's terms of service, respect the website's
robots.txt file, and ensure that the data scraped is used in a responsible and ethical manner.

2.1 The Skytrax Website

Skytrax stands tall as a preeminent website within the aviation industry, serving as a trusted
repository of airline and airport reviews, rankings, and ratings. With its vast array of data, Skytrax
plays a pivotal role in shaping industry standards, providing invaluable insights into airline and
airport performance, passenger experiences, and industry trends.

As a beacon of credibility and trust, Skytrax holds immense influence within the aviation sector,
serving as a guiding light for travelers, airlines, and industry professionals alike. Its comprehensive
data offerings provide a wealth of information that informs decision-making processes, drives
improvements, and sets industry benchmarks.

At the heart of Skytrax lies a treasure trove of data encompassing a myriad of aspects crucial to
understanding the aviation landscape. From airline services and cabin comfort to staff behavior and
cleanliness standards, Skytrax leaves no stone unturned in its quest to provide comprehensive
insights into airline and airport operations.

Skytrax's data repository encompasses a diverse range of information, including airline and airport
reviews, ratings, and rankings. Travelers from around the globe share their experiences and

3
perspectives, offering valuable feedback on various aspects of their journey, from check-in
procedures to in-flight amenities.

Skytrax employs a multi-faceted approach to data collection, soliciting feedback from passengers
who rate and review different airlines and airports based on their first hand experiences. This
crowdsourced data collection methodology ensures a diverse range of perspectives and contributes to
the richness and depth of the data available on the platform.

By delving into the data available on Skytrax, stakeholders gain valuable insights into airline and
airport performance metrics, including service quality, punctuality, and overall customer satisfaction
levels. This granular level of detail enables stakeholders to identify areas for improvement, address
pain points, and enhance the overall passenger experience.

Skytrax's reputation as a trusted authority in the aviation industry stems from its unwavering
commitment to transparency, accuracy, and integrity in data reporting. Travelers, airlines, and
industry professionals alike rely on Skytrax's insights and recommendations to make informed
decisions and navigate the complexities of air travel with confidence.

2.2 Beautiful Soup as an HTML Parser

Beautiful Soup is a popular Python library used for web scraping. It provides a convenient way to
parse HTML and XML documents, allowing easy extraction of relevant data. Beautiful Soup offers
various methods and functionalities that simplify the process of navigating and manipulating the
parsed data.

Beautiful Soup offers a simple and intuitive interface to parse HTML documents. It handles poorly
formatted HTML gracefully and provides powerful features to search and extract specific elements
from the parsed data. Its flexibility and ease of use make it a preferred choice for web scraping tasks.

Using Beautiful Soup for web scraping offers several advantages. It provides a high-level API that
abstracts the complexities of parsing HTML, making it accessible even to beginners. It supports
different parsers, allowing compatibility with various HTML documents. Beautiful Soup also offers
powerful searching and filtering capabilities, making it efficient for extracting specific data from web
pages.

The data extraction process involved several steps to extract relevant information from the Skytrax
website. These steps included fetching the HTML content of the web pages, parsing the HTML using

4
Beautiful Soup, identifying the desired elements and their corresponding attributes, and extracting
the required data.

2.2.1 Step-by-Step Procedure for Extracting Data from Skytrax

1. Retrieve the HTML content of the target web pages using Python's requests library.

2. Parse the HTML content using Beautiful Soup, creating a parse tree representation of the
document.

3. Identify the HTML elements and attributes containing the desired data, such as airline names,
ratings, reviews, and airport details.

4. Utilize Beautiful Soup's methods and functionalities to navigate the parse tree and extract the
required data.

5. Store the extracted data in an appropriate format, such as CSV or a database, for further analysis.

2.2.2 Challenges Faced during the Data Extraction Process and Solutions Applied

During the data extraction process, various challenges may arise, including handling dynamic
content, dealing with ant scraping measures, and managing large volumes of data. To overcome
these challenges, techniques such as handling JavaScript based websites, implementing delays and
timeouts, rotating IP addresses, and using session management techniques can be applied.In addition
to the basic web scraping techniques used in this project, there are advanced techniques that can be
employed to overcome specific challenges.

Many modern websites use JavaScript to dynamically load content. To scrape such websites, tools
like Selenium WebDriver can be used, which allows interaction with the website, including
executing JavaScript code and extracting the desired data.

Websites frequently deploy sophisticated measures such as captchas or IP blocking mechanisms to


deter automated scraping activities. These obstacles pose significant challenges for organizations
seeking to extract data from online sources in a systematic and efficient manner. However, with the
advent of innovative techniques and strategies, these challenges can be effectively addressed,
opening up new avenues for data extraction and analysis.

To overcome captchas, organizations can leverage various techniques, including captcha-solving


services or machine learning-based approaches. These solutions automate the process of solving
captchas, allowing for seamless data extraction without human intervention. Similarly, for IP

5
blocking, rotating proxies or proxy services can be employed to ensure uninterrupted scraping
operations. By dynamically switching IP addresses, organizations can evade detection and
circumvent IP-based restrictions, thereby enabling continuous and uninterrupted data collection.

The versatility of web scraping extends across a multitude of industries, offering transformative
benefits and unlocking new opportunities for organizations across various sectors. One notable
application of web scraping is in the realm of ecommerce, where businesses leverage scraping
techniques to monitor competitor prices, gather product information, and provide real-time price
comparisons to customers. This invaluable data empowers ecommerce businesses to make informed
pricing decisions, optimize their product offerings, and maintain competitiveness in the market
landscape.

2.3 Market Research and Competitive Analysis

Web scraping stands as an indispensable tool in the arsenal of modern organizations, offering a
gateway to a treasure trove of market data that spans the vast digital landscape. From customer
reviews and sentiment analysis to competitor insights and industry trends, web scraping empowers
businesses to harness the wealth of information available online, unlocking valuable insights that
drive strategic decision-making and propel organizational growth.

One of the primary domains where web scraping demonstrates its prowess is in the aggregation and
analysis of customer reviews. By casting a wide net across diverse online platforms, forums, and
social media channels, organizations can cast a comprehensive view of customer sentiments and
perceptions regarding their products or services. Through the judicious application of sentiment
analysis techniques to this rich tapestry of data, businesses can gain nuanced insights into customer
satisfaction levels, discern prevailing sentiments, and pinpoint areas ripe for enhancement.

Furthermore, the longitudinal analysis of sentiment trends enables organizations to track the evolving
landscape of consumer preferences and sentiment towards their brand, products, or the industry at
large. By charting sentiment fluctuations over time, businesses can uncover valuable patterns and
shifts in consumer behaviour, gaining foresight into emerging trends and opportunities for
innovation.

But the utility of web scraping extends far beyond the realm of customer sentiment analysis.
Organizations can leverage web scraping techniques to glean competitive intelligence, monitoring
competitor activities, pricing strategies, and product offerings. By keeping a finger on the pulse of

6
the competitive landscape, businesses can adapt swiftly to changing market dynamics, identify gaps
in the market, and position themselves strategically for success.

Moreover, web scraping facilitates the aggregation of industry trends and market insights from a
multitude of sources, enabling organizations to stay abreast of the latest developments and emerging
opportunities. By synthesizing data from diverse sources, businesses can gain a holistic
understanding of market dynamics, identify niche segments, and inform their strategic planning
initiatives. This not only helps in the formation of the picture of the competitor but also find where
does the modern trend goes into , since in today’s world of social media it is very important to know
what the people want because if one can know what the interest of a person , it help companies very
much.

DATA CLEANING

Imputation stands as a fundamental process within the realm of data pre-processing, offering a means
to address the challenge of missing values by extrapolating estimates based on the available data.
Within the gamut of imputation techniques, several approaches have emerged as common practices,
each with its unique methodology tailored to the nature of the data at hand.

Among these techniques, mean imputation holds sway as a widely employed method, wherein
missing numerical values are replaced with the arithmetic mean of the respective variable. This
approach leverages the central tendency of the data to provide a reasonable estimate for the missing
values, ensuring the preservation of the dataset's overall structure and integrity.

In a similar vein, median imputation serves as an alternative strategy, particularly effective in


scenarios where the data distribution is skewed or contains outliers. By replacing missing values with
the median of the variable, this approach offers robustness against extreme values while maintaining
the essence of the original dataset.

Regression imputation represents a more sophisticated imputation technique, wherein missing values
are predicted using regression models trained on the available data. By leveraging relationships
between variables, regression imputation offers a more nuanced approach to imputation, capturing
underlying patterns and dependencies within the data.

In the context of the project at hand, the mean imputation technique took precedence, serving as the
chosen method for handling missing numerical values. By replacing missing entries with the mean
value of the respective variable, the project aimed to ensure data completeness while minimizing
disruption to the analytical process.

7
On the categorical front, mode imputation emerged as the preferred approach, wherein missing
categorical values were replaced with the most frequent category within the variable. This
methodological choice was motivated by its simplicity and efficacy in preserving the categorical
structure of the data, thereby facilitating downstream analysis and interpretation.

However, it's worth noting that imputation represents just one facet of the broader landscape of
missing data handling techniques. Alternatives such as deletion offer a contrasting approach, wherein
rows or columns containing missing values are simply removed from the dataset. While deletion can
be effective in certain scenarios, it comes at the cost of potentially discarding valuable information,
particularly when missing data patterns exhibit non-random distributions.

In essence, the process of imputation serves as a cornerstone of data pre-processing, offering a means
to mitigate the impact of missing values and ensure the integrity and utility of the dataset for
subsequent analysis. By judiciously selecting and applying imputation techniques, researchers and
analysts can navigate the complexities of missing data with confidence, unlocking insights and
driving informed decision-making.

3.1 Imputation Techniques

Imputation, a cornerstone process in data pre-processing, plays a pivotal role in addressing the
challenge of missing values by extrapolating estimates based on available data. It encompasses a
variety of techniques tailored to different types of data and scenarios, with common approaches
including mean imputation, median imputation, and regression imputation.

In the context of the project under consideration, the mean imputation technique took precedence as
the chosen method for handling missing numerical values. This involved replacing the missing
entries with the arithmetic mean of the respective variable, leveraging the central tendency of the
data to provide reasonable estimates while preserving the overall structure and integrity of the
dataset.

Similarly, mode imputation emerged as the preferred strategy for handling missing categorical values
in the project. By replacing missing entries with the most frequent category within the variable,
mode imputation ensured the preservation of categorical structure and facilitated downstream
analysis and interpretation.

While imputation represents a widely adopted approach to handling missing data, it's important to
acknowledge that alternative methods exist, each with its own strengths and limitations. For instance,
deletion offers a straightforward solution where rows or columns containing missing values are

8
simply removed from the dataset. However, this approach can lead to a loss of valuable information,
particularly when missing data is not randomly distributed across the dataset.

Moreover, beyond imputation and deletion, other techniques for handling missing data include
interpolation, where missing values are estimated based on neighboring values, and multiple
imputation, which involves generating multiple imputed datasets to account for uncertainty in the
imputation process.

In summary, imputation stands as a versatile tool in the data analyst's toolkit, offering a means to
address missing data while preserving the integrity and utility of the dataset for subsequent analysis.
By understanding the various imputation techniques available and their implications, analysts can
make informed decisions tailored to the specific characteristics of their data and research objectives.

3.2 Handling Outliers and Anomalies in the Data

Through the sophisticated utilization of data visualization techniques, we embark on a


comprehensive exploration of the rich and extensive Skytrax dataset, a treasure trove of information
that encapsulates the multifaceted nuances of the aviation industry. Our endeavor transcends mere
surface-level analysis; it is an immersive journey into the intricate interplay of factors that shape the
realm of air travel.
At the forefront of our analysis lies the meticulous examination of the distribution of flights across an
expansive array of airlines and airports. We meticulously dissect the intricate tapestry of flight
patterns, scrutinizing the frequency and popularity of routes traversed by airlines across the globe.
By unraveling this complex web of data, we gain invaluable insights into the preferences of both
passengers and airlines alike, discerning trends that serve as the bedrock of the aviation landscape.
Furthermore, our exploration extends beyond the realm of flight paths to encompass the performance
of different aircraft types. We delve deep into the realms of operational metrics, scrutinizing factors
such as on-time performance, cabin comfort, and overall service quality. Through this meticulous
analysis, we provide stakeholders with a panoramic view of the aviation experience, shedding light
on the myriad factors that influence passenger satisfaction and operational efficiency.
Moreover, our journey of exploration takes us into the realm of passenger demographics and travel
preferences. By immersing ourselves in the rich tapestry of traveler data, we unravel the diverse
mosaic of passenger demographics, booking behaviors, and travel preferences. This granular
understanding enables airlines and airports to tailor their services and offerings to cater to the unique
needs and preferences of their clientele, thereby enhancing the overall travel experience.

9
In addition to traditional static visualizations, we harness the power of interactive visualization
techniques to provide users with an immersive and dynamic exploration experience. Through
interactive plots, dashboards, and data-driven narratives, users are empowered to navigate the data
landscape in real-time, zooming in on specific data points, filtering information based on predefined
criteria, and gaining deeper insights into the intricacies of the aviation industry.
In conclusion, the synergistic fusion of web scraping Skytrax data and visualizing the extracted
insights represents a transformative paradigm shift in the aviation industry. By leveraging the power
of Python and advanced data visualization techniques, we unlock the full potential of Skytrax data,
empowering stakeholders to make informed decisions, identify areas for optimization and
innovation, and gain a nuanced understanding of the complex dynamics that govern air travel in the
modern era..

DATA ANALYSIS

Data visualization is a powerful tool for effectively communicating information and insights from
data. It involves creating graphical representations of the data to visually explore patterns, trends,
and relationships.

4.1 Exploratory Data Analysis EDA

Exploratory Data Analysis EDA stands as a pivotal stage in the data analysis journey, serving as the
initial step in unraveling the intricacies of a dataset. Far beyond merely scratching the surface, EDA
delves deep into the data, employing an arsenal of statistical techniques and visualization tools to
extract meaningful insights, unveil hidden patterns, and uncover relationships that lie beneath the
surface.

At its core, EDA entails a comprehensive examination of the dataset through various lenses,
encompassing summary statistics, graphical representations, and exploratory visualizations. By
scrutinizing measures of central tendency and dispersion, analysts gain a holistic understanding of
the dataset's distribution and variability, laying the groundwork for further investigation.

Furthermore, EDA harnesses the power of visualizations to bring the data to life, offering intuitive
insights into its underlying structure and characteristics. Histograms provide a visual depiction of the
frequency distribution of a variable, shedding light on its shape and spread. Scatter plots, on the other

10
hand, unveil relationships between two variables, showcasing trends, clusters, or outliers that may be
present.

In addition to these fundamental visualizations, EDA incorporates a diverse array of graphical


techniques, including bar charts, box plots, correlation matrices, and heatmaps. Bar charts offer a
categorical view of the data, facilitating comparisons between different groups or categories. Box
plots provide a succinct summary of a variable's distribution, highlighting key statistical measures
such as quartiles and outliers.

Correlation matrices and heatmaps, meanwhile, offer insights into the relationships between multiple
variables, unveiling patterns of association and dependency. These visualizations are invaluable for
identifying potential predictors or drivers within the dataset, guiding the formulation of hypotheses
and informing subsequent analysis.

In essence, EDA serves as a gateway to deeper understanding, enabling analysts to glean insights,
formulate hypotheses, and pave the way for further analysis. By leveraging a rich repertoire of
techniques and visualizations, EDA illuminates the data landscape, empowering analysts to extract
actionable insights and unlock the full potential of the dataset.

4.2 Data Visualization with Matplotlib and PyGWalker

To visualise the data after data cleaning we have many libraries out of them the one used in this
project are :

1. Matplotlib
2. PyGWalker

4.2.1 Introduction to matplotlib

Matplotlib is a versatile and powerful plotting library in Python, widely used for creating static,
interactive, and publication-quality visualizations. Developed by John D. Hunter in 2003, Matplotlib
has become the go-to tool for data visualization tasks due to its flexibility, extensive functionality,
and ease of use.

Matplotlib offers a plethora of features and capabilities that empower users to create a wide range of
visualizations to suit their needs. From simple line plots and scatter plots to complex heatmaps and
3D plots, Matplotlib provides the tools necessary to visualize data in meaningful and insightful ways.

One of the key strengths of Matplotlib is its rich assortment of plotting functions, each tailored to
specific visualization tasks. Users can create line plots, bar plots, histograms, scatter plots, pie charts,

11
and more with just a few lines of code, making it an invaluable tool for exploratory data analysis and
presentation.

Matplotlib offers extensive customization options, allowing users to tailor their visualizations to
meet specific requirements and preferences. With fine-grained control over colors, line styles,
markers, fonts, and annotations, users can create visually stunning and informative plots that
effectively convey their data. Matplotlib stands as a cornerstone of data visualization in Python,
offering a rich array of features, customization options, and integration capabilities. Whether you're a
beginner exploring data visualization for the first time or a seasoned researcher creating complex
visualizations for publication, Matplotlib provides the tools and flexibility needed to bring your data
to life in meaningful and impactful ways.

With its wide range of plotting functions, Matplotlib empowers users to create an extensive variety
of visualizations, including line plots, scatter plots, bar plots, histograms, pie charts, and more. This
versatility ensures that users can choose the most suitable plot type to effectively convey their data
insights.

Moreover, Matplotlib offers extensive customization options, allowing users to tailor the appearance
of their plots to meet specific requirements and preferences. From colours and markers to line styles
and fonts, users can fine-tune every aspect of their plots to achieve the desired visual aesthetics.

Furthermore, Matplotlib supports multiple backends for rendering plots, offering flexibility in
generating plots for different purposes. Whether you're working in an interactive environment like
Jupyter notebooks or need to save plots as image files for publication, Matplotlib has you covered.
One notable feature of Matplotlib is its seamless integration with LaTeX, enabling users to include
mathematical expressions, symbols, and formatted text in their plots with ease. This functionality is
particularly useful for academic and scientific presentations where mathematical notation is
prevalent. Matplotlib's object-oriented interface allows for the creation of complex and sophisticated
visualizations with fine-grained control over plot elements. By working directly with plot objects and
axes, users can build customized plots that effectively communicate their data insights.

In addition, Matplotlib provides support for 3D plotting through the `mpl_toolkits.mplot3d` module,
enabling users to create immersive visualizations of three-dimensional data. This capability is
valuable for fields such as scientific computing, engineering, and geospatial analysis. Furthermore,
Matplotlib seamlessly integrates with NumPy and Pandas, two fundamental libraries for numerical
computing and data manipulation in Python. This integration enables users to plot NumPy arrays and

12
Pandas DataFrame and Series objects directly, streamlining the visualization process and facilitating
exploratory data analysis.

Overall, Matplotlib's extensibility, versatility, and user-friendly interface make it a go-to tool for data
visualization in Python. Whether you're visualizing data for exploratory analysis, presentations, or
publication, Matplotlib empowers you to create compelling and informative visualizations that
enhance your data-driven insights..

4.2.2 Introduction to PyGWalker

In the realm of data analysis, navigating vast datasets and extracting meaningful insights can be a
daunting task. PyGWalker stands out as a transformative tool, offering a comprehensive suite of
capabilities to empower both seasoned analysts and data enthusiasts alike.

PyGWalker transcends mere analysis, offering an intuitive, user-friendly interface designed to


streamline the data exploration process. Imagine intuitively manipulating data through drag-and-drop
actions, crafting insightful visualizations like charts and graphs that bring your findings to life. This
visual storytelling fosters deeper understanding and facilitates the identification of trends, patterns,
and anomalies with remarkable ease.

Furthermore, PyGWalker automates repetitive tasks, freeing you from the burden of manual
calculations and manipulations. This newfound efficiency allows you to focus on what truly matters:
extracting valuable insights and translating them into impactful decisions.

The PyGWalker Advantage:

 Enhanced Efficiency: Achieve results faster with automation and an intuitive interface that
simplifies complex data manipulation tasks.
 Democratizing Data Analysis: Gain insights with ease, regardless of extensive coding
knowledge or technical expertise. PyGWalker empowers users of all skill levels to dive into
data analysis and derive meaningful insights.
 Unveiling Deeper Truths: Discover hidden patterns and relationships previously masked by
traditional methods. PyGWalker's advanced analytical capabilities enable users to uncover
insights that may have gone unnoticed with conventional approaches.
 Data-Driven Decisions: Make informed choices with confidence, empowered by robust
analysis and clear visualizations generated by PyGWalker. By transforming raw data into
actionable insights, PyGWalker facilitates data-driven decision-making across various
domains and industries.

13
By embracing PyGWalker, you unlock the true potential of your tabular data, transforming it from
raw numbers into a springboard for informed decision-making and impactful outcomes. Embark on a
rewarding journey of data exploration today, and see your understanding soar with PyGWalker as
your guide. With its intuitive interface, powerful analytical capabilities, and commitment to
democratizing data analysis, PyGWalker empowers users to extract insights, make informed
decisions, and drive meaningful outcomes in today's data-driven world.

RESULTS AND FINDINGS

5.1 Summary of Extracted Data from Skytrax

The extracted data from the Skytrax website encompassed a plethora of fields, each brimming with
valuable insights into the performance, services, and customer experiences of airlines and airports.
Among the data fields were airline names, ratings, passenger reviews, airport details, and various
other pertinent information. Each of these fields served as a lens through which to scrutinize and
understand the intricacies of the aviation industry, offering a comprehensive view of its multifaceted
landscape.

This dataset, a veritable treasure trove of information, comprised thousands of records, representing a
substantial volume of data ripe for analysis. Structured in a tabular format, with rows denoting
individual observations such as airlines or airports and columns delineating different attributes or
variables, the dataset provided a robust foundation for rigorous analysis and exploration.

As the data analysis process unfolded, it unveiled a myriad of key insights into the aviation industry,
illuminating trends, patterns, and correlations that lay hidden within the vast expanse of data. Among
these revelations were trends in passenger satisfaction, rankings of airlines and airports based on
various criteria, correlations between different service attributes, and the identification of factors
exerting significant influence on customer reviews and ratings.

14
Through rigorous statistical analysis, these insights were brought to light with clarity and precision.
Statistical techniques uncovered significant patterns and relationships within the data, shedding light
on nuanced interactions and dependencies. These findings included correlations between factors such
as cabin comfort and overall passenger ratings, associations between airline size and customer
satisfaction levels, and the identification of service attributes that wielded considerable influence
over passenger perceptions and experiences.

In essence, the analysis of the Skytrax dataset transcended mere data crunching; it was a journey of
discovery and enlightenment, unveiling the intricate tapestry of the aviation industry and offering
invaluable insights into its inner workings. Armed with these insights, stakeholders were empowered
to make informed decisions, implement targeted interventions, and chart a course toward enhanced
customer satisfaction and operational excellence in the dynamic and ever-evolving realm of aviation.

5.2 Impact and Implications of the Analysis Results

The meticulous examination of data within British Airways transcended the confines of mere
organizational boundaries, its impact resonating far beyond the company's boardrooms and
operational hubs. Indeed, the insights unearthed through this rigorous analysis reverberated across
the entire aviation industry, sparking a transformative wave of innovation, optimization, and strategic
recalibration that rippled through every sector and segment within the dynamic aviation landscape.

At the heart of this data-driven revolution lay a deep-seated commitment to understanding and
harnessing the power of data to drive informed decision-making and propel progress. The depth and
breadth of the insights gleaned from the data went far beyond surface-level observations, delving
into the intricate layers of operational nuances, customer sentiments, and prevailing industry trends
with an unparalleled level of granularity and precision.

This comprehensive understanding of the aviation ecosystem served as a potent catalyst for
stakeholders across the industry to embark on a journey of exploration and enhancement, traversing a
diverse spectrum of opportunities ripe for optimization and innovation. From optimizing service
delivery and refining passenger experiences to streamlining operational efficiencies and crafting
targeted marketing endeavors, the insights derived from the data analysis provided a roadmap for
strategic action, guiding decision-makers toward pathways of sustainable growth and competitive
differentiation.

But the impact of this data-driven approach extended well beyond the confines of individual
companies or organizations. It permeated the very fabric of the aviation industry, catalyzing a

15
collective movement toward greater efficiency, effectiveness, and excellence. By embracing data-
driven decision-making and leveraging insights gleaned from comprehensive analysis, stakeholders
across the aviation ecosystem were able to chart a course toward a future defined by innovation,
resilience, and customer-centricity.

In essence, the outcomes of the analysis didn't merely illuminate a pathway for British Airways'
continuous evolution; they served as a beacon of inspiration and guidance for the entire aviation
industry. The ripple effects of this data-driven revolution continue to shape the future trajectory of
the industry, fostering a culture of innovation, collaboration, and continuous improvement that
propels the aviation sector forward into a new era of growth and prosperity.

CONCLUSION

6.1 Recap of Objectives and Project Overview

The overarching goal of the project was to delve into the vast repository of data available on the
Skytrax website, employing the cutting-edge technique of web scraping to extract pertinent
information. Beyond mere extraction, the project aimed for a deeper dive into the intricacies of the
aviation industry, seeking to uncover hidden insights and trends through rigorous data analysis.

To accomplish this ambitious objective, the project adopted a multifaceted approach that
encompassed several key stages. Firstly, extensive web scraping techniques were employed to
systematically retrieve data from various sections of the Skytrax website, ranging from airline and
airport reviews to passenger ratings and industry rankings. This process required meticulous
attention to detail, as different web pages presented unique challenges in terms of data structure and
accessibility.

Following the extraction phase, the project transitioned into a phase of data cleaning and
preprocessing. Here, the focus was on refining the raw data obtained from web scraping, removing
any inconsistencies, duplicates, or outliers that could potentially skew the analysis results. Through
careful data wrangling and transformation, the project ensured that the subsequent analysis would be
built upon a solid foundation of clean and reliable data.

16
With the cleaned dataset in hand, the project then embarked on a journey of exploratory data analysis
EDA, delving into the nuances of the aviation data to uncover meaningful patterns, correlations, and
anomalies. This phase involved employing a variety of statistical techniques and visualization tools
to gain a holistic understanding of the data landscape. From identifying trends in passenger
satisfaction ratings to exploring geographical variations in airport performance, the EDA phase
provided invaluable insights into the inner workings of the aviation industry.

Building upon the insights gleaned from EDA, the project proceeded to conduct more in-depth
statistical analysis, delving into the underlying drivers of key performance indicators and exploring
potential correlations between different variables. This phase involved the application of advanced
statistical models and hypothesis testing techniques to extract actionable insights from the data.

Finally, the project culminated in the creation of visually stunning and informative data
visualizations, designed to distill complex insights into intuitive and digestible formats. Through the
use of interactive charts, graphs, and dashboards, the project aimed to communicate its findings in a
compelling and accessible manner, enabling stakeholders to easily interpret and act upon the insights
generated.

In summary, the project represented a comprehensive exploration of the aviation industry, leveraging
web scraping, data analysis, and visualization techniques to uncover valuable insights and trends. By
successfully achieving its objectives, the project laid the groundwork for informed decision-making
and strategic planning within the aviation sector, paving the way for future innovation and growth.

6.2 Importance and Relevance of Web Scraping and Data Analysis in Today's World

In the contemporary digital landscape, where the expanse of the internet hosts a wealth of
information, the indispensability of web scraping and data analysis cannot be overstated. These twin
pillars of data science serve as invaluable instruments for organizations endeavoring to navigate the
intricacies of the online realm. They facilitate the extraction of actionable insights, which, in turn,
drive informed decision-making, shape strategic initiatives, and ultimately, propel business success.
At the core of this process lies the extraction of data through web scraping—a technique that
empowers organizations to systematically gather information from diverse online sources. By
harnessing the power of web scraping, organizations gain access to a plethora of valuable data that
serves as the bedrock for subsequent analysis and interpretation.
Data analysis, in its essence, serves as the linchpin of the decision-making process. It involves the
transformation of raw data into actionable insights that inform strategic direction and operational
decision-making. Through the adept application of statistical techniques, machine learning

17
algorithms, and advanced visualization tools, organizations can uncover hidden patterns, discern
trends, and unveil correlations within the data. This, in turn, illuminates customer behaviours, unveils
market dynamics, and elucidates industry trends.
In the case of British Airways, the strategic utilization of web scraping and data analysis proved
instrumental in deriving key insights and implications for the airline's operations and strategic
planning. By meticulously analysing data related to airline and airport reviews, passenger satisfaction
ratings, and industry benchmarks obtained through web scraping, British Airways garnered a
comprehensive understanding of its performance relative to competitors. Moreover, it identified
areas for improvement and uncovered emerging trends within the aviation industry.
In conclusion, the symbiotic interplay between web scraping and data analysis represents a
cornerstone for organizations seeking to thrive in the digital era. By deftly harnessing the power of
these techniques, organizations can unearth invaluable insights, gain a competitive edge, and chart a
trajectory towards sustained success and innovation.

REFERENCES

1.https://www.airlinequality.com/airlinereviews/britishairways

2. https://beautifulsoup4.readthedocs.io/en/latest/

3. https://matplotlib.org/stable/index.html

4. https://pandas.pydata.org/docs/

5. https://requests.readthedocs.io/en/latest/

6. https://docs.kanaries.net/pygwalker

18
DATA SHEET

19
WEB SCRAPING CODE

This code below is designed to extract data from the Skytrax website

DATA ANALYSIS CODE

20
This code is designed to analyse the data extracted from Skytrax.

21

You might also like