Professional Documents
Culture Documents
4251 Assignment 2
4251 Assignment 2
4251 Assignment 2
Brief Information: Web Scraping and Data Visualization are two powerful techniques in the
realm of data science and analysis.
2. HTML Parsing: Extracting relevant data from HTML using libraries like `Beautiful
Soup` or `lxml`.
4. Handling Dynamic Content: Techniques like using Selenium for scraping dynamic
websites.
Data visualization is the graphical representation of data to uncover insights and patterns.
It involves creating visualizations such as charts, graphs, and maps to communicate
information effectively. Key concepts include:
1. Types of Visualizations: Understanding when to use different types of charts like bar
charts, line charts, scatter plots, etc.
2. Choosing the Right Visualization: Selecting appropriate visualizations based on the data
and the insights you want to convey.
3. Color Theory and Design: Understanding color schemes and design principles to create
visually appealing and informative visuals.
4. Interactivity: Adding interactivity to visualizations for better exploration and
understanding.
5. Storytelling with Data: Presenting data in a narrative format to convey a clear message.
1. Web Scraping:
2. Data Visualization:
import requests
import pandas as pd
import numpy as np
def data_Downloader():
url = "https://www.worldometers.info/coronavirus/"
html_page=requests.get(url).text
#print(html_page)
soup=BeautifulSoup(html_page,"lxml")
get_table = soup.find("table",id="main_table_countries_today")
get_table_data = get_table.tbody.find_all("tr")
dic={}
for i in range(len(get_table_data)):
try:
key=get_table_data[i].find_all("a",href=True)[0].string
except:
key=get_table_data[1].find_all("td")[0].string
dic[key]=ValuesView
#print(dic)
column_names=["Country", "Total Cases","New Cases","Total Deaths", "New Deaths", "Total
Recovered", "New Recovered", "Active Cases", "Serious Critical", "Tot Cases", "Deaths", "Total
Tests", "Tests", "Population"]
table=pd.DataFrame(dic).iloc[2:,:]
table=table.reset_index(drop=True).T.iloc[:,:14]
table.index.name='Country'
table.columns=column_names
table=table.rename(index={None:"World"})
table.head(20)
table.to_csv("Corona Data.csv")
print('code worked')
import requests
import pandas as pd
def scrape_corona_data():
url = "https://www.worldometers.info/coronavirus/"
html_page = requests.get(url).text
rows = []
rows.append(row_data)
# Create DataFrame
df = pd.DataFrame(rows, columns=columns)
# Clean data
return df
# Scrape data
corona_data = scrape_corona_data()
corona_data.head()
corona_data.to_csv('/content/drive/MyDrive/Corona_Data.csv', index=False)
import pandas as pd
df_cleaned = df.dropna(subset=['Country,Other'])
# Sort the DataFrame by 'TotalCases' in descending order
top_10_countries = df_sorted.head(10)
print(top_10_countries[['Country,Other', 'TotalCases']])
# Plotting
plt.figure(figsize=(12, 6))
plt.xlabel('Country/Region')
plt.tight_layout()
plt.show()
import pandas as pd
df_cleaned = df.dropna(subset=['Country,Other'])
# Sort the DataFrame by 'TotalCases' in descending order
top_10_countries = df_sorted.head(10)
# Plotting
plt.figure(figsize=(8, 8))
plt.pie(top_10_countries['TotalCases'], labels=top_10_countries['Country,Other'],
autopct='%1.1f%%', startangle=140)
plt.tight_layout()
plt.show()
Data Visualization:
1. Sample Screenshot of .csv file created