Professional Documents
Culture Documents
Happay
Happay
Q1)
Created Tables PostgreSQL and finding out answers.
Explanation:
The query orders employees by salary in descending order and then I used offset to skip first 75 %
leaving only last 25% in terms of salary
Output:
b. list all the employees who have salary greater than the average salary of the entire dataset
Explanation :
I have used a subquery which calculates the average salary .The subquery gets executed first and this
subquery is used as a filter to main query to get the requirement.
Output:
Explanation:
The inner subquery calculates the average using window and partitioned on basis of department to
get each department average .The main query uses this as a filter.
Output:
Explanation:
query identifies and lists duplicates in basis of combination of employee_id,first_name,last_name
and dept_id.
The data given had zero duplicates on basis of employee_id,first_name,last_name and dept_id.
explanation:
Query orders on basis of salary in descending order then selects the second highest using limit and
offset
Q2:
Explanation:
The query uses sum() as a window function and ordering on basis of transaction date.
Output:
Q3
Explanation:
Here I created a CTE(common table expression) named LatestEmployment which selects the
emp_id,emp_profile and emp_join_date for each employe where their emp_join_date is latest using
max().
Then I have left joined with employee_table to get name and other requirements.
Output:
Q4
Here I have used left join as I considered employee table to be main table and
wanted to get all details inspite of having null .
If I had used inner join then nikita would have been left out and now I can
change my data and add the details for nikita.
PYTHON:
Python P1
url1 = "https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/
cars1.csv"
cars1 = pd.read_csv(url1)
cars1
url2="https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/
cars2.csv"
cars2=pd.read_csv(url2)
cars2
● Oops, it seems our first dataset has some unnamed blank columns, fix
cars1
● What is the number of observations in each dataset?
Overall code:
import pandas as pd
import numpy as np
import random
##cars1
url1 = "https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/
cars1.csv"
cars1 = pd.read_csv(url1)
cars1
##cars2
url2="https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/05_Merge/Auto_MPG/
cars2.csv"
cars2=pd.read_csv(url2)
cars2
cars1
##Observations
observations_cars1 = len(cars1)
observations_cars2 = len(cars2)
##cars
cars=pd.concat([cars1,cars2])
cars
##owner
random_owners
cars['owners'] = random_owners
cars
Python P2
● Import the necessary libraries
● Create a histogram with the 10 countries that have the most 'Quantity'
ordered except UK
check:
Same graph after removing negative quantity.
Python P3
Import these 2 datasets and check the heads. name the columns or the first
dataset as below.
https://media.geeksforgeeks.org/wp-content/uploads/file.tsv (column names -
'user_id', 'item_id', 'rating', 'timestamp')
'https://media.geeksforgeeks.org/wp-content/uploads/Movie_Id_Ti tles.csv'
● Calculate mean rating of all movies
ON ANALYSING WE GET 0 CORRELATION AS MOST OF THE DATA POINTS HAVE VERY LESS VARIANCE
AND ARE ALMOST SAME IN VALUES
ON ANALYSING WE GET 0 CORRELATION AS MOST OF THE DATA POINTS HAVE VERY LESS VARIANCE
AND ARE ALMOST SAME IN VALUES