Movie Que

You might also like

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 3

@equals(activity('Get Metadata Source File').output.firstRow.properties.

schema,
activity('Get Metadata Combined File').output.firstRow.properties.schema)

--------------------------------------------------------------------------
from pyspark.sql import SparkSession

# Create a SparkSession
spark = SparkSession.builder \
.appName("MovieRecommendation") \
.getOrCreate()

# Load CSV files into DataFrames


movies_df = spark.read.csv("movies.csv", header=True, inferSchema=True)
ratings_df = spark.read.csv("ratings.csv", header=True, inferSchema=True)
users_df = spark.read.csv("users.csv", header=True, inferSchema=True)

# Show the schema and a few rows of each DataFrame


movies_df.printSchema()
movies_df.show(5, truncate=False)

ratings_df.printSchema()
ratings_df.show(5)

users_df.printSchema()
users_df.show(5)

# Perform necessary transformations and create temporary views for SparkSQL


movies_df.createOrReplaceTempView("movies")
ratings_df.createOrReplaceTempView("ratings")
users_df.createOrReplaceTempView("users")

# Example query: Get average ratings for each movie


average_ratings = spark.sql("""
SELECT m.movieId, m.title, AVG(r.rating) as avg_rating
FROM movies m
JOIN ratings r ON m.movieId = r.movieId
GROUP BY m.movieId, m.title
ORDER BY avg_rating DESC
""")
average_ratings.show(10)

# Example query: Get top-rated movies by a specific user


user_id = 1
top_movies_for_user = spark.sql(f"""
SELECT m.movieId, m.title, r.rating
FROM movies m
JOIN ratings r ON m.movieId = r.movieId
WHERE r.userId = {user_id}
ORDER BY r.rating DESC
""")
top_movies_for_user.show(10)
----------------------------------------------------------------

Certainly! Here are some movie recommendation questions along with corresponding
SparkSQL queries:

Top Rated Movies:


Question: What are the top-rated movies based on average ratings?
SparkSQL Query:
sql
Copy code
SELECT m.movieId, m.title, AVG(r.rating) as avg_rating
FROM movies m
JOIN ratings r ON m.movieId = r.movieId
GROUP BY m.movieId, m.title
ORDER BY avg_rating DESC

Most Rated Movies:

Question: Which movies have received the most ratings?


SparkSQL Query:
sql
Copy code
SELECT m.movieId, m.title, COUNT(r.rating) as num_ratings
FROM movies m
JOIN ratings r ON m.movieId = r.movieId
GROUP BY m.movieId, m.title
ORDER BY num_ratings DESC
Top Recommendations for a User:

Question: What are the top movie recommendations for a specific user (e.g., user
with ID 1)?
SparkSQL Query:
sql
Copy code
SELECT m.movieId, m.title, rec.rating
FROM movies m
JOIN (
SELECT movieId, rating
FROM (
SELECT userId, movieId, rating, row_number() OVER (PARTITION BY userId
ORDER BY rating DESC) as rn
FROM ratings
) tmp
WHERE userId = 1 AND rn <= 10
) rec ON m.movieId = rec.movieId
Recommendations Based on Similar Users:

Question: What movies would be recommended to a user based on the preferences of


users with similar tastes?
SparkSQL Query:
sql
Copy code
SELECT r.userId, m.movieId, m.title, rec.rating
FROM ratings r
JOIN (
SELECT userId, movieId, rating
FROM (
SELECT userId, movieId, rating, row_number() OVER (PARTITION BY userId
ORDER BY rating DESC) as rn
FROM ratings
) tmp
WHERE rn <= 10
) rec ON r.userId = rec.userId
JOIN movies m ON rec.movieId = m.movieId
WHERE r.userId = 1
Movies Not Yet Rated by a User:

Question: Which movies have not been rated by a specific user (e.g., user with ID
1)?
SparkSQL Query:
sql
Copy code
SELECT m.movieId, m.title
FROM movies m
LEFT JOIN ratings r ON m.movieId = r.movieId AND r.userId = 1
WHERE r.userId IS NULL
These are just a few examples of questions you can ask and corresponding SparkSQL
queries you can execute to perform movie recommendation tasks. You can modify these
queries based on your specific requirements and the structure of your data.

---------------------

You might also like