Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 1

PySpark/SQL Interview Questions

1. Page with no likes : Write a query to return the IDs of the Facebook pages that have
zero likes. The output should be sorted in ascending order based on the page IDs.

Solution: We can either use left join or in operator


result_df = pages_df.select("page_id").join(
page_likes_df.select("page_id").distinct(),
on = "page_id",
how = "leftanti")
result_df.show()

a = [r['page_id'] for r in page_likes_df.select("page_id").distinct().collect()]


#print(a)

result_df = pages_df.select("page_id").filter(~col("page_id").isin(a))
result_df.show()

You might also like