Professional Documents
Culture Documents
Spark With Scala Recently Asked Interview Questions: Trendytech Insights
Spark With Scala Recently Asked Interview Questions: Trendytech Insights
Spark With Scala Recently Asked Interview Questions: Trendytech Insights
how to consider
block size in Hadoop a) consider as 64Mb block size b) consider 128Mb block size c) no use
The answer was to consider 64MB can know why? Sorry, I couldn't frame the question
exactly. I felt if we divide into small sizes then there is a chance of an increase in the
metadata in NameNode. I thought it was not an ideal solution. Can you please explain this
clearly? and there is one more question in vice versa i.e., with 4 node cluster there the correct
answer given was 256Mb I think. Can you please go through these two questions can you
give clarification?
YOU HAVE NOT REACTED TO THIS POST. CLICK TO REACT WITH A HEART. THE
TOTAL NUMBER OF REACTIONS FOR THIS ITEM IS 0.01 COMMENT
Trendytech Insights
MODERATOR8 MONTHS AGO
In above data, you will notice data is not evenly distributed on "position". Hence we call it as
skewed table on key "position".
If you create partitions of data on this column then one partition will have 2000 records while
other three partitions have comparatively less records.
🔸3 tasks working on smaller partitions can get completed but, the task working on large partition
will still be running.
This impacts overall performance.
🔹If any of two tables is skewed, then we should use skew join.
▪️Hive Properties:
🔸hive.optimize.skewjoin=true;
🔸hive.skewjoin.key=500000; --threshold