This Python code uses Spark SQL to read in a CSV file, calculate the average age by name, and write the results out to Cloud Storage and BigQuery. It first creates a SparkSession and reads the input CSV. It then groups the data by name, calculates the average age, and shows the results. Finally, it extracts the name column, drops duplicates, and writes the results to Cloud Storage in silver and gold layers and comments out writing to a BigQuery table.
This Python code uses Spark SQL to read in a CSV file, calculate the average age by name, and write the results out to Cloud Storage and BigQuery. It first creates a SparkSession and reads the input CSV. It then groups the data by name, calculates the average age, and shows the results. Finally, it extracts the name column, drops duplicates, and writes the results to Cloud Storage in silver and gold layers and comments out writing to a BigQuery table.
This Python code uses Spark SQL to read in a CSV file, calculate the average age by name, and write the results out to Cloud Storage and BigQuery. It first creates a SparkSession and reads the input CSV. It then groups the data by name, calculates the average age, and shows the results. Finally, it extracts the name column, drops duplicates, and writes the results to Cloud Storage in silver and gold layers and comments out writing to a BigQuery table.