This Python code uses Spark SQL to read employee data from a JSON file into a DataFrame. It then filters the DataFrame to only rows where the stream is "JAVA" and writes the filtered DataFrame to a new Parquet file. It first reads the JSON, displays the DataFrame, coalesces and writes it to a Parquet file. Then it reads the Parquet, filters for "JAVA" stream, displays and writes the filtered DataFrame to a new Parquet file.
This Python code uses Spark SQL to read employee data from a JSON file into a DataFrame. It then filters the DataFrame to only rows where the stream is "JAVA" and writes the filtered DataFrame to a new Parquet file. It first reads the JSON, displays the DataFrame, coalesces and writes it to a Parquet file. Then it reads the Parquet, filters for "JAVA" stream, displays and writes the filtered DataFrame to a new Parquet file.
This Python code uses Spark SQL to read employee data from a JSON file into a DataFrame. It then filters the DataFrame to only rows where the stream is "JAVA" and writes the filtered DataFrame to a new Parquet file. It first reads the JSON, displays the DataFrame, coalesces and writes it to a Parquet file. Then it reads the Parquet, filters for "JAVA" stream, displays and writes the filtered DataFrame to a new Parquet file.
This Python code uses Spark SQL to read employee data from a JSON file into a DataFrame. It then filters the DataFrame to only rows where the stream is "JAVA" and writes the filtered DataFrame to a new Parquet file. It first reads the JSON, displays the DataFrame, coalesces and writes it to a Parquet file. Then it reads the Parquet, filters for "JAVA" stream, displays and writes the filtered DataFrame to a new Parquet file.