Professional Documents
Culture Documents
Week7 Bda
Week7 Bda
Week7 Bda
Split lines from the textbook into arrays or words and perform
operations using select ().
Aim: A program to Split lines from the text book into arrays or words and
perform operations using select().
Description:
The split() method splits a string into an array of substrings. The split() method
returns the new array. The split() method does not change the original string. If ("
") is used as separator, the string is split between words.
Dataset:
Program:
from pyspark.shell import spark
from pyspark.sql.functions import monotonically_increasing_id
from pyspark.sql.functions import split
book = spark.read.text("Text.txt")
book.show()
lines = book.withColumn("words", split(book["value"], " "))
lines.show()
linesID = lines.withColumn("ID", monotonically_increasing_id()+1)
linesID.show()
#lines.select("words").head(5)
linesID.select("value","ID").filter(linesID.ID <=5).show()
middleRowID = lines.count()//2
linesID.select("value","ID").filter((linesID.ID >= (middleRowID-2)) & (linesID.ID <=
(middleRowID+2))).show()
linesID.select("value","ID").filter(linesID.ID%2 !=0).show()