Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 32

SCALA-continuation

The data set

Transformation requires
Output::

This list we cant persist in disk so converting into STRING

So you will get as each one as string


Create schema and append to Trans
To perform any grouping aggregation in spark that data should be in
key-value pair

Transforming into key-value pairs


Making records into structures
There are two techniques
1) Tuple
2) case class

Each record in emp transformed as a tuple


Converting tuple objects into a key-value pair
Here we have to access elements not by index through “ ._”

Task:
Below data b/w words many spaces are there now I need to do like only
one space b/w each words

First we need to remove spaces from left and right

Now we want to split the data delimiter as a space


Now you want to remove those space elements. If you want remove
elements. Go with filter ()

Now all blank spaces eliminated


But we want it as a string with only one space b/w the words
mkString () :it will convert list,tople,records as a String

If you give directly it will concat all elements and as a string without
space
If you need space b/w words mkString(“space”)
These type of transformations require when u doing unstructured data
like from twitter, Facebook. bec they may be not following any
structure
Problem with tuple is we can’t remember the positions of elements
each elements. We can make data into structure but Tuple not having
schema
Similar to sql tables we need to provide schema.it is possible by case
classes.
Accessing elements is easy
Now I want to access element a(column) in s1 obj

Here each element in list is samp object. Now I want to add all
elements in each objectwise
Now we will try to coverts of emp records into case class

Apply transformation and converting to case object


Here each element is emp case class object
Now you want to convert case class into key-value pair. But in below
you didn’t access with index/positions we just give the name of field
So you can easily access with dname,sal if you want(dname,sal) as pair
This will help you when you want to convert RDD into table means. We
have to give a schema to that RDD.by using case classes.

Functions
Zero arg or no-org function
Previously we manually did transformations by using and all. I f you
need these things regularly. Then we can create function and then we
can use these as reusability when ever required.
Create function for namefirst letter upper case and then remaining
are in lower case
Upper case function

Gender function

Grade based on Salary


Department function: based on deptnum.you need to get deptname
Combined function to create emp case class objects

Now I want below data need to convert into case class

Alreay we have function.so we need to use that


All the required transformations applied In that toEmp() function

Create a function that gives a Boolean as return type


Now we apply this function into filter()
We have a list and we want only males

Now we want only females

Now we have a list of records in that based on the gender we need to


separate those records.i.e males one list and females are in other list
Now I have a emp case class from that I need to extract only males

Recursive functions
For factorial
Because it is out of range int
Pwer function
X is declare as Int but you applying string so type mismatch
Above valid only if string contains pure numerics
Then wt is diff b/w & and &&
Dno=11
Active evaluation

Here still it is checking all the conditions this is Active evaluations


Lazy Evaluation

Active: in the below expression second one is false but it is still checking
all conditions

You might also like