What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?

1.
What are basic characteristics of Data and how is Parallel processing system different
from distributed system?
Five characteristics of high quality information are accuracy, completeness, consistency,

uniqueness, and timeliness.
Accuracy-the quality or state of being correct or precise.
Completeness-need to be complete.
Consistency-the way in which a substance holds together.
Uniqueness-should avoid duplicate or redundancy of data.
Timeliness-it should reach within the particular time.
The main difference between parallel and distributed computing is that parallel computing allows
multiple processors to execute tasks simultaneously while distributed computing divides a single
task between multiple computers to achieve a common goal.
Parallel takes place in single computer and distributed takes place in different systems.
Processors communicate through buses whereas in distributed it takes place through network.
Computer can have shared memory in parallel whereas in distributed computer has its own
memory.
2.Outline the challenges faced in big data.

Dealing with data growth
Generating insights in a timely manner
Recruiting and retaining big data talent
Integrating disparate data sources
Validating data
Securing big data
3.Discuss about coexistence of RDBMS and Big data.

Hadoop is not replacing RDBMS it is simply complementing them and giving RDBMS the
capability to ingest the large volumes of data being prouduced and dealing with their variety and
veracity as well as as giving a storage platform on HDFS with a flat architecture that keeps raw
data in a flat architecture and gives a schema on read and analytics. Big data is evolution not
revolution so hadoop will not replace RDBMS since they are good at dealing with relational and
transactional data.They can co-exist peacefully.
4. Compare and contrast SMP, MMP and Distributed system.

SMP (symmetric multiprocessing)- in which some hardware resources might be shared among
processors.The processors communicate via shared memory and have a single operating system.
MPP (massively parallel processing) - also known as shared-nothing, in which each processor
has exclusive access to hardware resources. MPP systems are physically housed in the same box.
A distributed system is a network that consists of autonomous computers that are connected
using a distribution middleware. They help in sharing different resources and capabilities to
provide users with a single and integrated coherent network.
Distributed system- A distributed system is a system whose components are located on different
networked computers, which communicate and coordinate their actions by passing messages to
one another. They help in sharing different resources and capabilities to provide users with a
single and integrated coherent network.
5. Explore and list few use cases of big data Analytics in the following domains
Healthcare
Retail
Telecom
Entertainment
 1) Healthcare:-
 Hospital quality and patient safety in the ICU.
 Real-Time Alerting.
 Enhancing Patient Engagement.
 Precision medicine, personalized care, and genomics.
 Population health management, risk stratification, and prevention.
 Big data might just cure cancer.
2) Retail:-
 Up-Sell/Cross-Sell Recommendations.
 Fraud Detection.
 Personalizing customer experience.
 Forecasting demand in Retail.
 Customer journey analytics.
3) Telecom:-
 Improved Network Security
 Better Customer Service
 Contextualized Location-Based Promotions
 Predictive Maintenance
 Targeted Campaigns
 Real-Time Network Analytics
3) Entertainment:-
 Predicting what your audience wants.
 Insights into customer churn.
 Optimized scheduling of media streams.
 Content monetization.
 Effective Ad Targeting.
6 .Explain about evolution of Hadoop.

Hadoop was started with Doug Cutting and Mike Cafarella in the year 2002 when they both
started to work on Apache Nutch project. Apache Nutch project was the process of building a
search engine system that can index 1 billion pages. They were looking for a feasible solution
which can reduce the implementation cost as well as the problem of storing and processing of
large datasets.
In 2003, they came across a paper that described the architecture of Google’s distributed file
system, called GFS (Google File System) which was published by Google, for storing the large
data sets.
In 2004, Google published one more paper on the technique Map Reduce, which was the
solution of processing the large datasets.
In 2005, Cutting found that Nutch is limited to only 20-to-40 node clusters. He soon realized two
problems:
(a) Nutch wouldn’t achieve its potential until it ran reliably on the larger clusters
(b) And that was looking impossible with just two people (Doug Cutting & Mike Cafarella).
in 2006, Doug Cutting joined Yahoo along with Nutch project. He wanted to provide the world
with an open-source, reliable, scalable computing framework, with the help of Yahoo. So at
Yahoo first, he separates the distributed computing parts from Nutch and formed a new project
Hadoop
In 2007, Yahoo successfully tested Hadoop on a 1000 node cluster and start using it.
In January of 2008, Yahoo released Hadoop as an open source project to ASF
In 2009, Hadoop was successfully tested to sort a PB (PetaByte) of data in less than 17 hours for
handling billions of searches and indexing millions of web pages.
In December of 2011, Apache Software Foundation released Apache Hadoop version 1.0.
And later in Aug 2013, Version 2.0.6 was available.
And currently, we have Apache Hadoop version 3.0 which released in December 2017.
7. What is HDFS and define the following

Block
I.Rack
FSimage
FS Data input & output stream
II.
The Hadoop Distributed File System (HDFS) is the primary data storage system used by Hadoop
applications. It employs a Name Node and Data Node architecture to implement a distributed file
system that provides high-performance access to data across highly scalable Hadoop clusters. 1)
Block: - Blocks are the smallest continuous location on your hard drive where data is stored.
HDFS stores each file as blocks, and distribute it across the Hadoop cluster.
2) Rack :- A rack is a collection of 30 or 40 nodes that are physically stored close together and
are all connected to the same network switch.
3) FS image :- It contains the complete state of the file system namespace since the start of the
Name Node
4) FS Data input and output stream :- In order to read only file:
• We first need to get an instance of file system underlying the cluster.
• Then we need to get an input stream to read from the data of the file.
• We can use class static methods to copy Bytes from input stream to any other stream like
standard output, or we can use read() or read Fully() methods on input stream to read data into
byte buffers. To write a file in:
• First we need to get the instance if file system.
• Create a file with create() method on file system instance which will return an FS Data output
stream.
8.How is a file stored in HDFS to support parallel processing as well as support fault
tolerance?
Fault tolerance and parallel processing:
HDFS provides fault tolerance by replicating the data blocks and distributing it among different
data Nodes across the cluster. By default, this replication factor is set to 3 which is configurable.
So, if I store a file of 1 GB in HDFS where the replication factor is set to default i.e. , 3, it will
finally occupy a total space of 3 GB because of the replication. Now, even if a retrieve the data
from other replicas stored in different Data Nodes.
9.What are the HDFS Daemons and what are their responsibilities?
 Daemons are the processes that run in the background. There are mainly 4 daemons
which run for Hadoop.
 Namenode – It runs on master node for HDFS.
 Datanode – It runs on slave nodes for HDFS.
 ResourceManager – It runs on master node for Yarn.
 NodeManager – It runs on slave node for Yarn.
10.What assumptions were made in the design of HDFS? Do these assumptions make
sense? Discuss why?
The Assumptions made are as follows
1. Large dataset: The architecture is designed such that it is the best fit for large amount of data.
2. Write once, read many: It assumes that a file in HDFS once written will not be modified,
though it can be accessed n number of times. This assumption enables to ensure high throughput
of data access.
3. Commodity hardware: HDFS assumes that the cluster(s) will run on common hardware, that
is, non-expensive, ordinary machines. This is in order to reduce overall cost to a great extent.
4. Data replication and Fault tolerance: HDFS works on the assumption that hardware is bound
to fail at some point of time or the other. To overcome this failure, each block is stored on three
nodes ( rf = 3 default): two on the same rack and one on a different rack for fault tolerance. This
redundancy enables robustness, fault detection, quick recovery.
5. Moving code to the data, than data to the code: This is done in order to increase the overall
efficiency as its much better to do the computations at the applications near the data and send the
results than send the data itself. This reduces congestion in the network as large amount of data
isn’t sent.
11.What are the main functions of Map Reduce ?
MapReduce serves two essential functions: it filters and parcels out work to various nodes
within the cluster or map, a function sometimes referred to as the mapper, and it organizes and
reduces the results from each node into a cohesive answer to a query, referred to as the reducer.
How MapReduce works
The original version of MapReduce involved several component daemons, including:
 JobTracker -- the master node that manages all the jobs and resources in a cluster;
 TaskTrackers -- agents deployed to each machine in the cluster to run the map and reduce
tasks; and
 JobHistory Server -- a component that tracks completed jobs and is typically deployed as
a separate function or with JobTracker.
12.What are the two main phases of the Map Reduce programming ? Explain through an
example.
A MapReduce Program consists of three different phases. They are:

 Mapper
 Sort and Shuffle
 Reducer
Among all the three phases, Mapper and Reducer are the direct implementation with respect to
coding, where as the Sort and Shuffle phase acts as a glue between Mapper and Reducer.
Sort and Shuffle phase takes the input as (K,V) and generates the output int the form of Key and
List of Value pairs (K, List(v)).
Reducer phase takes the input as (K, List(v)), and generates the output as (K,V). Reducer phase
output is the Final Output.
13.What are the functions of YARN ?
Hadoop YARN is the resource management and job scheduling technology in the open
source Hadoop distributed processing framework. One of Apache Hadoop's core components,
YARN is responsible for allocating system resources to the various applications running in
a Hadoop cluster and scheduling tasks to be executed on different cluster nodes.
In a cluster architecture, Apache Hadoop YARN sits between HDFS and the processing engines
being used to run applications. It combines a central resource manager with containers,
application coordinators and node-level agents that monitor processing operations in individual
cluster nodes. YARN can dynamically allocate resources to applications as needed, a capability
designed to improve resource utilization and application performance compared with
MapReduce's more static allocation approach.
In addition, YARN supports multiple scheduling methods, all based on a queue format for
submitting processing jobs. The default FIFO Scheduler runs applications on a first-in-first-out
basis, as reflected in its name. However, that may not be optimal for clusters that are shared by
multiple users. Apache Hadoop's pluggable Fair Scheduler tool instead assigns each job running
at the same time its "fair share" of cluster resources, based on a weighting metric that the
scheduler calculates.
Hadoop YARN also includes a Reservation System feature that lets users reserve cluster
resources in advance for important processing jobs to ensure they run smoothly. To avoid
overloading a cluster with reservations, IT managers can limit the amount of resources that can
be reserved by individual users and set automated policies to reject reservation requests that
exceed the limits.
14.What are the YARN daemons and what are their functions ?
The different Daemons in YARN are:-
Resources Manager:- Runs on a master daemon and manages the resource allocation in the
cluster.
Node Manager:- They run on the slave daemons and are responsible for the execution of a task
on every single Data Node.
Application Master:- Manages the user job life cycle and resource needs of individual
applications. It works along with the Node Manager and monitors the execution of tasks.
15.Explain how Map Reduce, YARN and HDFS work together. Diagrams are not
necessary for the explanation, but the sequence of handshakes between the components
should be clearly explained.
To process any data, the client submits data and program to Hadoop. HDFS stores the data while
MapReduce process the data and Yarn divide the tasks.
➢ HDFS has a master slave topology, it has two daemons namely:
• Name Node is the daemon running of the master machine. Stores the directory tree of all files
in the file system. It tracks where across the cluster the file data resides.
• Data node daemon runs on the slave nodes. It stores data in the Hadoop File System. In
functional file system data replicates across many Data Modes.
➢ MapReduce: The general idea of the MapReduce algorithm is to process the data in parallel
on your distributed cluster. It subsequently combines it into the desired result or output.
➢ The MapReduce algorithm contains two important tasks:
• Map: Map takes a set of data and converts it into another set of data, where individual
elements are broken down into tuples (key/value pairs).
• Reduce: which takes the output from a map as an input and combines those data tuples into a
smaller set of tuples.
➢ YARN: Yarn divides the task on resource management and job scheduling/monitoring into
separate daemons. Yarn supports the concept of Resource Reservation via Reservation System.
In this, a user can fix several resources for execution of a job over time and temporal constraints.
➢ In summary Input data is broken into blocks of size 128 Mb and then blocks are moved to
different nodes. Once all the blocks of the data are stored on data-nodes, the user can process the
data. Resource Manager then schedules the program (submitted by the user) on individual nodes.
Once all the nodes process the data, the output is written back to HDFS.
16.What shortcomings of Hadoop motivated the development of Spark. How were the
shortcomings addressed by Spark ?
The Drawbacks of Hadoop are:
Low Processing speeds: as the algorithm is best only for large amount of data.
Data pipelining: hadoop doesn’t support data pipelining
Latency: In Hadoop, the MapReduce framework is slower, since it supports different formats,
structures, and huge volumes of data.
Lengthy Line of Code: Since Hadoop is written in Java, the code is lengthy. And, this takes more
time to execute the program.
These are addressed in spark by In-memory Processing: In-memory processing is faster when
compared to Hadoop, as there is no time spent in moving data/processes in and out of the disk.
Stream Processing: Apache Spark supports stream processing, which involves continuous input
and output of data. Stream processing is also called real-time processing.
Less Latency: Apache Spark is relatively faster than Hadoop, since it caches most of the input
data in memory.
Lazy Evaluation: Apache Spark starts evaluating only when it is absolutely needed. This plays an
important role in contributing to its speed.
Less Lines of Code: Although Spark is written in both Scala and Java, the implementation is in
Scala, so the number of lines are relatively lesser in Spark when compared to Hadoop
17.Is Spark completely different from Hadoop ? If not, what is same between the two and
what is different ?
While Spark can run on top of Hadoop and provides a better computational speed solution.
However, there are various similarities between them.
• Recovery: RDDs allows recovery of partitions in spark on failed nodes by recomputation of the
DAG (direct acyclic graph) while also supporting a more similar recovery style to Hadoop by
way of checkpointing.
• Fault tolerance: Both have counter measures for handling faults, hence no need to restart the
system.
• OS support: Both support cross platform.
• Scalability: Both provides scalability.
➢ The differences are:
• Speed: Since spark works in memory, works much faster than Hadoop map reduce.
• Real time analysis: Spark enables real time analysis of the data while Hadoop fails.
• Development: Spark is developed in SCALA while Hadoop in java.
18.Spark clusters may be more expensive than Hadoop. Explain why.
Hadoop:
As it is part of Apache Open Source there is no software cost.

Hardware cost is less in MapReduce as it works with smaller memory(RAM) as compared to
Spark. Even commodity hardware is sufficient.
Spark:
Spark also is Apache Open Source so no license cost.
Hardware cost is more than MapReduce as even though Spark can work on commodity hardware
it needs a lot more memory(RAM) as compared to MapReduce since it should be able to fit all
the data in Memory for optimal performance. Cluster needs little high-end commodity hardware
with lots of RAM else performance gets hit.
19.Find out about Spark use cases.

Streaming Data: Apache Spark’s key use case is its ability to process streaming data. With so
much data being processed on a daily basis, it has become essential for companies to be able to
stream and analyze it all in real time.
Machine Learning: Spark comes with an integrated framework for performing advanced
analytics that helps users run repeated queries on sets of data which essentially amounts to
processing machine learning algorithms.
Interactive Analysis: Apache Spark, is fast enough to perform exploratory queries without
sampling. Spark also interfaces with a number of development languages including SQL, R, and
Python.
20.In the Scala REPL, type “3.” and then hit the TAB key. What do you see ? Note: do not
ignore the “.” after the 3.
It displays all the operations that can be performed on the integers such as !=, +, <<, >>, abs,
compareTo, getClass, isNaN, isWhole, round, toInt, %, -, ^, ==, / *, <, > and many more.
21.Do the same as above by typing “Hello.” followed by the TAB key. Note: do not ignore
the “.” after “Hello”. Repeat by typing “Hello.s” and then applying the TAB key.
=>“Hello”. Followed by the TAB key displays all the operations that can be performed on the
Strings such as *, ++, capitalize, contentEquals, flatten, charAt, lengthCompare, map, isBlank,
max, min, exists, equals and many more.
“Hello”.s followed by the TAB key displays all the operations that can be performed on the
Strings that starts with letter “s” such as sameElements, scanLeft, seq, slice, sortBy, split,
startsWith, subString, splitAt and many more.
22.In the Scala REPL, compute the square root of 3, and then square that value. What do
you observe and how can you explain your observation ?
scala> var x= scala.math.pow(scala.math.sqrt(3), 2)

x: Double = 2.9999999999999996
In the output we can observe that first sqrt(3) is performed which is
1.7320508075688772, then pow() function is used to get sqaure of that value which is
2.9999999999999996.
23.Scala lets you multiply a string with a number—try out "crazy" * 3 in the REPL. What
does this
operation do?
scala> “crazy” * 3
res4: String = crazycrazycrazy
The result is the given string is displayed 3 times continously.
24.How do you get the first character of a string in Scala? The last character?
scala> val s = "Welcome"

s: String = Welcome
scala> s.head
res0: Char = W
scala> s(0)
res1: Char = W
scala> s.last
res2: Char = e
scala> s(s.length - 1)
res3: Char = e
25.What is Lazy Initialization?
Lazy initialization is a technique that defers the creation of an object until the first time it is
needed. In other words, initialization of the object happens only on demand. We can improve the
application’s performance by avoiding unnecessary computation and memory consumption.
26.Write a Scala equivalent for the Java loop for (int i = 10; i >= 0; i--)
System.out.println(i);
scala> for(i <- 10 to (0, -1)) println(i) scala> for(i<-(0 to 10).reverse)

println(i)
10 10
9 9
8 8
7 7
6 OR 6
5 5
4 4
3 3
2 2
1 1
0 0
27. Write a procedure countdown(n: Int) that prints the numbers from n to 0.
=>scala> def countdown(n:Int){ scala> def countdown(n:Int){
| for(i<-(0 to n).reverse) | for(i<-n to (0,-1))
| println(i) | println(i)
|} |}
countdown: (n: Int)Unit countdown: (n: Int)Unit
OR
scala> countdown(4) scala> countdown(5)
4 5
3 4
2 3
1 2
0 1
0
28.Write a for loop for computing the product of the Unicode codes of all letters in a string.
For example, the product of the characters in "Hello" is 825152896.
=>scala> var product = 1 ; for (c <- "Hello") product *= c.toInt ; println(product)

825152896
product: Int = 825152896
scala> var product:Long = 1 ; for (c <- "Hello") product *= c.toLong ; println(product)

9415087488
product: Long = 9415087488
29.Write a function that computes x^n, where n is an integer. Use the following recursive
definition:
x^n = y^2 if n is even and positive, where y = x^(n / 2).
x^n = x * x^(n – 1) if n is odd and positive.
x^0 = 1.
x^n = 1 / x^(–n) if n is negative.
Don’t use a return statement.
=>scala> def xpown(x: BigDecimal, n: Int): BigDecimal = {

| if (n == 0) 1
| else if (n < 0) 1 / xpown(x, -n)
| else if (n % 2 == 0) {
| val i = xpown(x, n / 2)
| i*i
| }
| else x * xpown(x, n - 1)
|}
xpown: (x: BigDecimal, n: Int)BigDecimal
scala> xpown(2, 1024)

res23: BigDecimal = 1.797693134862315907729305190789023E+308
scala> xpown(-2, -10)

res24: BigDecimal = 0.0009765625
scala> xpown(-2, 10)

res25: BigDecimal = 1024
30.A map is like a dictionary. Explain.
Map is a collection of key-value pairs. In other words, it is similar to dictionary(also consists of

ke-value pais used in Python) . Keys are always unique while values need not be unique. Key-
value pairs can have any data type. However, data type once used for any key and value must be
consistent throughout. Maps are classified into two types: mutable and immutable. By default
Scala uses immutable Map. In order to use mutable Map, we must import
scala.collection.mutable.Map class explicitly.The difference between mutable and immutable
objects is that when an object is immutable, the object itself can't be changed.
31.Set up a map of gadgets that you want, along with their prices. Create a second map of
the gadgets with a 10% discount. Print the second map and inspect its contents.
scala> var gadgets= Map("Headphone"->500, "Notebook"->50, "Earphones"->300)

gadgets: scala.collection.immutable.Map[String,Int] = Map(Headphone -> 500, Notebook -> 50,
Earphones -> 300)
scala> val discountedGadegets = for((k, v) <- gadgets) yield (k, v * 0.9)

discountedGadegets: scala.collection.immutable.Map[String,Double] = Map(Headphone ->
450.0, Notebook -> 45.0, Earphones -> 270.0)
32.Use a mutable map to count the number of times each word appears in a sentence
(provided as a string).
1. => scala>defwordCounter(text:String)={ | { | for{ | word<-

text.toLowerCase.replaceAll("[^a-z]","").split("") | ifword.length>3 | }yieldword |
}.groupBy(identity).map{case(word,list)=>(word,list.length)} | }
wordCounter:(text:String)scala.collection.immutable.Map[String,Int]
2. scala>wordCounter("HelloworldHelloworldbyeguys")
res15:scala.collection.immutable.Map[String,Int]=Map(guys->1,world->2, hello->2)
33.Write a function lteqgt(values: Array[Int], v:Int) that returns a triple containing the
counts of
values less than v, equal to v, and greater than v.
=> scala>defwordCounter(text:String)={ | { | for{ | word<-text.toLowerCase.replaceAll("[^a-

z]","").split("") | ifword.length>3 | }yieldword |
}.groupBy(identity).map{case(word,list)=>(word,list.length)} | }
wordCounter:(text:String)scala.collection.immutable.Map[String,Int]
scala>wordCounter("HelloworldHelloworldbyeguys")
res15:scala.collection.immutable.Map[String,Int]=Map(guys->1,world->2, hello->2)
34. In the scala console, type “Hello”.zip(“World”). What does it do ? Give a scenario
where you
can use this method.
scala> "Hello".zip("World")
res38: scala.collection.immutable.IndexedSeq[(Char, Char)] = Vector((H,W), (e,o), (l,r), (l,l),
(o,d))
Can be used for ordering (sorting) of strings
35. Explain what is a comprehension in your own words. Give an example of

comprehensions in Scala and explain the correspondence between your explanation and
example.
Comprehension is the ability to understand something after processing text and understanding its
meaning. Scala offers alight weight notation for expressing sequence comprehensions.
Comprehensions have the form for(enums) yielde, where enums refers to a semicolon-separated
list of enumerators. An enumerator is either a generator which introduces new variables, or it is a
filter. It is similar to comprehension as in it is a filter to get certain data.
36 .When does it make sense to use iterators ? Why is map / foreach usually a better option
?
It makes sense to use iterators when collection is too big to place completely in
memory, for example, when processing large files.
Map/foreach is usualy a better option in this case as they get only one item at a time and
foreach iterates over all items one at a time.
37.Given an array of strings, find the sum of the length of all the strings.
scala> def sumoflen(strings:Array[String]):Int = {

| var sum = 0
| for(string<-strings){
| sum+=string.length()
|}
| sum
|}
sumoflen: (strings: Array[String])Int
scala> sumoflen(Array("akshay","aswin","shritej")) res5: Int

= 18
38.Write a function that given a string, produces a map of the indexes of all the characters
as a list. For example, indexes(“Mississippi”) should return a map associating ‘M’ with the
List(0), ‘i’ with the List(1, 4, 7, 10) and so on. Use a mutable map of characters and
ListBuffer’s which are mutable, in place of List’s which are not mutable.
.scala> def indexes(str: String) = {

| var associated = new HashMap[Char, LinkedHashSet[Int]]()
| for(char <- str.distinct) {
| val indexSet = new LinkedHashSet[Int]
| for(i <- 0 until str.length if str(i) == char) indexSet += i
| associated(char) = indexSet
| }
| associated
|}
for(char <- str.distinct) {
^
scala> indexes("akshay") res6:

scala.collection.mutable.HashMap[Char,scala.collection.mutable.LinkedHashSet[Int]]
= HashMap(a -> LinkedHashSet(0, 4), s -> LinkedHashSet(2), h -> LinkedHashSet(3), y ->
LinkedHashSet(5), k -> LinkedHashSet(1))
39.Write a class Time with read only properties hours and minutes and a method
before(other: Time): Boolean that checks whether this time comes before the other. A
Time object should be constructed as new Time(hrs, min), where hrs is in 24 hour format
(0 to 23).
=>scala> class Time(val hrs: Int, val min: Int){

| def before(other: Time)={
| (hrs < other.hrs) || (hrs == other.hrs && min < other.min)
|}
|}
defined class Tim
scala> new Time(4,30)
res0: Time = Time@26774a86
41.Write an object Conversions with methods inchesToCms and milesToKms. Use it to

convert 5 inches to centimetres and 10 miles to kilometers.
=>scala> object Conversions{

| def inchesToCms(inches: Double) = inches*2.54
| def milesToKms(miles: Double) = miles*1.60934
|}
defined object Conversions
scala> Conversions.inchesToCms(5)
res5: Double = 12.7
scala> Conversions.milesToKms(10)
res6: Double = 16.0934
42.What are getters and setters ? Explain with an example.
Getters are a technique through which we get the value of the variables of a class.
 Getting the value of a global variable directly. In which we call specify the name of the
variable with the object.
 Getting the value of a variable through method calling using the object. This technique is
good when we don’t have accessibility to class variables but methods are available public.
Example:
// A Scala program to illustrate

// Getting the value of members of a class
// Name of the class is Student

class Student
{
// Class variables
var student_name: String= " "
var student_age: Int= 0
// Getter
private var student_rollno= 0
// Class method
def set_rollno(x: Int){
student_rollno= x
}
def get_rollno(): Int ={
return student_rollno
}
// Creating object
object Main
{
// Main method
def main(args: Array[String])
{
// Class object
var obj = new Student()
obj.student_name= "Yash"
obj.student_age= 22
obj.set_rollno(59)
// Directly getting the value of variable

println("Student Name: " + obj.student_name)

println("Student Age: " + obj.student_age)
// Through method calling

println("Student Rollno: " + obj.get_rollno)
}
}
Setters
Setter are a technique through which we set the value of variables of a class.
Setting an variable of class is simple it can be done in two ways :-
First if the members of a class are accessible from anywhere. i.e no access modifier
specified.
Example:

// Setting the variable of a class

class Student
{
// Class variables
var student_rollno= 0
}
// Creating object
object Main
{
// Main method
{
// Class object
obj.student_age= 22
obj.student_rollno= 59
println("Student Name: " + obj.student_name)
println("Student Age: " + obj.student_age)
println("Student Rollno: " + obj.student_rollno)
}
}
Output:
Student Name: dikshitha
Student Age: 21
Student Rollno: 031
For security reasons it is not recommended. As accessing the members of class
directly is not a good a method to initiate and change the value as it will allow
others to identify the variable.
Second if the members of a class are defined as private. Initiation of the variables is
done by passing the variable to public method of that class using the object of the
class.
Example:

// Setting the private variable of a class

class Student
{
// Class variables
private var student_rollno= 0
// Class method
def set_roll_no(x: Int)
{
student_rollno= x
}
}
// Creating object
object GFG
{
// Main method
{
// Class object
obj.student_age= 22
//error: variable student_rollno in class

// Student cannot be accessed in Student
//obj.student_rollno= 59
obj.set_roll_no(59)

println("Student Name: "+ obj.student_name)

println("Student Age: "+obj.student_age)
// Through method calling

println("Student Rollno: "+obj.student_rollno)
}
}
43.Using pattern matching, write a function swap that swaps the first two elements of an
array provided its length is atleast two. Refer to the example on List matching for hints.
=>scala> def swap2(arr: Array[Int]): Array[Int] = arr match {

| case Array(first, second, _*) =>
| arr(0) = second
| arr(1) = first
| arr
| case _ => arr
|}
swap2: (arr: Array[Int])Array[Int]
scala> var num = Array(4,3,6,7,9)

num: Array[Int] = Array(4, 3, 6, 7, 9)
scala> swap2(num)
res11: Array[Int] = Array(3, 4, 6, 7, 9)
44.What are some of the key features of Functional Programming?
The key factor of functional programming is "immutable state". In other words, once a variable
is initialized with one value, it cannot be assigned a different value later. This is in contrast with
imperative programming, where it is common to re-assign variables to new values.
A language would be considered a functional programming language if there is no way to

update existing variables. In other words, if all variables are immutable. Other, more technical,
ways to say the same thing:
• All built-in operators and user defined functions are "pure functions"
• All expressions are "referentially transparent"
• "Destructive assignment" is prohibited
A functional style can be used in most (not all) imperative languages. To do so, the programmer
himself must avoid mutable data, since the language would continue to allow it. As a side note,
modern software engineering principles generally advise functional style code when possible,
even in non-functional languages. So if you are a coder in an imperative language, spend time
learning a functional language -- you'll improve your habits and learn a new set of patterns to
apply to solving problems. But in a language not designed for functional programming, the
programmer would find it difficult to accomplish some straightforward tasks in purely functional
terms, looping being a good example. To make functional coding easier, most functional
languages provide features such as:
• Recursion
• Closures
• First class functions
• Higher order functions

• Anonymous functions
• Currying

What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Are Basic Characteristics of Data and How Is Parallel Processing System Different From Distributed System?

Uploaded by

Copyright:

Available Formats

1.

Five characteristics of high quality information are accuracy, completeness, consistency,

2.Outline the challenges faced in big data.

Generating insights in a timely manner

Recruiting and retaining big data talent

Integrating disparate data sources

Securing big data

3.Discuss about coexistence of RDBMS and Big data.

4. Compare and contrast SMP, MMP and Distributed system.

6 .Explain about evolution of Hadoop.

7. What is HDFS and define the following

Fault tolerance and parallel processing:

The Assumptions made are as follows

11.What are the main functions of Map Reduce ?

The original version of MapReduce involved several component daemons, including:

A MapReduce Program consists of three different phases. They are:

13.What are the functions of YARN ?

The different Daemons in YARN are:-

➢ HDFS has a master slave topology, it has two daemons namely:

➢ The MapReduce algorithm contains two important tasks:

The Drawbacks of Hadoop are:

Data pipelining: hadoop doesn’t support data pipelining

• OS support: Both support cross platform.

• Scalability: Both provides scalability.

➢ The differences are:

• Development: Spark is developed in SCALA while Hadoop in java.

18.Spark clusters may be more expensive than Hadoop. Explain why.

As it is part of Apache Open Source there is no software cost.

19.Find out about Spark use cases.

scala> var x= scala.math.pow(scala.math.sqrt(3), 2)

res4: String = crazycrazycrazy

The result is the given string is displayed 3 times continously.

scala> val s = "Welcome"

25.What is Lazy Initialization?

scala> for(i <- 10 to (0, -1)) println(i) scala> for(i<-(0 to 10).reverse)

=>scala> var product = 1 ; for (c <- "Hello") product *= c.toInt ; println(product)

scala> var product:Long = 1 ; for (c <- "Hello") product *= c.toLong ; println(product)

x^n = y^2 if n is even and positive, where y = x^(n / 2).

x^n = x * x^(n – 1) if n is odd and positive.

x^n = 1 / x^(–n) if n is negative.

Don’t use a return statement.

=>scala> def xpown(x: BigDecimal, n: Int): BigDecimal = {

scala> xpown(2, 1024)

scala> xpown(-2, -10)

scala> xpown(-2, 10)

30.A map is like a dictionary. Explain.

Map is a collection of key-value pairs. In other words, it is similar to dictionary(also consists of

scala> var gadgets= Map("Headphone"->500, "Notebook"->50, "Earphones"->300)

scala> val discountedGadegets = for((k, v) <- gadgets) yield (k, v * 0.9)

1. => scala>defwordCounter(text:String)={ | { | for{ | word<-

values less than v, equal to v, and greater than v.

=> scala>defwordCounter(text:String)={ | { | for{ | word<-text.toLowerCase.replaceAll("[^a-

can use this method.

Can be used for ordering (sorting) of strings

35. Explain what is a comprehension in your own words. Give an example of

scala> def sumoflen(strings:Array[String]):Int = {

scala> sumoflen(Array("akshay","aswin","shritej")) res5: Int

.scala> def indexes(str: String) = {

scala> indexes("akshay") res6:

=>scala> class Time(val hrs: Int, val min: Int){

41.Write an object Conversions with methods inchesToCms and milesToKms. Use it to

=>scala> object Conversions{