Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Big Data Questions

1. What is meant by Big Data?


a. Big data is data that is too ‘big’ to be processed using traditional methods because of
either its sheer size, its high rate of production or its diversity (it can be structured,
semi-structured or unstructured)
2. Name the three defining characteristics of Big Data.
a. Velocity, Volume and Variety
3. Explain the meaning of the Big Data characteristic known as volume.
a. the data that needs to be analysed is too big to fit on a single server
4. Explain how a Big Data processing system differs from a traditional data processing system.
a. Data is distributed across multiple servers in larger block sizes than a traditional
block size. The program to handle the data must also work across multiple devices
5. What is meant by a distributed file system?
a. this system is one in which the blocks of individual files are spread across multiple
servers
6. What is meant by fault-tolerant?
a. This means that if one device that is handling data fails, the rest of the system
continues running as normal, and the failed device can simply be replaced. the
system never relies on a single specific device, because it is very important that data
handling does not stop
7. Explain the meaning of the Big Data characteristic known as variety.
a. data can appear in many different forms (it can be structured, semi-structured or
unstructured) meaning that each piece of data is not necessarily handled in the same
way as the previous
8. Explain the meaning of the Big Data characteristic known as velocity.
a. the velocity is the rate at which data is ‘in motion’ or being created
9. Explain how machine learning is used in Big Data systems to leverage value in stored
datasets. Give real examples of two different types of machine learning Big Data systems.
a. Machine learning is used to find patterns in data. Machine learning can look at data
and graph a line of best fit, such as when measuring temperature against growth
rate. It can also be used to predict future outcomes, such as what video is most likely
to be viewed by a single person based on their previous interests
10. Explain what is meant by
a. Immutability
i. the state of a data structure can not be changed once it is defined
b. Statelessness
i. functions will always return the same value whenever they are run with the
same arguments
c. higher-order functions.
i. can be passed to another function as an argument
11. What features of functional programming make it easier to
a. write correct code
i. it is easy to see which parts of the code are independent and which parts
rely on other functions
b. distribute code to run across more than one machine?

i. independent parts of the code can be run in parallel across different devices
Other Questions
1.
2.
3.
4.

5.
a. The data has a large volume, a high velocity or a large amount of variety
b. Functional programming
c. it is easy to understand what parts of the code are independent of each other, and it
is possible to run independent code in parallel on separate machines
6.
a. CREATE TABLE Booking (

BookingID int ,

ActName string,

StageName string,

Day string,

StartTime,

PRIMARY KEY(BookingID),

FOREIGN KEY(ActName) REFERENCES Act(Name),

FOREIGN KEY(StageName) REFERENCES Stage(Name),

);

b. BookingID
c. ActName, it references the primary key Name from the Act table
d. There is no redundant data, there is a primary key in each table, all data is atomic, no
partial dependencies, no many to many relationships, no non key dependencies
e. one many from stage to booking

one many from act to booking

one many from agent to act

f. INSERT INTO Stage(Name, StageType, Curfew)

VALUES (‘Trapezoid’, ‘Comedy’,’11 pm’)

g. DELETE FROM Acts WHERE ActType = ‘fire eater’


h. SELECT A.Name, A.ActType, B.Day

FROM Bookings B, Act A

WHERE A.Agent = ‘Paul Scott’

GROUP BY Day

ORDER BY Day DESC


i. Create a new table called restrictions that links each act type to the stage that it
plays on, then instead of manually entering the stage name when creating a booking,
get the information from looking at the act type through looking at the act
j. A Client - server database system should be used, allowing the server to handle
problems such as concurrent access
7.
a. See attached image
b. the quantity is too large to handle on a single server
c. The data is being created or needs to be analysed at a high velocity

The data has a wide variety, appearing as different types in structured, unstructured
and semi-structured formats

d. Data types cannot change once defined. It is important that data types remain the
same so that the data can be grouped and analysed more easily, so a paradigm that
uses immutable data is very useful

You might also like