Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 3

1) What is Big Data?

It describes the large volume of Data both Structured and Unstructured. The term Big Data
refers to simply use of predictive analytics, user behavior analytics and other advanced data
analytics methods.
It extracts value from data and seldom to a particular size to the data set. The challenge
includes capture, storage, search, sharing, transfer, analysis, creation.

2) Where does Big Data come from?


There are three sources of Big Data:
a. Social Data: It comes from the social media channel’s insights on consumer behavior.
b. Machine Data: It consists of real-time data generated from sensors and weblogs. It
tracks user behavior online.
c. Transaction Data: It generated by large retailers and B2B Companies frequent basis.

3) What is machine learning?


ML is computer-based techniques that seek to extract knowledge from large amounts of
data without making any assumptions on the data’s underlying probability distribution. ML
requires massive amounts of data (Big Data) for “training”. The data must be clean and free
of biases and spurious data before input to ML.

4) What are the different types of machine learning?


a. Supervised ML – It is the Machine learning that makes use of labeled training data.
It is the process of training an algorithm to take a set of inputs X and find a model that best
relates them to the output Y.
b. Unsupervised ML – Machine learning that does not make use of labeled training data. The
ML program has to discover structure within the data themselves.

5) What is ‘training set’ and ‘test set’ in a machine learning model? How much data will you
allocate for your training, validation, and test sets?
Training dataset – to identify relationships between inputs and outputs based on historical
patterns in the data.
Test dataset – used to test the model’s ability to predict well on new data.
The training set is examples given to the model to analyze and learn. 70% of the total data is
typically taken as the training dataset. This is labeled data used to train the model.
The test set is used to test the accuracy of the hypothesis generated by the model.
Remaining 30% is taken as testing dataset. We test without labeled data and then verify
results with labels.

6) What are the five major aspects in data processing?


• Capture – refers to how the data are collected and transformed into a format that can be
used by the analytical process.
• Curation – refers to the process of ensuring data quality and accuracy through a data
cleaning exercise.
• Storage – refers to how the data will be recorded, archived, and accessed and the
underlying database design.
• Search – refers to how to query data.
• Transfer – refers to how the data will move from the underlying data source or storage
location to the underlying analytical tool.

7) What is distributed ledger technology and consensus mechanism?


It is an efficient means to create, exchange, and track ownership of financial assets on a
peer-to-peer basis. Entries are recorded, stored, and distributed across a network of
participants so that each participant has a matching copy of the digital database.
Basic elements of a DLT network include a digital ledger, a consensus mechanism used to
confirm new entries, and a participant network. Consensus mechanism supports the process
by which the computer entities (or nodes) in a network agree on a common state of the
ledger.

8) What are the benefits of using distributed ledger technology in finance?


Benefits include greater accuracy, transparency, and security in record keeping; faster
transfer of ownership; and peer-to-peer interactions. The records are considered
immutable, or unchangeable, yet they are transparent and accessible to network
participants on a near-real-time basis.

9) Discuss the mechanism of blockchain and its application of distributed ledger technology.
Blockchain is a type of digital ledger in which information is recorded sequentially within
blocks that are then linked or “chained” together and secured using cryptographic methods.
Each block contains a grouping of transactions (or entries) and a secure link (known as a
hash) to the previous block. New transactions are inserted into the chain only after
validation via a consensus mechanism in which authorized members agree on the
transaction and the preceding order, or history, in which previous transactions have
occurred.
The consensus mechanism used to verify a transaction includes a cryptographic problem
that must be solved by some computers on the network (known as miners) each time a
transaction takes place. The process to update the blockchain can require substantial
amounts of computing power, making it very difficult and extremely expensive for an
individual third party to manipulate historical data.

You might also like