Digital Tech

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 20

1. Define Cloud Computing?

Cloud computing is a method for delivering information technology (IT) services in


which resources are retrieved from the Internet through web-based tools and
applications, as opposed to a direct connection to a server.

2. What are the uses of Cloud Computing?


Uses of cloud computing:
 Create new apps and services.
 Store, back up, and recover data.
 Host websites and blogs.
 Stream audio and video.

3. What are some of the top companies that provide


cloud services?
 Amazon Web Service (AWS)
 Microsoft Azure.
 Google Cloud Platform.
 Adobe.
 VMware. 

4. Define Artificial Intelligence?


AI is the simulation of human intelligence processes by machines, especially
computer systems. These processes include learning (the acquisition of information
and rules for using the information), reasoning (using the rules to reach approximate
or definite conclusions), and self-correction.
 

5. Define Machine Learning?


Machine learning is a method of data analysis that automates analytical model
building. It is a branch of artificial intelligence based on the idea that systems can
learn from data, identify patterns and make decisions with minimal human
intervention.
 

6. List down the usage of AI and ML?


Both Artificial Intelligence and Machine learning need a large amount of data to
make the analysis.
Here comes the concept of Big Data; which helps us to store and retrieve a large
amount of data.
 

7. Define Deep Learning


Deep Learning is an artificial intelligence function that imitates the workings of the
human brain in processing data and creating patterns for use in decision making.
8. What are the characteristics of Deep Learning?
Deep learning is a subset of machine learning in Artificial Intelligence (AI) that has
networks capable of learning unsupervised from data that is unstructured or
unlabeled.
Also known as Deep Neural Learning or Deep Neural Network.
 

9. List down the applications of Deep Learning?


Applications:
 Automatic speech recognition
 Image recognition
 Customer relationship management
 Mobile advertising
 Image restoration
 

10. Define Big Data Analytics? 


'Big Data is data but with a huge size. Huge in size and yet growing exponentially
with time. In short, such data is so large and complex that none of the traditional data
management tools are able to store it or process it efficiently.
 

11. List down the characteristics of Big Data?


Huge in size and yet growing exponentially with time.
None of the traditional data management tools are able to store it or process it
efficiently.
 

12. Define Cyber Security?


Cybersecurity comprises technologies, processes, and controls that are designed to
protect systems, networks, and data from cyber-attacks.
Effective cybersecurity reduces the risk of cyberattacks.
 

13. What are the types of Cyberattacks?


 Malware attack
 Spoofing                
 Phishing
 

14. Define IoT?


The Internet of Things (IoT) is the network of physical devices, vehicles, home
appliances, and other items embedded with electronics, software, sensors,
actuators, and connectivity which enables these things to connect and exchange
data.
 

15. List down the applications of IoT?


 Smart homes
 Wearable (Smartwatches)
 Manufacturing Industries’
 Transportation
 Agriculture
 Retail Industries
 Healthcare etc.

16. Define AR?


Augmented Reality (AR) is a general term for a collection of technologies used to
blend computer-generated information with the viewer’s natural senses.
 

17. List down the characteristics of AR?


Augmented Reality (AR) refers to deploying virtual images over real-world objects.
The overlay is executed simultaneously with the input received from a camera or
another input device like smart glasses.
 

18. List down the applications of AR?


 Games
 Movies
 Medical
 Advertising Media
 Shopping
 Interior Design etc.

19. What is the definition of Digital technology?


Digital technology includes all types of electronic equipment and applications that
use information in the form of numeric code.
Example: Personal computers, calculators, automobiles, etc.
 
What is Cloud Computing?
It is advance stage technology implemented so that the cloud provides the services
globally as per the user requirements. It provides a method to access several servers
worldwide.
What are the benefits of cloud computing?
The main benefits of cloud computing are:
1. Data backup and storage of data
2. Powerful server capabilities.
3. Incremented productivity.
4. Cost effective and time saving.
Companies like Amazon which owns AWS, Microsoft which owns Azure, VMware
which provides cloud desktop. Google which provides various cloud solutions like
Google Drive, Slides/Docs/ Google Cloud platform etc are example of such.
  What are the Cloud Service Models?  (Not relevant but it would be good if you know and you
tell on your own somewhere in the interview, interviewer may get impressed)
 Infrastructure as a service
 (IaaS) Platform as a service
 (PaaS) Software as a service (SaaS)
  What is Digital Technology?
When TCS will come to your campus they will have a presentation about TCS Digital.
According to TCS Digital Technology are the blend of these five –
 Data
 Cloud
 Intelligence
 Inter connectivity
 Visual Computing
What is artificial Intelligence?
Artificial intelligence (AI) makes it possible for machines to learn from experience,
adjust to new inputs and perform human-like tasks. Most AI examples that you hear
about today – from chess-playing computers to self-driving cars – rely heavily on
deep learning and natural language processing. Using these technologies,
computers can be trained to accomplish specific tasks by processing large amounts
of data and recognizing patterns in the data.
Why is artificial intelligence important?
 AI automates repetitive learning and discovery through data. But AI is different from
hardware-driven, robotic automation. Instead of automating manual tasks, AI
performs frequent, high-volume, computerized tasks reliably and without fatigue. For
this type of automation, human inquiry is still essential to set up the system and ask
the right questions.
 AI adds intelligence to existing products. In most cases, AI will not be sold as an
individual application. Rather, products you already use will be improved with AI
capabilities, much like Siri was added as a feature to a new generation of Apple
products. Automation, conversational platforms, bots and smart machines can be
combined with large amounts of data to improve many technologies at home and in
the workplace, from security intelligence to investment analysis.
 AI adapts through progressive learning algorithms to let the data do the
programming. AI finds structure and regularities in data so that the algorithm
acquires a skill: The algorithm becomes a classifier or a predicator. So, just as the
algorithm can teach itself how to play chess, it can teach itself what product to
recommend next online. And the models adapt when given new data. Back
propagation is an AI technique that allows the model to adjust, through training and
added data, when the first answer is not quite right.
 AI analyzes more and deeper data using neural networks that have many hidden
layers. Building a fraud detection system with five hidden layers was almost
impossible a few years ago. All that has changed with incredible computer power
and big data. You need lots of data to train deep learning models because they learn
directly from the data. The more data you can feed them, the more accurate they
become.
 AI achieves incredible accuracy though deep neural networks – which was
previously impossible. For example, your interactions with Alexa, Google Search and
Google Photos are all based on deep learning – and they keep getting more accurate
the more we use them. In the medical field, AI techniques from deep learning, image
classification and object recognition can now be used to find cancer on MRIs with
the same accuracy as highly trained radiologists.
What is Machine Learning?
Machine learning is an application of artificial intelligence (AI) that provides systems
the ability to automatically learn and improve from experience without being
explicitly programmed. Machine learning focuses on the development of computer
programs that can access data and use it learn for themselves.

The process of learning begins with observations or data, such as examples, direct
experience, or instruction, in order to look for patterns in data and make better
decisions in the future based on the examples that we provide. The primary aim is to
allow the computers learn automatically without human intervention or assistance
and adjust actions accordingly.

What is Big Data?


Big data is a term that describes the large volume of data – both structured and
unstructured – that inundates a business on a day-to-day basis. But it’s not the
amount of data that’s important. It’s what organizations do with the data that
matters. Big data can be analyzed for insights that lead to better decisions and
strategic business moves.

Why Is Big Data Important?


The importance of big data doesn’t revolve around how much data you have, but
what you do with it. You can take data from any source and analyze it to find
answers that enable 1) cost reductions, 2) time reductions, 3) new product
development and optimized offerings, and 4) smart decision making. When you
combine big data with high-powered analytics, you can accomplish business-related
tasks such as:
 Determining root causes of failures, issues and defects in near-real time.
 Generating coupons at the point of sale based on the customer’s buying habits.
 Recalculating entire risk portfolios in minutes.
 Detecting fraudulent behavior before it affects your organization.

What is deep Learning?


Deep learning is a machine learning technique that teaches computers to do what
comes naturally to humans: learn by example. Deep learning is a key technology
behind driverless cars, enabling them to recognize a stop sign, or to distinguish a
pedestrian from a lamppost. It is the key to voice control in consumer devices like
phones, tablets, TVs, and hands-free speakers. Deep learning is getting lots of
attention lately and for good reason. It’s achieving results that were not possible
before.

What is cyber security?


Cybersecurity is the protection of internet-connected systems, including hardware,
software and data, from cyberattacks.
In a computing context, security comprises cybersecurity and physical security —
both are used by enterprises to protect against unauthorized access to data centers
and other computerized systems. Information security, which is designed to
maintain the confidentiality, integrity and availability of data, is a subset of
cybersecurity.
What are different types of cyber attacks? (Not
Important but good to know)
Malware 

 If you’ve ever seen an antivirus alert pop up on your screen, or if you’ve


mistakenly clicked a malicious email attachment, then you’ve had a close call
with malware. Attackers love to use malware to gain a foothold in users’
computers—and, consequently, the offices they work in—because it can be so
effective.

 “Malware” refers to various forms of harmful software, such as viruses


and ransomware. Once malware is in your computer, it can wreak all sorts of
havoc, from taking control of your machine, to monitoring your actions and
keystrokes, to silently sending all sorts of confidential data from your
computer or network to the attacker’s home base.

 Attackers will use a variety of methods to get malware into your computer, but
at some stage it often requires the user to take an action to install the
malware. This can include clicking a link to download a file, or opening an
attachment that may look harmless (like a Word document or PDF
attachment), but actually has a malware installer hidden within. 
Phishing 

 Of course, chances are you wouldn’t just open a random attachment or click
on a link in any email that comes your way—there has to be a compelling
reason for you to take action. Attackers know this, too. When an attacker
wants you to install malware or divulge sensitive information, they often turn
to phishing tactics, or pretending to be someone or something else to get you
to take an action you normally wouldn’t. Since they rely on human curiosity
and impulses, phishing attacks can be difficult to stop.

 In a phishing attack, an attacker may send you an email that appears to be


from someone you trust, like your boss or a company you do business with.
The email will seem legitimate, and it will have some urgency to it (e.g.
fraudulent activity has been detected on your account). In the email, there will
be an attachment to open or a link to click. Upon opening the malicious
attachment, you’ll thereby install malware in your computer. If you click the
link, it may send you to a legitimate-looking website that asks for you to log in
to access an important file—except the website is actually a trap used to
capture your credentials when you try to log in.

 In order to combat phishing attempts, understanding the importance of


verifying email senders and attachments/links is essential. 
SQL Injection Attack 

 SQL (pronounced “sequel”) stands for structured query language; it’s a


programming language used to communicate with databases. Many of the
servers that store critical data for websites and services use SQL to manage
the data in their databases. A SQL injection attack specifically targets this
kind of server, using malicious code to get the server to divulge information it
normally wouldn’t. This is especially problematic if the server stores private
customer information from the website, such as credit card numbers,
usernames and passwords (credentials), or other personally identifiable
information, which are tempting and lucrative targets for an attacker.

 An SQL injection attack works by exploiting any one of the known SQL
vulnerabilities that allow the SQL server to run malicious code. For example, if
a SQL server is vulnerable to an injection attack, it may be possible for an
attacker to go to a website’s search box and type in code that would force the
site’s SQL server to dump all of its stored usernames and passwords for the
site. 

What is IOT i.e. Internet of Things?


Internet of Things (IoT) is an ecosystem of connected physical objects that are
accessible through the internet. The ‘thing’ in IoT could be a person with a heart
monitor or an automobile with built-in-sensors, i.e. objects that have been assigned
an IP address and have the ability to collect and transfer data over a network without
manual assistance or intervention. The embedded technology in the objects helps
them to interact with internal states or the external environment, which in turn
affects the decisions taken.

 Applications include –
 Smart homes
 Wearable (Smart watches)
 Manufacturing Industries’
 Transportation
 Agriculture
 Retail Industries
 Healthcare etc.
AR vs VR vs MR
A lot of people use the term “virtual reality” for different types of “Immersive
Experiences”. This includes augmented and mixed reality as well as 360° video.
Although they each offer alternate or altered reality experiences, they are quite
different and too frequently are the technologies confused with one another.

1. Virtual Reality
2. Augmented Reality
3. Mixed Reality
4. 360° Video
Virtual  Reality
In its simplest form, Virtual Reality (VR) transposes the user to an alternate world.
The real world which the user is in, does not exist. This is done through live video or
computer generated graphics and uses closed head-mounted displays (HMD’s) that
completely blind the user from seeing anything in the real-world. The Oculus Rift,
Playstation VR, HTC Vive, Google Daydream, Gear VR are examples of HMD’s. 
 * The differences in HMD’s will be explained in an upcoming article from BluFocus.
Non-Interactive VR:
With non-interactive VR applications and content, the user is a spectator in another
world. They sit back and can look anywhere as if they were there. But they cannot
interact (other than point and click). They are still fully immersed though and with
added components such as immersive audio, the user is fully engulfed into a
different reality thus altering the user’s senses, etc.
Types of Non-Interactive VR:
Experience — These are experiences which allow the user to feel actively involved
and engaged but still in an entirely passive role.
 Examples of Experience VR:  Oculus Dreamdeck, G2A Land, Face Your Fears,
Everest VR
 Storytelling and Story Enabling — Examples are movies, short stories and
narrative pieces. This is another important subject regarding the art of
storytelling vs. story enabling with VR. Both deal with a certain plot the author
has introduced, but while storytelling gives no control to the user, story
enabling allows the user to interact with the story; thus giving them choices
and some amounts of freedom to engage.
 It is argued that this type of “VR Cinema” is yet to be discovered because
current storytellers are unable to see beyond conventional cinematic
processes which dictate every part of how the story is told. Creators of
cinema content for the virtual realm are yet to be born.
Interactive VR

 This type of experience gives the user interactive abilities while in their
alternate world. Users can fully immerse themselves in their alternate realities
by moving forward or backwards, sideways, up and down. This immersion
expands the user’s senses and they can also interact with objects by holding,
throwing, pushing and pulling.
 Most of this is not real-world but instead computer generated using high
powered game engine PC’s that allow for real time rendering.
Augmented Reality
 This adds to our reality. It supplements the real world with digital objects. It
does not take us elsewhere but instead enhances our present. It literally
“augments” our reality instead of blocking out the world.
 With AR, computer generated graphics overlay the current reality and provide
enhancing data that can be used regularly in day-to-day life. Examples of AR
have been seen in movies for quite some time such as The Terminator,
Minority Report and others.
 The digital object overlays can be text data, 3D objects or video such as with
Google Glass notifications or on the head-up displays (HUD) in cars which
provide valuable information to a driver.
 As Tim Cook, CEO of Apple said, “AR allows individuals to be present in the
world but hopefully allows an improvement on what’s happening presently”.
Mixed  Reality
 MR is a mixture of VR and AR where virtual objects interact with real world
objects. An example would be if you placed a virtual object (a cup for
example) onto a real-world object (a table). The cup would remain in that
same position as you walk or change locations. Basically, the virtual object
attaches itself to the real-world object and it becomes part the real-world.
Take a look at the Magic Beans demo or Bridge Engine Demo for awesome
examples.
 Examples of Mixed Reality: Microsoft HoloLens, ODG headsets, Google
Glass, Magic Beans Demo
360°  Content
 360° content can be easily created using the plethora of 360° camera’s on the
market today. Typically, this is a 2 step process where multiple cameras or
lenses capture an image from a different angle and are then stitched together
to create a single image that can be projected into a 360° environment. There
are many challenges in filming and stitching 360° video that will be discussed
in a later article.
 360° content utilizes “live” video, or even pre-rendered computer generated
graphics. It can be of concerts, car rides, drone videos, and so much more.
Is 360° content VR?
(The main argument is that if the user views 360° content within a VR headset, like
the Oculus Rift, is it not technically VR because the user is then immersed into an
alternate world with no visibility of the “real world”?)
No. This isn’t VR.
Why?
 Well, because true VR utilizes sensors to track your head movements giving
you the illusion that you are in this alternate world. When your head moves,
the view of the world moves as well, affecting both your subconscious and
conscious mind. It also tracks your position in space and requires highly
powered computers with head-mounted displays such as the Rift, Vive or PS
VR to give precise displays of individual frames that accurately match the
head’s position.
 With 360° content, you are not fully immersed. You can look up, down and
around but you can’t move forward. And frame rates are no where near in
comparison therefore not tricking your brain into really believing you are in
another world.
 360° content at its most basic can be viewed without a headset on
applications such as YouTube, Facebook posts or through websites requiring
mouse movements to navigate.
Data Science and Machine Learning

What are the differences between supervised and unsupervised learning?

Supervised Learning Unsupervised Learning

 Uses known and labeled data


as input  Uses unlabeled data as input
 Supervised learning has a  Unsupervised learning has no
feedback mechanism  feedback mechanism 
 The most commonly used  The most commonly used
supervised learning algorithms unsupervised learning algorithms are
are decision trees, logistic k-means clustering, hierarchical
regression, and support vector clustering, and apriori algorithm
machine

Explain the steps in making a decision tree.


1. Take the entire data set as input
2. Calculate entropy of the target variable, as well as the predictor attributes
3. Calculate your information gain of all attributes (we gain information on
sorting different objects from each other)
4. Choose the attribute with the highest information gain as the root node 
5. Repeat the same procedure on every branch until the decision node of each
branch is finalized

How do you build a random forest model?


A random forest is built up of a number of decision trees. If you split the data into
different packages and make a decision tree in each of the different groups of data,
the random forest brings all those trees together.
Steps to build a random forest model:
1. Randomly select 'k' features from a total of 'm' features where k << m
2. Among the 'k' features, calculate the node D using the best split point
3. Split the node into daughter nodes using the best split
4. Repeat steps two and three until leaf nodes are finalized 
5. Build forest by repeating steps one to four for 'n' times to create 'n' number of
trees 
How can you avoid overfitting your model?
Overfitting refers to a model that is only set for a very small amount of data and
ignores the bigger picture. There are three main methods to avoid overfitting:
1. Keep the model simple—take fewer variables into account, thereby removing
some of the noise in the training data
2. Use cross-validation techniques, such as k folds cross-validation 
3. Use regularization techniques, such as LASSO, that penalize certain model
parameters if they're likely to cause overfitting

Differentiate between univariate, bivariate, and multivariate analysis.


Univariate
Univariate data contains only one variable. The purpose of the univariate analysis is
to describe the data and find patterns that exist within it. 
Bivariate
Bivariate data involves two different variables. The analysis of this type of data deals
with causes and relationships and the analysis is done to determine the relationship
between the two variables.
Multivariate
Multivariate data involves three or more variables, it is categorized under
multivariate. It is similar to a bivariate but contains more than one dependent
variable.

You are given a data set consisting of variables with more than 30 percent
missing values. How will you deal with them?
The following are ways to handle missing data values:
If the data set is large, we can just simply remove the rows with missing data values.
It is the quickest way; we use the rest of the data to predict the values.
For smaller data sets, we can substitute missing values with the mean or average of
the rest of the data using the pandas' data frame in python. There are different ways
to do so, such as df.mean(), df.fillna(mean).

What are dimensionality reduction and its benefits?


Dimensionality reduction refers to the process of converting a data set with vast
dimensions into data with fewer dimensions (fields) to convey similar information
concisely. 
This reduction helps in compressing data and reducing storage space. It also
reduces computation time as fewer dimensions lead to less computing. It removes
redundant features; for example, there's no point in storing a value in two different
units (meters and inches). 

What are recommender systems?


A recommender system predicts what a user would rate a specific product based on
their preferences. It can be split into two different areas:
Collaborative Filtering
As an example, Last.fm recommends tracks that other users with similar interests
play often. This is also commonly seen on Amazon after making a purchase;
customers may notice the following message accompanied by product
recommendations: "Users who bought this also bought…"
Content-based Filtering
As an example: Pandora uses the properties of a song to recommend music with
similar properties. Here, we look at content, instead of looking at who else is
listening to music.

How can you select k for k-means? 


We use the elbow method to select k for k-means clustering. The idea of the elbow
method is to run k-means clustering on the data set where 'k' is the number of
clusters.
Within the sum of squares (WSS), it is defined as the sum of the squared distance
between each member of the cluster and its centroid. 

How can you calculate accuracy using a confusion matrix?


The formula for accuracy is:
Accuracy = (True Positive + True Negative) / Total Observations

Which of the following machine learning algorithms can be used for


inputting missing values of both categorical and continuous variables?
 K-means clustering
 Linear regression 
 K-NN (k-nearest neighbor)
 Decision trees 
The K nearest neighbor algorithm can be used because it can compute the nearest
neighbor and if it doesn't have a value, it just computes the nearest neighbor based
on all the other features. 
When you're dealing with K-means clustering or linear regression, you need to do that
in your pre-processing, otherwise, they'll crash. Decision trees also have the same
problem, although there is some variance.

Explain cross-validation.
Cross-validation is a model validation technique for evaluating how the outcomes of
a statistical analysis will generalize to an independent data set. It is mainly used in
backgrounds where the objective is to forecast and one wants to estimate how
accurately a model will accomplish in practice. 
The goal of cross-validation is to term a data set to test the model in the training
phase (i.e. validation data set) to limit problems like overfitting and gain insight into
how the model will generalize to an independent data set.
Cross Validation is a model performance improvement technique. This is a Statistics
based approach in which the model gets to train and tested with rotation within the
training dataset so that model can perform well for unknown or testing data.
In this the training data are split into different groups and in rotation those groups are
used for validation of model performance.

What are the drawbacks of the linear model?


 The assumption of linearity of the errors
 It can't be used for count outcomes or binary outcomes
 There are overfitting problems that it can't solve

What are eigenvalue and eigenvector?


Eigenvalues are the directions along which a particular linear transformation acts by
flipping, compressing, or stretching.
Eigenvectors are for understanding linear transformations. In data analysis, we
usually calculate the eigenvectors for a correlation or covariance matrix. 
What is Linear Regression?
The question can also be phrased as to why linear regression is not a very effective
algorithm.
Linear Regression is a mathematical relationship between an independent and
dependent variable. The relationship is a direct proportion, relation making it the most
simple relationship between the variables.
Y = mX+c
What is Logistic Regression?
Logistic Regression is the Binary Classification. It is a statistical model that uses the
logit function on the top of the probability to give 0 or 1 as a result.
Difference between Regression and Classification?
The major difference between Regression and Classification is that Regression results in
a continuous quantitative value while Classification is predicting the discrete labels.
However, there is no clear line that draws the difference between the two. We have a few
properties of both Regression and Classification. These are as follows:
Regression
 Regression predicts the quantity.
 We can have discrete as well as continuous values as input for regression.
 If input data are ordered with respect to the time it becomes time series
forecasting.
Classification
 The Classification problem for two classes is known as Binary Classification.
 Classification can be split into Multi- Class Classification or Multi-Label
Classification.
 We focus more on accuracy in Classification while we focus more on the error
term in Regression.
What is Natural Language Processing?  State some real life example of NLP.
Natural Language Processing is a branch of Artificial Intelligence that deals with the
conversation of Human Language to Machine Understandable language so that it can be
processed by ML models.
Examples – NLP has so many practical applications including chatbots, google
translate,  and many other real time applications like Alexa.
Some of the other applications of NLP are in text completion, text suggestions, and
sentence correction.

What do you understand by Confusion Matrix ? How does Confusion Matrix help in
evaluating model performance?
Confusion Matrix is a matrix to find the performance of a Classification model. It is in
general a 2×2 matrix with one side as prediction and the other side as actual values.
We can find different accuracy measures using a confusion matrix. These parameters
are Accuracy, Recall, Precision, F1 Score, and Specificity.
What is the significance of Sampling?
For analyzing the data we cannot proceed with the whole volume at once for large
datasets. We need to take some samples from the data which can represent the whole
population. While making a sample out of complete data, we should take that data which
can be a true representative of the whole data set.
What are Type 1 and Type 2 errors? In which scenarios the Type 1 and Type 2 errors
become significant?
Rejection of True Null Hypothesis is known as a Type 1 error. In simple terms, False
Positive are known as a Type 1 Error.
Not rejecting the False Null Hypothesis is known as a Type 2 error. False Negatives are
known as a Type 2 error.
Type 1 Error is significant where the importance of being negative becomes significant.
For example – If a man is not suffering from a particular disease marked as positive for
that infection. The medications given to him might damage his organs.
While Type 2 Error is significant in cases where the importance of being positive
becomes important. For example – The alarm has to be raised in case of burglary in a
bank. But a system identifies it as a False case that won’t raise the alarm on time
resulting in a heavy loss.
What are the conditions for Overfitting and Underfitting?
In Overfitting the model performs well for the training data, but for any new data it fails
to provide output. For Underfitting the model is very simple and not able to identify the
correct relationship. Following are the bias and variance conditions.
Overfitting – Low bias and High Variance results in overfitted model. Decision tree is
more prone to Overfitting.
Underfitting – High bias and Low Variance. Such model doesn’t perform well on test
data also. For example – Linear Regression is more prone to Underfitting.
Describe Decision tree Algorithm?
Decision tree is a Supervised Machine Learning approach. It uses the predetermined
decisions data to prepare a model based on previous output. It follows a system to
identify the pattern and predict the classes or output variable from previous output .
What is Ensemble Learning. Give an important example of Ensemble Learning?
Ensemble Learning is a process of accumulating multiple models to form a better
prediction model. In Ensemble Learning the performance of the individual model
contributes to the overall development in every step. There are two common techniques
in this – Bagging and Boosting.
Bagging – In this the data set is split to perform parallel processing of models and
results are accumulated based on performance to achieve better accuracy.
Boosting – This is a sequential technique in which a result from one model is passed to
another model to reduce error at every step making it a better performance model.
The most important example of Ensemble Learning is Random Forest Classifier. It takes
multiple Decision Tree combined to form a better performance Random Forest model.
 Explain Naive Bayes Classifier and the principle on which it works?
Naive Bayes Classifier algorithm is a probabilistic model. This model works on the Bayes
Theorem principle.  The accuracy of Naive Bayes can be increased significantly by
combining it with other kernel functions for making a perfect Classifier.
Bayes Theorem –  This is a theorem which explains the conditional probability. If we
need to identify the probability of occurrence of Event A provided the Event B has
already occurred such cases are known as Conditional Probability.
What is Deep Learning ?
Deep Learning is the branch of Machine Learning and AI which tries to achieve better
accuracy and able to achieve complex models. Deep Learning models are similar to
human brains like structure with input layer, hidden layer, activation function and output
layer designed in a fashion to give a human brain like structure.
Deep Learning have so many real time applications –
Self Driving Cars
Computer Vision and Image Processing
Real Time Chat bots
Home Automation Systems

What is the difference between data science and


big data?
The common differences between data science and big data are –
Big Data Data Science

Large collection of data sets that cannot An interdisciplinary field that includes
be stored in a traditional system analytical aspects, statistics, data mining,
machine learning, etc.
Popular in the field of communication, Common application are digital advertising,
purchase and sale of goods, financial web research, recommendation systems
services, and educational sector (Netflix, Amazon, Facebook), speech and
handwriting recognition applications
Big Data solves problems related to data Data Science uses machine learning
management and handling, and analyze algorithms and statistical methods to obtain
insights resulting in informed decision accurate predictions from raw data
making
Popular tools are Hadoop, Spark, Flink, Popular tools are Python, R, SAS, SQL, etc.
NoSQL, Hive, etc.

What are Interpolation and Extrapolation?


Interpolation – This is the method to guess data points between data sets. It is a
prediction between the given data points.
Extrapolation – This is the method to guess data point beyond data sets. It is a
prediction beyond given data points.
What is the difference between ‘expected value’
and ‘average value’?
When it comes to functionality, there is no difference between the two. However,
they are used in different situations.
An expected value usually reflects random variables, while the average value reflects
the population sample.
 What is the importance of statistics in data
science?
Statistics help data scientists to get a better idea of a customer’s expectations.
Using statistical methods, data Scientists can acquire knowledge about consumer
interest, behavior, engagement, retention, etc. It also helps to build robust data
models to validate certain inferences and predictions.

What is association analysis? Where is it used?


Ans. Association analysis is the task of uncovering relationships among data. It is
used to understand how the data items are associated with each other.
What is an API? What are APIs used for?
Ans. API stands for Application Program Interface and is a set of routines, protocols,
and tools for building software applications.
With API, it is easier to develop software applications.
What is market basket analysis?
Ans. Market Basket Analysis is a modeling technique based upon the theory that if
you buy a certain group of items, you are more (or less) likely to buy another group of
items.
What is the goal of A/B Testing?
Ans. A/B testing is a comparative study, where two or more variants of a page are
presented before random users and their feedback is statistically analyzed to check
which variation performs better.
Explain the purpose of group functions in SQL. Cite
certain examples of group functions.
Ans. Group functions provide summary statistics of a data set. Some examples of
group functions are –
a) COUNT
b) MAX
c) MIN
d) AVG
e) SUM
f) DISTINCT
What is Root Cause Analysis?
Ans. Root Cause is defined as a fundamental failure of a process. To analyze such
issues, a systematic approach has been devised that is known as Root Cause
Analysis (RCA). This method addresses a problem or an accident and gets to its
“root cause”.
What is the difference between a Validation Set
and a Test Set?
Ans. The validation set is used to minimize overfitting. This is used in parameter
selection, which means that it helps to
verify any accuracy improvement over the training data set. Test Set is used to test
and evaluate the performance of a trained Machine Learning model.
What packages are used for data mining in Python
and R?
Ans. There are various packages in Python and R:
Python –Pandas, NLTK, Matplotlib, and Scikit-learn are some of them.
R –Forecast and GGPlot are some of the packages.
What is Pattern Recognition?
Ans. Pattern recognition is the process of data classification that includes pattern
recognition and identification of data regularities. This methodology involves the
extensive use of machine learning algorithms.
What is Data Science?
Data Science is a combination of algorithms, tools, and machine learning
technique which helps you to find common hidden patterns from the given raw
data.
What is Power Analysis?
The power analysis is an integral part of the experimental design. It helps you to
determine the sample size requires to find out the effect of a given size from a
cause with a specific level of assurance. It also allows you to deploy a particular
probability in a sample size constraint.
Explain the difference between Data Science and Data Analytics
Data Scientists need to slice data to extract valuable insights that a data analyst
can apply to real-world business scenarios. The main difference between the
two is that the data scientists have more technical knowledge then business
analyst. Moreover, they don't need an understanding of the business required
for data visualization.
What is reinforcement learning?
Reinforcement Learning is a learning mechanism about how to map situations
to actions. The end result should help you to increase the binary reward signal.
In this method, a learner is not told which action to take but instead must
discover which action offers a maximum reward. As this method based on the
reward/penalty mechanism.
What is a recall?
A recall is a ratio of the true positive rate against the actual positive rate. It
ranges from 0 to 1.
What is bias in Data Science?
Bias is a type of error that occurs in a Data Science model because of using an
algorithm that is not strong enough to capture the underlying patterns or trends
that exist in the data. In other words, this error occurs when the data is too
complicated for the algorithm to understand, so it ends up building a model that
makes simple assumptions. This leads to lower accuracy because of underfitting.
Algorithms that can lead to high bias are linear regression, logistic regression,
etc.
What is pruning in a decision tree algorithm?
Pruning a decision tree is the process of removing the sections of the tree that
are not necessary or are redundant. Pruning leads to a smaller decision tree,
which performs better and gives higher accuracy and speed.
Explain selection bias.
Selection bias is the bias that occurs during the sampling of data. This kind of
bias occurs when a sample is not representative of the population, which is
going to be analyzed in a statistical study.
What is ROC curve?
It stands for Receiver Operating Characteristic. It is basically a plot between a
true positive rate and a false positive rate, and it helps us to find out the right
tradeoff between the true positive rate and the false positive rate for different
probability thresholds of the predicted values. So, the closer the curve to the
upper left corner, the better the model is. In other words, whichever curve has
greater area under it that would be the better model.
Explain SVM algorithm in detail.
SVM stands for support vector machine, it is a supervised machine learning
algorithm which can be used for both Regression and Classification. If you have
n features in your training data set, SVM tries to plot it in n-dimensional space
with the value of each feature being the value of a particular coordinate. SVM
uses hyperplanes to separate out different classes based on the provided kernel
function.
What Is the Difference Between Epoch, Batch, and Iteration in
Deep Learning?
 Epoch – Represents one iteration over the entire dataset (everything put
into the training model).
 Batch – Refers to when we cannot pass the entire dataset into the neural
network at once, so we divide the dataset into several batches.
 Iteration – if we have 10,000 images as data and a batch size of 200. then
an epoch should run 50 iterations (10,000 divided by 50).
How are KNN and K-means clustering different?
Firstly, KNN is a supervised learning algorithm. In order to train this
algorithm, we require labeled data. K-means is an unsupervised learning
algorithm that looks for patterns that are intrinsic to the data. The K in
KNN is the number of nearest data points. On the contrary, the K in K-
means specify the number of centroids.

You might also like