Data Mining Tools and Tech 06.07.2024

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 18

My bills are as follows. So books. I will.

Record these books Data Mining Concepts and


Techniques, Foundation and Camber and probably this will the main book textbook for this
course. Then introduction to data mining by Stand back from Kumar. Another nice book is An
Induction Surgical Learning with Addictions in Python by. Ohh, we can have tea and Taylor,
OK. Then handsome machine learning cycle learn keras and tensor. So this is for Python those
who want to learn Python. The nice book. OK. And do you want to learn or? You can follow
this. Regarding requirement for programming. So you're assignments will require programming
exercise. OK, but I'm not. Ohh I think on any specific language if you know our or Python or
let's say power BI something you can work on the assignment. OK, so from my side there is no
restriction on the language. Thank you. But definitely in this course we will require. Some
programming skills to. What have been the data sets and like real life problems? If I have shared
with you. Fighting videos OK from YouTube videos which are on Python And data science. So
first one is on installation of Python so. Have you checked that? Yes, yes, yes, yes, yes, yes. OK,
I hope all of you have installed Python. Yes, yes. OK. OK, so. And handle some case studies,
some data sets. I will share whenever required. OK. So these are the modules. So introduction to
data mining, data mining tools, data mining functionalities and applications. They think they're
understanding. Dictation for data mining. Supervising unsupervised learning. OK decision trees.
Until then, network nice base classifier. And classical evaluation and improvement techniques.
OK, this will cover over 8 weeks. So we'll start with the first module, Introduction to Data
mining, Data mining Tools, functionalities and applications. OK. So I will start with the question
why we require determining, OK, why you are studying this course? What is the name? Anyone.
I think there is a huge data and we need to retrieve some important part of it, isn't? The
information from the data. Yeah, but we actually before performing any function on the data
itself, we have to first of all know how to get the data and exactly get the data that we need. So
finding patterns in large data sets basically. OK, some of you told the large. Because that. He
told you require any information. You told practically anything else. Professor. This information
area, even a single click is important for a company or organization. So to track the pattern and
to track the customer behavior, we need to sample data to support our hypothesis or which will
help us in. Data we have so current stage in the current scenario, OK. We have enough capacity
to collect data and store data. That is not your issue. That is not determining determining
something else. So are you telling that we have to shoot data, raw data into information? Yes, we
can make sense of the data and make use of the data by telling them. Yes we have lot of raw data
of your data we are collecting from several devices, sensors and all. OK. But we want to find
some pattern OK or some knowledge. The only gain knowledge from their details and what is
inside. What is the inference you can draw from the data? OK, so. Why do you? In the remote.
OK, there is automatic data collection tools database systems where comprised society so. All
this hardware capabilities are there now nowadays, OK, And growth of many application areas,
OK. Then this has led to explosive growth, the better from TB to petabyte. OK. And major focus
are like business from in the field of business where you come out from the from stock. OK, find
your remote sensing burnt comedy scientific thing simulation. OK, these are all generating data
society and everyone who from social media you have new YouTube. OK, so we are drowning
in an ocean of data but starving for knowledge. OK, so we are collecting one of data. OK, so like
you're telling mouse click and all Yeah, those things are being collected, but. Yeah. What is the
hidden knowledge or pattern that we want to? Find that that is objective. OK, what is solution?
Solution is we have to apply data mining technique. OK. Ohh, Internet or even. Have you heard
about this? No, no. Have you heard about IOP? Yes, yes, yes. Internet. What is the meaning?
Controlling any device, Let's suppose simple light bulb with our mobile phone or ecosystem.
Ohh, like where in the machine is the machine? Yes, actually all these sensors that are deployed,
they send data, they transmit data. So create a remote like say a solar panel is. Oh oh legs, your
own car please. All of these devices are emitting data, and they emit data in the form of events.
Various protocols and all of this data is compiled and, you know, analyzed and visualized at the
central bank. And politics is done. OK, So what is happening? All your devices are now
connected with the Internet. OK, so they create a network to Internet of Things? Yeah. So one
question, can you just move it to the previous slide once data mining slide? Hmm. So, uh, so
you're saying that, uh, we are finding insights the winding techniques, right? Yeah, all the data.
So there are some techniques for supervisor and supervised learning in machine learning. So how
is data mining different from them? The companies overall, the overall umbrella, OK.
Techniques you will study, Yeah, yes. Basically, yeah, yeah. So they come under the data
mining, yes. OK, I will explain you this thing to where the supervised learning and those things
come under determining, OK, today only you will see these things, OK? OK, so Internet event.
So Internet even means there are several things, Internet of content things there are people,
Internet of Things, Internet of location. You have now hotels are there OK, so like Wikipedia
and several photos, they write blogs and all there is Internet of content. So increase in
knowledge. Other people. The data social interaction right to. Even this Facebook, Twitter, OK.
And other social media presence. There is interaction. The internal things, these are the ones the
data generated, objects planted to the network. OK, any data location mobility provides, you're
moving OK you're also generating some data from this spatial dimension. OK, the location is
also changing. Sometimes time is also associated with the data location. All these things are
collectively called Internet of Events. The model can imagine how much data is getting rendered
nowadays. We could from each of these courses. OK. You know what the question is? What data
mining? So it's an iterative and interactive discovering novel. OK, when did useful?
Comprehensive and understandable patterns and models and massive data sources. OK, so why
data mining? Have understood, but what is data mining? OK, what kind of activities will be
called data mining? So there are few keywords here. I have underlined them. OK, they're very
important. OK, what? What do you mean by iterative is that? There are many steps or passes
involved. OK, so you don't expect that you will find the patterns or results just by a single click?
OK, you have to do like different. Interactive. It means demand intervention is required. OK, so
you cannot leave it to the computer even though the algorithms are machine learning algorithms
are smart, but they are not so smart that they will do everything on their own, OK, if something
is going wrong, they cannot understand that this is wrong, OK, So even intervention is required.
So if you apply. Working they will find. Several patterns OK. The little patterns they will only
there, but all patterns will not be correct or useful. The human intervention is required to find
whether the patterns are correct or not, whether there is full or not. Then novel. What is the
meaning of novel? Unseen new. Yes, new. OK, yeah, novel means new. It means you have to
find some new pattern, OK, If you find something obvious which is known to. They found the
common sense that is not the only. Like you may tell that if your bread is sold, what are you
going to fold? That is, that is very trivial kind of pattern. OK, so you will not call it as a
returning. OK, valid. What is the meaning of valid? Really good Can you please repeat? Who
full data? The valid here means. Generalize the teacher. OK, don't worry. Entire objective. Is the
end of your finding pattern is that this pattern would repeat in the future? OK, you are putting all
the effort. OK, time and effort to find the pattern. The main thing is that that patterns we.
Repeating in the future. OK, if you find something better which is only specific to that data, you
have used some data OK and from that button which is not valid in future then that is a waste.
OK, whenever you do any forecasting, what is assumption the pattern which was there in the
history in the past, it will repeat. OK, so you have to find a pattern that can be generalized to
future. OK then useful. It means action is possible. Actually possible means of business. So
suppose there is a bank which developed the machine learning model to clarify whether a
customer. Will return bone or not. Customer is good. All. Bed. OK. So if you find the pattern in
that in some action is possible that OK, if you found that some customers banned you will not
give the loan and if the customer is good you will give the loan. So some pattern you will find
and action is possible. Like if you find that OK bread and butter are sold together. Then what you
will do, what is action you will take in the supermarket? Together. Yes. So the action, there was
one we were thinking that they they will place it together or there is some other. Or other way of
thinking that keep them far apart so that customers will search for these two products and buy the
product in between? OK. Or you can another action possibly you can create a combo combo of
products and give some discount. OK, so like this kind of some action should be possible. Then
comprehensive and understandable, which means leading to insights. OK, finally you should be
able to. Explain why something is happening happening. OK I understand what is causing this
one. OK so. This is the. Knowledge or wisdom your human will gain OK. Tell me any questions
from here, it's very important. Each word is very important. Tell me. Listen. OK, OK, so that is.
We say novel, so we need a new right, but we have also used the word human intervention. So
that itself would mean right? That it is a known pattern. No, no, no. Remained in the building.
They have to check the results, you have to see whether whether the meaningful or not. That is,
if it is not meaningful, you have to repeat the process again. Human intervention is after the
process has completed. We are taking the output. Yes, if it is correct or not, then if not, results
are not correct, not meaningful, OK or there are too many patterns, OK, then again you have to
start the process. OK, that's why I-20, OK, Novel means we should find something which is not
obvious. Means from a common sense you cannot think. OK, is this clear to me? Even
interviewed him. And now indeed. OK. So I have here ohh generation of the future. Package for
the blessing of future. It might be a for short. Can you please repeat? There is a generalization of
a generalization to the future, that is. Future generations. So we can build patterns only for
specific specific types of only for five years or. Presented the invasion. Ohh. It depends on the
application of context, how long, how long it will be valid OK Is there any other factors which
are impacting or not? OK, who are you? We cannot say for what duration. OK. OK, what it
should be? Suppose you know that OK, my other environmental factors are not changing so. For
next five years. So this pattern should be valid for next five years. So depending on other factors.
OK, like suppose the price of airfare is. Not going to change so much in too much in next two
years. So what is the seasonal factor and pattern and all those things will repeat. OK, so then.
You have to take care of those organs. OK. Thank you. And for the understand urban questions
understandable leading to insights, yeah. So once the actions taken we need to take the insights
of it or. I'm not saying that this one is called to this one is second. OK, don't think that way. It
means you the pattern you could understand why something is happening. OK, OK. So you have
to determine what is the call, what is going on? Acting. OK, the factors we need to uh, yeah. The
only thing happening you have to understand so because you know this algorithms are capable of
giving a lot of patterns in many maybe junk. Hmm. OK. Yeah, fully analyze the results. That
they make sense or not. Both. Thank you. OK. More this done mining is used here OK, when
you listen to this word mining. Something for this picture will come to your mind. I feel like this
one somebody is mining gold or iron. OK. So. Is this term data mining in this now? Is there any?
Anything wrong with this term data mining? When you compare with coal mining. OK, hold on.
I don't mind. Large amounts of data to the relevant data. My. Analyzing. You know also data to
find out the data to us, like increase mining the big deep into the earth to find out the something
similar to that. Anything. Which which? My name is Jeff. Mining is just related to extracting of
data whereas in you know data mining. What we do is we filter out the useful patterns and
models as you have mentioned in earlier slide. Yeah, see the number here is my name is word is
correct. OK, so mining means generally you are extracting you're digging or lot of digging lot of
stuff and finding the useful thing. OK, but you see here when you say coal mining pool is the
useful part. OK, useful thing that you want to extract. I don't. But when you see data mining, do
you want to start data? You know. Like we want. Checking information. Yeah, you understand,
like the pattern? Or knowledge. Or inside something like this we are not extracting data and data
is already there. OK so. Yeah, the technically correct names are knowledge discovery in
databases, OK, Knowledge extraction, pattern data, archaeology, information harvesting like
this, OK. But this data mining used to just emphasize the large volume of data so that what data
is kept. There is an interface is there, but just to emphasize the large volume of data, the data
word is used OK. OK, now. You will see this more knowledge discovery. So in data mining
segment. So knowledge is the overall process of extracting knowledge from data. OK data
mining is a step in knowledge. So the process where the application of specific algorithms based
on. So this is one step one of the steps in the knowledge discovery process OK where we apply
specific algorithms. OK, but before applying algorithm there are several other steps. Now.
Which type of activities are not determined? The business line I have. What? What is the
meaning? Move which kind of operations or activities will not classify them as. Give them you.
Extracting the data raw data is not called as data mining. OK, correct. Just collecting and storing
whether you're being. Yes. Yeah, OK. Any other? Decline. Being. Training. Determining. ETL's.
EDL OK. Mobile. Come back home. OK, see some simple operations. Mathematical operations
like simple search or query processing? OK Computing simple statistical measures like average
standard deviation? OK, these things are not very many. The procession was small machine
learning, statistical programs, and nondeterminism. OK, so let us continue. So this is the data
mining process. OK, so how it starts, You have an organization. This is an organization. OK, it
has. Regarding other data and that raw data is stored in data warehouse. OK, what is the data
warehouse? Data. Deposited. What is the difference in data warehouse and database? Contains
structured as well as unstructured base. Data Warehouse can contain data from several databases.
It is a single repository of data containing all the data from all the sources for the normalization.
Any other people? Data warehouse can have structured as well as unstructured data while
database has a structured. Defined schema. Databases event schema but does not have defense
scheme. Unstructured data will be there. So how is that? OK, that makes the racist store the data.
So it's the platform where you store the data, but the data warehouse is where you bring in data
from various sources, make it standardized, and then store it as a single source. A dinner at home
had a collection of historical data as well all the data sets. Yes, the data warehouse is generally
historical in nature. OK, so maybe last five or ten years later I store OK database contains in the
current. Current data. OK, so we covered these things in second chapter, second third chapter.
But OK, why we maintain this these two different databases and data warehouse? Because some
of the. More. Volume of beta. That is not the major. Kind of data and volume of data is different
about the case. Yeah, volume pointing the relevant data. That is, for that we are able to query it
faster in the runtime. Yes, correct. So generally today daily transactions we require the current
data so that is stored in database. OK, the depending on what kind of queries we will. Maybe
used every day. The tables are optimized for that in database OK. The data warehouse is
generally historically. Your your decision making OK. The purpose is decision making here the
so you fetch the queries are different here. Maybe on database. Database the generic stories are
current transactions OK, transaction processing only. A DPS government person system. OK. So
we'll understand the difference, but we are quickly, I've told you, OK, that's why they're
maintained different, OK. Data warehouse will contain a lot of data for let's say employee for
sales for products. Maybe for a long period, five years, 10 years like this. Now suppose you're.
Objective is to. Forecast. Forgot the sales of some product. Sales of, let's say, the mobile phone.
OK. So that is your task. So depending on this task. We will create the target data. OK, so from
the data warehouse you will the selection and cleaning. You will page and create a subset. Hmm.
And which variables are required? So when you want to forecast you will not require the
employee database OK, the employee data will not be fetched OK maybe pass sales and the price
OK from which region how much was sold? So those data you will. That you're there, the target.
OK then. Some transformation will be required. So transformation will depending on the. Dusk
and they're looking at will apply look to. Clean up because of the data. This is your three person.
OK. Because they will prepare the data for applying the algorithm. Now this part data mining.
We are applying the algorithms. Here it may be statistical algorithm. For machine learning
everything. OK, the supervisor and supervisor. You have played here. And then it will generate a
lot of patterns. How good, How good? So. Here this reminder that is required interpretation and
evaluation of those patterns. OK, better. The patterns are interesting. OK, they are something
new is there? OK with that there valid for future. OK, useful or not? OK, so this this thing
evaluation will be done here OK once. That dictation from his computer that adds to the
knowledge. Knowledge of the user. And based on this knowledge and integration with the
business, so some action is taken. Connection will be taken and. And rules or something is added
to the confirmation. OK, so is this picture clear? What are the steps? Yes. Ohh this. OK. So
applying the algorithm is only in this phase. OK. But before that you have to do a lot of things,
OK. Give that selection data because in. And now we have that line. You have to evaluate the
patterns. OK, so let's continue. So I will explain briefly these steps. So learning the application
domain. So it is very important to know what is your objective, OK? So relevant prior
knowledge and goals of application, OK creating and target data set, the data selection OK and
then data cleaning and proposing it may take 60% of the effort, OK. This is very crucial. You
can. So success of your data mining project will depend mainly on this step. OK. Whether you
have. Corrected or not and whether they are clean or not in the right format. For application of
the algorithm, so this is very important. So don't jump directly. Who are playing algorithms?
Spend a lot of time in this this machine learning algorithms are. Uh, like black boxes? OK, if you
could jump data they will give you jungle jobs. They themselves cannot tell you that data is not
clear. If you provide them data, you'll be you'll get jump buttons. So data reduction and
transformation. These are required if your data is very large, yeah, sentence cannot handle them.
So either reduce or transform them such that algorithms can work. OK, so find useful features
dimensionality or variable reduction invariant representation. OK so this will cover so. Data
understanding we'll cover in the second week this this will in the second week in the cleaning
will be in the third week. OK, the the duction transformation, this all of this will be covered in
3rd week. And choosing functions of data mining. So whether it's a summarization, we just want
to describe the rate or classify or aggression or association clustering. So these things will get.
Then to link the mining algorithm. For classification there are several algorithms, for example
decision tree or tuition neural network might based classifier. OK, which algorithm is suitable for
this application or this data that we have to check? So data mining in search of patterns, search
for patterns of interest. So you get the pattern. Then you could check whether they are.
Interesting or not, that is pattern evaluation and knowledge representation. So visualization, so
we have to whatever output you get, you have to present in such a way that it is understandable,
OK transformation, removing redundant patterns etcetera, etcetera. And finally use all the
discovered numbers, so you will take some action based on that. Second, you. OK. So any
question? OK, So what is the? History of this determining how it has evolved. OK, so in the 80s,
nineteen 80s. Initially. When this computer is developed. The main focus on ERP. What is ERP?
Enterprise resource planning. OK. What is the purpose of ERP? Record. Voice is not clear. What
do you think? I said it is like a system of records and a system of transaction. So recording your
thing just. We are recording oh oh where you do like a lot of you are like SAP and ERP. Oracle
is a ERP for organizations to do their transactions as well as maintain the data. You actually
helps us to run our business in various ways that it can be operational efficiency or supply chain
or the financial domain or HR domain, anything. I mean whatever helps us decisions which helps
us run the company in a more efficient way. OK, any other answer? The sort of information
management system. OK, so basically it is connecting different verticals in the company will
improve the visibility 131 vertical, the vertical and. Good data will be. In one. What is the 400
and format of the synchronized? OK. So do the benefit to ERP, OK, then 90s? The CRM system
What is What is CRM? This modulations IT management. OK, what is meant by this? It meant
to provide various services to the existing customers or like if they have a support or they wanted
to sell some other engagement to them rather than to UCR and things. Like Salesforce, those
here and. OK. Anything else? The item is basically to keep records of the. I mean at what stage
the relationship with any particular customer is? So keeping we got this step by step. OK, so
some of you could that is like a complaint management system services and complain. Yeah. So
somebody having uh, let's say they use certain call and they have an issue with the engine, so.
Resolved so they can call customer care and the customer care would have all the information
there CRM and from there they can know about when they purchased the car, when the last
service will happen, all the record and everything could be at one single place for them and based
on whatever. To complain that they raise they would be able to investigate that issue and will be
able to provide a specific service to the client to make make sure that client OK. So this is this is
there, but it is a very small. Is not the primary objective of customer relationship. Management.
OK, so basically you want to. Like it costumer based on customer you want to assist the
customer. Lifetime. Value. OK. So yeah, you're doing the service and all those things, but what
is the objective is you maintain a long time relationship and you want to also estimate the
customer lifetime value. What is this customer lifetime value? HoloLens. Company How much
revenue does one customer bring? How much the customer has helped out the other platform
today? Yeah. So basically we multiple senses not just from the sales but also from reptiles
etcetera. Yeah. So basically, basically there are several customers, OK, the company wants to
track. Ohh, which customer is available means if they stay with the company for let's say next
five years, revenue, revenue it will bring to the company. The different methods to estimate those
things. OK, so these are the main. And then in 2000, this ecommerce came into existence. OK,
so where you have online shopping, it changed the entire. Real of being and OK What are the
challenges in ecommerce? He got the good before but his was not available in economy. Sorry,
can you please repeat? Kill before proceeding. We want to touch and feel it and see the quality
like we did go by the pictures and the advertisements and we don't know how the quality exactly
is. So this is one of the challenges like in the conventional shopping you will go and touch,
physically touch the product and experience it. But that is missing in ecommerce. So there is one
challenge any other thing. A security breach while I mean while they at the time and there can be
a fraud in the network which can. Yeah, OK. Yeah. Yeah, companies are also sold related to
commerce. Eruption point of view? OK, there is no reliability. Maybe the product which they
claim is actually branded my gift. OK. On time delivery? Somebody that. That they also have
been in there. Delivery time and the return process is the customer wants to return the outlier
chain management, there can be data breach also they can share our identity, address, phone
number with some third parties. Different things. People are good and are not. Not going to
happen. I'm just saying how how the shopping experience has changed. Between the difference
between the conventions for that even you get it immediately or go to the store, but in
ecommerce will have to wait for it until it get delivered. Everything is on the Internet, needs on
phone so it's instant right now. OK, OK, OK, OK, so let me ask you this question. Suppose we
go to the conventional store to buy a cloth. OK, from garment. OK, so with whom do you
interact? The child. We don't know much about the seller. No, You interact with the girlfriend,
right? And then you tell your what is your objective? OK, suppose you tell that. OK, I've come to
Bayshore. OK, so then. What this cell phone you love? You can talk. Hold on. Yeah, so.
Questions like which you want to what you want OK, which button, which design these things
OK, they will ask you now. So what is the what is the salesman is doing is trying to understand
your taste. OK, what is there are many things hidden in your mind. Would you think you cannot
directly express? OK this one ask your question and then it tries to retrieve your that test
objective. So who will do this job in in commerce? Filters on the ecommerce website then we are
buying the product. So based on that the systems, yes, yes. In your mind, everything is not
increasing. You don't go, you don't go. You don't need to shop with, uh, in your that. OK, I want
to buy a shirt of ₹2335. OK, these things are not OK, this color exactly blue color with so much
darkness and all those things. OK, so you go there with some. Some big kind of choices, OK.
And then the salesman, you talk to the salesman, salesman will show something and then you
finally freeze and some item, OK, So who will come, that is the main difference, main challenge.
OK. Yeah. So one of you told the recommendation system. So understanding the taste the user is
a big challenge in ecommerce, yes. So for that case we can use the process of clustering, right?
Yeah, there are some. Probably. Ohh, so suppose you only on ecommerce with Amazon you just
type smartphone. You will see a list of maybe under 200 smartphones with user. OK, so is it
possible for the user to check each one and then? Find the relevant 1. Possibly this this
ecommerce or this Amazon? They have some algorithm how to rank these items. Each order to
show this items so that maybe in by checking all the top 2-3 you will find the good one OK,
otherwise if it if it gives you the result in random order OK, the customer will be. More than five
5-6 and then the customer leave the website, the maybe the choice is choice product maybe,
maybe 20th place. Again the order OK, so customer does not have so much patience to check.
All the 100 items and find the best one. OK, so these things are done by the salesman to
conventional store, OK. And then in this ecommerce some algorithm has to work to. Understand
this that taste of the customization the command indent. OK, so. And then in 2010, this. Data
mining and big database things started. OK, so. Like this is the. Evolution of data mining so
initially. What was the purpose? So you said the purpose was collection of? Data. No, because
you are. When you come to, you come to understand, to understand. What is there in the mind of
the customer and all? And then using big data you're finding some patterns and. So that you can
take from action. OK. So do you agree? Do you have any question? We have. Yeah. So and
nowadays also you actually CRM are still used it used. I'm saying when you start saying I'm not
saying that they have disappeared. OK. OK. OK, now this is the evolution of this data science or
data mining, OK. When you compare, have you heard about industrial revolution? You heard
this term. Yes. What happened in this? Go all lot of manual processes is done by human beings
for move to the machines and a lot of faster and more reliable products are being in this in this.
They understand. Also action. OK, OK. You're saying you're saying manual things that were
replaced by machines. Yeah. And a lot of industry has been set up against the mass specific
service, specific service. OK, OK, OK. So if you compare this with this evolution database data.
What is the analogy? All differences between these two things. There is also some revolution
here, right? Revolution is happening. So how do you compare and contrast them with decision
with this? Artificial intelligence replace manual works that are being done by humans right now.
Yes, OK. Yeah. Any other thing? So ecommerce, let us have hands to help. A large collection of
data. So after that we data mining field. On the data analysis. The low, the machine learning and
the building on that, building on that, building on that. Make use of those data or building
patterns on the. Automating backups. Any other? OK, so one of you told you correctly in a, in a
what you're trying to do, you're trying to replicate the muscle power of human being. OK thing
instead of things done by manual. Manual man done manually. You're trying to end machines
which can do a few things now. And now what we are doing in this, in this. Here we are trying to
replicate the brain brain power of human being. With the meeting. QQ. Yeah. So this is the
difference of what is happening, why these are going in this direction. OK, what has happened in
the? The. Study behind. OK, so let us continue the new age. So while we are into this page. You
know, you know that. Ohh, let other people for the computing storm. To the technology has
become cheaper. OK, OK, let's say the hardware, the hard disk or RAM or your cursor. OK,
these things have become very cheap now. Mobile computing, even your small device, is about
power. Unbelievable people. Not possible. Maybe when? Networking, right? So you're handing
water better using. Get your computing now. What is cloud computing? Who do not? You know,
we cannot judge how much they need. For servers. There's no comparison. Service. In the
former. So what did you have? Yes. So now you think you don't even buy the. You can even rent
them. So you don't have that much. Start up. You can even rent them so this. From moving.
Cloud computing just like you could. Yeah, yeah, yeah. Solid. Then data from the data storm.
There there's clear, there's only like. Velocity. Uh, the data is sleeping at a very rapid pace these
days. Yes, the data is very very. OK. Even if there are some courses. Been very. When writing.
Different types of industry and. Like from the different fields. Yeah. Actually. Now you're
infected. Later your comments OK, Feedback OK Text, audio, video. There are. What is the
truthfulness of the? Maybe later ambiguity? We will discuss. So. If you see the advancement.
Voices I. The problem? There's a background check on my son my from my device. Yes, when
you're talking this account or some people. OK. Name of. It's not from the outside. I think it's I
think it's from shutter. Now it's gone, OK. OK, thank you. So you see that you see the why this
new ideas emerged to compare if you compare? Um, the device is. Name. Name. OK, OK. 50.
The computers. Currently and currently your. OK, OK. I I feel like 112. So so it is 5000 * 40
day. Guidance Computer the. On the moon. OK, 1959 OK, so your phones are built phones are
more powerful than. OK, OK, your this is your this computing stone mobile. OK, OK, OK. Ohh.
All the creating terms of. Pause the city supercomputer. £500. I can't do all this. Contaminate.
And the and the piece required is. Maybe the supercomputer? Square feet. And you can come to
iPhone icon. So you can see the advancement. Watching the technology. So. At Formatting at .3
Mega. 3 Mega. Instrument. Even more than one. 14,500 times more. Again. No. OK, nowadays
we see USB C charges are for. OK. To just see how much one and one thing has been. OK ohh.
So now the question is. What is Big Data? Launch data means. Yes, yes. As mentioned earlier,
the data can be structured an. Come up. Downstream analysis. Is there so is there so is there so
the data? With the amount of user equals. Yeah. So. So data becomes large enough. Volume,
Volume. One V1 TV. Big data, big data. I think when it's all soft to process it. I mean, whenever
we face this, we change. Like volume is 1 issue OK. The data is big. Dimensional. You are
thinking big. What is an increase? I think support depends upon the system, the system which
needs. Yes, yes. OK for your computer if you're working using your desktop. Young. For you.
Done. Done. Be all your video. And maybe we didn't. Done. Then it is called big data. You
cannot. OK. Volume. No, volume only. Volume is only vision. Disability. The challenge, the
challenge, the challenge and the level method. Level methods for level methods for August,
August. The meeting. Cost effective improvements, minimum or resources. Yeah. Increasing.
Within that main thing. OK, so OK, so you have to change your algorithms. Hello. In the
statistical or formula based IIII. Even even. But but. Who? Keep on investing, keep on investing
in hardware, testing in hardware thing in hardware, OK, OK. Perfect, but they should be the
answer they should give answers to. Come. This is where. This is where there's big data
analytics. Different from traditional entry. Mum. Again. Blood. Equality. Equality better so
speak. It will different. It will differ in speed. Doing OK, OK, OK, OK. You said the team lean.
The means. The means. OK, OK, OK, OK. You know the site. You know. OK, OK, OK. Yes.
One is velocity, velocity, velocity, velocity, velocity, velocity. What time? All the time? How
much? OK, so speed. Buses. OK, so go to your video watching any video. Moving, then you,
then you, then you velocity data. OK, OK. The volume button. Can you be a dinosaur? Many,
many forms. I'll talk to you later, OK? What do you mean? Clear data like data like. Which can
be? Which can he? Can he? Can he? Place place relation data. They can be stored in a table.
Relationship partitions. OK. So there is stuff OK. Data. Can you give some? Images of people.
Video file. Yes, text, text, text. Feedback on feedback on the. Does not have a predefined
relational into relational into relational into relational database. OK. So like, thank you Audio,
video, image, Internet data, Internet and log files. Log files. The data is not the big issue.
Volume, volume, volume. The main but the main but the main but the main but that then. OK.
You can do that. Actually the data is working OK. Does. This is a dictation situation, situation,
situation situation. The property. You look like you look. 90% of the bodies remain. The
majority, the majority, 90% is not visible. Similar, similar, similar only this only this only this
much. With all weekend for the weekend for this, but the main but the main thing is we think.
Video. Would you? Would you? Which you are not able to. Wanted to the port only the port one
port one port one port one port one. So that is. Again. OK, OK. The data. That you think? Move.
The. Perfect. OK. You cannot expect the toggle toggle either. Either get some, get some, get
some, get some, get some results. That is. Yeah. Some people, Yeah. Some people, yeah. Of big
data. Signified. What is your question? What does? What is the? Cancel, cancel. Cancel. Cancel.
Cancel. 30. To. If you feel, if you feel like OK. Better than. There may be. There may be to the.
There may be chance that it may. OK, OK. More. OK. Then. OK, so a few examples of being
data from the right. And young based government painter. They'll have 16. Till has 16, each of
each of which is one. So which aspect? Which aspect? 1. 11 Below, Friend City, Volume 111. So
could you please, could you please, could you please, could you please, could you please
elaborate? Close. You can do. You can do. Monitor, monitor, monitor the space and all this and
all this and all the data. 1B. Wouldn't be all. OK. What is the volume, What is the volume of the
data is coming, is coming? By By. Will you, Will you, Will you, Will you? Again, again, again,
again. And the handles. Mainly handle candy handle. Party. Maybe you cannot detect count as
two. This has to be done on the on the on the strike. Ample, Ample, ample, ample. Ample,
ample, ample, ample ample. The blue. Average of average Average of data points. Data points.
To the bank. To the bank. To the band. To the band, The band, the band, the band, the band, the
band. The the the the the EPD ET ET ET ET ET. What you will do, What you will do?
Calculation next. Calculation next. Calculation next. Calculation next, Calculation next. And.
Entire. That would entirely. Your entire. Uh. Yeah. Uh. Yeah. Have have, have, have, have,
have, have, have, have, have, have. OK, so. Like this, like this like this, like this, like Yeah.
Long. Down, down, down, down, down. OK, so you think, you think you think OK, your
traditional methods will not work on this kind of data. OK. You get to meet. Molding. For.
Thank you. If any business will invest there. Often. Two. But by the hardware and hardware and
software. Return. Right. Question is question is what, what what is? Will still. Probably just too
much. Uh. Justifications. Justifications Also there's also. Healthy length, healthy plants, healthy
plants, healthy plants, healthy plants, healthy plants, healthy plants help. Capital. You need to do
it to do to do to. Everything. How everything. How everything. Everything for everything for
everything for everything for everything. Right competitive advantage. 313 advantage.
Advantage globally the margins. Destination industry. People in the industry, industry, industry.
Is very very. Ready, Ready, ready. Ready. Ready. Ready. Ready, ready, ready. The beginning of
the beginning of this. We've been. Something, something something something something
something. That one day, three word, one day, three word. One day, three word, one day. They
will. You know which cannot, which cannot, which cannot, which cannot, which cannot, which
cannot, which cannot, which cannot, which cannot, which cannot, which cannot, which cannot,
which cannot, which cannot, which cannot, which cannot, which. OK, OK. OK, OK, OK.
Competitive advantage in marketing. This company is taught. OK for if you have heard about
portal. Order in. Marketing. You've done this one this time already. Advantage. OK, there are
nice articles. So this bigger than this can provide competitive advantage to the company. Nucleus
research concluded analytics based that. 10 point $50.00 for every dollar spent. They found that.
And costing costing the founder OK you spend $1.00 or 2000 this is done 10.66 dollars. The
Media map company achieved their 12% ROI in five months with an annual revenue of $2.2
million. You're going. Ohh. They're bigger than drive the top line and then everything minimizes
operational cost. The police offline. Yeah. Yeah. So you could above plan is increasing and the
cost is decreasing so your profit will increase. So because parent constraint, what does this
example of the media math company achieving this row? Company they applied. Ohh. Not in
game with me. I will share the references. OK. OK, so beginning these aren't constrained by
predefined set of questions. OK, see nowadays, now the national thing is if you're in a company,
you will try to solve your problem, which you will observe, OK, Until you observe anything, you
will not take any steps or solve it, OK but. This uh, bigger can find the. If something is
happening wrong, it can find the patterns OK. The bigger and it's Alan, consumed by three
different set of questions. OK, the main problem with us or our brain is that you don't know what
you don't know. OK. Do you agree? Yes. Yes. OK, so big data can help you find what are the
things you do then. Then once you know that then you can take action to solve those things. OK,
so you don't have to guess. OK, so. If you have, because I think you can take back the vision you
didn't to find answers that are more specific and security more useful. So earlier when we did,
italics was not there. The decisions were taken based on intuition. Vegan and now since you have
this kind of analytics. You can take decision based on facts, based on numbers. OK. So. So yeah,
you can find, you can go through this paper. And giving competitive advantage to big data and
the picture with you, OK. Oh oh, different companies have an advantage using big data. OK,
now. Thanks. Ohh. Better different. Take some simple simple you can say operation that OK and
some are complex. So based on the complexity, these antics are classified into groups. OK, the
first one is the descriptive analytics. OK, So what you're trying to find is what what has
happened. You want to know what has happened? OK. So it's just an reporting tool. OK. So what
is the average sales? OK, what is the maximum sales, minimum sales in a year? OK, so just.
Reporting tool. But implicit. And I dictate on bigger discrepancy. It's the view of QQ metrics and
measures. Is also called the data analysis. OK, to them charts and plots. You want to see what is
going on, what is happening? OK, so data query, report, description, strike, data visualization,
dashboards, basic what is specific models. So these are the tools you use to know what. Like you
may find the weather like. Ohh. Like OK, this kind of simple results you can find through
descriptive analytics. OK, diagnostic ending. More than one. So why something happened? So
once you observe, observe something you need. You may question like why the sales went down
in let's say last quarter, so why something happened? Diagnostic. The build on my accurate the
root cause of the problem. So basically you want to find the cause of the problem. Sorry, bad
romance is coming. So stupid. So. The latest marketing campaign impact sales. So suppose sales
increased. So you want to know whether increase naturally or because of some marketing
campaign. But if you're agnostic, so did the weather affect BFS? OK. So this is second one, third
one is. Predictive Analytics. OK. So what is likely to happen you want to predict? OK, watch it.
So predict the also predict the also predict the outcome of certain. Let's say you have. Uh. Given
some discount on them, OK, you want to predict the sales of that product? OK, so do you want to
predict? So being able to predict allows one to make better distribution. So if you know of
something. What is going to happen next? If you know now we can take better decisions. OK,
construct models based on past data to predict features. So when we. For this kind of analytics
we have to build models. OK, based on position. 2010 is that happening? That has happened in
the past. It will repeat in the future. OK, so now what kind of model you have to build? Linear
regression? Time series analysis? Data mining technique? Simulation based system? These these
rules. And. The fourth one and the last one is the prescription antics, OK. So what do I need to
do? OK, so this will prescribe you what to do that is the best and you take. So there's a course of
action to take. Condition a final decision. Did you understanding of what? OK, why it has
happened when I variety of what might happen? Analysis to help the user determine the best
course of action to take. OK, so once you know the what has happened, why it has happened and
and different scenarios if I do this what like this and like this that is. Really. Really. So this will
help you. OK, migrating model estimates the probability that the customer will default on a loan
to the other 1.6. That you can estimate using Pediatrics So if you what are the tools required, you
require optimization techniques here. So linear programming operations inside this kind of tool is
used. So what are the four types of entities? There's. Correct SO Start. It starts with the. The 50.
Then diagnostic. And then predictive and then prescriptive. Which one is easiest? Disconnected.
OK. Yeah, that would be. And each one is the most valuable. Predictive. Predictive. You probably will tell
you only you will do like. What if I do this? What will happen? What is? Even the mayor, he will leave and then? And then you will, you will tell the outcome, but it

will not tell you what is the best fashion take. Betty will not tell them. OK. So if you compare, if you compare, if you compare the different types of analytics. OK on to

complexity and value. Look for the user OK, you will see that if you go. On the difficult diagnostic quality restricted, the value of increased and complexity increase.

OK, so. So far. We have very good control on like this. Most of the the thing the thing the simplest one is just. But very less. On the defective part. OK, so here you

require optimization tools from this. Specifically you require. OK. Are you sending? Yeah. So can we say that prescriptive and inferential are same? No, no comment

field is with What is the end the call inference diagnostic frequency? OK. Ohh. Thank you. Still not very. Not clear on what you're doing. What I want to ask you? OK,

this is not very mature, right? I don't know. It's there, but it is not a lot of skills. OK, you have two more optimization subject. OK, you have three. Yeah. You can build

a dictionary model linear regression with model simple OK, just scripting requires which will get into population to population to population 2. OK. Ohh. So little bit,

uh, awareness is less. Yeah, but technically it is also advanced. OK. OK, so let us see what are the. Application areas of data mining in real life. To. In bank for

astronomy by burning down discovery many business arising PRN investments again Canal park manufacturing, sports, entertainment, telecom, ecommerce. Targeting,

marketing, healthcare and so on. That something in both. Law enforcement. Providing type features and. OK, so you name any three? There will be a lot of applications

of these. OK. Do you see some examples? For customer modeling, yeah, yeah, say your customer task. Or if it's attrition prediction. Our company, they lose. You don't

believe? There were many people are leaving on paying back alley. People. You cannot. Yeah, kitchen attrition means. Customer is leaving the company and joining

some other competitor of that company. OK. In telecommunication, mobile service. Customer will not happy with the service and joins with the Geo OK that is an

attrition. So efficient rate of mobile phone customers is around 25 to 30% per year, very high. OK. Telecom industry attrition is maybe the highest compared to other

industries. To the task is studied who is likely to acquit next month OK, given customer information for the past ten months. So your company may ask you to list of

customers are given to you and. Our estimate customer value and what is the cost effective offer to be made to this customer? OK this this kind of the 2nd 2nd

objective like. OK. You found the list of customers, we're going to leave the company, likely to leave the company. The next question is? Estimate the customer value
of each one. OK. So if the customer is not bringing revenue, very little revenue if they leave the company should not matter for the company. OK, so that's why you

need to estimate the customer value. You cannot retain all customers if some are. Not getting revenue then OK. Yeah, you're fine with leaving their customer? OK.

Customer value is important and once you estimated what is the cost effective offer to be made to this customer? Do. Yeah, revenue, but he's likely to leave the

company. So you have to give some offers to retain that customer. Now how much? What is the cost effective offer? So what should be the cost of that offer? OK, you

cannot give. You cannot spend more than that 500 generator is in. Maybe net loss? All these things must be estimated. Then you have some examples like targeted

marketing. OK, Cross selling. Customer acquisition what? What is possible? Telling people this. Hurry. Tell him what? Products. OK. Like for example. Along with

the, you know, opening an account for the customers credit cards, other facilities. Yeah, yeah. So when you open an account with the bank if you after few days. And

they will call you and request to take a credit card or insurance. Other products. OK, that didn't make it. You cannot complain, cannot complain, cannot arbitrarily

target. All types of customers OK, they have to see. With customers are worthy of. Go back. OK. Clear the customer and then. Somebody requires insurance you are

giving credit card to actually that person will not decrease from. Hello, Sir. Yeah. The acquisition prediction task for the second option to predict who is likely to affect

next one. How can I predict that? Based on past data. Long list of customers Then how can we predict for each customer who is likely to acquit next month? Why not

so see? Suppose your database of fast customers who have left the company and who are continuing with you. You have to see the feature whether they were supposed

they have a prepaid connection. So whether they were recharging or not for past few months or what was the amount they were recharging, they were maybe

recharging for very small amount or not recharging for 5-6 months and then they left. OK. Yes, yes, yes. This is the. This is the main task. OK. Ohh, you have people

find the pattern. What pattern of pattern is? What was the pattern of the customers who left the company? OK. And as I'm telling you that they still receive. Is it clear?

OK, so. I feel good. God. Monday. And from detection from. What? You can just something telecom references. OK then case for example so. Example is like person

applied for a loan. OK, what? Yeah. Apply for a loan to a bank was filled with us with the monitors. To check this drop that position like everything is proper” or

something like that. Previous credit payment history. So this call profile the person. Based on. OK, hand, right. Any credit cards? OK, so the whole. But some person.

Do you? No, don't we just this factor also is also thinking. Ohh yeah, not only the. Think about I'm not with that you will check whether he is capability of returning the

loan. OK. So fast is should the bank approved the loan, that's the past. OK, so people who have the best create don't leave their homes and people with what period are

not like the customers are best customers and customers are in the middle. OK, so judging. Very good and very bad person you can easily identify but the difficulty is

the middle Bombay cases which has some features or both, some which are bad and whether they will return the long or not. The bank developed rate card models

using variety of machine learning methods. Mortgage and credit card qualification are the result of being able to fully predict the person is likely to default on loan

amount. How it is if you. If you have the filter, applications are very quick. You have to just fill the form in. In the bank, dictate and the one or two minutes it will take

and they will pay you whether they're good or not. What is the reason? Because the profiling is being done by them and not humans. In the. You can have all the credit

history. The customer and the customer look us tomorrow's data is maintained and 2nd. Yeah. So what? Yeah, the main thing is the ticket. Infection. Very quickly.

Good. Yeah, this table in India then which way, what is the important things they require from last summer? And they will check this table score. So they have

developed a model that OK score is. Together than 800 or something other than 500, OK. They would immediately. Give them one. Right. To Because of this, machine

learning algorithms are wrong. It has become very fast. Has become. OK, so ecommerce, ecommerce, how machine learning is applied? Person buying the book and

Amazon.com. Click on Amazon on Amazon. What would happen? Any product? What is from the bottom? The 11,000. What? What did you show review? Yeah,

reviews for the for it would also. Related products. Yeah, so it was a. So the task is recommend other books on products to this person is likely to buy. So for Amazon

the task is recommend other books or products this person is likely to buy. It will show that. So Amazon dot clustering based on books born. So if there's a group of

books with a similar book so it will show you. Suppose you clicked on. Advances in knowledge discovery and data mining. It will show that customers should not.

Joining also bought practical, practical machine learning tools and joy. He made them pass sales. OK, if you group the. Bookmiller topic books OK and on your books

that customers have bought together after sometime. OK. And then I will give you the recommendation. Ohh, next week. Have you heard about Netflix? Yeah. Yeah.

Hawaii in India. But it is a long, very old. Move the comment is done. What it does is. It recommend moving so once you start watching few movies it will understand

your taste and it will recommend you other movies. OK. So there was a lot of competition organized by Netflix to improve their recommendation? Your business,

please. This is it. Netflix movie, how you how it works is very interesting. Recommendation system and lot of research has been done for Netflix movie rebounder

system. So there's a combination programs of Amazon and education processes. So in medical field genomic microarrays case. So given micro medical data for the

number of patients, can we accurately diagnose the disease? OK. Predict outcome for a given treatment. OK, the command the best treatment. OK, so. Even if two
patients have the same disease, the treatment may not be same because there may be some other factors. OK, so getting the outcome of the entrapment and becoming

the best treatment for this patient is? Again, not modern will be done. Pretty acute lympho. Arctic leukemia. Ohh I'd be like you're my little human so. There are

different variants of leukemia, so. It's predicting. Yeah, yeah, correct version of the. So depending on the gene data. Machine learning algorithms can guide them,

whether it's e-mail or type of living. The other 30, some there, some studies are there. 30th training cases were there and 34 test cases and 7000 gene data. Available.

Go. You think you have to build diagnostic model and the result on the 33 or 34? Reply matrix, which is where there's only one error, so. Got it. E-mail. You said you

will if you search on Google. OK. Or classifying L and AML based on machine learning will get out of research papers. OK. So. You didn't want to take them. OK, so

it got 4 dictation. Machine learning has been applied, regression of Malina money laundering cases. OK, so U.S. Treasury securities fraud, they have that. At the phone

call, maybe and T. Bill Atlantic with silicon. But the reason Detection at Bold Epileptics in 2002. In the maybe the project and I feel like I have dictation they will find.

Management. What? I go alone. That. And then therefore management is to something else happened. And then after that there's Q and all those things. So optimizing

analytics is used to direct the correct supplies or discovery or food item to areas where they are needed most. OK. Ohh here of your healing is mainly the procedure and

so does the village need bottled water or the force rice or wheat should have done it. So whenever any disaster happens anywhere, the first part is of the same food and

water. But but but. Uh, all the same and the location may not be same, so requirement may be different somewhere you need. Both are wiser, with shelter

approximately important. OK, one example is uh. Here, Hurricane Frances was on way to hit Florida, the Atlantic Coast into 1/4. Go to Walmart wants to predict which

items will be sold most in the path of the hurricane. OK, so this hurricane is going to hit. I thought you. So before this I can. The Walmart. One Monday morning.

Before. Most in the stores near the Canadian. So can you guess? Paper. Drinking water, Toilet paper? OK Drinking water? Anything else? Please press the cities. Like.

At least the label that battery OK What else? Type foods Google for SO. By Premium. Back to food. Typical. Working, working, yeah. So, OK, nice. So what this

company did? Ohh you didn't know bottled water, flashlights or OK bad food battery and all those things but what what kind the shopper you mind the shopper he

mind the shopper is free when? He several weeks earlier had a different location. OK. So then they mind the shopping shopper history, the sales item sold? OK. And

what they found that in the past since the strawberry pop tarts and bear increased 7 times. OK. So you see this is determining you are finding something which is non

trivial, not obvious. If you if you are working in some company and you say that OK, you keep more water, bottled water, flashlights, food will be sold. Once

everybody is common sense, right? Everybody knows this. OK, but what has been found that is not very obvious? OK. Do you agree? Are you guys here? Yeah, yes,

yeah. OK, Yeah. OK, so. Of the determining tasks which are applied in the marketing for customers, for, for detection, for disaster management, so how we can group

them? OK, what is the depending on the nature of the task. We can group this task and there are many functionalities. Weather. Trying to pretend to be found

determining does OK, so the simplest one is descriptive and predictive. So a class of concept description. Description basically. So we want to characterize and

discriminate. OK. To get the data calculation is summarization of the general characteristics of features. To target class of data you want to summarize some features.

OK. His comparison of the general class data objects against objects from one or multiple contrasting classes. So this is. One class with with. OK, So what is the output

of despective? This thing output really by bar charts for multi dimensional data and multi dimensional table. OK, so example, let us consider example. So there is a

company all electronics. OK, it it fell the economic goods. They're very good only when they are successful international company with branches around the world.

OK, Easy Brand has its own cellular bases. The domain is following lesson table. What? What I need name, address, age of vision and store. Then you write them

table, maybe item ID, brand, category. Type price. Garage banged in name and address. You. Give me the transactions that happened. Another employee. OK then,

item sold. What is the quantity of items sold? OK, so data collection example is as follows. So some of us of us to all our customers who spend them. All these

numbers. So this is an example of integration task. OK, so company may. Ohh that what is the profile? On the customers who are spending more than 5000 a year at

this company. OK, this is an example of the. You like this one? Somebody like this and they're not profile of this customers that they're they're 40 years, 40 to 50 years

old, employed and have an excellent bold trading. The question of the question we spent more than five times over the years. OK. Little dictation. So what happens

here? Yeah, which way? What is your question? Customer age address. And category will be underrated. What do you think of now? OK. So yeah, Captain is the best

simple conversation description on the customer discrimination. Discrimination is comparison. OK, so. You are the definition. You compared two groups of customers.

OK, let's say those who shop for computer products regularly. Regularly means more than twice a month, and those who rarely shop for the products less than three

times a year. You want to compare these two groups of customers, so this is an example of discrimination. OK, give me one minute. And then the as follows. OK, so

the result is like this if he wasn't the customers who frequently purchased computer products. I'm going to 40 years old and never university elevation. I. Could only

buy it. Have you and have no need to do so. Example of discrimination right there is a comparison. Between two groups of customers, OK. So these comes under your
desk. OK, next functionality is mining frequent button associations and correlations. The patterns are frequent patterns, patterns that occur frequently in data. OK, so

here we there's a term called frequent itemset. The set of items that often appear together in a transaction with the. So. You got people. Although the you know this.

Mining. Was from this the retail store data? He also called the Market Basket analysis. Market Market. It means you want to analyze what are the items. The customer

is buying together in the in the Market Basket or the car. Each items are sold together. OK. Ohh. Yeah, it is OK. Like customer X computer. OK, OK. He or she is most

likely will most likely do what you will buy the software. This kind of association is there. So there are some. Whether the solution is strong or weak, there are some

numbers to below this because support means out of all transition, how many cases we suppose this these items are present together these two percent okay, confidence

means what is the quality that. Customer buy your software given the customer has bought computer that is 60% charges there. OK, so when you create this kind of

rule. OK, that algorithm will give you some kind of quality quality. It means this will not whether this rule will be valid in future or not. OK, if the support is very less

than the .001, confidence is also very less. 2% then it is likely that it is this kind of pattern is occurring only in this data set. In future it may not occur. OK. So higher

the support and higher the confidence, it is likely that this kind of patterns will occur in the future? OK, then we can frequent sequential patterns. Sequential means

something the customer is buying after making a purchase of making a purchase of making. OK, OK, not in the same Market Basket. Not together. It's eventually the

consequential. OK, OK, so sequential is like. Suppose a person buys a camera. OK, then after. A few days, a few months, the person will buy memory card. OK, so.

The differential pattern or put on the association rules. The rules are like this. It wasn't by his computer. He or she most likely will buy a software. OK, so this is the

second type of functionality. Well, the part one is the prediction. OK, we're finding models that are 5 and they string is classes are concepts for prediction. Prediction

means for future we want to know what will be the output in the future. So there are two players. Super models are supervised derived and so we are deriving models

from labeled data. So means the historical data we have, we know the outcome of the historical data that is from the labeled data. Well, we have the X variable and we

know the Y also of the data. OK, why is there still the class? Thanks for applying, applying, applying, applying for loan. Page to the age to the age in component form

and why we did not denote whether the customer is a good or bad customer. OK. Move on bed. And the person returned the loan on March. That is the label data. You

know the class from the. Move up The physical methods are different, like the night vision plus Commission neural network logical relation. You don't. Applications

are paid cards for detection. OK. Log direct marketing. Weather, Weather. Whether this person will? Come to buy the product from the shop or not, there is nothing.

Party went start to, you know, some. Objects or start or. Diseases web pages. OK, I'll put it back. Output will present an England bold OK. Like if income is less than

50,000 per month, the customer is not a good customer. On please you can represent as. I hope. In the classification chapter. OK. So like if all is you and income is high

the the customer will buy computer this kind of this part. This is the part. And if you love them for, what will be the outcome of? What is the class level of that? Do

that. Did that leave them though? So like this you have a mixture of items. So this is like a mixture. OK. And you have this model? OK, so you can understand the

customers or bad customer. You know that there are two types of. But you don't know any. Any customer comes in future, you don't know whether you will do the

band. So you apply the model, which means you give them a model. That is. The next one is. But then I did. So. This is. On the. Home. So you have. The description of

the. What do you think? Why do you want that? OK. Ohh. Generally the bathroom. Is that? Putting in your training book. I think I found move left unsupervised

learning. It means you don't know the why. OK, you don't know the correct answer. In case of the loan example. Remove from the party data within the customer that

customer. You look like there is no Why? Very, very. The objective here is different market segmentation. So you can do so from this one we want to. Move their

consumers together. OK. What is the meaning of market recommendation? I'm getting you. Go out with you. We have about the authority area we have. Is very, very

important for. All the benefit bullet manufacturing any product. So all all consumers are not the same. We don't have low cost. You know. Ohh, any company not

create a product which will satisfy different types and then OK let's say Maruti has too many cars. Why you tell so many cups? Different, very, very different variants

of cars are there. Starting from although. What is different markers, correct? Yeah, so some are like. The the low cost but they have they have some go back there,

some will bring him table bring him table pass from high end. Need family family family. So this is segmentation. We want to identify the groups of consumers. So

when you have. Appropriate product. OK, well for that you must apply this constraint and looking to find which are there any groups are there and which group you

want to. So let's say you want to. Amount spent. OK. So you design the product, you're set one right one. This red one. Cortana. And if you're designing the correct

product for the right group of customers, then you will realize profit, OK. And you design something for a long group of. Remove. OK, OK, this functionality. This

functionality is called. OK. So any questions so far? What do you called how many functions we have covered so far? Data characterization data. Discrimination data.

Data definition. Liquid patterns, associations and correlations and patterns. Classification and prediction. Bandwidth. Berkeley List. Cooking. So yeah, let us continue

so few more. So when you so when you're looping, looping region of limitation of grouping objects, what is the object? Of the two is to maximize last similarity and
minimize inter class. Good. National interoperability minimize interfacing. It means these items within the group are similar to each other, but items from one group

will not be similar to items. Maximize intraclass. Within the class we maximize similarity and minimize inter class energy. Maybe that? OK, the next functionality is

outlier. Water there are common between two groups. Yeah, yeah, yeah, yeah, there may be common. OK, so. There are there are some algorithms, some this classical

algorithm. They they tell that there is a very. That. OK then. But when we have some overlap, OK, then there are concepts called fuzzy clustering. OK, for the

clustering is there in. It means for every 10 it has some. Number belonging to. I think it is not a very yes or no situation, OK. OK. So if you have some technical, uh,

location where this overlap and then you have to apply this kind of algorithms. What I'm showing here there are basically that there is a clear yes or no whether this

item belongs to this or not this clear. Okay, overlap is not there. Ohh. Yeah, some of it will. Or some other functions. What is an outlier? Any data point which does not

comply with the state hypothesis or regulations. That. That exceptional values which are. It means the development will will not comply with the general training.

Organize the data that does not combine the general behavior of the data. That they get so much as to allow suspicion that it was generated by different. Definition. But

something is so different that you suspect whether they have confirmation or not or not. More like you have some friend of data like this. OK. And some data point is

here. Exception. It made it made noise or if it. OK OK so method how you find is the byproduct of clustering or regression analysis. So if you perform clustering. OK,

they are not following in any of the classroom then. Or when regression analysis you're doing. You line and one which are very far from the line. Can we call them an

outline? So useful information there, even emphasis. Ohh. What happens in when you're doing people? King that you think that you go on outlier later you throw it.

OK. But you have to be careful whether to depends on when depends on the. OK, OK, OK, suppose you are working on fall detection case. OK, OK, the fraud

transactions are outliers. OK, the 99% of the. OK, you'll you'll have a very plain or something, but the data points which are of interest for this application and the

outliers. OK, so don't blindly throw throw the during this. Task is to analyze the ocular zone. Like like what detection where you read the? OK. Then you have trained

and trained and lives. Cortana. Something is changing over time. OK. You want to end the to end time series, time series, time series and division. So division versus

the stock market, stock market you have? OK, fine. What time? What time Then you may do fine 125. You've done. In this picture you'll see the left hand side is not

working right. Okay, sequential. So you can also this is a continuous data based on time for some some sequential pattern, OK. Like what do you mean somebody

bought a opened and bank account savings account? Up in the. Again. By the digital camera and then. The time series in the time series. OK. So. This slide. Making of

the machine learning items so Other one. So you have what's the applications? That they're learning functionality but. The damage of the reaction of the communist

works. Really less developed. One of the new things. OK. Yeah. Any questions? Then we say learning, learning AI and data analytics are part of OK, OK, OK, OK.

Like. OK, OK. There there are different differences between AI and machine learning. OK, then you have other. Yeah. OK, OK. Buzzwords. We need any. There are

some. Will you? Send it. So. No, the question is machine this machine learning algorithms are capable yeah generating lot of patterns. All the things are all patterns.

There's patterns. Interesting. Has the potential to generate millions of small family, small family, small family, small person. Intervention. That intervention, that

intervention. So now you should be able to answer this question. Listening to my all these likes when you will say that something is interesting. A button which didn't

go proper inside. Like that Walmart is that is that hurricane? OK. You don't know. This this. Video. Knowing. So so. It should be easily understood. Then valid on new

or test data with some believers. OK, then potentially useful and Marvel and Marvel and Marvel and Marvel. Think that interesting but interesting knowledge,

Knowledge something found something interesting is something. Yeah. Testing this. As follows. Suppose confident accuracy coverage. Our next question.

Appointment. What are you doing? What are you doing? It might be OK. It wasn't by from the. That's what I see is. But really, how many times you are correct? Do I

have? Please. Any given. Show. But. I think city city. These are all relative values evaluation, but from perspective perspective. More, more red ones. Object things,

object things. These are some. These are #1 #1 #1. Quantifying. OK, like #2 support 2160% like this and very subjective. This cannot be measured in numbers. OK,

done. Done. In. Something new is there or not? What? And actually calling. OK, so. Is the is not a single discipline? You know. The weather. Technology, technology.

We have. The subject. So that is the beauty, that is the string that. It it has a. From different machine learning Computer science. So the domain is the cost. What do you

want? The next topic is AI. Yeah, yeah. What is you? I will end here. Will there be any questions? So you will share the presentation, right? Open the. Assignment is

you have to package uh share uh share, uh share, uh share data one data. Open. Type open. Your language you are using. And see what? And see what. And and. The

first three videos here. Frame and all that. This. Hope. Yeah, we'll meet next week. Yeah. Good question. Yeah, there's a question that, you know, I don't have. I don't

have kind of, you know, kind of, you know, kind of, you know, kind of, you know, do you take some of this? I've shared that. Basic videos Video OK. Opening

opening maybe like to? Learn OK a lot of available so when I'm teaching other. I will share the photo. Ohh. OK. Are you guys idea is not in the, not in the? OK. OK.

Last few questions. Yeah. Enjoy the boat and enjoy the boat. Do do do. Next. Yeah, OK, OK. Yeah, you have to just, you have to just copy, paste, copy paste on your
on your own. And the post the post results again. OK. OK. How much did you? All Python. Experience. How to install 2nd? One second, third, one second, third.

Simple. This. OK. You fine, OK. Hmm. Any other any other questions? We can go. If we don't have any, we stop. We stop here. London is like and to the profit to the

portal to the portal. Go yeah i'm hoping by

You might also like