Professional Documents
Culture Documents
Assignment 1 Eti
Assignment 1 Eti
Assignment 1 Eti
Technology
TYCO A
ROLL NO 21
ASSIGNMENT 1
1.Explain History of AI
o Year 1955: An Allen Newell and Herbert A. Simon created the "first artificial
intelligence program"Which was named as "Logic Theorist". This program had
proved 38 of 52 Mathematics theorems, and find new and more elegant proofs for
some theorems.
o Year 1956: The word "Artificial Intelligence" first adopted by American Computer
scientist John McCarthy at the Dartmouth Conference. For the first time, AI coined as
an academic field.
At that time high-level computer languages such as FORTRAN, LISP, or COBOL were
invented. And the enthusiasm for AI was very high at that time.
The golden years-Early enthusiasm (1956-1974)
o The duration between years 1974 to 1980 was the first AI winter duration. AI winter
refers to the time period where computer scientist dealt with a severe shortage of
funding from government for AI researches.
o During AI winters, an interest of publicity on artificial intelligence was decreased.
A boom of AI (1980-1987)
o Year 1980: After AI winter duration, AI came back with "Expert System". Expert
systems were programmed that emulate the decision-making ability of a human
expert.
o In the Year 1980, the first national conference of the American Association of Artificial
Intelligence was held at Stanford University.
o The duration between the years 1987 to 1993 was the second AI Winter duration.
o Again Investors and government stopped in funding for AI research as due to high
cost but not efficient result. The expert system such as XCON was very cost effective.
o Year 1997: In the year 1997, IBM Deep Blue beats world chess champion, Gary
Kasparov, and became the first computer to beat a world chess champion.
o Year 2002: for the first time, AI entered the home in the form of Roomba, a vacuum
cleaner.
o Year 2006: AI came in the Business world till the year 2006. Companies like
Facebook, Twitter, and Netflix also started using AI.
Now AI has developed to a remarkable level. The concept of Deep learning, big data, and
data science are now trending like a boom. Nowadays companies like Google, Facebook,
IBM, and Amazon are working with AI and creating amazing devices. The future of Artificial
Intelligence is inspiring and will come with high intelligence.
ANS-Data mining, also known as knowledge discovery in data (KDD), is the process
of uncovering patterns and other valuable information from large data sets. Given the
evolution of data warehousing technology and the growth of big data, adoption of
data mining techniques has rapidly accelerated over the last couple of decades,
assisting companies by transforming their raw data into useful knowledge. However,
despite the fact that that technology continuously evolves to handle data at a large-
scale, leaders still face challenges with scalability and automation.
Data mining has improved organizational decision-making through insightful data
analyses. The data mining techniques that underpin these analyses can be divided
into two main purposes; they can either describe the target dataset or they can
predict outcomes through the use of machine learning algorithms. These methods
are used to organize and filter data, surfacing the most interesting information, from
fraud detection to user behaviors, bottlenecks, and even security breaches.
When combined with data analytics and visualization tools, like Apache Spark,
delving into the world of data mining has never been easier and extracting relevant
insights has never been faster. Advances within artificial intelligence only continue to
expedite adoption across industries.
The data mining process involves a number of steps from data collection to
visualization to extract valuable information from large data sets. As mentioned
above, data mining techniques are used to generate descriptions and predictions
about a target data set. Data scientists describe data through their observations of
patterns, associations, and correlations. They also classify and cluster data through
classification and regression methods, and identify outliers for use cases, like spam
detection.
Data mining usually consists of four main steps: setting objectives, data gathering
and preparation, applying data mining algorithms, and evaluating results.
1. Set the business objectives: This can be the hardest part of the data mining
process, and many organizations spend too little time on this important step. Data
scientists and business stakeholders need to work together to define the business
problem, which helps inform the data questions and parameters for a given project.
Analysts may also need to do additional research to understand the business context
appropriately.
2. Data preparation: Once the scope of the problem is defined, it is easier for data
scientists to identify which set of data will help answer the pertinent questions to the
business. Once they collect the relevant data, the data will be cleaned, removing any
noise, such as duplicates, missing values, and outliers. Depending on the dataset,
an additional step may be taken to reduce the number of dimensions as too many
features can slow down any subsequent computation. Data scientists will look to
retain the most important predictors to ensure optimal accuracy within any models.
3. Model building and pattern mining: Depending on the type of analysis, data
scientists may investigate any interesting data relationships, such as sequential
patterns, association rules, or correlations. While high frequency patterns have
broader applications, sometimes the deviations in the data can be more interesting,
highlighting areas of potential fraud.
Deep learning algorithms may also be applied to classify or cluster a data set
depending on the available data. If the input data is labelled (i.e. supervised
learning), a classification model may be used to categorize data, or alternatively, a
regression may be applied to predict the likelihood of a particular assignment. If the
dataset isn’t labelled (i.e. unsupervised learning), the individual data points in the
training set are compared with one another to discover underlying similarities,
clustering them based on those characteristics.
4. Evaluation of results and implementation of knowledge: Once the data is
aggregated, the results need to be evaluated and interpreted. When finalizing
results, they should be valid, novel, useful, and understandable. When this criteria is
met, organizations can use this knowledge to implement new strategies, achieving
their intended objectives.
ANS- A Dataset is a set or collection of data. This set is normally presented in a tabular
pattern. Every column describes a particular variable. And each row corresponds to a given
member of the data set, as per the given question. This is a part of data management. Data sets
describe values for each variable for unknown quantities such as height, weight, temperature,
volume, etc of an object or values of random numbers. The values in this set are known as
a datum. The data set consists of data of one or more members corresponding to each row. In
this article, let us learn the definition of the dataset, different types of datasets, properties, and so
on with many solved examples.
A data set is an ordered collection of data. As we know, a collection of information obtained
through observations, measurements, study, or analysis is referred to as data. It could include
information such as facts, numbers, figures, names, or even basic descriptions of objects. For
our study, data can be organized in the form of graphs, charts, or tables. Through data mining,
data scientists assist in the analysis of gathered data.
A dataset is a set of numbers or values that pertain to a specific topic. A dataset is, for example,
each student’s test scores in a certain class. Datasets can be written as a list of integers in a
random order, a table, or with curly brackets around them. The data sets are normally labelled so
you understand what the data represents, however, while dealing with data sets, you don’t
always know what the data stands for, and you don’t necessarily need to realize what the data
represents to accomplish the problem.
In Statistics, we have different types of data sets available for different types of information. They
are:
Numerical Datasets
The numerical data set is a data set, where the data are expressed in numbers rather than
natural language. The numerical data is sometimes called quantitative data. The set of all the
quantitative data/numerical data is called the numerical data set. The numerical data is always in
the numbers form, such that we can perform arithmetic operations on it.
Bivariate Datasets
A data set that has two variables is called a Bivariate data set. It deals with the relationship
between the two variables. Bivariate dataset usually contains two types of related data.
Example: To find the percentage score and age of the students in a class. Score and age can be
considered as two variables
2. The sales of ice cream versus the temperature on that day. Here the two variables used
are ice cream and temperature.
(Note: In case, if you have one set of data alone say, temperature, then it is called the univariate
dataset)
Multivariate Datasets
A data set with multiple variables. When the dataset contains three or more than three data
types (variables), then the data set is called a multivariate dataset. In other words, the
multivariate dataset consists of individual measurements that are acquired as a function of three
or more than three variables.
Example: If we have to measure the length, width, height, volume of a rectangular box, we have
to use multiple variables to distinguish between those entities.
Categorical Datasets
Categorical data sets represent features or characteristics of a person or an object. The
categorical dataset consists of a categorical variable also called the qualitative variable, that can
take exactly two values. Hence, it is termed as a dichotomous variable. Categorical
data/variables with more than two possible values are called polytomous variables. The
qualitative/categorical variables are often assumed to be polytomous variable unless otherwise
specified.
Example:
Correlation Datasets
The set of values that demonstrate some relationship with each other indicates correlation data
sets. Here the values are found to be dependent on each other.
Generally, correlation is defined as a statistical relationship between two entities/variables. In
some scenarios, you might have to predict the correlation between the things. It is essential to
understand how correlation works. The correlation is classified into three types. They are:
Positive correlation – Two variables move in the same direction (Either both are up or
both or down)
Negative correlation – Two variables move in opposite directions. (One variable is up and
another variable is down and vice versa)
No or zero correlation – No relationship between two variables.
Example: A tall person is considered to be heavier than a short person. So here the weight and
height variables are dependent on each other.