Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 19

MGM’s

JAWAHARLAL NEHRU ENGINEERING COLLEGE,


AURANGABAD.

(Affiliated to Dr.B.A.Technological University, Lonere)

INTERNSHIP REPORT
For
Second year
On
“Artificial Intelligence Foundations”

Submitted by

Priyanka Bharat Jadhav (206232)

Ms. C. G. Patil Dr V. B. Musande


Internship Mentor HOD (CSE)

Department of Computer Science and Engineering


Academic Year
2020-21
Index

Sr No Chapter No Page No
1 Summary 3
2 List Of contents 4
3 Details of the topic as per the list of content 5

3.1 Introduction to artificial Intelligence 5

3.2 Database concepts 7

3.3 AI Programming fundamentals : Python 10

3.4 AI Statistics -Python 12

3.5 Data Visualization 15

4 Certificate 19
Chapter 1
Summary

Computers with the ability to mimic or duplicate the functions of Human Brain It is the
intelligence of machine & the Branch of Of computer science which aims to create smart
Machine which performs task using human intelligence.
Examples of AI like Sri , Google Assistant , Email spam detection , Self-driving cars ,
Shopping or Movies Recommendations based on our search history.
The basic foundations which are required to learn Artificial Intelligence are some
database concepts in that standard language SQL and all data types data formats of SQL ,then
some Introduction to database ,relational database its implementation with SQL language.
Basic functional language required for Artificial Intelligence is Python so we have seen all
basics of python language , how python is used in various fields of AI.
In AI Statistics we have seen the basic concepts of probabilities and statistics in that
descriptive concepts like mean , Mode , median , Standard deviation , Correlations ,
Probability and Linear regression in python language.
Python Libraries like Matplotlib , seaborn and geoplotlib are used to Data
visualization with python. Data can be represented more efficiently and visually with this
libraries.

Silent features of course:

Upon successful completion of this course, the student will be able to:

 Evaluate AI trends and career opportunities

 Describe how AI impacts business

 Demonstrate how to use Python as an AI Tool

 Create data visualizations with Python

 Evaluate the different forms of graphs

 Demonstrate the impact that statistics has on AI research

 Evaluate the fundamentals of SQL and data structures.


Chapter 2

List of contents

There are five courses (topics) in this program.

 1: Introduction to Artificial Intelligence

 2: Database Concepts

 3: AI Programming Fundamentals: Python

 4: AI Statistics – Python

 5: Data Visualization with Python


Chapter 3

Details of the topic as per the list of content

3.1 Introduction to artificial Intelligence

3.1.1 Evaluation and definition of AI

Artificial Intelligence is “the science and engineering of making intelligent


machines,” as defined by computer scientist John McCarthy in 1956.

The genesis of AI, as we know it today, begins in the classical age, when the Ancient
Greeks told of the god Hephaestus creating robots to assist him in his workshop. In
13th Century Bagdad, an inventor named Al-Jazari built a water-powered automated
orchestra that worked by the means of a rotating drum with pegs that operated levers
to create different musical sounds depending on the position of the pegs. This could
be considered the first example of a programmable machine—in other words, a
computer.

In the centuries that followed, automata, or moving mechanical devices made to


imitate the actions of a living being, were created by inventors and mathematicians
like the mechanical lion designed and built by Leonardo da Vinci, mechanical dolls--
The Musician, The Draughtsman, and The Writer-- created by Swiss watchmaker and
mathematician Pierre Jaquet-Droz, and the Euphonia-- a machine that could draw,
write, speak, sing, and laugh-- built by Joseph Faber over 25 years. 
3.1.2 Key AI concepts and Treminology

 Cognitive Computing

Cognitive Computing is a computer programmed to learn like a human, but


faster. It can process pre-programmed information as well as take in new information,
independently interpret it, then make decisions, and take action accordingly. In this
fashion it can “think” by ingesting and synthesizing information—and also re-
formulate outcomes by rejecting data that does can assess and adapt; however, there is
still a steep learning curve around context, which is, so far, the gulf between human-
like AI decisions and humans.   

 Machine Learning

Machine Learning is the part of AI that trains a machine on how to learn based
on manually input features and classifiers. This is a method of analysis learned from
data patterns or examples applied to learning experiences that result in unique data
systems with minimal human influence. The life cycle of machine learning is to pose
the problem, collect the data, train the algorithm, test the result, collect the feedback,
and re-calibrate the solution using that feedback, and reapply it to the problem.

Through this cycle, the machine keeps data that is useful and disregards data that is
not—the goal is to create an algorithm that will create a better algorithm. The
challenge is to ensure that the source data is clean, accurate, well-labeled, and free of
biased data.

 Deep Learning

Deep Learning is a specialized form of machine learning that uses many layers
of neural networks that are arranged hierarchically—but is able to learn from data that
is unstructured and unlabeled and will process and apply structure to make meaning
automatically. [ Finding conflicting definitions of needing to be labeled vs unlabeled
data]. The deep in deep learning refers to the layers upon layers of neural networks
used in processing the data. Examples of deep learning are image and speech
recognition, and natural language processing. In Machine Learning humans act as
trainers for the program; in Deep Learning neural networks stand-in for the human
and act as the trainer for the model. 

 Neural Networks

Neural Networks attempt to mimic the way the brain works and learns with
multiple nodes between the input and the output with the program building
connections between those nodes. The interrelation and strength of those connections
are what influence the output. The more often a specific connection is made, the
stronger it becomes. Patterns are built out of the strongest data and can be compared
to those of other programs in order to strengthen the network and amassing
knowledge of the way that human brains do. A neural network will learn from
example, so that the more examples it sees, the better it learns. And when it has
enough examples and non-examples, it is able to make comparisons and learn.

 Natural Language Processing

Natural Language Processing is the interaction between humans and


computers using natural human language. This is quite complex as language needs to
be processed by syntax, semantics, and context. There are also the high-level
abstractions of human communication to consider like sarcasm, inference, and tone.

3.2 Database concepts:

3.2.1 SQL & Relational database models

A relational database is a collection of information (data) that organizes data


with defined relationships for easy access.  In a relational model, the data is stored as
tables.

Some of the more popular relational databases include:

 SQL Server and Access - Microsoft

 Oracle and RDB – Oracle


3.2.2 Types of SQL Relationships

There are two types of relationships in database

1. One-to-one(1:1)
Each primary key value relates to only one (or no) record in the related table.
They're like spouses—you may or may not be married, but if you are, both you and
your spouse have only one spouse. Most one-to-one relationships are forced by
business rules and don't flow naturally from the data. In the absence of such a rule,
you can usually combine both tables into one table without breaking any
normalization rules.
2. One-to-many(1:M)
The primary key table contains only one record that relates to none, one, or
many records in the related table. This relationship is similar to the one between you
and a parent. You have only one mother, but your mother may have several children.

The endpoints of the line indicate whether the relationship is one-to-one or one-to-
many.

1.      If a relationship has a key at one endpoint and a figure-eight at the other, it is a
one-to-many relationship.

2.  If a relationship has a key at each endpoint, it is a one-to-one relationship.

3.2.3 Data Types and data formats

1. Numeric data types


o Int – -2,147,483,648) to 2,147,483,647
o Big int -  -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

2. Date and Time


o Date - YYYY-MM-DD 2020-06-01
o Time – HH:MM:SS 23:59:59
o Datetime - YYYY-MM-DD HH:MI:SS  2020-06-01 12:00:01

3. Character and string


 Char: It’s a fixed-length character string and we can store maximum 8000
characters
 varchar(n): It’s a variable-length character string and we can store
maximum 8000 characters
 varchar(max): It’s a variable-length character string and we can store
maximum 2^31-1 characters (upto 2 GB)

3.2.4 No SQL Databases

NoSQL, stands for “not only SQL.” NoSQL is a different approach to


database design that provides for more flexible schemas for the storage and retrieval
of data. NoSQL databases have been around for many years but are now becoming
more popular because of their flexibility and ability to deal with big data and high-
volume web and mobile applications. They are chosen today for their attributes
around scale, performance, and ease of use. The most common types of NoSQL
databases are key-value, document, column, and graph databases.

It's important to emphasize that the "No" in "NoSQL" is not the actual word
"No." This distinction is important only because many NoSQL databases support SQL
like queries, but because in a world of microservices and polyglot persistence, NoSQL
and relational databases are now commonly used together in a single application.

NoSQL databases do not follow all the rules of a relational database —


specifically, it does use a traditional row/column/table database design and does not
use structured query language (SQL) to query data.

3.2.5 Implementation of Database structures

 The standard SQL commands such as “USE”, "SELECT”, "UPDATE",


"DROP", "CREATE", can be used to accomplish almost everything that
one needs to do with a database.
 Selecting and retrieving data from SQL database
 Filtering and sorting of data
3.3 AI Programming fundamentals : Python

3.3.1 Data types

1. Sting Data type

In the previous exercise, you printed “I am learning Python!” to the console. This
is a string. String is a collection of characters. In Python, it is referred to as “str”
data type.

2. Numeric data type


Python support two types of numeric data types - int and float.

int is short for integer. Integers are whole numbers. They don’t have any
decimal part.

When the numbers have a decimal part, as in 3.14 or 2.5, it is called


a float data type.

3. Boolean data type

Python also has a bool data type which stands for Boolean values. It can take
only two values. True or False. Note that there are no quotes and the values start
with an upper-case letter.

To find the type of any value in Python, we can type "type(<value>)" and it
gives the type.

3.3.2 Data Structures


List:
List is a collection of objects. The objects can be int, str, float, bool, or even one of
the collection objects. Each object can be of a different type. There can be multiple copies of
the same object. List is created with square brackets around the collection, with each element
of the collection separated by a comma.

This is how a list object looks in python.

[1,”two”,3.5,True,1,”two”]

Tuple:

Tuple is also a collection of diverse type of objects. The objects can be int, str, float,
bool or even one of the collection objects. Tuples allows more than one copy of the object.

Tuple is created with rounded brackets around the collection, with each element of the
collection separated by a comma.

This is how a tuple looks in python.

(1,”two”,3.5,True,1,”two”)

Tuples are immutable. You cannot change them from the original form. Tuples are
most useful when you want to create a collection object which you don’t want to be changed
for the life of the application.

Sets:

Set is also a collection of diverse type of objects. The objects can be int, str, float,
bool or a tuple. Sets do not allow more than one copy of the object. A Set is created with
curly brackets around the collection, with each element of the collection separated by a
comma.

This is how a set looks in python.

{1,”two”,3.5,True}
Sets are mutable. You can change them from the original form. You add elements,
update elements, remove elements from the set. But the elements in a set have to be
immutable. That is why you cannot have a list or set as an element inside a set

Dictionary:

Dict stands for dictionary. Dict is a special collection object which maintains a key-
value pairs. The key is a unique identifier and the value is an object. The objects can be int,
str, bool, float or any other objects. There can be more than one copy of the same object. But
the keys have to immutable objects and have to be unique. Usually, str objects are used as
keys. No two objects can have same keys.

This is how a dict looks in python.

{"a":"This is first object","b":1,"c":2.5,"d":True,"e":[1,2,3],"f":(4,5,6)}

3.3.3 Loops and control statements

 For & While loop


 If else ,if..elif statement , Nested if statement
 Break , Continue

3.4 AI Statistics -Python

3.4.1 Basic Statistics Concepts

Statistics is simply defined as a form of mathematical analysis that uses quantified,


statistical models and for a given set of experimental data or real-life studies. One of the main
focus of statistics is that statistical information can be presented in a visual manner such as
charts, and graphs.
Statistics play a central role in AI research and development. Stats allow us to see the
"big picture" and the impact on our projects.  For example, employees in Big Company, Inc.
have been complaining that their offices are always too hot, while workers in the warehouse
complain the temperature is too cold.  The building heating and cooling is run by HVAC
systems that are controlled by micro sensors and an AI-based master controller, in other
words, an AI system.  Statistical analysis of the temperature variants over a period of time
will show where the issues in each are of the building are. By analyzing these statistical
variants we can reprogram our AI-based master controller to "learn" the best times of the day
to adjust the temperatures according to the ambient temperature both inside and outside of the
building.  This will help to regulate the specific environmental conditions for individual areas
of the building.

Most statistical concepts are concerned with the following analytics:

3.4.2 Descriptive Statistics

Mean:

 The mean is the simple mathematical average of a set of two or more numbers  

 The mean is the most common measure of the central tendency of a set of points.
However, the mean is very sensitive to outliers. 

 Mean can only be used with numeric data.  

Mode:

 The frequency of an attribute value is the number of times the value occurs in the data
set. 

 It is found by collecting and organizing the data in order to count the frequency of
each result.

 The mode is the most frequent number—that is, the number that occurs the highest
number of times.

 The notions of frequency and mode are typically used with categorical data, but it can
be used on any data type. 
Median:

 The middle number of a set of numbers is called the median.  It is found by ordering


all data points and picking out the one in the middle (or if there are two middle
numbers, taking the mean of those two numbers). 

 It may be thought of as the "middle" value of a data set.

3.4.3 Probabilities

Statistical Probability:

Probability is the branch of mathematics concerning numerical descriptions of how


likely an event is to occur or how likely it is that a proposition is true. Probability is used
extensively in AI to develop machine learning models which will predict the outcome based
on probability. 

Bayes theorem states that the probability of an event A happening, given that
event B has happened, is equal to the probability of an event B happening, given that
A has happened multiplied by the probability of A happening divided by the
probability of B happening. 

Categorical Data:

Categorical data is a collection of information that is divided into groups.

Types of Categorical Data:

1. Nominal Data: This is a type of data used to name variables without providing any
numerical value.

2. Ordinal Data: This is a data type with a set order or scale to it. However, this order does
not have a standard scale on which the difference in variables in each scale is measured.

Quantitative data :
Quantitative data are measures of values or counts and are expressed as numbers, both whole
numbers and numbers with decimals. 

Quantitative data are data about numeric variables (e.g. how many; how much; or how often).

Types of Quantitative data:

 Counter: Count equated with entities. For example, the number of people who
download a particular application from the App Store.

 Measurement of physical objects: Calculating measurement of any physical thing.


For example, the HR executive carefully measures the size of each cubicle assigned to
the newly joined employees.

 Sensory calculation: Mechanism to naturally “sense” the measured parameters to


create a constant source of information. For example, a digital camera converts
electromagnetic information to a string of numerical data.

3.4.4 Correlations

Correlation is a statistical measure that expresses the extent to which two variables are
linearly related (meaning they change together at a constant rate). How is the correlation
measured? The sample correlation coefficient, r, quantifies the strength of the relationship.
Correlations are also tested for statistical significance.

Pearson correlation is the one most commonly used in statistics. This measures the
strength and direction of a linear relationship between two variables.

Values always range between -1 (strong negative relationship) and +1 (strong positive
relationship). Values at or close to zero imply weak or no linear relationship.

Correlation coefficient values less than +0.8 or greater than -0.8 are not considered
significant.

3.4.5 Linear Regression


A statistical measure that attempts to determine the strengths of the relationship between one
dependent variable (usually denoted by Y) and a series of the other changing variable
(Known as the independent variable).

- In other words: Regression is a mathematical technique used to estimate the cause and
effect relationship among variables.

3.5 Data Visualization with Python

3.5.1 Visualization Dashboards

Dashboards allow you to develop visualizations, without needing to create


complicated code. They allow you to take your dataset and create a range of graphics with
ease. Three of the most popular visualization dashboards are Microsoft Excel, Tableau
Desktop, and Microsoft Power BI. 

Basic visualization tools can help you build the most commonly used types of charts, graphs,
and plots.

Specialized visualization tools provide more in-depth visualization controls, so you can
develop more specialized visualizations

3.5.2 Visualization Tools

Specialized visualization tools provide more in-depth visualization controls, so you


can develop more specialized visualizations. We’ll discuss the following specialized
visualization tools: ggplot and the Tableau Desktop map functionality.

 ggplot from yhat implements code using a high-level but very expressive API. The result is
less time spent creating charts, and more time interpreting what they mean. ggplot is not a
good fit for people trying to make highly customized data visualizations. ggplot has a
symbiotic relationship with pandas. If you're planning on using ggplot, it's best to keep your
data in DataFrames, or a tabular data object. Also, ggplot working by putting visualizations
into layers. For example, you could start with data points, then add a line connecting the
points, and then finally add a trendline.

ggplotlib:

geoplotlib is an open-source Python toolbox for visualizing geographical data.


According to Andrea Cuttone, et al., the creator of the software, “geoplotlib supports the
development of hardware-accelerated interactive visualizations in pure python, and provides
implementations of dot maps, kernel density estimation, spatial graphs, Voronoi tesselation,
shapefiles and many more common spatial visualizations.”

geoplotlib can create a dot map, 2D histogram, heatmap, map with markers, spatial
graph, Voronoi tessellation, Delaunay triangulation, convex hull, shapefiles, and GeoJSON.

3.5.2 Graph types

Building of Pie charts, Line chart , Bar chart , Column chart, Ring Plots.

Pie charts:

A pie chart represents a whole unit, divided into categories represented as percentages of the
whole. When you add up the separate categories, they should add up to 100%.

Line Charts:

A line graph is designed to reveal trends or changes that occur over time. It is best used when
you have a continuous data set, versus one that starts and stops.

Bar Charts:

Bar graphs are similar to column charts, in that you can use them in the same way. However,
column charts limit your label and comparison space.
Column Graph :

This is one of the most common types of data visualization tools because they are a simple
way to show a comparison among different sets of data.

Ring Plots:

Ring Plots are like pie charts only they have openings in the middle that are for aesthetic
purposes or for layering.

Map Graphs:

Map graphs allow you to plot data onto a geographic area, either using predefined lists (like
country) or custom lists (like company sales regions).
Chapter 4: Certificate

You might also like