Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 31

UNIT 3

Experimental Skills

Science is all about experiments, as all the hypothesis can


become a law only when they are justified experimentally.
Experimentation involves lots of terms like precision and
accuracy etc. also, all the observations and their corresponding
values obtained are approximate.
Experiments are very important because they are the ones
that really validate something that the theory says right.

Experimental Skills
 Experimental skills are the kinds of skills that we should
have to ensure that our experiments are running correctly,
for us to feel confident that the data that we have obtained
from the experiments are correct; and therefore, we can
now, you know, with great confidence we can say that if
the theory does not match that experimental data, then there
is some issue with the theory.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 1


UNIT 3

 If you have run the experiment incorrectly, then obviously,


this is not going to hold true. So, in that case you cannot
confidently say that the theory is not correct or the theory is
incomplete.
 So, it is very important for you to run your experiments
correctly and if you are an experiment list, that is
something that will have to put up with all the time.
 You should feel ready to indicate in what ways you have
run your experiment correctly, and your experiment should
be open to scrutiny - people should be able to ask you lot of
questions on how you ran the experiment. You should be
able to defend how you ran your experiments. And learning
experimental skills is a very important aspect associated
with that right.
 Experimental skills provide us with knowledge of the
physical world, and it is the experiments that provide us the
evidence that grounds this knowledge. One of the major
roles of experimental skills is to test theories and to provide
the basis for scientific knowledge. So, we have to run our
experiments carefully and correctly.

How to do Experiments -
1. Make observations
2. Form a hypothesis
3. Make a prediction.
4. Perform an experiment. 
5. Analyze the results of the experiment.
6. Draw a conclusion.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 2


UNIT 3

7. Report your results

Experimental skills is important for experimental research .


what is experimental research ?
 So ,lets try to understand this through examples.
 Imagine taking 2 samples of the same plant and exposing
one of them to sunlight, While the other is kept away from
sunlight .
Let the plant exposed to sunlight be called sample A While latter
is called as sample B.
If after the duration of research,we find out that sample A grows
and b dies. Even though they are both regularly wetted and
given same treatment.
Therefore we can conclude that sunlight will aid growth in all
similar plants.
Another example
 Now a day’s corona is spread in world. All the scientists of
the world, inside the laboratory are using different
medicines on corona samples to see the effect of different
medicines on the removal of disease.
So, what are scientists doing here?
Scientists manipulates independent variables.
Here medicines are independent variables.
Whom to see independent variables effect?
On corona ok here corona is dependent variable.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 3


UNIT 3

Meaning of experimental research ?


Experimental research according to the examples we have just
given.
“Experimental research is a scientific and systematic approach
to research, where one or more independent variables are
manipulated and applied to one or more dependent variables to
measure their effect on the end.”

The effect of the independent variables on the dependent


variables is usually observed and recorded over some time, to
help researchers in drawing a reasonable conclusion regarding
the relationship between these 2 variable types.

Characteristics of experimental research


 Control
Variables that are not of direct interest to the
researcher,called extraneous variables, need to be
controlled. Control refers to removing or minimising the
influence of such variables.
 Manipulation
Manipulation refers to a deliberate operation of the
conditions by the researcher. In this process, a per-
determined set of conditions, called independent variable or
experimental variable. It is also called treatment variable.
Such variables are imposed on the subjects of experiment.
In specific terms manipulation refers to deliberate operation
of independent variable on the subjects of experimental
group by the researcher to observe its effect.
 Observation

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 4


UNIT 3

In experimental research, the experimenter observes the


effect of the manipulation of the independent variable on
dependent variable. The dependent variable, for example,
may be performance or achievement in a task.
 Replication
Replication is a matter of conducting a number of sub-
experiments, instead of one experiment only, within the
framework of the same experimental design. The researcher
may make a multiple comparison of a number of cases of
the control group and a number of cases of the
experimental group.

Steps in experimental Research


1. Selecting and defining the problem.
After deciding the topic of interest, the researcher tries
to define the research problem. This helps the
researcher to focus on narrow research area to be able
to study it appropriately. defining the research
problem helps you to formulate a research hypothesis.
2. Surveying the literature.
In the research process ,the literature survey stands as the
important point of all activities.
This literature survey helps researcher weather the topic
is value studying and it provide inside into ways in
which the researcher can limit the scope to a needed area
of inquiry.
3. Starting hypothesis
It is almost impossible for a researcher not to have any
hypothesis or objectives before proceeding with his
works.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 5


UNIT 3

Because hypothesis or the objectives shows the direction


to a researcher.
That is why in experimental research the research design
is built around the tentative hypothesis or clearly define
objectives.
4. Construct an experimental design
It represents all the elements, condition and relations of
the some consequences:
 Select sample of subjects

Examples of Experimental Research


Experimental research examples are different, depending
on the type of experimental research design that is being
considered. The most basic example of experimental
research is laboratory experiments, which may differ in
nature depending on the subject of research.
1. Administering Exams After The End of Semester
During the semester, students in a class are lectured on
particular courses and an exam is administered at the end of
the semester. In this case, the students are the subjects or
dependent variables while the lectures are the independent
variables treated on the subjects.
Only one group of carefully selected subjects are
considered in this research, making it a pre-experimental
research design example. We will also notice that tests are
only carried out at the end of the semester, and not at the
beginning. Further making it easy for us to conclude that it
is a one-shot case study research. 
2. Employee Skill Evaluation

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 6


UNIT 3

Before employing a job seeker, organizations conduct tests


that are used to screen out less qualified candidates from
the pool of qualified applicants. This way, organizations
can determine an employee's skill set at the point of
employment.
In the course of employment, organizations also carry out
employee training to improve employee productivity and
generally grow the organization. Further evaluation is
carried out at the end of each training to test the impact of
the training on employee skills, and test for improvement.
Here, the subject is the employee, while the treatment is the
training conducted. This is a pretest-posttest control group
experimental research example.
3. Evaluation of Teaching Method
Let us consider an academic institution that wants to
evaluate the teaching method of 2 teachers to determine
which is best. Imagine a case whereby the students
assigned to each teacher is carefully selected probably due
to personal request by parents or due to stubbornness and
smartness.
This is a no equivalent group design example because the
samples are not equal. By evaluating the effectiveness of
each teacher's teaching method this way, we may conclude
after a post-test has been carried out.
However, this may be influenced by factors like the natural
sweetness of a student. For example, a very smart student
will grab more easily than his or her peers irrespective of
the method of teaching.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 7


UNIT 3

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 8


UNIT 3

MEANING OF DATA ANALYSIS


In any research,the step of analysis of the data is one of the most
crucial tasks requiring proficient knowledge to handle the data
collected as per the pre decided research design of the project.
Analysis of data is defined byProf Wilkinson and
Bhandarkaras
-A number of closely related operations that are performed with
the purpose of summarizing the collected data and organizing
these in such a manner that they will yield answers to the
research questions or suggest hypothesis or questions if no such
questions or hypothesis had initiated the study.
According to Goode , Barr and Scales,
analysis is a process which enters into research in one form or
another form the very beginning...It may be fair to say that
research consists in general of two larger steps – the gathering of
data, but no amount of analysis can validly extract from the data
factors which are not present.
In his book on research methodology, C. R. Kothari explains
that the term analysis refers to the computation of certain
measures along with searching for patterns of relationship that
exist among data -groups. He quotes G.B.Giles to further
elaborate the concept as “in the process of analysis, relationships
or differences supporting or conflicting with original or new
hypotheses should be subjected to statistical tests of significance
to determine with what validity data can be said to indicate any
conclusions

Data analysis is defined as a process of cleaning, transforming,


and modeling data to discover useful information for business

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 9


UNIT 3

decision-making. The purpose of Data Analysis is to extract


useful information from data and taking the decision based upon
the data analysis.
A simple example of Data analysis is whenever we take any
decision in our day-to-day life is by thinking about what
happened last time or what will happen by choosing that
particular decision. This is nothing but analyzing our past or
future and making decisions based on it. For that, we gather
memories of our past or dreams of our future. So that is nothing
but data analysis. Now same thing analyst does for business
purposes, is called Data Analysis.

Data Analysis Tools

Data analysis tools make it easier for users to process and


manipulate data, analyze the relationships and correlations
between data sets, and it also helps to identify patterns and
trends for interpretation. Here is a complete list of tools used for
data analysis in research.
Types of data in research

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 10


UNIT 3

Every kind of data has a rare quality of describing things after


assigning a specific value to it. For analysis, you need to
organize these values, processed and presented in a given
context, to make it useful. Data can be in different forms; here
are the primary data types.

 Qualitative data: When the data presented has words and


descriptions, then we call it qualitative data. Although you can
observe this data, it is subjective and harder to analyze data in
research, especially for comparison. Example: Quality data
represents everything describing taste, experience, texture, or
an opinion that is considered quality data. This type of data is
usually collected through focus groups, personal interviews, or
using open-ended questions in surveys.
 Quantitative data: Any data expressed in numbers of
numerical figures are called quantitative data. This type of data
can be distinguished into categories, grouped, measured,
calculated, or ranked. Example: questions such as age, rank,
cost, length, weight, scores, etc. everything comes under this
type of data. You can present such data in graphical format,
charts, or apply statistical analysis methods to this data. The
(Outcomes Measurement Systems) OMS questionnaires in
surveys are a significant source of collecting numeric data.
 Categorical data: It is data presented in groups. However,
an item included in the categorical data cannot belong to more
than one group. Example: A person responding to a survey by
telling his living style, marital status, smoking habit, or
drinking habit comes under the categorical data. A chi-square
test is a standard method used to analyze this data.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 11


UNIT 3

Data Analysis Process


Data Analysis Process consists of the following phases that are
iterative in nature −
1. Data Requirements Specification
2. Data Collection
3. Data Processing
4. Data Cleaning
5. Data Analysis
6. Communication

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 12


UNIT 3

1. Data Requirements Specification


The data required for analysis is based on a question or an
experiment. Based on the requirements of those directing the
analysis, the data necessary as inputs to the analysis is
identified (e.g., Population of people). Specific variables
regarding a population (e.g., Age and Income) may be specified
and obtained. Data may be numerical or categorical.
2.Data Collection
Data Collection is the process of gathering information on
targeted variables identified as data requirements. The emphasis
is on ensuring accurate and honest collection of data. Data
Collection ensures that data gathered is accurate such that the
related decisions are valid. Data Collection provides both a
baseline to measure and a target to improve.
Data is collected from various sources ranging from
organizational databases to the information in web pages. The
data thus obtained, may not be structured and may contain
irrelevant information. Hence, the collected data is required to
be subjected to Data Processing and Data Cleaning.
3.Data Processing
The data that is collected must be processed or organized for
analysis. This includes structuring the data as required for the
relevant Analysis Tools. For example, the data might have to be
placed into rows and columns in a table within a Spreadsheet or
Statistical Application. A Data Model might have to be created.
4.Data Cleaning
The processed and organized data may be incomplete, contain
duplicates, or contain errors. Data Cleaning is the process of

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 13


UNIT 3

preventing and correcting these errors. There are several types


of Data Cleaning that depend on the type of data. For example,
while cleaning the financial data, certain totals might be
compared against reliable published numbers or defined
thresholds. Likewise, quantitative data methods can be used for
outlier detection that would be subsequently excluded in
analysis.
5.Data Analysis
Data that is processed, organized and cleaned would be ready
for the analysis. Various data analysis techniques are available
to understand, interpret, and derive conclusions based on the
requirements. Data Visualization may also be used to examine
the data in graphical format, to obtain additional insight
regarding the messages within the data.
Statistical Data Models such as Correlation, Regression
Analysis can be used to identify the relations among the data
variables. These models that are descriptive of the data are
helpful in simplifying analysis and communicate results.
The process might require additional Data Cleaning or
additional Data Collection, and hence these activities are
iterative in nature.
6.Communication
The results of the data analysis are to be reported in a format as
required by the users to support their decisions and further
action. The feedback from the users might result in additional
analysis.
The data analysts can choose data visualization techniques,
such as tables and charts, which help in communicating the

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 14


UNIT 3

message clearly and efficiently to the users. The analysis tools


provide facility to highlight the required information with color
codes and formatting in tables and charts.

MODELLING SKILLS

what a model is
 It is a mathematical or a descriptive that is quantitative
or quantitative abstraction of a process.
 It allows us to describe a process in mathematical
terms, so that we can emulate or simulate the process
behavior.
why would we need this model?
 One of the most popular uses of a model is in
prediction.
 That is as we call as the forecasting.
 model consists of certain inputs from the user - and
then, the model makes a prediction of how the process
would respond, and we will talk more about it shortly.
So, models are heavily used in prediction or inferring
certain unknowns and also classification.
 models are heavily used in classification
 third application of model ling is in fault detection.
 For example, I know how a friend of mine would talk,
and sit, and so on when he or she is normal, but when
something is wrong, may be something is mentally
troubling my friend, then I know that something is
definitely troubling him by observing the behavior.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 15


UNIT 3

Now, what’s happening underneath is I am projecting


my friend’s behavior against a normal behavior that I
have observed over a period of time, and then,
comparing both, and seeing well there is a huge
difference, and then, finally, coming to a conclusion
that something is abnormal.
 This is pretty much the same idea in fault detection as
well. We built models of process under normal
operating conditions from historical data or through
first principles approaches and then, keep comparing
the measurements that come out of the process against
what the model is predicting. If there is a significant
difference between the prediction and the
measurement, then we raise an alarm, and probably
conclude that there is a fault, and then, take it up for
further diagnosis.
 One of the prime uses of models is also in simulations.
We have heard of simulators. Some of you must have
worked with different simulators in chemical
engineering, mechanical engineering, aerospace, and
so on. You have heard of air craft or flight simulators.
There, the primary role of the model again is in
predictions. So, we give certain inputs to the model,
the same inputs that we would see when we operate
the process, and ask how the process would respond.
 So, the model would make a prediction. We need high
fidelity models in such applications, whereas when we

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 16


UNIT 3

use models in control, although I don’t list that here,


models are heavily used in control,
 where the model makes a prediction of where the
process is heading, and then, a controller takes an
action to keep the response of the process close to the
set point. There, in such applications, that’s in control,
we may not need high fidelity models. We can work
with fairly approximate models. And finally, we do
find uses of modelling in design and optimization and
so on.
Model Vs Process
 the point on distinction between a model and a process and
also a similarity - I am showing you a schematic here of the
process and the model. So, if you carefully look at the
process architecture, there are inputs which are causal and
physical inputs going into the process, and then, there are
disturbances acting on the process. You can think of… you
can take any process and actually cast it into this
architecture. And then, the process responds, which we call
as outputs in the engineering terminology.
 Now, models also give you the outputs that are of interest
to you. In fact, typically, the output of a model is same as
the output of the process, but typically the output of a
model is nothing but the variable that you want to predict.
And the inputs to the model need not be necessarily the
physical inputs that go into the process. The inputs to the
model are generally more or the same as the inputs that are
going to the process, but for example, if you looking at a
dynamic model, in a dynamic model, the output is modeled

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 17


UNIT 3

as a function of the present and the past, because transients


are important to us. So, the inputs to the model are not only
the present input but also the past; whereas the process is
operating in real time, and it keeps receiving only the
present input at any instant; the past has occurred but at that
time. On the contrary, in the model, you do feed the past
inputs and so on, and then make a prediction, because
model is after all a mathematical abstraction.
 And then, there are also certain user defined parameters and
or user specific inputs that you will have to provide to the
model along with the system parameters. So, the
architectures are different, but the final use of the model is
in prediction – basically, predicting the variable of interest
to you. So, that’s very important. So, do not get confused
with the inputs that go into the process and the inputs that
go into the model.
 Two broad approaches to modelling
 so, let’s look at how models are developed because we
want to really gain some insight into how to develop and
build models right. There are two broad approaches to
modelling. One approach is to start from fundamentals,
where you invoke the laws of physics, chemistry, biology,
and so on; essentially science based or mechanistic models.
 And here, we invoke the laws of conservation primarily
mass, energy, momentum, and usea few constitutive
relationships may be from thermodynamics and fluid
mechanics and so on; and finally come up with the model;
the set of equations essentially.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 18


UNIT 3

 which we call as a model. Now, that’s one approach.


 The other approach, which is quite contrasting, where we
don’t rely on science as much as we rely on data. There is a
data science, but we don’t rely on the science of the physics
to begin with; at some point in time maybe we can
incorporate, but to begin with, we rely on data. And this
approach is called an empirical modelling approach. And
it’s also called a data driven approach, where I will use
the data to identify the relationship between the variables of
interest; typically, the input and output and so on or
sometimes only to build a model for the output which we
call as a time series model.
 And here data is the primary food for identification.
Without data, there is no empirical approach at all. And the
kind of models that come out of empirical approaches are
called black box models, typically, where you don’t
incorporate necessarily any physics of the process,
explicitly.

 Model Vs Process

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 19


UNIT 3

 Okay so, just to even make this point very clear - the point
on distinction between a model and a process and also a
similarity - I am showing you a schematic here of the
process and the model. So, if you carefully look at the
process architecture, there are inputs which are causal and
physical inputs going into the process, and then, there are
disturbances acting on the process. You can think of… you
can take any process and actually cast it into this
architecture. And then, the process responds, which we call
as outputs in the engineering terminology.
 Now, models also give you the outputs that are of interest
to you. In fact, typically, the output of a model is same as
the output of the process, but typically the output of a
model is nothing but the variable that you want to predict.
And the inputs to the model need not be necessarily the
physical inputs that go into the process. The inputs to the
model are generally more or the same as the inputs that are
going to the process, but for example, if you looking at a
dynamic model, in a dynamic model, the output is modeled
as a function of the present and the past, because transients
are important to us. So, the inputs to the model are not only
the present input but also the past; whereas the process is
operating in real time, and it keeps receiving only the
present input at any instant; the past has occurred but at that
time. On the contrary, in the model, you do feed the past
inputs and so on, and then make a prediction, because
model is after all a mathematical abstraction.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 20


UNIT 3

 And then, there are also certain user defined parameters and
or user specific inputs that you will have to provide to the
model along with the system parameters. So, the
architectures are different, but the final use of the model is
in prediction – basically, predicting the variable of interest
to you. So, that’s very important. So, do not get confused
with the inputs that go into the process and the inputs that
go into the model

Two broad approaches to modelling

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 21


UNIT 3

 so, let’s look at how models are developed because we


want to really gain some insight into how to develop and
build models right. There are two broad approaches to
modelling. One approach is to start from fundamentals,
where you invoke the laws of physics, chemistry, biology,
and so on; essentially science based or mechanistic models.
 And here, we invoke the laws of conservation primarily
mass, energy, momentum, and usea few constitutive
relationships may be from thermodynamics and fluid
mechanics and so on; and finally come up with the model;
the set of equations essentially.
 which we call as a model. Now, that’s one approach.
 The other approach, which is quite contrasting, where we
don’t rely on science as much as we rely on data. There is a
data science, but we don’t rely on the science of the physics
to begin with; at some point in time maybe we can
incorporate, but to begin with, we rely on data. And this
approach is called an empirical modelling approach. And
it’s also called a data driven approach, where I will use
the data to identify the relationship between the variables of
interest; typically, the input and output and so on or
sometimes only to build a model for the output which we
call as a time series model.
 And here data is the primary food for identification.
Without data, there is no empirical approach at all. And the
kind of models that come out of empirical approaches are
called black box models, typically, where you don’t
incorporate necessarily any physics of the process,

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 22


UNIT 3

explicitly. You work with a minimal understanding of the


process, but that does not mean that there is no provision
for incorporating the physics of the process or whatever
you know about the process a priori; you can. And as you
keep incorporating the prior knowledge into your empirical
model, the black shade turns into gray, and there is some
transparency that sets in and such models are known as
gray box models.
 So, in that respect, the first principles models are actually
called white box models because they are very transparent.
If I look at a model, first principles model, I will be able to
associate every term in that model with some physical
characteristics of the process; whereas, that’s not
necessarily the case with an empirical model. An empirical
model is some mathematical fit between the input and
output. So, to give you a simple example, when we go out
on a test drive, let say to purchase a vehicle, the common
sense thing that all of us do is take the vehicle, sit in the car
and apply certain inputs to rotate the steering wheel and
pedal, apply breaks, supply fuel, and so on, and you know,
give all different kinds of inputs that we want to really test
the vehicle on; and then, collect the response of the vehicle.
 So, what we are doing there is, we are applying inputs, and
observing the response of the vehicle, putting it all together
in our brain, and building a mental model. We may not be
able to write an equation there. We are building an
empirical model. We are not really building a model of the
car from first principles. I am sure that would be a very

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 23


UNIT 3

deadly approach indeed, but we never do it and we have not


seen anyone doing it.
 On the other hand, when we really sit in courses on in
automobile engineering, mechanical engineering, and so on
or in engineering design, we do learn what are the
mechanics of a vehicle through equations, through first
principles understanding and so on. So, that has its place,
while empirical modelling has its place; where increasingly
a lot of people are turning to empirical modelling, primarily
because many processes that we are looking at, trying to
understand, are quite complicated, quite complex, for us to
write a first principles model. So, the experimental or the
empirical approach is a natural recourse and that will
continue; it’s here to stay; it’s been there since times
immemorial; it has been there from the time man has
started to build models, try to understand processes from
observations.

 Building models from data : systematic procedure

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 24


UNIT 3

 an empirical approach here.


 So, I give you a flow chart for a systematic way of
identifying the model, and let me quickly go over this.
Primarily, you have three stages.
 One is data acquisition and we don’t talk about that here;
we assume that data is available to us, and so, the first stage
we really don’t discuss much.
 And then, the second stage is, of course, model
development, which is at the heart of this procedure. And
the third stage is model assessment; that’s very important
and that applies to all models that we develop, whether it’s
a first principles or an empirical model,

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 25


UNIT 3

 it’s important to assess the goodness of the model; is the


model capable of predicting the process over a good range
of operating conditions? Is it doing well on a fresh data set?
Even a first principles approach has to be validated.
 So, please don’t be under that impression that this is not
required for a fundamental model. And maybe we want to
ask, if in an empirical approach, if the parameter estimates
that I have obtained in a model, do they have large errors in
them.
 That is a something of interest. And the third thing that we
have to watch out for an empirical modelling is over fitting.
Remember I said building a model from data is pretty much
similar to a student learning a subject.
 The student is presented with the text book, and the course
material, and then, the student tries to understand the
concepts of the subject, eventually, through a proper
interaction with the course material and the instructor.
 Now, in the end, you have to ask if the student has over
learnt; that may seem very strange - what is meant by over
learning?
 Now, over learning is, let us say, I as a student I am trying
to solve an assignment problem and the assignment
problem is based on a certain concept.
 If I have understood the concept, that is, if I at the end of
the problem solving exercise,

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 26


UNIT 3

 if I have gained mastery of the concept on which the


question has been based,
 then the goal of solving the assignment problem is more or
less achieved,
 but if I start paying attention to the numbers, the very fine
details that are very specific to that question, but has very
little to do with subject itself,
 and I am trying to really memorize all of that, then I am
over learning right. So, that is probably a simple analogy of
over fitting in modelling as well.
 And that occurs primarily because of presence of noise in
data and I have an example to show you later on. So,
remember that there are three stages: data acquisition,
model development, and model validation.
model development part.
 You see that there is a preprocessing - data pre-processing
- we talked about at in the last lecture. We have to watch
out for missing values, outliers or any other anomalies, get
rid of them and so on. And a big part of that is
visualization; involves visualization of data.
 And we had an example in the last lecture highlighting the
importance of visualization.
 Once we have understood the data well and it’s ready for
modelling, we should not straight jump away necessarily to
build a model unless I know the model structure very well;

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 27


UNIT 3

 that is, I know it’s a first order or I know that it’s a linear
model, non-linear model, and so on of this type, and so on.
So, an intermediate step involves what is known as a non-
parametric analysis, where I make minimal assumptions on
the process, and try to gather as much information as
possible from the data, so as to make a good guess of the
model structure; that’s called a nonparametric approach.
 And this step, can be skipped if I already know the
structure of the model that I am going to fit okay. So, in
many situations the non-parametric analysis may not be
even present. work with a minimal understanding of the
process, but that does not mean that there is no provision
for incorporating the physics of the process or whatever
you know about the process a priori; you can. And as you
keep incorporating the prior knowledge into your empirical
model, the black shade turns into gray, and there is some
transparency that sets in and such models are known as
gray box models.
 So, in that respect, the first principles models are actually
called white box models because they are very transparent.
If I look at a model, first principles model, I will be able to
associate every term in that model with some physical
characteristics of the process; whereas, that’s not
necessarily the case with an empirical model.
 An empirical model is some mathematical fit between the
input and output. So, to give you a simple example, when
we go out on a test drive, let say to purchase a vehicle, the
common sense thing that all of us do is take the vehicle, sit

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 28


UNIT 3

in the car and apply certain inputs to rotate the steering


wheel and pedal, apply breaks, supply fuel, and so on, and
you know, give all different kinds of inputs that we want to
really test the vehicle on; and then, collect the response of
the vehicle. So, what we are doing there is, we are applying
inputs, and observing the response of the vehicle, putting it
all together in our brain, and building a mental model. We
may not be able to write an equation there. We are building
an empirical model. We are not really building a model of
the car from first principles. I am sure that would be a very
deadly approach indeed, but we never do it and we have not
seen anyone doing it.
 On the other hand, when we really sit in courses on in
automobile engineering, mechanical engineering, and so on
or in engineering design, we do learn what are the
mechanics of a vehicle through equations, through first
principles understanding and so on. So, that has its place,
while empirical modelling has its place; where increasingly
a lot of people are turning to empirical modelling, primarily
because many processes that we are looking at, trying to
understand, are quite complicated, quite complex, for us to
write a first principles model. So, the experimental or the
empirical approach is a natural recourse and that will
continue; it’s here to stay; it’s been there since times
immemorial; it has been there from the time man has
started to build models, try to understand processes from
observations.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 29


UNIT 3

 Building models from data : systematic procedure


Primarily, you have three stages.
 One is data acquisition and we don’t talk about that here;
we assume that data is available to us, and so, the first stage
we really don’t discuss much.
 And then, the second stage is, of course, model
development, which is at the heart of this procedure. And
the third stage is model assessment; that’s very important
and that applies to all models that we develop, whether it’s
a first principles or an empirical model,
 it’s important to assess the goodness of the model; is the
model capable of predicting the process over a good range
of operating conditions? Is it doing well on a fresh data set?
Even a first principles approach has to be validated.
 The student is presented with the text book, and the course
material, and then, the student tries to understand the
concepts of the subject, eventually, through a proper
interaction with the course material and the instructor.

 So, remember that there are three stages: data acquisition,


model development, and model validation.
model development part-
 You see that there is a preprocessing - data per-processing
- we talked about at in the last lecture. We have to watch
out for missing values, outliers or any other anomalies, get
rid of them and so on. And a big part of that is

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 30


UNIT 3

visualization; involves visualization of data. And we had an


example in the last lecture highlighting the importance of
visualization.
 Once we have understood the data well and it’s ready for
modeling, we should not straight jump away necessarily to
build a model unless I know the model structure very well;
 An intermediate step involves what is known as a non-
parametric analysis, where I make minimal assumptions on
the process, and try to gather as much information as
possible from the data, so as to make a good guess of the
model structure; that’s called a nonparametric approach.

ASST.PROF PALLAVI D.PATIL(CSE DEPARTMENT ,SETI,PANHALA) 31

You might also like