LP1 1

LABORATORY PRACTICE-I BE COMPUTER
INDEX
Sr. No
CONTENT
1. High Performance Computing
1.1 Vector and Matrix Operations -

Design parallel algorithm to
1. Add two large vectors
2. Multiply Vector and Matrix
3. Multiply two N × N arrays using n2 processors
1.2 Parallel Sorting Algorithms-

For Bubble Sort and Merger Sort, based on existing sequential algorithms, design
and implement parallel algorithm utilizing all resources available.
1.3 Parallel Search Algorithm-

Design and implement parallel algorithm utilizing all resources available. for
 Binary Search for Sorted Array
 Depth-First Search ( tree or an undirected graph ) OR
 Breadth-First Search ( tree or an undirected graph) OR
 Best-First Search that ( traversal of graph to reach a target in the shortest
possible path)
1.4 Parallel Implementation of the K Nearest Neighbors Classifier
2. Artificial Intelligence
2.1 Solve 8-puzzle problem using A* algorithm. Assume any initial configuration
and define goal configuration clearly.
2.2 Implement any one of the following Expert System ,

 Medical Diagnosis of 10 diseases based on adequate symptoms
 Identifying birds of India based on characteristics
2.3 Use Heuristic Search Techniques to Implement Best first search (Best-Solution
but not always optimal) and A* algorithm (Always gives optimal solution).
SIT, LONAVALA 1
2.4 Constraint Satisfaction Problem:

Implement crypt-arithmetic problem or n-queens or graph coloring problem
( Branch and Bound and Backtracking)
3. Data Analytics
3.1 Download the Iris flower dataset or any other dataset into a DataFrame. (eg
https://archive.ics.uci.edu/ml/datasets/Iris ) Use Python/R and Perform following –
 How many features are there and what are their types (e.g., numeric,
nominal)?
 Compute and display summary statistics for each feature available in the
dataset. (eg. minimum value, maximum value, mean, range, standard
deviation, variance and percentiles
 Data Visualization-Create a histogram for each feature in the dataset to
illustrate the feature distributions. Plot each histogram.
 Create a boxplot for each feature in the dataset. All of the boxplots should be
combined into a single plot. Compare distributions and identify outliers.
3.2 Download Pima Indians Diabetes dataset. Use Naive Bayes‟ Algorithm for
classification
 Load the data from CSV file and split it into training and test datasets.
 Summarize the properties in the training dataset so that we can calculate
probabilities and make predictions.
 Classify samples from a test dataset and a summarized training dataset.
3.3 Trip History Analysis: Use trip history dataset that is from a bike sharing service
in the United States. The data is provided quarter-wise from 2010 (Q4) onwards.
Each file has 7 columns. Predict the class of user. Sample Test data set available
here https://www.capitalbikeshare.com/trip-history-data.
3.4 Twitter Data Analysis: Use Twitter data for sentiment analysis. The dataset is
3MB in size and has 31,962 tweets. Identify the tweets which are hate tweets and
which are not. Sample Test data set available here
https://datahack.analyticsvidhya.com/contest/practice-problem-twitter-sentiment-
analysis/
SIT, LONAVALA 2
HIGH PERFORMANCE COMPUTING
SIT, LONAVALA 3
Assignment No 1
Aim:
Vector and Matrix Operations-
Design parallel algorithm to
1. Add two large vectors
2. Multiply Vector and Matrix
3. Multiply two N × N arrays using n2 processors
Aim: Implement nxn matrix parallel addition, multiplication using

CUDA, use shared memory.
Prerequisites:
- Concept of matrix addition, multiplication.
- Basics of CUDA programming
Objectives:
Student should be able to learn parallel programming, CUDA architecture
and CUDA processing flow
Theory:
A straightforward matrix multiplication example that illustrates the basic features of
memory and thread management in CUDA programs
• Leave shared memory usage until later
• Local, register usage
• Thread ID usage
• P = M * N of size WIDTH x WIDTH
• One thread handles one element of P

• M and N are loaded WIDTH times from global memory
Matrix Multiplication steps

1. Matrix Data Transfers
2. Simple Host Code in C
3. Host-side Main Program Code
4. Device-side Kernel Function
5. Some Loose Ends
SIT, LONAVALA 4
Step 1: Matrix Data Transfers

/ Allocate the device memory where we will
copy M to Matrix Md;
Md.width = WIDTH;
Md.height = WIDTH;
Md.pitch = WIDTH;
int size = WIDTH * WIDTH * sizeof(float);
cudaMalloc((void**)&Md.elements, size);
// Copy M from the host to the device
cudaMemcpy(Md.elements, M.elements, size, cudaMemcpyHostToDevice);
/ Read M from the device to the host into P
cudaMemcpy(P.elements, Md.elements, size,
cudaMemcpyDeviceToHost);
...
/ Free device
memory
cudaFree(Md.ele
ments);
Step 2: Simple Host Code in C

/ Matrix multiplication on the (CPU) host in double precision
/ for simplicity, we will assume that all dimensions are equal
void MatrixMulOnHost(const Matrix M, const Matrix N, Matrix P)

{
for (int i = 0; i < M.height; ++i)
for (int j = 0; j < N.width; ++j) {
double sum = 0;
for (int k = 0; k < M.width; ++k) {
double a = M.elements[i * M.width + k];
double b = N.elements[k * N.width + j];
sum += a * b;
}
P.elements[i * N.width + j] = sum;
}
Multiply Using One Thread Block

• One Block of threads compute matrix P
– Each thread computes one element of P
• Each thread
SIT, LONAVALA 5
– Loads a row of matrix M

– Loads a column of matrix N
– Perform one multiply and addition for each pair of M and N elements
– Compute to off-chip memory access ratio close to 1:1 (not very high)
Step 3: Host-side Main Program Code

int main(void) {
// Allocate and initialize the matrices
Matrix M = AllocateMatrix(WIDTH, WIDTH, 1);
Matrix N = AllocateMatrix(WIDTH, WIDTH, 1);
Matrix P = AllocateMatrix(WIDTH, WIDTH, 0);
/ M * N on the device
MatrixMulOnDevice(M,
N, P);
/ Free matrices
FreeMatrix(M)
;
FreeMatrix(N)
;
FreeMatrix(P);
return 0;
}
Host-side code
// Matrix multiplication on the device
void MatrixMulOnDevice(const Matrix M, const Matrix N, Matrix P)
{
/ Load M and N to the device
Matrix Md =
AllocateDeviceMatrix(M);
CopyToDeviceMatrix(Md, M);
Matrix Nd =
AllocateDeviceMatrix(N);
CopyToDeviceMatrix(Nd, N);
/ Allocate P on the device
/ Setup the execution

configuration dim3
dimBlock(WIDTH, WIDTH);
dim3 dimGrid(1, 1);
SIT, LONAVALA 6
/ Launch the device computation threads!

MatrixMulKernel<<<dimGrid, dimBlock>>>(Md,
Nd, Pd);
/ Read P from the device
CopyFromDeviceMatrix(P, Pd);
/ Free device matrices
FreeDeviceMatrix(Md);
FreeDeviceMatrix(Nd);
FreeDeviceMatrix(Pd);
}
Step 4: Device-side Kernel Function

// Matrix multiplication kernel – thread specification
__global__ void MatrixMulKernel(Matrix M, Matrix N, Matrix P)
{
// 2D Thread ID
int tx = threadIdx.x;
int ty = threadIdx.y;
/ Pvalue is used to store the element of the matrix
/ that is computed by the thread
float Pvalue = 0;
for (int k = 0; k < M.width; ++k)
{
float Melement = M.elements[ty * M.pitch
+ k]; float Nelement = Nd.elements[k *
N.pitch + tx]; Pvalue += Melement *
Nelement;
}
/ Write the matrix to device memory;
/ each thread writes one element
P.elements[ty * P.pitch + tx] =
Pvalue;
Step 5: Some Loose Ends

- Free allocated CUDA memory
SIT, LONAVALA 7
Facilities:
Latest version of 64 Bit Operating Systems, CUDA enabled NVIDIA Graphics card
Input:
Two matrices
Output:
Multiplication of two matrix
Software Engg.:
Mathematical Model:
Conclusion:
We learned parallel programming with the help of CUDA architecture.
Questions:
1. What is CUDA?
2. Explain Processing flow of CUDA programming.
3. Explain advantages and limitations of CUDA.
4. Make the comparison between GPU and CPU.
5. Explain various alternatives to CUDA.
6. Explain CUDA hardware architecture in detail.
Program:
//1)How to add two largest vectors by Parallel execution
#include<stdio.h>
#include<iostream>
#include<cstdlib>
//****important to add following library to allow a programmer to use parallel
paradigms*****
#include<omp.h>
using namespace std;
#define MAX 100
int main()
{
int a[MAX],b[MAX],c[MAX],i;
printf("\n First Vector:\t");
SIT, LONAVALA 8
//Instruct a master thread to fork and generate more threads to process following
loop structure
#pragma omp parallel for
for(i=0;i<MAX;i++)
{
a[i]=rand()%1000;
}
//Discuss issue of this for loop below-if we make it parallel, possibly values that
get printed will not be in sequence as we dont have any control on order of
threads execution
for(i=0;i<MAX;i++)
{
printf("%d\t",a[i]);
}
printf("\n Second Vector:\t");

for(i=0;i<MAX;i++)
{
b[i]=rand()%1000;
}
for(i=0;i<MAX;i++)
{
printf("%d\t",b[i]);
}
printf("\n Parallel-Vector Addition:(a,b,c)\t");

for(i=0;i<MAX;i++)
{
c[i]=a[i]+b[i];
}
for(i=0;i<MAX;i++)
{
SIT, LONAVALA 9
printf("\n%d\t%d\t%d",a[i],b[i],c[i]);
}
}
1)Output:
Output:
guest-bvoaff@C04L0809:~$ g++ par_add_large_vectors.cpp -fopenmp

guest-bvoaff@C04L0809:~$ ./a.out
First Vector: 383 777 67 58 393 919 537 413 980

729 582 814 434 43 87 276 788 403 754
932 676 739 226 94 539 915 335 386 492
649 421 362 27 690 59 763 926 426 736
368 429 530 123 135 929 802 69 198 324
315 167 456 11 42 229 373 421 784 370
526 873 857 545 367 364 750 808 178 584
651 399 60 368 12 586 886 793 540 172
211 567 782 862 22 91 956 862 170 996
281 305 925 84 327 336 505 846 313 124
895
Second Vector: 570 219 528 732 503 270 708 340 796
618 846 555 488 228 841 350 193 500 34
764 124 914 987 856 743 491 227 365 859
936 432 551 437 228 275 407 474 121 858
395 29 237 235 793 818 428 143 11 928
529 795 378 467 601 97 902 317 492 652
756 301 280 771 481 675 709 927 567 856
497 353 586 965 306 683 434 286 441 865
689 444 619 440 729 31 117 97 624 871
829 19 368 715 149 723 245 451 921 379
764
Parallel-Vector Addition:(a,b,c)
383 570 953
777 219 996
67 528 595
58 732 790
393 503 896
919 270 1189
537 708 1245
413 340 753
SIT, LONAVALA 10
guest-bvoaff@C04L0809:~$
2) Multiply vector and matrix

#include<stdio.h>
#include<iostream>
#include<cstdlib>
#include<omp.h>
int main()
{
int m=3,n=2;
int mat[m][n],vec[n],out[m];
//matrix of size 3x2

for(int row=0;row<m;row++)
{
for(int col=0;col<n;col++)
{
mat[row][col]=1;
}
}
//display matrix
cout<<"Input Matrix"<<endl;
{
{
cout<<"\t"<<mat[row][col];
}
cout<<""<<endl;
}
//column vector of size 2x1

for(int row=0;row<n;row++)
{
vec[row]=2;
}
//display vector
cout<<"Input Col-Vector"<<endl;
SIT, LONAVALA 11
for(int row=0;row<n;row++)
{
cout<<vec[row]<<endl;
}
//before multiplication check condition,

no_of_cols(matrix)==no_of_rows(vector)
#pragma omp parallel
{
{
out[row]=0;
{
out[row]+=mat[row][col]*vec[col];
//int count=out[row];
//printf("\n%d\n",count);
}
}
//display resultant vector

cout<<"Resultant Col-Vector"<<endl;
{
cout<<"\nvec["<<row<<"]:"<<out[row]<<endl;
}
return 0;
}
SIT, LONAVALA 12
2) Output
Output:
guest-zbuacn@admin:~$ g++ par_matrix_vect_mul.cpp -fopenmp

guest-zbuacn@admin:~$ ./a.out
Input Matrix
1 1
1 1
1 1
Input Col-Vector
2
2
Resultant Col-Vector
vec[0]:4
vec[1]:4
vec[2]:4
3) Multiply two n*n array using n2 processor
// Matrix-Matrix Multiplication
#include<iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include<omp.h>
#define N 4
float A[N][N], B[N][N], C[N][N]; // declaring matrices of NxN size
int main ()
{
/* DECLARING VARIABLES */
int i, j, m; // indices for matrix multiplication
float t_1; // Execution time measures
clock_t c_1, c_2;
/* FILLING MATRICES WITH RANDOM NUMBERS */
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
SIT, LONAVALA 13
{
A[i][j]= (rand()%5);
B[i][j]= (rand()%5);
}
}
// Display input matrix A:

printf("Matrix A:\n");
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
printf("%f\t",A[i][j]);
}
printf("\n");
}
// Display input matrix B:

printf("Matrix B:\n");
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
printf("%f\t",B[i][j]);
}
printf("\n");
}
c_1=clock(); // time measure:

/* MATRIX MULTIPLICATION */
printf("Max number of threads: %i \n",omp_get_max_threads());
#pragma omp parallel

#pragma omp single
{
printf("Number of threads: %i \n",omp_get_num_threads());
}
#pragma omp parallel for private(m,j)

// #pragma omp_set_num_threads(8)
for(i=0;i<N;i++)
{
SIT, LONAVALA 14
for(j=0;j<N;j++)
{
C[i][j]=0.; // set initial value of resulting matrix C = 0
for(m=0;m<N;m++)
{
C[i][j]=A[i][m]*B[m][j]+C[i][j];
}
}
}
// Display input matrix B:

printf("Matrix C:\n");
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
printf("%f\t",C[i][j]);
}
printf("\n");
}
/* TIME MEASURE + OUTPUT */

c_2=clock(); // time measure:
t_1 = (float)(c_2-c_1)/CLOCKS_PER_SEC; // in seconds; - time elapsed
for job row-wise
printf("Execution time: %f(in seconds) \n",t_1);
/* TERMINATE PROGRAM */
return 0;
}
3)Output
SIT, LONAVALA 15
Output:
guest-tim1wd@C04L0818:~$ g++ matrix_matrix_multiplication.c -fopenmp

guest-tim1wd@C04L0818:~$ ./a.out
Matrix A:
3.000000 2.000000 3.000000 1.000000
4.000000 2.000000 0.000000 3.000000
0.000000 2.000000 1.000000 2.000000
2.000000 2.000000 2.000000 4.000000
Matrix B:
1.000000 0.000000 0.000000 2.000000
1.000000 2.000000 4.000000 1.000000
1.000000 1.000000 3.000000 4.000000
0.000000 3.000000 0.000000 2.000000
Max number of threads: 4
Number of threads: 4
Matrix C:
8.000000 10.000000 17.000000 22.000000
6.000000 13.000000 8.000000 16.000000
3.000000 11.000000 11.000000 10.000000
6.000000 18.000000 14.000000 22.000000
Execution time: 0.005355(in seconds)
guest-tim1wd@C04L0818:~$
Conclusion:
To Design
parallel algorithm to Add two large vectors , Multiply Vector and Matrix
And Multiply two N × N arrays using n2 processors
Assignment No 2
SIT, LONAVALA 16
Aim:
Parallel Sorting Algorithms-
For Bubble Sort and Merger Sort, based on existing sequential algorithms,
design and implement parallel algorithm utilizing all resources available.
Aim: Understand Parallel Sorting Algorithms like Bubble sort and Merge Sort.
Prerequisites:
Student should know basic concepts of Bubble sort and Merge Sort.
Objective: Study of Parallel Sorting Algorithms like Bubble sort and
Merge Sort
Theory:
i) What is Sorting?
Sorting is a process of arranging elements in a group in a particular order, i.e.,
ascending order, descending order, alphabetic order, etc.
• Arrange elements of a list into certain order

• Make data become easier to access
• Speed up other operations such as searching and merging. Many sorting
algorithms with different time and space complexities
ii) What is Parallel Sorting?

A sequential sorting algorithm may not be efficient enough when we have to
sort a huge volume of data. Therefore, parallel algorithms are used in sorting.
• Based on an existing sequential sort algorithm

– Try to utilize all resources available
– Possible to turn a poor sequential algorithm into a reasonable parallel
algorithm (bubble sort and parallel bubble sort)
• Completely new approach
– New algorithm from scratch
– Harder to develop
– Sometimes yield better solution
Bubble Sort
SIT, LONAVALA 17
The idea of bubble sort is to compare two adjacent elements. If they are not in
the right order,switch them. Do this comparing and switching (if necessary) until the
end of the array is reached. Repeat this process from the beginning of the array n
times.
• One of the straight-forward sorting methods

– Cycles through the list
– Compares consecutive elements and swaps them if necessary
– Stops when no more out of order pair
• Slow & inefficient
2
• Average performance is O(n )
Bubble Sort Example

Here we want to sort an array containing [8, 5, 1]. The following figure shows
how we can sortthis array using bubble sort. The elements in consideration are shown
in bold.
8, 5, 1 Switch 8 and 5
5, 1, 8 Reached end start again.
1, 5, 8 No Switch for 5 and 8
1, 5, 8 Reached end start again.
1, 5, 8 No switch for 1, 5
1, 5, 8 No switch for 5, 8
1, 5, 8 Reached end.
  Implemented as a pipeline.
 Let local_size = n / no_proc. We divide the array in no_proc parts, and each
process executes the bubble sort on its part, including comparing the last
element
 with the first one belonging to the next thread.
 Implement with the loop (instead of j<i)
for (j=0; j<n-1; j++)
 For every iteration of i, each thread needs to wait until the previous thread
 has finished that iteration before starting.
 We'll coordinate using a barrier.
Algorithm for Parallel Bubble Sort

1. For k = 0 to n-2
2. If k is even then
3. for i = 0 to (n/2)-1 do in parallel
4. If A[2i] > A[2i+1] then
SIT, LONAVALA 18
5. Exchange A[2i] ↔ A[2i+1]

6. Else
7. for i = 0 to (n/2)-2 do in parallel
8. If A[2i+1] > A[2i+2] then
9. Exchange A[2i+1] ↔ A[2i+2]
10. Next k
Parallel Bubble Sort Example 1

• Compare all pairs in the list in parallel
• Alternate between odd and even phases
• Shared flag, sorted, initialized to true at beginning of each iteration (2
phases), if any processor perform swap, sorted = false
Parallel Bubble Sort Example 2

• How many steps does it take to sort the following sequence from least to
greatest using the Parallel Bubble Sort? How does the sequence look like after
2 cycles?
• Ex: 4,3,1,2
Merge Sort
• Collects sorted list onto one processor
• Merges elements as they come together
• Simple tree structure
• Parallelism is limited when near the root
Theory:
To sort A[p .. r]:
1. Divide Step
SIT, LONAVALA 19
If a given array A has zero or one element, simply return; it is already sorted.
Otherwise, splitA[p .. r] into two subarraysA[p .. q] and A[q + 1 .. r], each containing
about half of the elements of A[p .. r]. That is, q is the halfway point of A[p .. r].
2. Conquer Step
Conquer by recursively sorting the two subarraysA[p .. q] and A[q + 1 .. r].
3. Combine Step
Combine the elements back in A[p .. r] by merging the two sorted subarraysA[p .. q] and
A[q
+ 1 .. r] into a sorted sequence. To accomplish this step, we will define a procedure
MERGE (A, p, q, r).
Parallel Merge Sort

• Parallelize processing of sub-problems
• Max parallelization achived with one processor per node (at each layer/height)
Parallel Merge Sort Example
• Perform Merge Sort on the following list of elements. Given 2 processors,
P0 & P1, which processor is reponsible for which comparison?
• 4,3,2,1
Algorithm for Parallel Merge Sort
SIT, LONAVALA 20
1. Procedure parallelMergeSort
2. Begin
3. Create processors Pi where i = 1 to n
4. if i > 0 then recieve size and parent from the root
5. recieve the list, size and parent from the root
6. endif
7. midvalue= listsize/2
8. if both children is present in the tree then
9. send midvalue, first child
10. send listsize-mid,second child
11. send list, midvalue, first child
12. send list from midvalue, listsize-midvalue, second child
13. call mergelist(list,0,midvalue,list, midvalue+1,listsize,temp,0,listsize)
14. store temp in another array list2
15. else
16. call parallelMergeSort(list,0,listsize)
17. endif
18. if i >0 then
19. send list, listsize,parent
20. endif
21. end
INPUT:
1. Array of integer numbers.
OUTPUT:
1. Sorted array of numbers
FAQ
1. What is sorting?
2. What is parallel sort?
3. How to sort the element using Bubble Sort?
4. How to sort the element using Parallel Bubble Sort?
5. How to sort the element using Parallel Merge Sort?
6. How to sort the element using Merge Sort?
7. What is searching?
8. Different types of searching methods.
9. Time complexities of sorting and searching methods.
10. How to calculate time complexity?
11. What are space complexity of all sorting and searching methods?
12. Explain what is best, worst and average case for each method of
searching and sorting.
ALGORITHM ANALYSIS
SIT, LONAVALA 21
1. Time Complexity Of parallel Merge Sort and parallel Bubble sort in

best case is( when all data is already in sorted form):O(n)
worst case is: O(n logn)
average case is: O(n logn)
APPLICATIONS
1. Representing Linear data structure & Sequential data organization : structure & files
2. For Sorting sequential data structure
CONCLUSION
Thus, we have studied Parallel Bubble and Parallel Merge sort implementation.
Program:
1)Bubble Sort
#include<iostream>
#include<stdlib.h>
#include<omp.h>
void bubble(int *, int);

void swap(int &, int &);
void bubble(int *a, int n)

{
for( int i = 0; i < n; i++ )
{
//int first = i % 2;
#pragma omp parallel for shared(a,i)
for( int j = i; j < n-1; j += 2 )
{
if( a[ j ] > a[ j+1 ] )
{
swap(a[j],a[j+1]);
}
}
}
SIT, LONAVALA 22
void swap(int &a, int &b)

{
int test;
test=a;
a=b;
b=test;
int main()
{
int *a,n;
cout<<"\n enter total no of elements=>";
cin>>n;
a=new int[n];
cout<<"\n enter elements=>";
for(int i=0;i<n;i++)
{
cin>>a[i];
}
bubble(a,n);
cout<<"\n sorted array is=>";

{
cout<<a[i]<<"\t";
}
return 0;
}
Output:
SIT@SIT-ThinkCentre-E73:~$ g++ bubble1.cpp

SIT@SIT-ThinkCentre-E73:~$ ./a.out
SIT, LONAVALA 23
enter total no of elements=>4
enter elements=>2
6
8
3
sorted array is=>2 3 6 8
2)Merge Sort
#include<iostream>
#include<stdlib.h>
#include<omp.h>
void mergesort(int a[],int i,int j);

void merge(int a[],int i1,int j1,int i2,int j2);
void mergesort(int a[],int i,int j)

{
int mid;
if(i<j)
{
mid=(i+j)/2;
#pragma omp parallel sections

{
#pragma omp section

{
mergesort(a,i,mid);
}
#pragma omp section

{
mergesort(a,mid+1,j);
}
}
merge(a,i,mid,mid+1,j);
SIT, LONAVALA 24
}––––
}
void merge(int a[],int i1,int j1,int i2,int j2)
{
int temp[1000];
int i,j,k;
i=i1;
j=i2;
k=0;
while(i<=j1 && j<=j2)

{
if(a[i]<a[j])
{
temp[k++]=a[i++];
}
else
{
temp[k++]=a[j++];
}
}
while(i<=j1)
{
temp[k++]=a[i++];
}
while(j<=j2)
{
temp[k++]=a[j++];
}
for(i=i1,j=0;i<=j2;i++,j++)
{
a[i]=temp[j];
}
}
int main()
{
int *a,n,i;
cin>>n;
SIT, LONAVALA 25
a= new int[n];

for(i=0;i<n;i++)
{
cin>>a[i];
}
mergesort(a, 0, n-1);
cout<<"\n sorted array is=>";

for(i=0;i<n;i++)
{
cout<<"\n"<<a[i];
}
return 0;
}
Output:
SIT@SIT-ThinkCentre-E73:~$ g++ mergesort.cpp
SIT@SIT-ThinkCentre-E73:~$ ./a.out
enter elements=>2
5
8
1
sorted array is=>

1
2
5
8
SIT@SIT-ThinkCentre-E73:~$
Conclusion:
For Bubble Sort and Merger Sort, based on existing sequential
algorithms, design and implement parallel algorithm utilizing all resources
available.
SIT, LONAVALA 26
Assignment No 3
Aim:
Parallel Search Algorithm-
Design and implement parallel algorithm utilizing all resources available. for
SIT, LONAVALA 27
 Binary Search for Sorted Array

 Depth-First Search ( tree or an undirected graph ) OR
 Breadth-First Search ( tree or an undirected graph) OR
 Best-First Search that ( traversal of graph to reach a target in the
shortest possible path)
Objective: To study and implementation of searching techniques.
Outcome: Students will be understand the implementation of Binary search and BFS,
DFS
Pre-requisites:
64-bit Open source Linux or its derivative
Programming Languages: C++/JAVA/PYTHON/R
Theory:
Binary Search:
In computer science, binary search, also known as half-interval

search,logarithmic search,or binary chop,is a search algorithm that finds the position of a
target value within a sorted array. Binary search compares the target value to the middle
element of the array. If they are not equal, the half in which the target cannot lie is
eliminated and the search continues on the remaining half, again taking the middle
element to compare to the target value, and repeating this until the target value is found.
If the search ends with the remaining half being empty, the target is not in the array.
Even though the idea is simple, implementing binary search correctly requires attention
to some subtleties about its exit conditions and midpoint calculation.
Binary search runs in logarithmic time in the worst case, making O(log n)
comparisons,
where n is the number of elements in the array, the O is Big O notation, and
log is the logarithm. Binary search takes constant (O(1)) space, meaning that
the
space taken by the algorithm is the same for any number of elements in the
array.Binary search is faster than linear search except for small arrays, but the
array must be sorted first. Although specialized data structures designed for fast
SIT, LONAVALA 28
searching, such as hash tables, can be searched more efficiently, binary search
applies to a wider range of problems.
How Binary Search Works?
For a binary search to work, it is mandatory for the target array to be sorted. We
shall learn the process of binary search with a pictorial example. The following
is our sorted array and let us assume that we need to search the location of value
31 using binary search.
First, we shall determine half of the array by using this formula −

mid = low + (high - low) / 2
Here it is, 0 + (9 - 0 ) / 2 = 4 (integer value of 4.5). So, 4 is the mid of the array.
Now we compare the value stored at location 4, with the value being searched,
i.e. 31. We find that the value at location 4 is 27, which is not a match. As the
value is greater than 27 and we have a sorted array, so we also know that the
target value must be in the upper portion of the array.
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2
Our new mid is 7 now. We compare the value stored at location 7 with our
target value 31.
SIT, LONAVALA 29
The value stored at location 7 is not a match, rather it is more than what we are
looking for. So, the value must be in the lower part from this location.
Hence, we calculate the mid again. This time it is 5.
We compare the value stored at location 5 with our target value. We find
that it is a match.
We conclude that the target value 31 is stored at location 5.

Binary search halves the searchable items and thus reduces the count of
comparisons to be made to very less numbers.
Breadth-First Search :
Graph traversals
Graph traversal means visiting every vertex and edge exactly once in a well-defined
order. While using certain graph algorithms, you must ensure that each vertex of the
graph is visited exactly once. The order in which the vertices are visited are important
and may depend upon the algorithm or question that you are solving.
During a traversal, it is important that you track which vertices

have been visited. The most common way of tracking vertices is
to mark them.
Breadth First Search (BFS)

There are many ways to traverse graphs. BFS is the most commonly used approach.
SIT, LONAVALA 30
BFS is a traversing algorithm where you should start traversing from a selected node
(source or starting node) and traverse the graph layerwise thus exploring the
neighbour nodes (nodes which are directly connected to source node). You must then
move towards the next-level neighbour nodes.
As the name BFS suggests, you are required to traverse the graph breadthwise as follows:
1. First move horizontally and visit all the nodes of the current layer
2. Move to the next layer
Consider the following diagram.
The distance between the nodes in layer 1 is comparitively lesser than the distance
between the nodes in layer 2. Therefore, in BFS, you must traverse all the nodes in
layer 1 before you move to the nodes in layer 2.
Traversing child nodes

A graph can contain cycles, which may bring you to the same node again while
traversing the graph. To avoid processing of same node again, use a boolean array
which marks the node after it is processed. While visiting the nodes in the layer of a
graph, store them in a manner such that you can traverse the corresponding child
nodes in a similar order.
In the earlier diagram, start traversing from 0 and visit its child nodes 1, 2, and 3.
Store them in the order in which they are visited. This will allow you to visit the
child nodes of 1 first (i.e. 4 and 5), then of 2 (i.e. 6 and 7), and then of 3 (i.e. 7) etc.
To make this process easy, use a queue to store the node and mark it as 'visited' until
all its neighbours (vertices that are directly connected to it) are marked. The queue
follows the First In First Out (FIFO) queuing method, and therefore, the neigbors of
the node will be visited in the order in which they were inserted in the node i.e. the
node that was inserted first will be visited first, and so on.
SIT, LONAVALA 31
Program:
Binary Search:
#include<iostream>
#include<stdlib.h>
#include<omp.h>
int binary(int *, int, int, int);
int binary(int *a, int low, int high, int key)

{
int mid;
mid=(low+high)/2;
int low1,low2,high1,high2,mid1,mid2,found=0,loc=-1;
#pragma omp parallel sections

{
#pragma omp section
{
low1=low;
high1=mid;
while(low1<=high1)
{
if(!(key>=a[low1] && key<=a[high1]))

{
low1=low1+high1;
continue;
}
cout<<"here1";
mid1=(low1+high1)/2;
if(key==a[mid1])
{
found=1;
SIT, LONAVALA 32
loc=mid1;
low1=high1+1;
}
else if(key>a[mid1])
{
low1=mid1+1;
}
else if(key<a[mid1])
high1=mid1-1;
}
}
#pragma omp section

{
low2=mid+1;
high2=high;
while(low2<=high2)
{
if(!(key>=a[low2] && key<=a[high2]))

{
low2=low2+high2;
continue;
}
cout<<"here2";
mid2=(low2+high2)/2;
if(key==a[mid2])
{
found=1;
loc=mid2;
low2=high2+1;
}
else if(key>a[mid2])
{
SIT, LONAVALA 33
low2=mid2+1;
}
else if(key<a[mid2])
high2=mid2-1;
}
}
}
return loc;
}
int main()
{
int *a,i,n,key,loc=-1;
cin>>n;
a=new int[n];

for(i=0;i<n;i++)
{
cin>>a[i];
}
cout<<"\n enter key to find=>";

cin>>key;
loc=binary(a,0,n-1,key);
if(loc==-1)
cout<<"\n Key not found.";
else
cout<<"\n Key found at position=>"<<loc+1;
return 0;
}
Output:
SIT, LONAVALA 34
[SIT@localhost ~]$ g++ brfs.cpp -fopevi binarysearch.cpp

[SIT@localhost ~]$ g++ binarysearch.cpp -fopenmp
[SIT@localhost ~]$ ./a.out
enter elements=>10 45 63 78 90 230
enter key to find=>65
2) Breadth-First Search
#include<iostream>
#include<stdlib.h>
#include<queue>
class node
{
public:
node *left, *right;

int data;
};
class Breadthfs
{
public:
node *insert(node *, int);

void bfs(node *);
};
node *insert(node *root, int data)

{
SIT, LONAVALA 35
if(!root)
{
root=new node;
root->left=NULL;
root->right=NULL;
root->data=data;
return root;
}
queue<node *> q;
q.push(root);
while(!q.empty())
{
node *temp=q.front();
q.pop();
if(temp->left==NULL)
{
temp->left=new node;
temp->left->left=NULL;
temp->left->right=NULL;
temp->left->data=data;
return root;
}
else
{
q.push(temp->left);
if(temp->right==NULL)
{
temp->right=new node;
temp->right->left=NULL;
temp->right->right=NULL;
temp->right->data=data;
return root;
SIT, LONAVALA 36
}
else
{
q.push(temp->right);
}
}
}
void bfs(node *head)

{
queue<node*> q;
q.push(head);
int qSize;
while (!q.empty())
{
qSize = q.size();
for (int i = 0; i < qSize; i++)
{
node* currNode;
#pragma omp critical
{
currNode = q.front();
q.pop();
cout<<"\t"<<currNode->data;
}
#pragma omp critical
{
if(currNode->left)
q.push(currNode->left);
if(currNode->right)
q.push(currNode->right);
}
}
}
SIT, LONAVALA 37
int main(){
node *root=NULL;
int data;
char ans;
do
{
cout<<"\n enter data=>";
cin>>data;
root=insert(root,data);
cout<<"do you want insert one more node?";

cin>>ans;
}while(ans=='y'||ans=='Y');
bfs(root);
return 0;
}
[SIT@localhost ~]$ vi brfs.cpp
[SIT@localhost ~]$ g++ brfs.cpp -fopenmp
[SIT@localhost ~]$ ./a.out
enter data=>10
do you want insert one more node?y
enter data=>5
enter data=>15
enter data=>25
enter data=>20
do you want insert one more node?n
10 5 15 25 20
SIT, LONAVALA 38
Conclusion: We have implemented Binary searching and BFS .
Assignment No 4
Aim:
Parallel Implementation of the K Nearest Neighbors Classifier
Objective:
To Implement Parallel of the K Nearest Neighbors Classifier.
Theory:
In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method
used for classification and regression.[1] In both cases, the input consists of the k closest
training examples in the feature space. The output depends on whether k-NN is used for
classification or regression:
 In k-NN classification, the output is a class membership. An object is classified

by a majority vote of its neighbors, with the object being assigned to the class
most common among its k nearest neighbors (k is a positive integer, typically
small). If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.
 In k-NN regression, the output is the property value for the object. This value is
the average of the values of its k nearest neighbors.
SIT, LONAVALA 39
k-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. The k-NN
algorithm is among the simplest of all machine learning algorithms.
Both for classification and regression, a useful technique can be used to assign weight
to the contributions of the neighbors, so that the nearer neighbors contribute more to
the average than the more distant ones. For example, a common weighting scheme
consists in giving each neighbor a weight of 1/d, where d is the distance to the
neighbor
Program:
#include <iostream>
#include <vector>
#include <fstream>
#include <string>
#include <sstream>
#include <cmath>
#include <set>
#include <map>
#include <ctime>
#include<mpi.h>
#include<set>
class Instance{
private:
double R;
double G;
double B;
double isSkin;
public:
Instance(double R, double G, double B, int isSkin){
this->R = R;
this->G = G;
this->B = B;
this->isSkin = isSkin;
}
void setR(double R){

this->R = R;
SIT, LONAVALA 40
void setG(double G){

this->G = G;
}
void setB(double B){

this->B = B;
}
double getR(){
return R;
}
double getG(){
return G;
}
double getB(){
return B;
}
int skin(){
return isSkin;
}
//Calculate Euclidean distance
double calculateDistance(double otherR, double otherG, double otherB){
return sqrt((R - otherR) * (R - otherR) + (G - otherG) * (G - otherG) + (B -
otherB) * (B - otherB));
};
class TestInstance{
private:
double R;
double G;
double B;
public:
TestInstance(double R, double G, double B){
SIT, LONAVALA 41
this->R = R;
this->G = G;
this->B = B;
void setR(double R){

this->R = R;
}
void setG(double G){

this->G = G;
}
void setB(double B){

this->B = B;
}
double getR(){
return R;
}
double getG(){
return G;
}
double getB(){
return B;
}
};
vector<string> split(string a,char e){

vector<string> rez;
string cur;
for(int ctr1=0;ctr1<a.size();ctr1++){
if(a[ctr1]!=e)
cur.push_back(a[ctr1]);
else
rez.push_back(cur),cur.clear();
}
if(cur!="")
rez.push_back(cur);
SIT, LONAVALA 42
return rez;
}
vector<Instance> instances;
int k;
//returns the class value

int returnClassForObject(double r, double g, double b, set<double> distances,
map<double, int> distanceToClass){
for(int i = 0; i < instances.size(); i++){

double distance = instances[i].calculateDistance(r, g, b);
distances.insert(distance);
distanceToClass.insert(std::pair<double, int>(distance,
instances[i].skin()));
}
int countFirstClass = 0;
int countSecondClass = 0;
set<double>::iterator it = distances.begin();
while(countFirstClass != k && countSecondClass != k){

if(distanceToClass.find(*it) -> second == 1){
countFirstClass++;
}
else if(distanceToClass.find(*it) -> second == 2){
countSecondClass++;
}
it++;
}
if(countFirstClass == k){
return 1;
}
else if(countSecondClass == k){
return 2;
}
SIT, LONAVALA 43
int main(int argc, char **argv)

{
MPI_Init(NULL, NULL); //initialize MPI execution environment
int world_size;
int rank;
//Determines the size of the group associated with a communicator
MPI_Comm_size(MPI_COMM_WORLD, &world_size); //
//determines the rank of calling process in the communicator

MPI_Comm_rank(MPI_COMM_WORLD, &rank);
//world_size = number of instances (e.g. 20 in this example)

/*Multiplied by 3 because training dataset contains 3 column data + 1 column
class label*/
//MPI_Request and MPI_Status are the data types
MPI_Request requests[(world_size - 1) * 3];
MPI_Status statuses[(world_size - 1) * 3];
string line;
ifstream myfile("training.txt");
//init
if (myfile.is_open())
{
while (getline(myfile,line))
{
vector<string> parts = split(line, ' ');
//string to double conversion, supported by C++11
Instance instance(std::stod(parts[0]), std::stod(parts[1]),

std::stod(parts[2]), std::stod(parts[3]));
//push back constructed object

instances.push_back(instance);
}
myfile.close();
}
SIT, LONAVALA 44
/*The training set: https://archive.ics.uci.edu/ml/datasets/Skin+Segmentation#

This dataset is of the dimension 245057 * 4 where first three columns are B,G,R
(x1,x2, and x3 features) values and fourth column is of the class labels (decision
variable y)*/
//find min and max

double minR = instances[0].getR();
double maxR = instances[0].getR();
double minG = instances[0].getG();

double maxG = instances[0].getG();
double minB = instances[0].getB();

double maxB = instances[0].getB();

if(instances[i].getR() > maxR){
maxR = instances[i].getR();
}
else if(instances[i].getR() < minR){
minR = instances[i].getR();
}
if(instances[i].getG() > maxG){

maxG = instances[i].getG();
}
else if(instances[i].getG() < minG){
minG = instances[i].getG();
}
if(instances[i].getB() > maxB){

maxB = instances[i].getB();
}
else if(instances[i].getB() < minB){
minB = instances[i].getB();
}
}
//standardization or normalization of training dataset

SIT, LONAVALA 45
double curr = instances[i].getR();

double res = (curr - minR) / (maxR - minR);
instances[i].setR(res);
curr = instances[i].getG();
res = (curr - minG) / (maxG - minG);
instances[i].setG(res);
curr = instances[i].getB();
res = (curr - minB) / (maxB - minB);
instances[i].setB(res);
//setting k = sqrt(number of training instances)

k = sqrt(instances.size());
ifstream new_file("test.txt");
string new_line;
vector<TestInstance>test_instances;
double start, end;
//if Process 0
if(rank == 0) {
if (new_file.is_open())
{
while (getline(new_file,new_line))
{
vector<string> parts = split(new_line, ' ');
double r = std::stod(parts[0]);
double g = std::stod(parts[1]);
double b = std::stod(parts[2]);
r = (r - minR) / (maxR - minR);

g = (g - minG) / (maxG - minG);
SIT, LONAVALA 46
b = (b - minB) / (maxB - minB);
TestInstance new_instance(r, g, b);

test_instances.push_back(new_instance);
}
//Get current system time
start = MPI_Wtime();
int index = 1;
for(int i = 1; i < test_instances.size(); i++){
double r = test_instances[i].getR();
MPI_Isend(&r, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, requests +
index);
index ++;
double g = test_instances[i].getG();
MPI_Isend(&g, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, requests
+ index);
index ++;
double b = test_instances[i].getB();
MPI_Isend(&b, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, requests
+ index);
index ++;
}
double r = test_instances[0].getR();
double g = test_instances[0].getG();
double b = test_instances[0].getB();
map<double, int> distanceToClass;
set<double> distances;
int class_predicted = returnClassForObject(r, g, b, distances,
distanceToClass);
printf("Class for %d object is: %d\n", rank + 1, class_predicted);
}
else{
double r;
double g;
double b;
MPI_Irecv(&r, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, requests +
rank + 1);
SIT, LONAVALA 47
MPI_Irecv(&g, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, requests +

rank + 2);
MPI_Irecv(&b, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, requests +
rank + 3);
MPI_Wait(requests + rank + 1, statuses + rank + 1);

map<double, int> distanceToClass;

set<double> distances;
int class_predicted = returnClassForObject(r, g, b, distances,
distanceToClass);
printf("Class for %d object is: %d\n", rank + 1, class_predicted);
}
MPI_Barrier(MPI_COMM_WORLD);
if(rank == 0){
end = MPI_Wtime();
printf("Elapsed time: %.2f seconds.\n", (end - start));
}
MPI_Finalize();
/* ---------------------- OUTPUT ------------------------

apr@C04L0818:~/knn$ mpicxx -o knn-mpi-1 knn-mpi-1.cpp -std=c++11
apr@C04L0818:~/knn$ mpirun -np 20 knn-mpi-1
Class for 2 object is: 2
SIT, LONAVALA 48

Elapsed time: 0.00 seconds.
apr@C04L0818:~/knn$ */
Conclusion:
To Implement Parallel of the K Nearest Neighbors Classifier.
ARTIFICIAL INTELLIGENCE
SIT, LONAVALA 49
Assignment No 1
Aim:
Solve 8-puzzle problem using A* algorithm. Assume any initial configuration

and define goal configuration clearly.
Objective:
Student will learn:
i) The Basic Concepts of A Star :Evaluation function, Path Cost

,Heuristic function, Calculation of heuristic function
ii) General structure of eight puzzle problem.
iii) Logic of A star implementation for eight puzzle problem.
Theory:
Introduction:
In computer science, A* (pronounced as "A star") is a computer algorithm that is widely

used in path finding and graph traversal, the process of plotting an efficiently traversable
SIT, LONAVALA 50
path between multiple points, called nodes. The A* algorithm combines features of
uniform-cost search and pure heuristic search to efficiently compute optimal solutions.
A* algorithm is a best-first search algorithm in which the cost associated with a node is
f(n) = g(n) + h(n), where g(n) is the cost of the path from the initial state to node n and
h(n) is the heuristic estimate or the cost or a path from node n to a goal.
Thus, f(n) estimates the lowest total cost of any solution path going through node n. At
each point a node with lowest f value is chosen for expansion. Ties among nodes of equal
f value should be broken in favor of nodes with lower h values. The algorithm terminates
when a goal is chosen for expansion.
A* algorithm guides an optimal path to a goal if the heuristic function h(n) is admissible,
meaning it never overestimates actual cost. For example, since airline distance never
overestimates actual highway distance, and manhattan distance never overestimates actual
moves in the gliding tile.
For Puzzle, A* algorithm, using these evaluation functions, can find optimal solutions to
these problems. In addition, A* makes the most efficient use of the given heuristic
function in the following sense: among all shortest-path algorithms using the given
heuristic function h(n). A* algorithm expands the fewest number of nodes.
The main drawback of A* algorithm and indeed of any best-first search is its memory
requirement. Since at least the entire open list must be saved, A* algorithm is severely
space-limited in practice, and is no more practical than best-first search algorithm on
current machines. For example, while it can be run successfully on the eight puzzles, it
exhausts available memory in a matter of minutes on the fifteen puzzles.
A star algorithm is very good search method, but with complexity problems
To implement such a graph-search procedure, we will need to use two lists of node:
1) OPEN: nodes that have been generated and have had the heuristic function applied to
them but which have not yet been examined (i.e., had their successors generated). OPEN
is actually a priority queue in which the elements with the highest priority are those with
the most promising value of the heuristic function.
SIT, LONAVALA 51
2) CLOSED: Nodes that have already been examined. We need to keep these nodes in
memory if we want to search a graph rather than a tree, since whether a node is generated,
we need to check whether it has been generated before
A * Algorithm:
1. Put the start nodes on OPEN.
2. If OPEN is empty, exit with failure
3. Remove from OPEN and place on CLOSED a node n having minimum f.
4. If n is a goal node exit successfully with a solution path obtained by tracing

back the pointers from n to s.
5. Otherwise, expand n generating its children and directing pointers from each child
node to n.

 For every child node n’ do

evaluate h(n’) and computef(n’) = g(n’) +h(n’)=
g(n)+c(n,n’)+h(n)

If n’ is already on OPEN or CLOSED compare its new f

with the old f and attach the lowest f to n’.
 
put n’ with its f value in the right order in OPEN
6. Go to step 2.
Example of calculation of heuristic values for 8-puzzle problem:
• h1(n) = number of misplaced tiles
• h2(n) =no. of squares from desired location of each tile
SIT, LONAVALA 52
• h1(S) = 8
• h2(S) = 3+1+2+2+2+3+3+2 = 18
Implementation logic for 8-puzzle problem using A* algorithm
f(n)=g(n)+h(n)
Where, f(n) is evaluation function
g(n) is path cost
h(n) is heuristic function
A* is commonly used for the common path finding problem in applications such as
games, but was originally designed as a general graph traversal algorithm.
Program:
Puzzleboard.java
package ai_practical.assno3;
import java.util.Scanner;
import javax.swing.JOptionPane;
public class PuzzelBoard {

private String board[][];
private int blankX,blankY;
public PuzzelBoard()
{
this.board = new String[3][3];
}
public PuzzelBoard(PuzzelBoard b)
SIT, LONAVALA 53
{
this.board = b.board;
this.blankX = b.blankX;
this.blankY = b.blankY;
}
public void initBoard()

{
Scanner inp = new Scanner(System.in);
System.out.println("\nEnter one tile as '-' ie. Blank tile\n");
for(int i=0; i<3; i++)
{
for(int j=0; j<3; j++)
{
board[i][j] = JOptionPane.showInputDialog("Enter the value of tile
["+i+"]["+(j)+"] : ");
if(board[i][j].equals("-"))
{
blankX=i;
blankY=j;
}
}
}
}
public String[][] getBoard()

{
return board;
}
public void setBoard(String[][] board)

{
{
{
this.board[i][j] = board[i][j];
}
}
}
SIT, LONAVALA 54
public int getBlankX()

{
return blankX;
}
public int getBlankY()

{
return blankY;
}
public void setBlankX(int x)

{
blankX = x;
}
public void setBlankY(int y)

{
blankY = y;
}
public void display()

{
{
{
System.out.print("\t"+board[i][j]);
}
System.out.println();
}
}
public PuzzelBoard nextMove(int gn, PuzzelBoard goal)

{
PuzzelBoard temp = new PuzzelBoard();
PuzzelBoard next = new PuzzelBoard();
int minFn = 999;
System.out.println("\nPossible moves are : ");
if(blankY>0)
{
temp.setBoard(board);
temp.swap(blankX, blankY, blankX, blankY-1);
int fn = (temp.getHn(goal)+gn);
SIT, LONAVALA 55
System.out.println("\nFor Fn = "+fn+" : ");

temp.display();
if(fn < minFn)
{
minFn = fn;
next.setBoard(temp.board);
next.setBlankX(blankX);
next.setBlankY(blankY-1);
}
}
if(blankY<2)
{
temp.swap(blankX, blankY, blankX, blankY+1);
temp.display();
if(fn < minFn)
{
minFn = fn;
next.setBlankX(blankX);
next.setBlankY(blankY+1);
}
}
if(blankX>0)
{
temp.swap(blankX, blankY, blankX-1, blankY);
temp.display();
if(fn < minFn)
{
minFn = fn;
next.setBlankX(blankX-1);
next.setBlankY(blankY);
}
SIT, LONAVALA 56
if(blankX<2)
{
temp.swap(blankX, blankY, blankX+1, blankY);
temp.display();
if(fn < minFn)
{
minFn = fn;
next.setBlankX(blankX+1);
next.setBlankY(blankY);
}
}
return next;
}
public void swap(int i1, int j1, int i2, int j2)
{
String temp = board[i1][j1];
board[i1][j1] = board[i2][j2];
board[i2][j2] = temp;
public boolean equals(PuzzelBoard b)

{
{
{
if(!this.board[i][j].equals(b.board[i][j]))
{
return false;
}
}
}
return true;
SIT, LONAVALA 57
public int getHn(PuzzelBoard goal)

{
int hn = 0;
{
{
if(!this.board[i][j].equals(goal.board[i][j]))
{
hn++;
}
}
}
return hn;
}
}
Output:
run:
Enter start Board :
Enter one tile as '-' ie. Blank tile
The given start board is :

a b c
d - f
g e h
Enter goal Board :
Enter one tile as '-' ie. Blank tile
The given goal board is :

a b c
d e f
g h -
SIT, LONAVALA 58
The board is solved as :
Board after 0 moves :

a b c
d - f
g e h
Possible moves are :
For Fn = 5 :
a b c
- d f
g e h
For Fn = 5 :
a b c
d f -
g e h
For Fn = 5 :
a - c
d b f
g e h
For Fn = 3 :
a b c
d e f
g - h

a b c
d e f
g - h
Possible moves are :
For Fn = 5 :
a b c
d e f
- g h
For Fn = 2 :
SIT, LONAVALA 59
a b c
d e f
g h -
For Fn = 5 :
a b c
d - f
g e h

a b c
d e f
g h -
Goal state achieved.
Conclusion: A star algorithm is implemented for eight puzzle problem
SIT, LONAVALA 60
Assignment No 2
Aim:
Implement any one of the following Expert System ,
 Medical Diagnosis of 10 diseases based on adequate symptoms
 Identifying birds of India based on characteristics
Software Requirements:
SWI-Prolog for Windows, Editor.
Theory:
A system that uses human expertise to make complicated decisions. Simulates reasoning
by applying knowledge and interfaces. Uses expert’s knowledge as rules and data within
the system. Models the problem solving ability of a human expert.
Components of an ES:
1. Knowledge Base
i. Represents all the data and information imputed by experts in the field.
ii. Stores the data as a set of rules that the system must follow to
make decisions.
2. Reasoning or Inference Engine
i. Asks the user questions about what they are looking for.
ii. Applies the knowledge and the rules held in the knowledge base.
iii. Appropriately uses this information to arrive at a decision.
3. User Interface
i. Allows the expert system and the user to communicate.
ii. Finds out what it is that the system needs to answer.
iii. Sends the user questions or answers and receives their response.
4. Explanation Facility
i. Explains the systems reasoning and justifies its conclusions.
SIT, LONAVALA 61
PROGRAM-
go:-
hypothesis(Disease),
write('It is suggested that the patient has '),
write(Disease),
nl,
undo;
write('Sorry, the system is unable to identify the disease'),nl,undo.
hypothesis(cold) :-
symptom(headache),
symptom(runny_nose),
symptom(sneezing),
symptom(sore_throat),
nl,
write('Advices and Sugestions:'),
nl,
write('1: Tylenol'),
nl,
write('2: Panadol'),
nl,
write('3: Nasal spray'),
nl,
write('Please weare warm cloths because'),
nl,!.
hypothesis(influenza) :-
symptom(sore_throat),
symptom(fever),
symptom(headache),
symptom(chills),
symptom(body_ache),
nl,
nl,
write('1: Tamiflu'),
SIT, LONAVALA 62
nl,
write('2: Panadol'),
nl,
write('3: Zanamivir'),
nl,
write('Please take a warm bath and do salt gargling because'),
nl,!.
hypothesis(typhoid) :-
symptom(headache),
symptom(abdominal_pain),
symptom(poor_appetite),
symptom(fever),
nl,
nl,
write('1: Chloramphenicol'),
nl,
write('2: Amoxicillin'),
nl,
write('3: Ciprofloxacin'),
nl,
write('4: Azithromycin'),
nl,
write('Please do complete bed rest and take soft diet because'),
nl,!.
hypothesis(chicken_pox) :-
symptom(rash),
symptom(body_ache),
symptom(fever),
nl,
nl,
write('1: Varicella vaccine'),
nl,
write('2: Immunoglobulin'),
nl,
write('3: Acetomenaphin'),
nl,
SIT, LONAVALA 63
write('4: Acyclovir'),
nl,
write('Please do have oatmeal bath and stay at home because'),
nl,!.
hypothesis(measles) :-
symptom(fever),
symptom(runny_nose),
symptom(rash),
symptom(conjunctivitis),
nl,
nl,
write('1: Tylenol'),
nl,
write('2: Aleve'),
nl,
write('3: Advil'),
nl,
write('4: Vitamin A'),
nl,
write('Please get rest and use more liquid because'),
nl,!.
hypothesis(malaria) :-
symptom(fever),
symptom(sweating),
symptom(headache),
symptom(nausea),
symptom(vomiting),
symptom(diarrhea),
nl,
nl,
write('1: Aralen'),
nl,
write('2: Qualaquin'),
nl,
write('3: Plaquenil'),
nl,
SIT, LONAVALA 64
write('4: Mefloquine'),
nl,
write('Please do not sleep in open air and cover your full skin because'),
nl,!.
ask(Question) :-
write('Does the patient has the symptom '),
write(Question),
write('? : '),
read(Response),
nl,
( (Response == yes ; Response == y)
->
assert(yes(Question)) ;
assert(no(Question)), fail).
:- dynamic yes/1,no/1.
symptom(S) :-
(yes(S)
->
true ;
(no(S)
->
fail ;
ask(S))).
undo :- retract(yes(_)),fail.
undo :- retract(no(_)),fail.
undo.
OUTPUT-
/*
SIT@SIT-ThinkCentre-E73:~$ swipl -s medicalExpert.pl
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free
software.
Please run ?- license. for legal details.
For online help and background, visit http://www.swi-prolog.org

For built-in help, use ?- help(Topic). or ?- apropos(Word).
SIT, LONAVALA 65
?- go.
|: go.
Does the patient has the symptom headache? :
|: yes.
Does the patient has the symptom sore_throat? :
|: n.
Does the patient has the symptom fever? :
|: y.
Does the patient has the symptom rash? :
|: y.
Does the patient has the symptom body_ache? :
Sorry, the system is unable to identify the disease
true.
?- go.
go.
|: yes.
|: yes.
|: yes.
|: yes.
Advices and Sugestions:

1: Varicella vaccine
2: Immunoglobulin
3: Acetomenaphin
4: Acyclovir
Please do have oatmeal bath and stay at home because
It is suggested that the patient has chicken_pox
true .
?- go.
go.
|: yes.
SIT, LONAVALA 66
|: no.
|: no.
true.
?- go.
go
|:
|: go.
ERROR: Stream user_input:56:0 Syntax error: Operator expected
Exception: (9) hypothesis(_2070) ? creep
?- go.
|: go.
|: n.
|: yes.
|: yes.
|: yes.

1: Varicella vaccine
2: Immunoglobulin
3: Acetomenaphin
4: Acyclovir
Please do have oatmeal bath and stay at home because
It is suggested that the patient has chicken_pox
true .
?- go.
|: go.
|: y.
SIT, LONAVALA 67
|: n.
|: y.
|: n.
true.
?- go.
|: go.
|: n.
|: n.
|: y.
|: y.
Does the patient has the symptom runny_nose? :
|: y.
Does the patient has the symptom sweating? :
true.
?-
[1]+ Stopped swipl -s medicalExpert.pl
SIT@SIT-ThinkCentre-E73:~$ swipl -s medicalExpert.pl
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free
software.
Please run ?- license. for legal details.
For online help and background, visit http://www.swi-prolog.org

For built-in help, use ?- help(Topic). or ?- apropos(Word).
?- go.
go.
|: n.
SIT, LONAVALA 68
|: n.
|: y.
|: y.
|: y.
Does the patient has the symptom sweating? :
true.
?-
| go.
|: go.
|: y
|: y.
ERROR: Stream user_input:28:2 Syntax error: Operator expected
Exception: (9) hypothesis(_2070) ? creep
?- go.
|: go.
|: y.
|: n.
|: y.
|: y.
|: y.
Does the patient has the symptom conjunctivitis? :

1: Tylenol
2: Aleve
3: Advil
4: Vitamin A
Please get rest and use more liquid because
SIT, LONAVALA 69
It is suggested that the patient has measles

true .
?-
*/
Conclusion:
To implement any one of the following Expert System , Medical
Diagnosis of 10 diseases based on adequate symptoms and Identifying birds of
India based on characteristics
Assignment No 3
Aim:
Use Heuristic Search Techniques to Implement Best first search (Best-
Solution but not always optimal) and A* algorithm (Always gives optimal
solution).
Program:
BFS:
DistanceComprater.java
packagebfs;
importjava.util.Comparator;
public class DistanceComparator implements Comparator<Node> //

Comparator for priority queue based on Node distance
{
@Override
public int compare(Node o1, Node o2) {
if(o1.getDistance() > o2.getDistance())
return 1;
else if(o1.getDistance() < o2.getDistance())
return -1;
return 0;
}
SIT, LONAVALA 70
Graph.java
packagebfs;
import java.util.ArrayList;
public class Graph {
ArrayList<HeadNode>headNodesList; //arraylist to
hold headNodes
int n;
public Graph(int size) //constructor to

initialize the Graph
{
this.n = size;
headNodesList = new ArrayList<>();
}
public void initGraph() //method to accept

graph nodes
{
Scanner sc = new Scanner(System.in);
{
//System.out.println("Enter the name of node" +(i+1)+" : ");
HeadNode hn = new HeadNode();
hn.setName(JOptionPane.showInputDialog("Enter the name of node" +(i+1)+" :
"));
headNodesList.add(hn); //add the nodes to
headNodeList
}
{
HeadNodetempHeadNode = headNodesList.get(i);
while(true) //adjacent nodes input and

their distances
SIT, LONAVALA 71
{
String name = tempHeadNode.getName();
// sc.skip("\n");
String ans = JOptionPane.showInputDialog("\nDo you want to add
any adjacent node to node "+ name + "? (y/n) : ");
if(ans.equals("n") || ans.equals("N"))
break;
// sc.skip("\n");
String tempName=JOptionPane.showInputDialog("Enter the name of
adjacent node of "+ name + " : ");
//sc.skip("\n");
inttempDistance=Integer.parseInt(JOptionPane.showInputDialog("Enter
distance between nodes " + name + " and " + tempName+ " :"));
tempHeadNode.setNodeInfo(tempName,tempDistance);
headNodesList.set(i, tempHeadNode);
}
}
}
public void displayGraph() //method to display

graph in form of adjacency list
{
{
System.out.print("\n"+ tempHeadNode.getName() + " : ");
tempHeadNode.displayNodeList();
}
}
public int getIndex(String name) //method to get

index by passing name of node
{
{
if(tempHeadNode.getName().equals(name))
return i;
}
return -1; //if node not found return -1
SIT, LONAVALA 72
publicArrayListgetNeighbours(String node) //method to

return neighbours of selected node
{
intheadIndex=getIndex(node);
returnheadNodesList.get(headIndex).getNodeList();
}
}
HeadNode.java
packagebfs;
import java.util.Iterator;
public class HeadNode // Head node in adjacency list of graph

{
private String name; // node name
privateArrayList<Node>adjnodes = new ArrayList<>(); // List of adjacent
nodes
public void setName(String name) {

this.name = name;
}
public String getName() {

return name;
}
public void setNodeInfo(String name,int distance)// Add adjacent node

{
Node n = new Node(name,distance);
adjnodes.add(n);
}
publicArrayListgetNodeList()
{
returnadjnodes;
SIT, LONAVALA 73
public void displayNodeList() // Display adjacent nodes list (name,distance)

{
Iterator i = adjnodes.iterator();
if(i.hasNext())
{
Node temp= (Node)i.next();
System.out.print("("+temp.getName()+","+temp.getDistance()+")");
}
while(i.hasNext())
{
System.out.print(", ("+temp.getName()+","+temp.getDistance()+")");
}
}
Node.java
packagebfs;
public class Node // Class for adjacent nodes to headnode

{
String name; // node name
int distance; // distance between this node and headnode
// constructor and getters, setters for data members
public Node(String name, int dist)

{
this.name = name;
this.distance = dist;
}
public int getDistance() {

return distance;
}
SIT, LONAVALA 74

return name;
}
public void setDistance(int distance) {

this.distance = distance;
}

this.name = name;
}
BFS.java
packagebfs;
importjava.util.PriorityQueue;
public class BFS // Class for BFS Algorithm

{
/**
* @param args the command line arguments
*/
public static void main(String[] args)
{
int n;
n=Integer.parseInt(JOptionPane.showInputDialog("Enter No of nodes"));
// Enter no. of rows
PriorityQueue<Node>pq = new PriorityQueue<>(new DistanceComparator());
// Initilize priority queue
ArrayList<Boolean> visited = new ArrayList<>(n);
ArrayList<String> parent = new ArrayList<>(n); // Store
parent node
{
SIT, LONAVALA 75
visited.add(false); // Set visited list for all

nodes false
parent.add("NIL"); // Set parent of all
nodes NIL
}
Graph graph = new Graph(n); // Create

graph instance
graph.initGraph(); // Initialize graph
graph.displayGraph(); // Display graph as
adjacency list
String start, goal; // Accept start

and goal nodes
start = JOptionPane.showInputDialog("Enter the name of start node : ");
goal = JOptionPane.showInputDialog("Enter the name of goal node : ");
pq.add(new Node(start,0)); // Add start node

to priority queue with distance 0
visited.set(graph.getIndex(start), true); // Set visited
true
parent.set(graph.getIndex(start), "NIL"); // Set parent
of start NIL
System.out.println("\n\nPriority queue contents : \n");
displayQueue(pq);
while(!pq.isEmpty()) // Process untill

queue is not empty
{
Node temp = pq.poll(); // Remove
node with minimum distance
displayQueue(pq);
if(temp.getName().equals(goal)) // Check if goal
node is found
{
//JOptionPane.showMessageDialog(,"\nGoal node found");
System.out.println("\nGoal node '"+temp.getName() + "' found");
break;
}
else
SIT, LONAVALA 76
{
ArrayList<Node>neighbours = graph.getNeighbours(temp.getName()); //
Get the neighbours of the retrieved node that are not visited
for(Node n1:neighbours) // For all adjacent
nodes
{
if(!visited.get(graph.getIndex(n1.getName())))
{
visited.set(graph.getIndex(n1.getName()), Boolean.TRUE); // Mark
visited if not marked
pq.add(n1); // Add them to queue
parent.set(graph.getIndex(n1.getName()), temp.getName()); // Set parent
of neighbour node
}
}
displayQueue(pq); // Display the Queue
}
}
tracePath(parent,graph,goal);
}
private static void displayQueue(PriorityQueue<Node>pq) //

Fuction to display queue
{
for(Node n:pq)
{
System.out.print(n.getName()+"\t");
}
System.out.println("");
}
private static void tracePath(ArrayList<String> parent, Graph graph, String

goal) // Function to trace the path
{
System.out.println("\n\nPath : ");
String path = goal;
String temp = goal;
SIT, LONAVALA 77
while(!parent.get(graph.getIndex(temp)).equals("NIL")) //
Continue path till parent is not NIL
{
temp = parent.get(graph.getIndex(temp));
path = temp + ", " + path;
}
System.out.println(path);
}
}
/*
OUTPUT :
run:
A : (B,3), (C,1)
B : (D,3), (E,2)
C:
D:
E:
Priority queue contents :
C B
B
B
E D
D
D
Goal node 'D' found
Path :
A, B, D
BUILD SUCCESSFUL (total time: 1 minute 8 seconds)
SIT, LONAVALA 78
*/
A* Algorithm:
FixComprator.java
packageastargraph;
importjava.util.Comparator;
public class FxComparator implements Comparator<HeadNode> //

Comparator for priority queue based on fx value
{
@Override
public int compare(HeadNode o1, HeadNode o2) {
if(o1.getFx()> o2.getFx())
return 1;
else if(o1.getFx() < o2.getFx())
return -1;
return 0;
}
Graph.java
packageastargraph;
public class Graph { // Class for graph
ArrayList<HeadNode>headNodesList;
int n;
SIT, LONAVALA 79
public Graph(int size) // Initialize size and head node list

{
this.n = size;
headNodesList = new ArrayList<>();
}
public void initGraph() // Initialize graph nodes and edges

{
Scanner sc = new Scanner(System.in);
for(int i=0;i<n;i++) // Accept node names and their heuristic values
{
HeadNode hn = new HeadNode();

hn.setName(JOptionPane.showInputDialog("Enter the name of node " +
(i+1)+" : "));
hn.setHx(Integer.parseInt(JOptionPane.showInputDialog("Enter the heuristic
value of node " +(i+1)+" : ")));
headNodesList.add(hn);
}
{
while(true) // Accept adjacent nodes and their distances

{
String name = tempHeadNode.getName();
String ans = JOptionPane.showInputDialog("\nDo you want to add
any adjacent node to node "+ name + "? (y/n) : ");
if(ans.equals("n") || ans.equals("N"))
break;
// sc.skip("\n");
String tempName=JOptionPane.showInputDialog("Enter the name of
adjacent node of "+ name + " : ");
//sc.skip("\n");
inttempDistance=Integer.parseInt(JOptionPane.showInputDialog("Enter
distance between nodes " + name + " and " + tempName+ " :"));
tempHeadNode.setNodeInfo(tempName,tempDistance);
headNodesList.set(i, tempHeadNode);
SIT, LONAVALA 80
}
}
publicvoid displayGraph() // Display graph adjacency list

{
{
System.out.print("\n"+ tempHeadNode.getName() + " (hx =
"+tempHeadNode.getHx()+") : ");
tempHeadNode.displayNodeList();
}
}
public int getIndex(String name) // Get index for given name

{
{
if(tempHeadNode.getName().equals(name))
return i;
}
return -1;
}
publicArrayListgetNeighbours(String node) // Get neighbour nodes list

{
intheadIndex=getIndex(node);
returnheadNodesList.get(headIndex).getNodeList();
}
public void setGx(String name,intgx) // Set gx for a node and update adjacency
list
{
int index = getIndex(name);
HeadNode node = headNodesList.get(index);
node.setGx(gx);
headNodesList.set(index, node);
}
publicHeadNodegetHeadNode (String name){ // Get Head node by name
SIT, LONAVALA 81
returnheadNodesList.get(getIndex(name));
}
public void setFx(Node neighbour,HeadNodecurr) // Set fx for neighbour via

current node
{
inttempGx = curr.getGx() + neighbour.getDistance(); // Get distance from
source to neighbour via current node
HeadNodeadj = getHeadNode(neighbour.getName()); // Get adjacent head node
if(tempGx>= adj.getGx()) // Check if calculated distance is less than previous
distance
return;
adj.setGx(tempGx); // Set gx as calculated distance

headNodesList.set(getIndex(adj.getName()), adj);// Update headnode list
}
}
HeadNode.java
packageastargraph;
import java.util.Iterator;
public class HeadNode// Adjacency list head node

{
private String name; // node name
private int gx; // gx value
private int hx; // heuristic value hx
private int fx; // fx = gx+hx value
privateArrayList<Node>adjnodes = new ArrayList<>(); // Adjacent nodes
list
publicHeadNode() // Initialize gx,hx and fx to infinity

{
gx=hx=999;
fx = gx+hx;
}
public int getGx() {
SIT, LONAVALA 82
returngx;
}
public void setGx(int gx) { // Set gx and update fxaccordigly

this.gx = gx;
setFx(this.gx+hx);
System.out.println("\nFx of node "+this.name+" = "+this.fx);
}
public int getHx() {

returnhx;
}
public void setHx(int hx) { // Set hx and update fx accordingly

this.hx = hx;
setFx(this.hx+gx);
}
public int getFx() {

returnfx;
}
public void setFx(int fx) {

this.fx = fx;
}

this.name = name;
}

return name;
}
public void setNodeInfo(String name,int distance) // Set adjacent node name

and distance
{
Node n = new Node(name,distance);
adjnodes.add(n); // Add node to list
}
SIT, LONAVALA 83
publicArrayListgetNodeList()
{
returnadjnodes;
}
public void displayNodeList() // Display adjacent nodes list (name,distance)

{
Iterator i = adjnodes.iterator();
if(i.hasNext())
{
System.out.print("("+temp.getName()+","+temp.getDistance()+")");
}
while(i.hasNext())
{
System.out.print(", ("+temp.getName()+","+temp.getDistance()+")");
}
}
Node.java
packageastargraph;
public class Node // Adjacent node name and distance

{
String name;
int distance;
public Node(String name, int dist)

{
this.name = name;
this.distance = dist;
}
public int getDistance() {
SIT, LONAVALA 84
return distance;
}

return name;
}
public void setDistance(int distance) {

this.distance = distance;
}

this.name = name;
}
AStarGraph.java
packageastargraph;
importjava.util.PriorityQueue;
public class AStarGraph {
/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
int n;
n=Integer.parseInt(JOptionPane.showInputDialog("Enter No of nodes"));
// Enter no. of rows
PriorityQueue<HeadNode> open = new PriorityQueue<>(new

FxComparator()); // Initilize priority queue openlist
ArrayList<HeadNode> closed = new ArrayList<>(n); //
Initialize closed list
SIT, LONAVALA 85
ArrayList<String> parent = new ArrayList<>(n); // Store

parent node
{
parent.add("NIL"); // Set parent of all
nodes NIL
}
Graph graph = new Graph(n); // Create

graph instance
graph.initGraph(); // Initialize graph
graph.displayGraph(); // Display graph as
adjacency list
String start, goal; // Accept start

and goal nodes
start = JOptionPane.showInputDialog("Enter the name of start node : ");
goal = JOptionPane.showInputDialog("Enter the name of goal node : ");
graph.setGx(start, 0); // Set gx=0 for start

node
open.add(graph.getHeadNode(start)); // Add start
node to open list
parent.set(graph.getIndex(start), "NIL"); // Set parent
of start NIL
displayQueue(open);
displayClosed(closed);
while(!open.isEmpty()) // Process until

open list is not empty
{
HeadNode temp = open.poll(); // Remove node
with minimum fx from open list
closed.add(temp); // Add it to closed list
displayQueue(open);
displayClosed(closed);
if(temp.getName().equals(goal)) // Check if goal
node is found
{
SIT, LONAVALA 86
System.out.println("\nGoal node '"+temp.getName() + "' found");

break;
}
else
{
ArrayList<Node>neighbours = temp.getNodeList(); // Get the
neighbours of the retrieved node
for(Node n1:neighbours) // For all adjacent
nodes
{
if(inClosed(n1.getName(), closed)) // If node in closed
list, process next node
continue;
if(!inOpen(n1.getName(), open)) // Check if not in
open list
{
graph.setFx(n1,temp); // Set fx for neighbour node

via current
open.add(graph.getHeadNode(n1.getName())); // Add it toopen
list
parent.set(graph.getIndex(n1.getName()), temp.getName()); // Set parent
of neighbour node
}
}
displayQueue(open);
}
tracePath(parent, graph, goal);

}
private static void displayQueue(PriorityQueue<HeadNode> open)

// Fuction to display queue open list
{
System.out.print("\nOpen List : ");
if(open.isEmpty())
{
SIT, LONAVALA 87
System.out.println("Empty");
return;
}
for(HeadNode n: open)
{
}
}
private static void displayClosed(ArrayList<HeadNode> closed)

// Fuction to display closed list
{
System.out.print("\nClosed List : ");
if(closed.isEmpty())
{
System.out.println("Empty");
return;
}
for(HeadNode n: closed)
{
}
}
private static boolean inClosed(String name, ArrayList<HeadNode> closed)

// Check if node in closed list
{
for(HeadNode n: closed)
{
if(n.getName().equals(name))
return true;
}
return false;
}
private static boolean inOpen(String name, PriorityQueue<HeadNode> open)

// Check if node in closed list
{
for(HeadNode n: open)
{
SIT, LONAVALA 88
if(n.getName().equals(name))
return true;
}
return false;
}
private static void tracePath(ArrayList<String> parent, Graph graph, String

goal) // Function to trace the path
{
System.out.println("\n\nPath : ");
String path = goal;
String temp = goal;
while(!parent.get(graph.getIndex(temp)).equals("NIL")) //
Continue path till parent is not NIL
{
temp = parent.get(graph.getIndex(temp));
path = temp + ", " + path;
}
System.out.println(path);
}
}
/*
OUTPUT:
run:
A (hx = 6) : (B,1), (C,3)

B (hx = 4) : (D,2)
C (hx = 3) : (D,5)
D (hx = 1) :
Fx of node A = 6
Open List : A
Closed List : Empty
Open List : Empty
Closed List : A
SIT, LONAVALA 89
Fx of node B = 5
Fx of node C = 6
Open List : B C
Open List : C
Closed List : A B
Fx of node D = 4
Open List : D C
Open List : C
Closed List : A B D
Path :
A, B, D
Goal node 'D' found

BUILD SUCCESSFUL (total time: 1 minute 37 seconds)
*/
Conclusion:
Thus we have studied to use Heuristic Search Techniques to Implement
Best first search and A* algorithm
SIT, LONAVALA 90
Assignment No 4
Aim:
Constraint Satisfaction Problem:
Implement crypt-arithmetic problem or n-queens or graph coloring problem
( Branch and Bound and Backtracking)
Objective:
Student will learn:
1. The basic concept of constraint satisfaction problem and backtracking.
2. General structure of N Queens problem.
Theory:
The N Queen is the problem of placing N chess queens on an N×N chessboard so that no
two queens attack each other. For example, following is a solution for 4 Queen problem.
The expected output is a binary matrix which has 1s for the blocks where queens are
placed.
For example, following is the output matrix for above 4 queen solution.
{ 0, 1, 0, 0} {
0, 0, 0, 1} {
1, 0, 0, 0} {
0, 0, 1, 0}
Generate all possible configurations of queens on board and print a configuration
that satisfies the given constraints.while there are untried conflagrations {
SIT, LONAVALA 91
generate the next configuration

if queens don't attack in this configuration then
{
print this configuration;
}
}
Backtracking Algorithm
Backtracking is finding the solution of a problem whereby the solution depends on the
previous steps taken.
In backtracking, we first take a step and then we see if this step taken is correct or not i.e.,
whether it will give a correct answer or not. And if it doesn’t, then we just come back and
change our first step. In general, this is accomplished by recursion. Thus, in backtracking,
we first start with a partial sub-solution of the problem (which may or may not lead us to
the solution) and then check if we can proceed further with this sub-solution or not. If not,
then we just come back and change it.
Thus, the general steps of backtracking are:
• start with a sub-solution
• check if this sub-solution will lead to the solution or not
• If not, then come back and change the sub-solution and continue again
The idea is to place queens one by one in different columns, starting from the leftmost
column. When we place a queen in a column, we check for clashes with already placed
queens. In the current column, if we find a row for which there is no clash, we mark this
row and column as part of the solution. If we do not find such a row due to clashes then
we backtrack and return false.
Algorithm:
1) Start in the leftmost column
2) If all queens are placed return true
3) Try all rows in the current column. Do following for every tried row.
SIT, LONAVALA 92
a) If the queen can be placed safely in this row then mark this [row, column]
as part of the solution and recursively check if placing queen here leads
to a solution.
b) If placing the queen in [row, column] leads to a solution then return true.
c) If placing queen doesn't lead to a solution then unmark this [row,

column] (Backtrack) and go to step (a) to try other rows.
3) If all rows have been tried and nothing worked, return false to trigger backtracking.
Program:
package ai_practical.assno12;
public class NQueen {
public static void main(String[] args) {

placeQueens(4);
}
private static void placeQueens(int gridSize) {

if(gridSize<4){
System.out.println("No Solution available");
}else{
int[] board = new int[gridSize];
placeAllQueens(board, 0);
printBoard(board);
}
}
private static boolean placeAllQueens(int[] board, int row) {

if(row == board.length){
return true;
}
boolean isAllQueensPlaced = false;

for (int column = 0; column < board.length; column++) {
board[row] = column;
if(isSafe(board, row)){
isAllQueensPlaced = placeAllQueens(board, row+1);
}
SIT, LONAVALA 93
if(isAllQueensPlaced){
return true;
}
}
return false;
}
private static boolean isSafe(int[] board, int row) {

for (int i = 0; i < row; i++) {
if(board[row] == board[i]){
return false;
}
if(Math.abs(board[row] - board[i]) == Math.abs(row-i)){

return false;
}
}
return true;
}
private static void printBoard(int[] board) {

for (int i = 0; i < board.length; i++) {
for (int j = 0; j < board.length; j++) {
if(j==board[i]){
System.out.print("Q ");
}else{
System.out.print("_ ");
}
}
System.out.println();
}
}
/*
run:
_Q__
___Q
SIT, LONAVALA 94
Q___
__Q_
Conclusion: N-queens problem is implemented using backtracking.
DATA ANALYTICS
SIT, LONAVALA 95
Assignment No 1
Aim:
Download the Iris flower dataset or any other dataset into a DataFrame.
(eg https://archive.ics.uci.edu/ml/datasets/Iris ) Use Python/R and Perform
following –
 How many features are there and what are their types (e.g., numeric,
nominal)?
 Compute and display summary statistics for each feature available in the
dataset. (eg. minimum value, maximum value, mean, range, standard
deviation, variance and percentiles
 Data Visualization-Create a histogram for each feature in the dataset to
illustrate the feature distributions. Plot each histogram.
 Create a boxplot for each feature in the dataset. All of the boxplots should
be combined into a single plot. Compare distributions and identify
outliers.
Theory:
R:
R is a powerful language used widely for data analysis and statistical computing. It
was developed in early 90s. Since then, endless efforts have been made to improve R’s user
interface. The journey of R language from a rudimentary text editor to interactive R Studio
and more recently Jupyter Notebooks has engaged many data science communities across the
world.
This was possible only because of generous contributions by R users globally. Inclusion of
powerful packages in R has made it more and more powerful with time. Packages such as
dplyr, tidyr, readr, data.table, SparkR, ggplot2 have made data manipulation, visualization
and computation much faster.
Advantages of R:
1. The style of coding is quite easy.

2. It’s open source. No need to pay any subscription charges.
SIT, LONAVALA 96
3. Availability of instant access to over 7800 packages customized for various

computation tasks.
4. The community support is overwhelming. There are numerous forums to help you
out.
5. Get high performance computing experience ( require packages)
6. One of highly sought skill by analytics and data science companies.
The interface of R Studio:
1. R Console: This area shows the output of code you run. Also, you can directly write
codes in console. Code entered directly in R console cannot be traced later. This is
where R script comes to use.
2. R Script: As the name suggest, here you get space to write codes. To run those codes,
simply select the line(s) of code and press Ctrl + Enter. Alternatively, you can click
on little ‘Run’ button location at top right corner of R Script.
3. R environment: This space displays the set of external elements added. This includes
data set, variables, vectors, functions etc. To check if data has been loaded properly in
R, always look at this area.
4. Graphical Output: This space display the graphs created during exploratory data
analysis. Not just graphs, you could select packages, seek help with embedded R’s
official documentation.
SIT, LONAVALA 97
Train Data: The predictive model is always built on train data set. An intuitive way to
identify the train data is, that it always has the ‘response variable’ included.
Test Data: Once the model is built, it’s accuracy is ‘tested’ on test data. This data always
contains less number of observations than train data set. Also, it does not include ‘response
variable’.
Data Sets Loads specified data sets, or list the available data sets.
Names The Names Of An Object
Functions to get or set the names of an object.
DIM Dimensions Of An Object
Retrieve or set the dimension of an object.
VIEW Invoke A Data Viewer
Invoke a spreadsheet-style data viewer on a matrix-like R object.
Standard deviation and Variance:
The standard deviation of an observation variable is the square root of its variance.

The variance is a numerical measure of how the data values is dispersed around the mean.
Percentile:
The nth percentile of an observation variable is the value that cuts off the first n percent of the
data values when it is sorted in ascending order.
Histogram:
A histogram represents the frequencies of values of a variable bucketed into ranges.

Histogram is similar to bar chat but the difference is it groups the values into continuous
ranges. Each bar in histogram represents the height of the number of values present in that
range.
R creates histogram using hist() function. This function takes a vector as an input and uses
some more parameters to plot histograms.
Syntax
The basic syntax for creating a histogram using R is –
hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used −
 v is a vector containing numeric values used in histogram.
 main indicates title of the chart.
 col is used to set color of the bars.
SIT, LONAVALA 98
 border is used to set border color of each bar.

 xlab is used to give description of x-axis.
 xlim is used to specify the range of values on the x-axis.
 ylim is used to specify the range of values on the y-axis.
 breaks is used to mention the width of each bar.
Summary
A very useful multipurpose function in R is summary(X), where X can be one of any number
of objects, including datasets, variables, and linear models, just to name a few
Response Variable (a.k.a Dependent Variable): In a data set, the response variable (y) is
one on which we make predictions. In this case, we’ll predict ‘Item_Outlet_Sales’.
Predictor Variable (a.k.a Independent Variable): In a data set, predictor variables (Xi) are
those using which the prediction is made on response variable.
Boxpots
Boxplots are great for comparing a groups of data. Let’s compare the sepal widths to the
species. The key is that the first variable is an ordered vector of quantitative
data Sepal.Width and the second variable is a vector of categorical data Species. We model
the relationship as Sepal.Width~Speciesmeaning that the Sepal.Width depends on the type
of Species.
SIT, LONAVALA 99
Program:
library(datasets)
data("iris")
names(iris)
dim(iris)
#view a dataset
View(iris)
#internal structur
min(iris$Sepal.Length)
max(iris$Sepal.Length)
mean(iris$Sepal.Length)
range(iris$Sepal.Length)
#standard deviation
sd(iris$Sepal.Length)
#variance
var(iris$Sepal.Length)
#percentile
quantile(iris$Sepal.Length)
#to display specific value
quantile(iris$Sepal.Length,c(0.3,0.6))
#histo
h <- hist(iris$Sepal.Length,main="sepal length frequencies-
histogram",xlab="sepal length",xlim=c(3.5,8.5),col="blue")
h
#using breaks
histogram",xlab="sepal
length",xlim=c(3.5,8.5),col="blue",labels=TRUE,breaks=3,border="green",las=
2)
SIT, LONAVALA 100


length",xlim=c(3.5,8.5),col="red",labels=TRUE,breaks=3,border="green",las=3
)
H<-
hist(iris$Sepal.Length,breaks=c(4.3,4.6,4.9,5.2,5.5,5.8,6.1,6.4,6.7,7.0,7.3,7.6,7.
9))
boxplot(iris$Sepal.Length)
summary(iris$Sepal.Length)
myboxplot<-boxplot(iris[,-5])
#outliers
myboxplot$out
Output:
> library(datasets)
> data("iris")
> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
> dim(iris)
[1] 150 5
> View(iris)
SIT, LONAVALA 101

> min(iris$Sepal.Length)
[1] 4.3
> max(iris$Sepal.Length)
[1] 7.9
> mean(iris$Sepal.Length)
[1] 5.843333
> range(iris$Sepal.Length)
[1] 4.3 7.9
> sd(iris$Sepal.Length)
[1] 0.8280661
> var(iris$Sepal.Length)
[1] 0.6856935
> quantile(iris$Sepal.Length)
0% 25% 50% 75% 100%
4.3 5.1 5.8 6.4 7.9
> quantile(iris$Sepal.Length,c(0.3,0.6))
30% 60%
5.27 6.10
> h <- hist(iris$Sepal.Length,main="sepal length frequencies-
histogram",xlab="sepal length",xlim=c(3.5,8.5),col="blue")
>h
$breaks
[1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
$counts
[1] 5 27 27 30 31 18 6 6
$density
[1] 0.06666667 0.36000000 0.36000000 0.40000000 0.41333333 0.24000000
0.08000000 0.08000000
$mids
[1] 4.25 4.75 5.25 5.75 6.25 6.75 7.25 7.75
$xname
[1] "iris$Sepal.Length"
$equidist
[1] TRUE
attr(,"class")
[1] "histogram"
SIT, LONAVALA 102


length",xlim=c(3.5,8.5),col="blue",labels=TRUE,breaks=3,border="green",las=
2)

SIT, LONAVALA 103

length",xlim=c(3.5,8.5),col="red",labels=TRUE,breaks=3,border="green",las=3
)
> H<-
hist(iris$Sepal.Length,breaks=c(4.3,4.6,4.9,5.2,5.5,5.8,6.1,6.4,6.7,7.0,7.3,7.6,7.
9))
> boxplot(iris$Sepal.Length)
SIT, LONAVALA 104

> summary(iris$Sepal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
> myboxplot<-boxplot(iris[,-5])
> myboxplot$out
[1] 4.4 4.1 4.2 2.0
SIT, LONAVALA 105

Assignment No 2
Aim:
Download Pima Indians Diabetes dataset. Use Naive Bayes‟ Algorithm
for classification
 Summarize the properties in the training dataset so that we can calculate
probabilities and make predictions.
Problem Statement:
Use of Naive Bayes‟ Algorithm for classification Load the data from CSV file and
split it into training and test datasets. Summarize the properties in the training dataset so that
we can calculate probabilities and make predictions. And Classify samples from a test dataset
and a summarized training dataset.
Objective:
 Summarize the properties in the training dataset so that we can calculate probabilities
and make predictions.
Theory:
R:
world.
Advantages of R:

computation tasks.
out.
SIT, LONAVALA 106

Library:
1. caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc
Contains several basic utility functions including: moving (rolling, running) window statistic
functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost
classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.
2. e1071: Misc Functions of the Department of Statistics, Probability Theory Group

(Formerly: E1071), TU Wien
SIT, LONAVALA 107

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support
vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ...
CSV(comma-separated values )
In computing, a comma-separated values (CSV) file is a delimited text file that uses a

comma to separate values. A CSV file stores tabular data (numbers and text) in plain text.
Each line of the file is a data record. Each record consists of one or more fields, separated
by commas. The use of the comma as a field separator is the source of the name for this file
format.
How to read csv:
mydata<-read.csv(file="F:/SEM7/DA/KomalN/Ass2/diabetes.csv",header=TRUE,sep=",")
The above reads the file diabetes.csv into a data frame that it creates called mydata.
header=TRUE specifies that this data includes a header row and sep=”,” specifies that the
data is separated by commas (though read.csv implies the same I think it’s safer to be
explicit).
Sample.split: Split Data Into Test And Train Set
temp_field <- sample.split(mydata,SplitRatio=0.7)
Split data from vector Y into two sets in predefined ratio while preserving relative ratios of
different labels in Y. Used to split the data used during classification into train and test
subsets.
SplitRatio
Splitting ratio:
I. if (0<=splitratio<1)< code=""> then SplitRatio fraction of points from Y will be et

toTRUE
II. if (SplitRatio==1) then one random point from Y will be set to TRUE
III. if (SplitRatio>1) then SplitRatio number of points from Y will be set to TRUE
train <- subset(mydata, temp_field==TRUE)
variable’.
test <- subset(mydata, temp_field==FALSE)
SIT, LONAVALA 108

Naive Bayes:
Naïve Bayes classification is a kind of simple probabilistic classification methods
based on Bayes’ theorem with the assumption of independence between features. The model
is trained on training dataset to make predictions by predict() function. This article introduces
two functions naiveBayes() and train() for the performance of Naïve Bayes classification.
Predict:
The predict() function to make predictions from that model on new data. The new dataset
must have all of the columns from the training data, but they can be in a different order with
different values.
Table: Cross Tabulation And Table Creation
table(pred1,test$Outcome,dnn = c("predicted","Actual"))
table uses the cross-classifying factors to build a contingency table of the counts at each
combination of factor levels.
Dnn: the names to be given to the dimensions in the result (the dimnames names).
Cbind Combine R Objects By Rows Or Columns
Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows,
respectively. These are generic functions with methods for other R classes
Program:
#library(datasets)
library(caTools)
library(e1071)
mydata<-
read.csv(file="F:/SEM7/DA/KomalN/Ass2/diabetes.csv",header=TRUE,sep=","
)
View(mydata)
head(train)
head(test)
my_model <- naiveBayes(as.factor(train$Outcome)~.,train)
my_model
SIT, LONAVALA 109

pred1<-predict(my_model,test[,-9])
pred1
pred1<-predict(my_model,test[,-9],type="raw")
pred1
pred1<-predict(my_model,test[,-9])
pred1
table(pred1,test$Outcome,dnn = c("predicted","Actual"))
output<- cbind(test,pred1)
View(output)
Output:
#library(datasets)
library(caTools)
library(e1071)
mydata<-
read.csv(file="F:/SEM7/DA/KomalN/Ass2/diabetes.csv",header=TRUE,sep=","
)
View(mydata)
> temp_field <- sample.split(mydata,SplitRatio=0.7)

> train <- subset(mydata, temp_field==TRUE)
> test <- subset(mydata, temp_field==FALSE)
SIT, LONAVALA 110

> head(train)
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI
DiabetesPedigreeFunction Age Outcome
2 1 85 66 29 0 26.6 0.351 31 0
3 8 183 64 0 0 23.3 0.672 32 1
4 1 89 66 23 94 28.1 0.167 21 0
7 3 78 50 32 88 31.0 0.248 26 1
8 10 115 0 0 0 35.3 0.134 29 0
9 2 197 70 45 543 30.5 0.158 53 1
> head(test)
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI
DiabetesPedigreeFunction Age Outcome
1 6 148 72 35 0 33.6 0.627 50 1
5 0 137 40 35 168 43.1 2.288 33 1
6 5 116 74 0 0 25.6 0.201 30 0
10 8 125 96 0 0 0.0 0.232 54 1
14 1 189 60 23 846 30.1 0.398 59 1
15 5 166 72 19 175 25.8 0.587 51 1
> my_model <- naiveBayes(as.factor(train$Outcome)~.,train)

> my_model
Naive Bayes Classifier for Discrete Predictors
Call:
SIT, LONAVALA 111

naiveBayes.default(x = X, y = Y, laplace = laplace)
A-priori probabilities:
Y
0 1
0.6269531 0.3730469
Conditional probabilities:
Pregnancies
Y [,1] [,2]
0 3.264798 3.073319
1 4.712042 3.771892
Glucose
Y [,1] [,2]
0 110.1277 26.59334
1 138.8272 33.08691
BloodPressure
Y [,1] [,2]
0 68.51402 17.91265
1 71.36126 20.30531
SkinThickness
Y [,1] [,2]
0 19.46106 14.81635
1 21.72251 17.42568
Insulin
Y [,1] [,2]
0 65.71963 92.92128
1 99.55497 134.75274
BMI
Y [,1] [,2]
0 30.39564 7.462149
1 35.18325 6.494494
DiabetesPedigreeFunction
Y [,1] [,2]
0 0.4289221 0.3089013
1 0.5271518 0.3344238
SIT, LONAVALA 112

Age
Y [,1] [,2]
0 31.4486 12.09977
1 37.1466 10.94577
> pred1<-predict(my_model,test[,-9])
> pred1
[1] 1 1 0 0 1 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0
0001101011010010001000101000011101100001001
10
[86] 0 0 0 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0
0100111000010000011010010001000100000100110
00
[171] 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 1 1 0 0 0 0 0
0010001001110101000011001010110000000010001
01
[256] 0
Levels: 0 1
> pred1<-predict(my_model,test[,-9],type="raw")
> pred1
0 1
[1,] 3.023617e-01 0.6976383049
[2,] 1.042643e-02 0.9895735696
[3,] 9.363969e-01 0.0636031459
[4,] 9.977055e-01 0.0022944542
[5,] 3.710766e-10 0.9999999996
[6,] 2.908498e-01 0.7091502399
[7,] 7.638351e-01 0.2361648756
[8,] 1.867342e-02 0.9813265778
[9,] 7.268751e-01 0.2731249027
[10,] 9.818539e-01 0.0181461175
[11,] 1.743120e-01 0.8256879958
[12,] 9.867358e-01 0.0132641733
[13,] 2.678128e-01 0.7321871927
[14,] 3.426135e-01 0.6573865416
[15,] 3.484119e-01 0.6515880608
[16,] 1.182130e-02 0.9881787022
[17,] 9.992828e-01 0.0007172357
[18,] 9.901948e-01 0.0098051522
…..
……
[255,] 5.173472e-02 0.9482652816
SIT, LONAVALA 113

[256,] 9.142187e-01 0.0857813186

> pred1<-predict(my_model,test[,-9])
> pred1
[1] 1 1 0 0 1 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0
0001101011010010001000101000011101100001001
10
[86] 0 0 0 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0
0100111000010000011010010001000100000100110
00
[171] 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 1 1 0 0 0 0 0
0010001001110101000011001010110000000010001
01
[256] 0
Levels: 0 1
> table(pred1,test$Outcome,dnn = c("predicted","Actual"))

Actual
predicted 0 1
0 144 22
1 35 55
> output<- cbind(test,pred1)
> View(output)
Assignment No 3
SIT, LONAVALA 114

Aim:
Trip History Analysis: Use trip history dataset that is from a bike sharing
service in the United States. The data is provided quarter-wise from 2010 (Q4)
onwards. Each file has 7 columns. Predict the class of user. Sample Test data set
available here https://www.capitalbikeshare.com/trip-history-data.
Problem Statement:
Analysis Trip History by using trip history dataset that is from a bike sharing service
in the United States. The data is provided quarter-wise from 2010 (Q4) onwards. Each file
has 7 columns. Predict the class of user.
Objective:
Predict the result from previous data.
Theory:
R:
world.
Advantages of R:

computation tasks.
out.
SIT, LONAVALA 115

Library:
3. caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc
Contains several basic utility functions including: moving (rolling, running) window statistic
functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost
classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.
4. e1071: Misc Functions of the Department of Statistics, Probability Theory Group

(Formerly: E1071), TU Wien
Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support
vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ...
5. rpart: Recursive Partitioning and Regression Trees
Recursive partitioning for classification, regression and survival trees. An implementation of

most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.
Data: an optional data frame in which to interpret the variables named in the formula.
SIT, LONAVALA 116

Method: one of "anova", "poisson", "class" or "exp". If method is missing then the routine

tries to make an intelligent guess. If y is a survival object, then method = "exp" is assumed,
if y has 2 columns then method = "poisson" is assumed, if y is a factor then method =
"class" is assumed, otherwise method = "anova" is assumed. It is wisest to specify the
method directly, especially as more criteria may added to the function in
future.Alternatively, method can be a list of functions named init, split and eval. Examples
are given in the file tests/usersplits.R in the sources, and in the vignettes ‘User Written Split
Functions’.
CSV(comma-separated values )
In computing, a comma-separated values (CSV) file is a delimited text file that uses a

comma to separate values. A CSV file stores tabular data (numbers and text) in plain text.
Each line of the file is a data record. Each record consists of one or more fields, separated
by commas. The use of the comma as a field separator is the source of the name for this file
format.
How to read csv:
mydata<-read.csv(file="F:/SEM7/DA/KomalN/Ass2/tripdata.csv",header=TRUE,sep=",")
The above reads the file tripdata.csv into a data frame that it creates called mydata.
header=TRUE specifies that this data includes a header row and sep=”,” specifies that the
data is separated by commas (though read.csv implies the same I think it’s safer to be
explicit).
Sample.split: Split Data Into Test And Train Set
Split data from vector Y into two sets in predefined ratio while preserving relative ratios of
different labels in Y. Used to split the data used during classification into train and test
subsets.
SplitRatio
Splitting ratio:
IV. if (0<=splitratio<1)< code=""> then SplitRatio fraction of points from Y will be et

toTRUE
V. if (SplitRatio==1) then one random point from Y will be set to TRUE
VI. if (SplitRatio>1) then SplitRatio number of points from Y will be set to TRUE
SIT, LONAVALA 117

variable’.
Summary
A very useful multipurpose function in R is summary(X), where X can be one of any number
of objects, including datasets, variables, and linear models, just to name a few. When used,
the command provides summary data related to the individual object that was fed into it.
Thus, the summary function has different outputs depending on what kind of object it takes as
an argument.
Head: Return The First Or Last Part Of An Object
Returns the first or last parts of a vector, matrix, table, data frame or function.
Since head() and tail() are generic functions, they may also have been extended to other
classes.
Predict: Predicted values based on linear model object
newdata Data frame in which to predict

type Type of prediction (response or model term)
Cbind: Combine R Objects By Rows Or Columns

Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows,
respectively. These are generic functions with methods for other R classes
Printcp: Displays CP Table For Fitted Itree Object

Displays the cp table for fitted itree object. Note that cp is not defined
for method="purit" or "extremes". Otherwise identical to rpart's printcp function.
Prune: Cost-Complexity Pruning Of An Rpart Object
Determines a nested sequence of subtrees of the supplied rpart object by
recursively snipping off the least important splits, based on the complexity parameter (cp).
Cp: Complexity parameter to which the rpart object will be trimmed.
Program:
library(e1071)
library(caTools)
library(rpart)
SIT, LONAVALA 118

mydata<-
read.csv(file="/home/SIT/Desktop/tripdata.csv",header=TRUE,sep=",")
View(mydata)
#consider column1,4,6,9 - output class
subset_mydata <- mydata[,c(1,4,6,9)]
temp_field <- sample.split(subset_mydata,SplitRatio=0.9)
train <- subset(subset_mydata, temp_field==TRUE)
test <- subset(subset_mydata, temp_field==FALSE)
summary(train)
summary(test)
head(train)
head(test)
fit <- rpart(train$Member.type~.,data=train,method="class")
plot(fit)
text(fit)
#test excluding last colm
pred<- predict(fit,newdata=test[,-4],type=("class"))
mean(pred==test$Member.type)
output <- cbind(test,pred)
View(output)
#fit<- rpart(train$Member.type~., data=train)

printcp(fit)
opt<- which.min(fit$cptable[,"xerror"])
cp <- fit$cptable[opt,"CP"]
#prune tree
pruned_model<-prune(fit,cp)
plot(fit)
text(fit)
Output:
> library(e1071)
> library(caTools)
> library(rpart)
> mydata<-
read.csv(file="/home/SIT/Desktop/tripdata.csv",header=TRUE,sep=",")
> View(mydata)
SIT, LONAVALA 119

> subset_mydata <- mydata[,c(1,4,6,9)]

> temp_field <- sample.split(subset_mydata,SplitRatio=0.9)
> train <- subset(subset_mydata, temp_field==TRUE)
> test <- subset(subset_mydata, temp_field==FALSE)
> summary(train)
Duration Start.station.number End.station.number Member.type
Min. : 60 Min. :31000 Min. :31000 Casual: 76741
SIT, LONAVALA 120

1st Qu.: 440 1st Qu.:31208 1st Qu.:31212 Member:203845

Median : 769 Median :31258 Median :31257
Mean : 1263 Mean :31322 Mean :31322
3rd Qu.: 1323 3rd Qu.:31500 3rd Qu.:31408
Max. :85674 Max. :32227 Max. :32227
> summary(test)
Min. : 60 Min. :31000 Min. :31000 Casual:25562
1st Qu.: 443 1st Qu.:31208 1st Qu.:31212 Member:67967
Median : 769 Median :31258 Median :31257
Mean : 1260 Mean :31321 Mean :31322
3rd Qu.: 1320 3rd Qu.:31411 3rd Qu.:31408
Max. :86181 Max. :32227 Max. :32227
> head(train)
2 578 31232 31609 Casual
3 580 31232 31609 Casual
4 606 31104 31509 Member
6 175 31104 31117 Member
7 1605 31264 31641 Casual
8 1591 31264 31641 Casual
> head(test)
1 679 31302 31307 Member
5 582 31129 31118 Member
9 509 31116 31203 Member
13 1226 31609 31230 Casual
17 209 31204 31275 Member
21 5012 31084 31084 Casual
> fit <- rpart(train$Member.type~.,data=train,method="class")
> plot(fit)
> text(fit)
SIT, LONAVALA 121

> pred<- predict(fit,newdata=test[,-4],type=("class"))

> mean(pred==test$Member.type)
[1] 0.8073966
> output <- cbind(test,pred)
> View(output)
> printcp(fit)
SIT, LONAVALA 122

Classification tree:
rpart(formula = train$Member.type ~ ., data = train, method = "class")
Variables actually used in tree construction:

[1] Duration End.station.number Start.station.number
Root node error: 76741/280586 = 0.2735
n= 280586
CP nsplit rel error xerror xstd

1 0.229617 0 1.00000 1.00000 0.0030768
2 0.017259 1 0.77038 0.78160 0.0028298
3 0.013174 4 0.71603 0.70976 0.0027301
4 0.010000 5 0.70286 0.70287 0.0027200
> opt<- which.min(fit$cptable[,"xerror"])
> cp <- fit$cptable[opt,"CP"]
> pruned_model<-prune(fit,cp)
> plot(fit)
> text(fit)
SIT, LONAVALA 123

SIT, LONAVALA 124

Assignment No 4
Aim:
Twitter Data Analysis: Use Twitter data for sentiment analysis. The dataset is 3MB in
size and has 31,962 tweets. Identify the tweets which are hate tweets and which are not.
Sample Test data set available here https://datahack.analyticsvidhya.com/contest/practice-
problem-twitter-sentiment-analysis/
Prerequisites: Fundamentals of Python Programming Languages
Objective: To learn the concept of natural language processing (NLP) tasks such as
part-of-speech tagging, noun phrase extraction, sentiment analysis, and
classification.
Theory: I. Python regular expression Library: Regular expressions are used to
identify whether a pattern exists in a given sequence of characters
(string) or not. They help in manipulating textual data, which is often a
pre-requisite for data science projects that involve text mining. You
must have come across some application of regular expressions: they
are used at the server side to validate the format of email addresses or
password during registration, used for parsing text data files to find,
replace or delete certain string, etc.
II.Python Tweepy library: This library provides access to the entire

twitter RESTful API methods. Each method can accept various
parameters and return responses.
III. Python TextBlob library: TextBlob is a Python (2 and 3) library for

processing textual data. It provides a consistent API for diving into
common natural language processing (NLP) tasks such as part-of-
speech tagging, noun phrase extraction, sentiment analysis, and more.
IV. Authentication: create OAuthHandler object : Tweepy supports oauth

authentication. Authentication is handled by the tweepy AuthHandler
class.
V.Utility Function in Python : This function provides sentiments

analysis of tweets.
SIT, LONAVALA 125

Facilities: Linux Operating Systems, Python editor.
Input:
Structured Dataset : Twitter Dataset
File: Twitter.csv
Output:
1. Sentiment analysis of twitter dataset.
2. Categorization of tweets as positive and negative tweets..
Conclusion: Hence, we have studied sentiment analysis of Twitter dataset to

classify the tweets from dataset.
Questions:
1. What is Sentiment analysis?
2. Which API is required to handle authentication?
3. What is syntax of utility function?
4. What is function of Text Blob library?
5. What is Re library in python?
Program:
library(dplyr)
library(tibble)
library(twitteR)
library(graphics)
library(purrr)
library(stringr)
library(tm)
library(syuzhet)
library(gapminder)
library(httpuv)
library(openssl)
library(RCurl)
library(RInside)
library(Rcpp)
library(textclean)
library(SnowballC)
library(gapminder)
#Connect to Twitter API:+
api_key<- "CRzxTe08UF5Mrl7nFxovwmAhN"
SIT, LONAVALA 126

api_secret <- "1DyGZD5alm0JS5GvsB5E5FB9piWf4Q4GjV4MAC1wDCXk0rTA7T"

access_token <- "1054441427696988160-TYFDtvBmMlwv2VPu0JMs6a1SfVFpfT"
access_token_secret <- "D44rlviM8AFH03v6EjpgZsP2XpPlnIsiKU1n6XWRUxSrW"
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)
#Get tweets:
prat_tweets <- userTimeline("prattprattpratt", n = 250)
oprah_tweets <- userTimeline("Oprah", n = 250)
neil_tweets <- userTimeline("neiltyson", n = 250)
mar_tweets <- userTimeline("billmaher", n = 250)
kutch_tweets <- userTimeline("aplusk", n = 250)
tweets<- tbl_df(map_df(c(prat_tweets,oprah_tweets,neil_tweets,
mar_tweets,kutch_tweets),as.data.frame))
write.csv(tweets, file="tweets.csv", row.names=FALSE)
#Read in data:
setwd("C:/Users/mateo/Documents/Repo/text-analysis")
tweets<-read.csv("tweets.csv")
#Clean up data:
twitterCorpus <-Corpus(VectorSource(tweets$text))
inspect(twitterCorpus[1:10])
twitterCorpus<- tm_map(twitterCorpus, content_transformer(tolower))

twitterCorpus<- tm_map(twitterCorpus,removeWords,stopwords("en"))
twitterCorpus<- tm_map( twitterCorpus,removeNumbers)
twitterCorpus<- tm_map( twitterCorpus,removePunctuation)
removeURL<- function(x) gsub("http[[:alnum:]]*", "", x)

twitterCorpus<- tm_map(twitterCorpus,content_transformer(removeURL))
removeURL<- function(x) gsub("edua[[:alnum:]]*", "", x)

twitterCorpus<- tm_map(twitterCorpus,content_transformer(removeURL))
# remove non "American standard code for information interchange (curly quotes and
ellipsis)"
SIT, LONAVALA 127

# using function from package "textclean"
removeNonAscii<-function(x) textclean::replace_non_ascii(x)
twitterCorpus<-tm_map(twitterCorpus,content_transformer(removeNonAscii))
twitterCorpus<- tm_map(twitterCorpus,removeWords,c("amp","ufef",
"ufeft","uufefuufefuufef","uufef","s"))
twitterCorpus<- tm_map(twitterCorpus,stripWhitespace)
inspect(twitterCorpus[1:10])
# stem corpus after sentiment analysis(given my sentiment dictionary choice), but before
cluster analysis
#Sentiment analysis:
# find count of 8 emotional sentiments
+ emotions<-get_nrc_sentiment(twitterCorpus$content)
barplot(colSums(emotions),cex.names = .7,
col = rainbow(10),
main = "Sentiment scores for tweets"
)
# sentiment positiviy rating
get_sentiment(twitterCorpus$content[1:10])
sent<-get_sentiment(twitterCorpus$content)
sentimentTweets<-dplyr::bind_cols(tweets,data.frame(sent))
# mean of sentiment positivity
meanSent<-function(i,n){
mean(sentimentTweets$sent[i:n])
}
(scores<-c(prat=meanSent(1,250),
oprah=meanSent(251,500),
neil=meanSent(501,750),
maher=meanSent(751,849),
astk=meanSent(850,1002)))
SIT, LONAVALA 128

#Cluster analysis:
# convert to stem words

twitterCorpus<-tm_map(twitterCorpus,stemDocument)
# build document term matrix
dtm<-DocumentTermMatrix(twitterCorpus)
dtm
mat<-as.matrix(dtm)
# create distance matrix
d<-dist(mat)
# input distance matrix into hclust function using method "ward.D"
groups<-hclust(d,method="ward.D")
plot(groups,hang=-1)
cut<-cutree(groups,k=6)
newMat<-dplyr::bind_cols(tweets,data.frame(cut))
table(newMat$screenName,newMat$cut)
SIT, LONAVALA 129

LP1 1

Uploaded by

Copyright:

Available Formats

You might also like

LP1 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

LP1 1

Uploaded by

Copyright:

Available Formats

LABORATORY PRACTICE-I BE COMPUTER

1. High Performance Computing

1.1 Vector and Matrix Operations -

1.2 Parallel Sorting Algorithms-

1.3 Parallel Search Algorithm-

1.4 Parallel Implementation of the K Nearest Neighbors Classifier

2.2 Implement any one of the following Expert System ,

2.4 Constraint Satisfaction Problem:

HIGH PERFORMANCE COMPUTING

Aim: Implement nxn matrix parallel addition, multiplication using

• P = M * N of size WIDTH x WIDTH

• One thread handles one element of P

Matrix Multiplication steps

Step 1: Matrix Data Transfers

Step 2: Simple Host Code in C

void MatrixMulOnHost(const Matrix M, const Matrix N, Matrix P)

Multiply Using One Thread Block

– Loads a row of matrix M

Step 3: Host-side Main Program Code

/ Setup the execution

/ Launch the device computation threads!

Step 4: Device-side Kernel Function

Step 5: Some Loose Ends

//1)How to add two largest vectors by Parallel execution

printf("\n Second Vector:\t");

#pragma omp parallel for

printf("\n Parallel-Vector Addition:(a,b,c)\t");

#pragma omp parallel for

guest-bvoaff@C04L0809:~$ g++ par_add_large_vectors.cpp -fopenmp

First Vector: 383 777 67 58 393 919 537 413 980

2) Multiply vector and matrix

//matrix of size 3x2

//column vector of size 2x1

//before multiplication check condition,

//display resultant vector

guest-zbuacn@admin:~$ g++ par_matrix_vect_mul.cpp -fopenmp

3) Multiply two n*n array using n2 processor

// Display input matrix A:

// Display input matrix B:

c_1=clock(); // time measure:

#pragma omp parallel

#pragma omp parallel for private(m,j)

// Display input matrix B:

/* TIME MEASURE + OUTPUT */

guest-tim1wd@C04L0818:~$ g++ matrix_matrix_multiplication.c -fopenmp

• Arrange elements of a list into certain order

ii) What is Parallel Sorting?

• Based on an existing sequential sort algorithm

• One of the straight-forward sorting methods

Bubble Sort Example

Algorithm for Parallel Bubble Sort

5. Exchange A[2i] ↔ A[2i+1]

Parallel Bubble Sort Example 1

Parallel Bubble Sort Example 2

Parallel Merge Sort

Algorithm for Parallel Merge Sort

1. Time Complexity Of parallel Merge Sort and parallel Bubble sort in

node left, right;

node insert(node , int);

node insert(node root, int data)