LP1 1

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 129

LABORATORY PRACTICE-I BE COMPUTER

INDEX

Sr. No
CONTENT

1. High Performance Computing

1.1 Vector and Matrix Operations -


Design parallel algorithm to
1. Add two large vectors
2. Multiply Vector and Matrix
3. Multiply two N × N arrays using n2 processors

1.2 Parallel Sorting Algorithms-


For Bubble Sort and Merger Sort, based on existing sequential algorithms, design
and implement parallel algorithm utilizing all resources available.

1.3 Parallel Search Algorithm-


Design and implement parallel algorithm utilizing all resources available. for
 Binary Search for Sorted Array
 Depth-First Search ( tree or an undirected graph ) OR
 Breadth-First Search ( tree or an undirected graph) OR
 Best-First Search that ( traversal of graph to reach a target in the shortest
possible path)

1.4 Parallel Implementation of the K Nearest Neighbors Classifier

2. Artificial Intelligence

2.1 Solve 8-puzzle problem using A* algorithm. Assume any initial configuration
and define goal configuration clearly.

2.2 Implement any one of the following Expert System ,


 Medical Diagnosis of 10 diseases based on adequate symptoms
 Identifying birds of India based on characteristics

2.3 Use Heuristic Search Techniques to Implement Best first search (Best-Solution
but not always optimal) and A* algorithm (Always gives optimal solution).

SIT, LONAVALA 1
LABORATORY PRACTICE-I BE COMPUTER

2.4 Constraint Satisfaction Problem:


Implement crypt-arithmetic problem or n-queens or graph coloring problem
( Branch and Bound and Backtracking)

3. Data Analytics

3.1 Download the Iris flower dataset or any other dataset into a DataFrame. (eg
https://archive.ics.uci.edu/ml/datasets/Iris ) Use Python/R and Perform following –
 How many features are there and what are their types (e.g., numeric,
nominal)?
 Compute and display summary statistics for each feature available in the
dataset. (eg. minimum value, maximum value, mean, range, standard
deviation, variance and percentiles
 Data Visualization-Create a histogram for each feature in the dataset to
illustrate the feature distributions. Plot each histogram.
 Create a boxplot for each feature in the dataset. All of the boxplots should be
combined into a single plot. Compare distributions and identify outliers.

3.2 Download Pima Indians Diabetes dataset. Use Naive Bayes‟ Algorithm for
classification
 Load the data from CSV file and split it into training and test datasets.
 Summarize the properties in the training dataset so that we can calculate
probabilities and make predictions.
 Classify samples from a test dataset and a summarized training dataset.

3.3 Trip History Analysis: Use trip history dataset that is from a bike sharing service
in the United States. The data is provided quarter-wise from 2010 (Q4) onwards.
Each file has 7 columns. Predict the class of user. Sample Test data set available
here https://www.capitalbikeshare.com/trip-history-data.

3.4 Twitter Data Analysis: Use Twitter data for sentiment analysis. The dataset is
3MB in size and has 31,962 tweets. Identify the tweets which are hate tweets and
which are not. Sample Test data set available here
https://datahack.analyticsvidhya.com/contest/practice-problem-twitter-sentiment-
analysis/

SIT, LONAVALA 2
LABORATORY PRACTICE-I BE COMPUTER

HIGH PERFORMANCE COMPUTING

SIT, LONAVALA 3
LABORATORY PRACTICE-I BE COMPUTER

Assignment No 1

Aim:
Vector and Matrix Operations-
Design parallel algorithm to
1. Add two large vectors
2. Multiply Vector and Matrix
3. Multiply two N × N arrays using n2 processors

Aim: Implement nxn matrix parallel addition, multiplication using


CUDA, use shared memory.
Prerequisites:
- Concept of matrix addition, multiplication.
- Basics of CUDA programming
Objectives:
Student should be able to learn parallel programming, CUDA architecture
and CUDA processing flow
Theory:
A straightforward matrix multiplication example that illustrates the basic features of
memory and thread management in CUDA programs
• Leave shared memory usage until later
• Local, register usage
• Thread ID usage

• P = M * N of size WIDTH x WIDTH

• One thread handles one element of P


• M and N are loaded WIDTH times from global memory

Matrix Multiplication steps


1. Matrix Data Transfers
2. Simple Host Code in C
3. Host-side Main Program Code
4. Device-side Kernel Function
5. Some Loose Ends

SIT, LONAVALA 4
LABORATORY PRACTICE-I BE COMPUTER

Step 1: Matrix Data Transfers


/ Allocate the device memory where we will
copy M to Matrix Md;
Md.width = WIDTH;
Md.height = WIDTH;
Md.pitch = WIDTH;
int size = WIDTH * WIDTH * sizeof(float);
cudaMalloc((void**)&Md.elements, size);
// Copy M from the host to the device
cudaMemcpy(Md.elements, M.elements, size, cudaMemcpyHostToDevice);
/ Read M from the device to the host into P
cudaMemcpy(P.elements, Md.elements, size,
cudaMemcpyDeviceToHost);
...
/ Free device
memory
cudaFree(Md.ele
ments);

Step 2: Simple Host Code in C


/ Matrix multiplication on the (CPU) host in double precision
/ for simplicity, we will assume that all dimensions are equal

void MatrixMulOnHost(const Matrix M, const Matrix N, Matrix P)


{
for (int i = 0; i < M.height; ++i)
for (int j = 0; j < N.width; ++j) {
double sum = 0;
for (int k = 0; k < M.width; ++k) {
double a = M.elements[i * M.width + k];
double b = N.elements[k * N.width + j];
sum += a * b;
}
P.elements[i * N.width + j] = sum;
}

Multiply Using One Thread Block


• One Block of threads compute matrix P
– Each thread computes one element of P
• Each thread

SIT, LONAVALA 5
LABORATORY PRACTICE-I BE COMPUTER

– Loads a row of matrix M


– Loads a column of matrix N
– Perform one multiply and addition for each pair of M and N elements
– Compute to off-chip memory access ratio close to 1:1 (not very high)

Step 3: Host-side Main Program Code


int main(void) {
// Allocate and initialize the matrices
Matrix M = AllocateMatrix(WIDTH, WIDTH, 1);
Matrix N = AllocateMatrix(WIDTH, WIDTH, 1);
Matrix P = AllocateMatrix(WIDTH, WIDTH, 0);

/ M * N on the device
MatrixMulOnDevice(M,
N, P);
/ Free matrices
FreeMatrix(M)
;
FreeMatrix(N)
;
FreeMatrix(P);
return 0;
}

Host-side code
// Matrix multiplication on the device
void MatrixMulOnDevice(const Matrix M, const Matrix N, Matrix P)
{
/ Load M and N to the device
Matrix Md =
AllocateDeviceMatrix(M);
CopyToDeviceMatrix(Md, M);
Matrix Nd =
AllocateDeviceMatrix(N);
CopyToDeviceMatrix(Nd, N);
/ Allocate P on the device

/ Setup the execution


configuration dim3
dimBlock(WIDTH, WIDTH);
dim3 dimGrid(1, 1);

SIT, LONAVALA 6
LABORATORY PRACTICE-I BE COMPUTER

/ Launch the device computation threads!


MatrixMulKernel<<<dimGrid, dimBlock>>>(Md,
Nd, Pd);
/ Read P from the device
CopyFromDeviceMatrix(P, Pd);
/ Free device matrices

FreeDeviceMatrix(Md);
FreeDeviceMatrix(Nd);
FreeDeviceMatrix(Pd);
}

Step 4: Device-side Kernel Function


// Matrix multiplication kernel – thread specification
__global__ void MatrixMulKernel(Matrix M, Matrix N, Matrix P)
{
// 2D Thread ID
int tx = threadIdx.x;
int ty = threadIdx.y;
/ Pvalue is used to store the element of the matrix
/ that is computed by the thread
float Pvalue = 0;
for (int k = 0; k < M.width; ++k)
{
float Melement = M.elements[ty * M.pitch
+ k]; float Nelement = Nd.elements[k *
N.pitch + tx]; Pvalue += Melement *
Nelement;
}
/ Write the matrix to device memory;
/ each thread writes one element
P.elements[ty * P.pitch + tx] =
Pvalue;

Step 5: Some Loose Ends


- Free allocated CUDA memory

SIT, LONAVALA 7
LABORATORY PRACTICE-I BE COMPUTER

Facilities:
Latest version of 64 Bit Operating Systems, CUDA enabled NVIDIA Graphics card
Input:
Two matrices
Output:
Multiplication of two matrix

Software Engg.:
Mathematical Model:

Conclusion:
We learned parallel programming with the help of CUDA architecture.
Questions:

1. What is CUDA?
2. Explain Processing flow of CUDA programming.
3. Explain advantages and limitations of CUDA.
4. Make the comparison between GPU and CPU.
5. Explain various alternatives to CUDA.
6. Explain CUDA hardware architecture in detail.

Program:

//1)How to add two largest vectors by Parallel execution

#include<stdio.h>
#include<iostream>
#include<cstdlib>
//****important to add following library to allow a programmer to use parallel
paradigms*****
#include<omp.h>
using namespace std;
#define MAX 100
int main()
{
int a[MAX],b[MAX],c[MAX],i;
printf("\n First Vector:\t");

SIT, LONAVALA 8
LABORATORY PRACTICE-I BE COMPUTER

//Instruct a master thread to fork and generate more threads to process following
loop structure
#pragma omp parallel for
for(i=0;i<MAX;i++)
{
a[i]=rand()%1000;
}

//Discuss issue of this for loop below-if we make it parallel, possibly values that
get printed will not be in sequence as we dont have any control on order of
threads execution
for(i=0;i<MAX;i++)
{
printf("%d\t",a[i]);
}

printf("\n Second Vector:\t");

#pragma omp parallel for


for(i=0;i<MAX;i++)
{
b[i]=rand()%1000;
}

for(i=0;i<MAX;i++)
{
printf("%d\t",b[i]);
}

printf("\n Parallel-Vector Addition:(a,b,c)\t");

#pragma omp parallel for


for(i=0;i<MAX;i++)
{
c[i]=a[i]+b[i];
}

for(i=0;i<MAX;i++)
{

SIT, LONAVALA 9
LABORATORY PRACTICE-I BE COMPUTER

printf("\n%d\t%d\t%d",a[i],b[i],c[i]);
}
}

1)Output:

Output:

guest-bvoaff@C04L0809:~$ g++ par_add_large_vectors.cpp -fopenmp


guest-bvoaff@C04L0809:~$ ./a.out

First Vector: 383 777 67 58 393 919 537 413 980


729 582 814 434 43 87 276 788 403 754
932 676 739 226 94 539 915 335 386 492
649 421 362 27 690 59 763 926 426 736
368 429 530 123 135 929 802 69 198 324
315 167 456 11 42 229 373 421 784 370
526 873 857 545 367 364 750 808 178 584
651 399 60 368 12 586 886 793 540 172
211 567 782 862 22 91 956 862 170 996
281 305 925 84 327 336 505 846 313 124
895
Second Vector: 570 219 528 732 503 270 708 340 796
618 846 555 488 228 841 350 193 500 34
764 124 914 987 856 743 491 227 365 859
936 432 551 437 228 275 407 474 121 858
395 29 237 235 793 818 428 143 11 928
529 795 378 467 601 97 902 317 492 652
756 301 280 771 481 675 709 927 567 856
497 353 586 965 306 683 434 286 441 865
689 444 619 440 729 31 117 97 624 871
829 19 368 715 149 723 245 451 921 379
764
Parallel-Vector Addition:(a,b,c)
383 570 953
777 219 996
67 528 595
58 732 790
393 503 896
919 270 1189
537 708 1245
413 340 753

SIT, LONAVALA 10
LABORATORY PRACTICE-I BE COMPUTER

guest-bvoaff@C04L0809:~$

2) Multiply vector and matrix


#include<stdio.h>
#include<iostream>
#include<cstdlib>
#include<omp.h>
using namespace std;

int main()
{
int m=3,n=2;
int mat[m][n],vec[n],out[m];

//matrix of size 3x2


for(int row=0;row<m;row++)
{
for(int col=0;col<n;col++)
{
mat[row][col]=1;
}
}
//display matrix
cout<<"Input Matrix"<<endl;
for(int row=0;row<m;row++)
{
for(int col=0;col<n;col++)
{
cout<<"\t"<<mat[row][col];
}
cout<<""<<endl;
}

//column vector of size 2x1


for(int row=0;row<n;row++)
{
vec[row]=2;
}

//display vector
cout<<"Input Col-Vector"<<endl;

SIT, LONAVALA 11
LABORATORY PRACTICE-I BE COMPUTER

for(int row=0;row<n;row++)
{
cout<<vec[row]<<endl;
}

//before multiplication check condition,


no_of_cols(matrix)==no_of_rows(vector)
#pragma omp parallel
{
#pragma omp parallel for
for(int row=0;row<m;row++)
{
out[row]=0;
for(int col=0;col<n;col++)
{
out[row]+=mat[row][col]*vec[col];
//int count=out[row];
//printf("\n%d\n",count);
}
}

//display resultant vector


cout<<"Resultant Col-Vector"<<endl;

for(int row=0;row<m;row++)
{
cout<<"\nvec["<<row<<"]:"<<out[row]<<endl;
}

return 0;
}

SIT, LONAVALA 12
LABORATORY PRACTICE-I BE COMPUTER

2) Output

Output:

guest-zbuacn@admin:~$ g++ par_matrix_vect_mul.cpp -fopenmp


guest-zbuacn@admin:~$ ./a.out
Input Matrix
1 1
1 1
1 1
Input Col-Vector
2
2
Resultant Col-Vector

vec[0]:4

vec[1]:4

vec[2]:4

3) Multiply two n*n array using n2 processor

// Matrix-Matrix Multiplication
#include<iostream>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include<omp.h>
using namespace std;
#define N 4
float A[N][N], B[N][N], C[N][N]; // declaring matrices of NxN size
int main ()
{
/* DECLARING VARIABLES */
int i, j, m; // indices for matrix multiplication
float t_1; // Execution time measures
clock_t c_1, c_2;
/* FILLING MATRICES WITH RANDOM NUMBERS */
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)

SIT, LONAVALA 13
LABORATORY PRACTICE-I BE COMPUTER

{
A[i][j]= (rand()%5);
B[i][j]= (rand()%5);
}
}

// Display input matrix A:


printf("Matrix A:\n");
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
printf("%f\t",A[i][j]);
}
printf("\n");
}

// Display input matrix B:


printf("Matrix B:\n");
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
printf("%f\t",B[i][j]);
}
printf("\n");
}

c_1=clock(); // time measure:


/* MATRIX MULTIPLICATION */
printf("Max number of threads: %i \n",omp_get_max_threads());

#pragma omp parallel


#pragma omp single
{
printf("Number of threads: %i \n",omp_get_num_threads());
}

#pragma omp parallel for private(m,j)


// #pragma omp_set_num_threads(8)
for(i=0;i<N;i++)
{

SIT, LONAVALA 14
LABORATORY PRACTICE-I BE COMPUTER

for(j=0;j<N;j++)
{
C[i][j]=0.; // set initial value of resulting matrix C = 0

for(m=0;m<N;m++)
{
C[i][j]=A[i][m]*B[m][j]+C[i][j];
}
}
}

// Display input matrix B:


printf("Matrix C:\n");
for(i=0;i<N;i++)
{
for(j=0;j<N;j++)
{
printf("%f\t",C[i][j]);
}
printf("\n");
}

/* TIME MEASURE + OUTPUT */


c_2=clock(); // time measure:
t_1 = (float)(c_2-c_1)/CLOCKS_PER_SEC; // in seconds; - time elapsed
for job row-wise
printf("Execution time: %f(in seconds) \n",t_1);

/* TERMINATE PROGRAM */
return 0;
}

3)Output

SIT, LONAVALA 15
LABORATORY PRACTICE-I BE COMPUTER

Output:

guest-tim1wd@C04L0818:~$ g++ matrix_matrix_multiplication.c -fopenmp


guest-tim1wd@C04L0818:~$ ./a.out
Matrix A:
3.000000 2.000000 3.000000 1.000000
4.000000 2.000000 0.000000 3.000000
0.000000 2.000000 1.000000 2.000000
2.000000 2.000000 2.000000 4.000000
Matrix B:
1.000000 0.000000 0.000000 2.000000
1.000000 2.000000 4.000000 1.000000
1.000000 1.000000 3.000000 4.000000
0.000000 3.000000 0.000000 2.000000
Max number of threads: 4
Number of threads: 4
Matrix C:
8.000000 10.000000 17.000000 22.000000
6.000000 13.000000 8.000000 16.000000
3.000000 11.000000 11.000000 10.000000
6.000000 18.000000 14.000000 22.000000
Execution time: 0.005355(in seconds)
guest-tim1wd@C04L0818:~$

Conclusion:
To Design
parallel algorithm to Add two large vectors , Multiply Vector and Matrix
And Multiply two N × N arrays using n2 processors

Assignment No 2

SIT, LONAVALA 16
LABORATORY PRACTICE-I BE COMPUTER

Aim:
Parallel Sorting Algorithms-
For Bubble Sort and Merger Sort, based on existing sequential algorithms,
design and implement parallel algorithm utilizing all resources available.

Aim: Understand Parallel Sorting Algorithms like Bubble sort and Merge Sort.

Prerequisites:
Student should know basic concepts of Bubble sort and Merge Sort.
Objective: Study of Parallel Sorting Algorithms like Bubble sort and
Merge Sort
Theory:
i) What is Sorting?
Sorting is a process of arranging elements in a group in a particular order, i.e.,
ascending order, descending order, alphabetic order, etc.

• Arrange elements of a list into certain order


• Make data become easier to access
• Speed up other operations such as searching and merging. Many sorting
algorithms with different time and space complexities

ii) What is Parallel Sorting?


A sequential sorting algorithm may not be efficient enough when we have to
sort a huge volume of data. Therefore, parallel algorithms are used in sorting.

• Based on an existing sequential sort algorithm


– Try to utilize all resources available
– Possible to turn a poor sequential algorithm into a reasonable parallel
algorithm (bubble sort and parallel bubble sort)
• Completely new approach
– New algorithm from scratch
– Harder to develop
– Sometimes yield better solution

Bubble Sort

SIT, LONAVALA 17
LABORATORY PRACTICE-I BE COMPUTER

The idea of bubble sort is to compare two adjacent elements. If they are not in
the right order,switch them. Do this comparing and switching (if necessary) until the
end of the array is reached. Repeat this process from the beginning of the array n
times.

• One of the straight-forward sorting methods


– Cycles through the list
– Compares consecutive elements and swaps them if necessary
– Stops when no more out of order pair
• Slow & inefficient
2
• Average performance is O(n )

Bubble Sort Example


Here we want to sort an array containing [8, 5, 1]. The following figure shows
how we can sortthis array using bubble sort. The elements in consideration are shown
in bold.
8, 5, 1 Switch 8 and 5
5, 8, 1 Switch 8 and 1
5, 1, 8 Reached end start again.
5, 1, 8 Switch 5 and 1
1, 5, 8 No Switch for 5 and 8
1, 5, 8 Reached end start again.
1, 5, 8 No switch for 1, 5
1, 5, 8 No switch for 5, 8
1, 5, 8 Reached end.

  Implemented as a pipeline.
 Let local_size = n / no_proc. We divide the array in no_proc parts, and each
process executes the bubble sort on its part, including comparing the last
element
 with the first one belonging to the next thread.
 Implement with the loop (instead of j<i)
for (j=0; j<n-1; j++)
 For every iteration of i, each thread needs to wait until the previous thread
 has finished that iteration before starting.
 We'll coordinate using a barrier.

Algorithm for Parallel Bubble Sort


1. For k = 0 to n-2
2. If k is even then
3. for i = 0 to (n/2)-1 do in parallel
4. If A[2i] > A[2i+1] then

SIT, LONAVALA 18
LABORATORY PRACTICE-I BE COMPUTER

5. Exchange A[2i] ↔ A[2i+1]


6. Else
7. for i = 0 to (n/2)-2 do in parallel
8. If A[2i+1] > A[2i+2] then
9. Exchange A[2i+1] ↔ A[2i+2]
10. Next k

Parallel Bubble Sort Example 1


• Compare all pairs in the list in parallel
• Alternate between odd and even phases
• Shared flag, sorted, initialized to true at beginning of each iteration (2
phases), if any processor perform swap, sorted = false

Parallel Bubble Sort Example 2


• How many steps does it take to sort the following sequence from least to
greatest using the Parallel Bubble Sort? How does the sequence look like after
2 cycles?
• Ex: 4,3,1,2

Merge Sort
• Collects sorted list onto one processor
• Merges elements as they come together
• Simple tree structure
• Parallelism is limited when near the root
Theory:
To sort A[p .. r]:
1. Divide Step

SIT, LONAVALA 19
LABORATORY PRACTICE-I BE COMPUTER

If a given array A has zero or one element, simply return; it is already sorted.
Otherwise, splitA[p .. r] into two subarraysA[p .. q] and A[q + 1 .. r], each containing
about half of the elements of A[p .. r]. That is, q is the halfway point of A[p .. r].
2. Conquer Step
Conquer by recursively sorting the two subarraysA[p .. q] and A[q + 1 .. r].
3. Combine Step
Combine the elements back in A[p .. r] by merging the two sorted subarraysA[p .. q] and
A[q
+ 1 .. r] into a sorted sequence. To accomplish this step, we will define a procedure
MERGE (A, p, q, r).

Parallel Merge Sort


• Parallelize processing of sub-problems
• Max parallelization achived with one processor per node (at each layer/height)
Parallel Merge Sort Example
• Perform Merge Sort on the following list of elements. Given 2 processors,
P0 & P1, which processor is reponsible for which comparison?
• 4,3,2,1

Algorithm for Parallel Merge Sort

SIT, LONAVALA 20
LABORATORY PRACTICE-I BE COMPUTER

1. Procedure parallelMergeSort
2. Begin
3. Create processors Pi where i = 1 to n
4. if i > 0 then recieve size and parent from the root
5. recieve the list, size and parent from the root
6. endif
7. midvalue= listsize/2
8. if both children is present in the tree then
9. send midvalue, first child
10. send listsize-mid,second child
11. send list, midvalue, first child
12. send list from midvalue, listsize-midvalue, second child
13. call mergelist(list,0,midvalue,list, midvalue+1,listsize,temp,0,listsize)
14. store temp in another array list2
15. else
16. call parallelMergeSort(list,0,listsize)
17. endif
18. if i >0 then
19. send list, listsize,parent
20. endif
21. end

INPUT:
1. Array of integer numbers.
OUTPUT:
1. Sorted array of numbers
FAQ
1. What is sorting?
2. What is parallel sort?
3. How to sort the element using Bubble Sort?
4. How to sort the element using Parallel Bubble Sort?
5. How to sort the element using Parallel Merge Sort?
6. How to sort the element using Merge Sort?
7. What is searching?
8. Different types of searching methods.
9. Time complexities of sorting and searching methods.
10. How to calculate time complexity?
11. What are space complexity of all sorting and searching methods?
12. Explain what is best, worst and average case for each method of
searching and sorting.
ALGORITHM ANALYSIS

SIT, LONAVALA 21
LABORATORY PRACTICE-I BE COMPUTER

1. Time Complexity Of parallel Merge Sort and parallel Bubble sort in


best case is( when all data is already in sorted form):O(n)
2. Time Complexity Of parallel Merge Sort and parallel Bubble sort in
worst case is: O(n logn)
3. Time Complexity Of parallel Merge Sort and parallel Bubble sort in
average case is: O(n logn)
APPLICATIONS
1. Representing Linear data structure & Sequential data organization : structure & files
2. For Sorting sequential data structure
CONCLUSION
Thus, we have studied Parallel Bubble and Parallel Merge sort implementation.

Program:

1)Bubble Sort

#include<iostream>
#include<stdlib.h>
#include<omp.h>
using namespace std;

void bubble(int *, int);


void swap(int &, int &);

void bubble(int *a, int n)


{
for( int i = 0; i < n; i++ )
{
//int first = i % 2;
#pragma omp parallel for shared(a,i)
for( int j = i; j < n-1; j += 2 )
{
if( a[ j ] > a[ j+1 ] )
{
swap(a[j],a[j+1]);
}
}
}

SIT, LONAVALA 22
LABORATORY PRACTICE-I BE COMPUTER

void swap(int &a, int &b)


{

int test;
test=a;
a=b;
b=test;

int main()
{

int *a,n;
cout<<"\n enter total no of elements=>";
cin>>n;
a=new int[n];
cout<<"\n enter elements=>";
for(int i=0;i<n;i++)
{
cin>>a[i];
}

bubble(a,n);

cout<<"\n sorted array is=>";


for(int i=0;i<n;i++)
{
cout<<a[i]<<"\t";
}

return 0;
}

Output:

SIT@SIT-ThinkCentre-E73:~$ g++ bubble1.cpp


SIT@SIT-ThinkCentre-E73:~$ ./a.out

SIT, LONAVALA 23
LABORATORY PRACTICE-I BE COMPUTER

enter total no of elements=>4

enter elements=>2
6
8
3

sorted array is=>2 3 6 8

2)Merge Sort

#include<iostream>
#include<stdlib.h>
#include<omp.h>
using namespace std;

void mergesort(int a[],int i,int j);


void merge(int a[],int i1,int j1,int i2,int j2);

void mergesort(int a[],int i,int j)


{
int mid;
if(i<j)
{
mid=(i+j)/2;

#pragma omp parallel sections


{

#pragma omp section


{
mergesort(a,i,mid);
}

#pragma omp section


{
mergesort(a,mid+1,j);
}
}

merge(a,i,mid,mid+1,j);

SIT, LONAVALA 24
LABORATORY PRACTICE-I BE COMPUTER

}––––
}
void merge(int a[],int i1,int j1,int i2,int j2)
{
int temp[1000];
int i,j,k;
i=i1;
j=i2;
k=0;

while(i<=j1 && j<=j2)


{
if(a[i]<a[j])
{
temp[k++]=a[i++];
}
else
{
temp[k++]=a[j++];
}
}

while(i<=j1)
{
temp[k++]=a[i++];
}

while(j<=j2)
{
temp[k++]=a[j++];
}

for(i=i1,j=0;i<=j2;i++,j++)
{
a[i]=temp[j];
}
}

int main()
{
int *a,n,i;
cout<<"\n enter total no of elements=>";
cin>>n;

SIT, LONAVALA 25
LABORATORY PRACTICE-I BE COMPUTER

a= new int[n];

cout<<"\n enter elements=>";


for(i=0;i<n;i++)
{
cin>>a[i];
}

mergesort(a, 0, n-1);

cout<<"\n sorted array is=>";


for(i=0;i<n;i++)
{
cout<<"\n"<<a[i];
}

return 0;
}

Output:
SIT@SIT-ThinkCentre-E73:~$ g++ mergesort.cpp
SIT@SIT-ThinkCentre-E73:~$ ./a.out

enter total no of elements=>4

enter elements=>2
5
8
1

sorted array is=>


1
2
5
8
SIT@SIT-ThinkCentre-E73:~$

Conclusion:
For Bubble Sort and Merger Sort, based on existing sequential
algorithms, design and implement parallel algorithm utilizing all resources
available.

SIT, LONAVALA 26
LABORATORY PRACTICE-I BE COMPUTER

Assignment No 3

Aim:
Parallel Search Algorithm-
Design and implement parallel algorithm utilizing all resources available. for

SIT, LONAVALA 27
LABORATORY PRACTICE-I BE COMPUTER

 Binary Search for Sorted Array


 Depth-First Search ( tree or an undirected graph ) OR
 Breadth-First Search ( tree or an undirected graph) OR
 Best-First Search that ( traversal of graph to reach a target in the
shortest possible path)

Objective: To study and implementation of searching techniques.

Outcome: Students will be understand the implementation of Binary search and BFS,
DFS
Pre-requisites:

64-bit Open source Linux or its derivative

Programming Languages: C++/JAVA/PYTHON/R

Theory:

Binary Search:

In computer science, binary search, also known as half-interval


search,logarithmic search,or binary chop,is a search algorithm that finds the position of a
target value within a sorted array. Binary search compares the target value to the middle
element of the array. If they are not equal, the half in which the target cannot lie is
eliminated and the search continues on the remaining half, again taking the middle
element to compare to the target value, and repeating this until the target value is found.
If the search ends with the remaining half being empty, the target is not in the array.
Even though the idea is simple, implementing binary search correctly requires attention
to some subtleties about its exit conditions and midpoint calculation.
Binary search runs in logarithmic time in the worst case, making O(log n)
comparisons,

where n is the number of elements in the array, the O is Big O notation, and
log is the logarithm. Binary search takes constant (O(1)) space, meaning that
the
space taken by the algorithm is the same for any number of elements in the
array.Binary search is faster than linear search except for small arrays, but the
array must be sorted first. Although specialized data structures designed for fast

SIT, LONAVALA 28
LABORATORY PRACTICE-I BE COMPUTER

searching, such as hash tables, can be searched more efficiently, binary search
applies to a wider range of problems.
How Binary Search Works?
For a binary search to work, it is mandatory for the target array to be sorted. We
shall learn the process of binary search with a pictorial example. The following
is our sorted array and let us assume that we need to search the location of value
31 using binary search.

First, we shall determine half of the array by using this formula −


mid = low + (high - low) / 2

Here it is, 0 + (9 - 0 ) / 2 = 4 (integer value of 4.5). So, 4 is the mid of the array.

Now we compare the value stored at location 4, with the value being searched,
i.e. 31. We find that the value at location 4 is 27, which is not a match. As the
value is greater than 27 and we have a sorted array, so we also know that the
target value must be in the upper portion of the array.

We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2

Our new mid is 7 now. We compare the value stored at location 7 with our
target value 31.

SIT, LONAVALA 29
LABORATORY PRACTICE-I BE COMPUTER

The value stored at location 7 is not a match, rather it is more than what we are
looking for. So, the value must be in the lower part from this location.

Hence, we calculate the mid again. This time it is 5.

We compare the value stored at location 5 with our target value. We find
that it is a match.

We conclude that the target value 31 is stored at location 5.


Binary search halves the searchable items and thus reduces the count of
comparisons to be made to very less numbers.

Breadth-First Search :

Graph traversals

Graph traversal means visiting every vertex and edge exactly once in a well-defined
order. While using certain graph algorithms, you must ensure that each vertex of the
graph is visited exactly once. The order in which the vertices are visited are important
and may depend upon the algorithm or question that you are solving.

During a traversal, it is important that you track which vertices


have been visited. The most common way of tracking vertices is
to mark them.

Breadth First Search (BFS)


There are many ways to traverse graphs. BFS is the most commonly used approach.

SIT, LONAVALA 30
LABORATORY PRACTICE-I BE COMPUTER

BFS is a traversing algorithm where you should start traversing from a selected node
(source or starting node) and traverse the graph layerwise thus exploring the
neighbour nodes (nodes which are directly connected to source node). You must then
move towards the next-level neighbour nodes.
As the name BFS suggests, you are required to traverse the graph breadthwise as follows:
1. First move horizontally and visit all the nodes of the current layer
2. Move to the next layer
Consider the following diagram.

The distance between the nodes in layer 1 is comparitively lesser than the distance
between the nodes in layer 2. Therefore, in BFS, you must traverse all the nodes in
layer 1 before you move to the nodes in layer 2.

Traversing child nodes


A graph can contain cycles, which may bring you to the same node again while
traversing the graph. To avoid processing of same node again, use a boolean array
which marks the node after it is processed. While visiting the nodes in the layer of a
graph, store them in a manner such that you can traverse the corresponding child
nodes in a similar order.
In the earlier diagram, start traversing from 0 and visit its child nodes 1, 2, and 3.
Store them in the order in which they are visited. This will allow you to visit the
child nodes of 1 first (i.e. 4 and 5), then of 2 (i.e. 6 and 7), and then of 3 (i.e. 7) etc.
To make this process easy, use a queue to store the node and mark it as 'visited' until
all its neighbours (vertices that are directly connected to it) are marked. The queue
follows the First In First Out (FIFO) queuing method, and therefore, the neigbors of
the node will be visited in the order in which they were inserted in the node i.e. the
node that was inserted first will be visited first, and so on.

SIT, LONAVALA 31
LABORATORY PRACTICE-I BE COMPUTER

Program:
Binary Search:

#include<iostream>
#include<stdlib.h>
#include<omp.h>
using namespace std;

int binary(int *, int, int, int);

int binary(int *a, int low, int high, int key)


{

int mid;
mid=(low+high)/2;
int low1,low2,high1,high2,mid1,mid2,found=0,loc=-1;

#pragma omp parallel sections


{
#pragma omp section
{
low1=low;
high1=mid;

while(low1<=high1)
{

if(!(key>=a[low1] && key<=a[high1]))


{
low1=low1+high1;
continue;
}

cout<<"here1";
mid1=(low1+high1)/2;

if(key==a[mid1])
{
found=1;

SIT, LONAVALA 32
LABORATORY PRACTICE-I BE COMPUTER

loc=mid1;
low1=high1+1;
}

else if(key>a[mid1])
{

low1=mid1+1;
}

else if(key<a[mid1])
high1=mid1-1;

}
}

#pragma omp section


{
low2=mid+1;
high2=high;
while(low2<=high2)
{

if(!(key>=a[low2] && key<=a[high2]))


{
low2=low2+high2;
continue;
}

cout<<"here2";
mid2=(low2+high2)/2;

if(key==a[mid2])
{

found=1;
loc=mid2;
low2=high2+1;
}
else if(key>a[mid2])
{

SIT, LONAVALA 33
LABORATORY PRACTICE-I BE COMPUTER

low2=mid2+1;
}
else if(key<a[mid2])
high2=mid2-1;

}
}
}

return loc;
}

int main()
{

int *a,i,n,key,loc=-1;
cout<<"\n enter total no of elements=>";
cin>>n;
a=new int[n];

cout<<"\n enter elements=>";


for(i=0;i<n;i++)
{
cin>>a[i];
}

cout<<"\n enter key to find=>";


cin>>key;

loc=binary(a,0,n-1,key);

if(loc==-1)
cout<<"\n Key not found.";
else
cout<<"\n Key found at position=>"<<loc+1;

return 0;
}

Output:

SIT, LONAVALA 34
LABORATORY PRACTICE-I BE COMPUTER

[SIT@localhost ~]$ g++ brfs.cpp -fopevi binarysearch.cpp


[SIT@localhost ~]$ g++ binarysearch.cpp -fopenmp
[SIT@localhost ~]$ ./a.out

enter total no of elements=>6

enter elements=>10 45 63 78 90 230

enter key to find=>65

2) Breadth-First Search

#include<iostream>
#include<stdlib.h>
#include<queue>
using namespace std;

class node
{
public:

node *left, *right;


int data;

};

class Breadthfs
{

public:

node *insert(node *, int);


void bfs(node *);

};

node *insert(node *root, int data)


{

SIT, LONAVALA 35
LABORATORY PRACTICE-I BE COMPUTER

if(!root)
{

root=new node;
root->left=NULL;
root->right=NULL;
root->data=data;
return root;
}

queue<node *> q;
q.push(root);

while(!q.empty())
{

node *temp=q.front();
q.pop();

if(temp->left==NULL)
{

temp->left=new node;
temp->left->left=NULL;
temp->left->right=NULL;
temp->left->data=data;
return root;
}
else
{

q.push(temp->left);

if(temp->right==NULL)
{

temp->right=new node;
temp->right->left=NULL;
temp->right->right=NULL;
temp->right->data=data;
return root;

SIT, LONAVALA 36
LABORATORY PRACTICE-I BE COMPUTER

}
else
{

q.push(temp->right);

}
}
}

void bfs(node *head)


{

queue<node*> q;
q.push(head);

int qSize;

while (!q.empty())
{
qSize = q.size();
#pragma omp parallel for
for (int i = 0; i < qSize; i++)
{
node* currNode;
#pragma omp critical
{
currNode = q.front();
q.pop();
cout<<"\t"<<currNode->data;

}
#pragma omp critical
{
if(currNode->left)
q.push(currNode->left);
if(currNode->right)
q.push(currNode->right);
}

}
}

SIT, LONAVALA 37
LABORATORY PRACTICE-I BE COMPUTER

int main(){

node *root=NULL;
int data;
char ans;

do
{
cout<<"\n enter data=>";
cin>>data;

root=insert(root,data);

cout<<"do you want insert one more node?";


cin>>ans;

}while(ans=='y'||ans=='Y');

bfs(root);

return 0;
}
[SIT@localhost ~]$ vi brfs.cpp
[SIT@localhost ~]$ g++ brfs.cpp -fopenmp
[SIT@localhost ~]$ ./a.out

enter data=>10
do you want insert one more node?y

enter data=>5
do you want insert one more node?y

enter data=>15
do you want insert one more node?y

enter data=>25
do you want insert one more node?y

enter data=>20
do you want insert one more node?n
10 5 15 25 20

SIT, LONAVALA 38
LABORATORY PRACTICE-I BE COMPUTER

Conclusion: We have implemented Binary searching and BFS .

Assignment No 4

Aim:
Parallel Implementation of the K Nearest Neighbors Classifier

Objective:
To Implement Parallel of the K Nearest Neighbors Classifier.

Theory:
In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method
used for classification and regression.[1] In both cases, the input consists of the k closest
training examples in the feature space. The output depends on whether k-NN is used for
classification or regression:

 In k-NN classification, the output is a class membership. An object is classified


by a majority vote of its neighbors, with the object being assigned to the class
most common among its k nearest neighbors (k is a positive integer, typically
small). If k = 1, then the object is simply assigned to the class of that single
nearest neighbor.

 In k-NN regression, the output is the property value for the object. This value is
the average of the values of its k nearest neighbors.

SIT, LONAVALA 39
LABORATORY PRACTICE-I BE COMPUTER

k-NN is a type of instance-based learning, or lazy learning, where the function is only
approximated locally and all computation is deferred until classification. The k-NN
algorithm is among the simplest of all machine learning algorithms.
Both for classification and regression, a useful technique can be used to assign weight
to the contributions of the neighbors, so that the nearer neighbors contribute more to
the average than the more distant ones. For example, a common weighting scheme
consists in giving each neighbor a weight of 1/d, where d is the distance to the
neighbor

Program:

#include <iostream>
#include <vector>
#include <fstream>
#include <string>
#include <sstream>
#include <cmath>
#include <set>
#include <map>
#include <ctime>
#include<mpi.h>
#include<set>

using namespace std;

class Instance{
private:
double R;
double G;
double B;
double isSkin;

public:
Instance(double R, double G, double B, int isSkin){
this->R = R;
this->G = G;
this->B = B;
this->isSkin = isSkin;
}

void setR(double R){


this->R = R;

SIT, LONAVALA 40
LABORATORY PRACTICE-I BE COMPUTER

void setG(double G){


this->G = G;
}

void setB(double B){


this->B = B;
}
double getR(){
return R;
}

double getG(){
return G;
}

double getB(){
return B;
}

int skin(){
return isSkin;
}
//Calculate Euclidean distance
double calculateDistance(double otherR, double otherG, double otherB){
return sqrt((R - otherR) * (R - otherR) + (G - otherG) * (G - otherG) + (B -
otherB) * (B - otherB));

};

class TestInstance{
private:
double R;
double G;
double B;

public:
TestInstance(double R, double G, double B){

SIT, LONAVALA 41
LABORATORY PRACTICE-I BE COMPUTER

this->R = R;
this->G = G;
this->B = B;

void setR(double R){


this->R = R;
}

void setG(double G){


this->G = G;
}

void setB(double B){


this->B = B;
}
double getR(){
return R;
}

double getG(){
return G;
}

double getB(){
return B;
}

};

vector<string> split(string a,char e){


vector<string> rez;
string cur;
for(int ctr1=0;ctr1<a.size();ctr1++){
if(a[ctr1]!=e)
cur.push_back(a[ctr1]);
else
rez.push_back(cur),cur.clear();
}
if(cur!="")
rez.push_back(cur);

SIT, LONAVALA 42
LABORATORY PRACTICE-I BE COMPUTER

return rez;
}

vector<Instance> instances;

int k;

//returns the class value


int returnClassForObject(double r, double g, double b, set<double> distances,
map<double, int> distanceToClass){

for(int i = 0; i < instances.size(); i++){


double distance = instances[i].calculateDistance(r, g, b);
distances.insert(distance);
distanceToClass.insert(std::pair<double, int>(distance,
instances[i].skin()));
}

int countFirstClass = 0;
int countSecondClass = 0;

set<double>::iterator it = distances.begin();

while(countFirstClass != k && countSecondClass != k){


if(distanceToClass.find(*it) -> second == 1){
countFirstClass++;
}
else if(distanceToClass.find(*it) -> second == 2){
countSecondClass++;
}
it++;
}
if(countFirstClass == k){
return 1;
}
else if(countSecondClass == k){
return 2;
}

SIT, LONAVALA 43
LABORATORY PRACTICE-I BE COMPUTER

int main(int argc, char **argv)


{
MPI_Init(NULL, NULL); //initialize MPI execution environment

int world_size;
int rank;

//Determines the size of the group associated with a communicator

MPI_Comm_size(MPI_COMM_WORLD, &world_size); //

//determines the rank of calling process in the communicator


MPI_Comm_rank(MPI_COMM_WORLD, &rank);

//world_size = number of instances (e.g. 20 in this example)


/*Multiplied by 3 because training dataset contains 3 column data + 1 column
class label*/
//MPI_Request and MPI_Status are the data types
MPI_Request requests[(world_size - 1) * 3];
MPI_Status statuses[(world_size - 1) * 3];

string line;
ifstream myfile("training.txt");

//init
if (myfile.is_open())
{
while (getline(myfile,line))
{
vector<string> parts = split(line, ' ');

//string to double conversion, supported by C++11

Instance instance(std::stod(parts[0]), std::stod(parts[1]),


std::stod(parts[2]), std::stod(parts[3]));

//push back constructed object


instances.push_back(instance);
}
myfile.close();
}

SIT, LONAVALA 44
LABORATORY PRACTICE-I BE COMPUTER

/*The training set: https://archive.ics.uci.edu/ml/datasets/Skin+Segmentation#


This dataset is of the dimension 245057 * 4 where first three columns are B,G,R
(x1,x2, and x3 features) values and fourth column is of the class labels (decision
variable y)*/

//find min and max


double minR = instances[0].getR();
double maxR = instances[0].getR();

double minG = instances[0].getG();


double maxG = instances[0].getG();

double minB = instances[0].getB();


double maxB = instances[0].getB();

for(int i = 0; i < instances.size(); i++){


if(instances[i].getR() > maxR){
maxR = instances[i].getR();
}
else if(instances[i].getR() < minR){
minR = instances[i].getR();
}

if(instances[i].getG() > maxG){


maxG = instances[i].getG();
}
else if(instances[i].getG() < minG){
minG = instances[i].getG();
}

if(instances[i].getB() > maxB){


maxB = instances[i].getB();
}
else if(instances[i].getB() < minB){
minB = instances[i].getB();
}
}

//standardization or normalization of training dataset


for(int i = 0; i < instances.size(); i++){

SIT, LONAVALA 45
LABORATORY PRACTICE-I BE COMPUTER

double curr = instances[i].getR();


double res = (curr - minR) / (maxR - minR);
instances[i].setR(res);

curr = instances[i].getG();
res = (curr - minG) / (maxG - minG);
instances[i].setG(res);

curr = instances[i].getB();
res = (curr - minB) / (maxB - minB);
instances[i].setB(res);

//setting k = sqrt(number of training instances)


k = sqrt(instances.size());

ifstream new_file("test.txt");
string new_line;

vector<TestInstance>test_instances;

double start, end;

//if Process 0
if(rank == 0) {
if (new_file.is_open())
{
while (getline(new_file,new_line))
{

vector<string> parts = split(new_line, ' ');

double r = std::stod(parts[0]);
double g = std::stod(parts[1]);
double b = std::stod(parts[2]);

r = (r - minR) / (maxR - minR);


g = (g - minG) / (maxG - minG);

SIT, LONAVALA 46
LABORATORY PRACTICE-I BE COMPUTER

b = (b - minB) / (maxB - minB);

TestInstance new_instance(r, g, b);


test_instances.push_back(new_instance);

}
//Get current system time
start = MPI_Wtime();

int index = 1;
for(int i = 1; i < test_instances.size(); i++){
double r = test_instances[i].getR();
MPI_Isend(&r, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, requests +
index);
index ++;
double g = test_instances[i].getG();
MPI_Isend(&g, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, requests
+ index);
index ++;
double b = test_instances[i].getB();
MPI_Isend(&b, 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD, requests
+ index);
index ++;
}

double r = test_instances[0].getR();
double g = test_instances[0].getG();
double b = test_instances[0].getB();
map<double, int> distanceToClass;
set<double> distances;
int class_predicted = returnClassForObject(r, g, b, distances,
distanceToClass);
printf("Class for %d object is: %d\n", rank + 1, class_predicted);
}
else{
double r;
double g;
double b;
MPI_Irecv(&r, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, requests +
rank + 1);

SIT, LONAVALA 47
LABORATORY PRACTICE-I BE COMPUTER

MPI_Irecv(&g, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, requests +


rank + 2);
MPI_Irecv(&b, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD, requests +
rank + 3);

MPI_Wait(requests + rank + 1, statuses + rank + 1);


MPI_Wait(requests + rank + 2, statuses + rank + 2);
MPI_Wait(requests + rank + 3, statuses + rank + 3);

map<double, int> distanceToClass;


set<double> distances;
int class_predicted = returnClassForObject(r, g, b, distances,
distanceToClass);
printf("Class for %d object is: %d\n", rank + 1, class_predicted);
}
MPI_Barrier(MPI_COMM_WORLD);
if(rank == 0){
end = MPI_Wtime();
printf("Elapsed time: %.2f seconds.\n", (end - start));
}
MPI_Finalize();

/* ---------------------- OUTPUT ------------------------


apr@C04L0818:~/knn$ mpicxx -o knn-mpi-1 knn-mpi-1.cpp -std=c++11
apr@C04L0818:~/knn$ mpirun -np 20 knn-mpi-1
Class for 2 object is: 2
Class for 3 object is: 2
Class for 5 object is: 1
Class for 6 object is: 2
Class for 8 object is: 1
Class for 4 object is: 1
Class for 10 object is: 1
Class for 13 object is: 1
Class for 9 object is: 1
Class for 7 object is: 1
Class for 11 object is: 1
Class for 12 object is: 1
Class for 14 object is: 1
Class for 16 object is: 1

SIT, LONAVALA 48
LABORATORY PRACTICE-I BE COMPUTER

Class for 18 object is: 1


Class for 1 object is: 2
Class for 15 object is: 2
Class for 17 object is: 1
Class for 20 object is: 2
Class for 19 object is: 1
Elapsed time: 0.00 seconds.
apr@C04L0818:~/knn$ */

Conclusion:
To Implement Parallel of the K Nearest Neighbors Classifier.

ARTIFICIAL INTELLIGENCE

SIT, LONAVALA 49
LABORATORY PRACTICE-I BE COMPUTER

Assignment No 1

Aim:

Solve 8-puzzle problem using A* algorithm. Assume any initial configuration


and define goal configuration clearly.

Objective:

Student will learn:

i) The Basic Concepts of A Star :Evaluation function, Path Cost


,Heuristic function, Calculation of heuristic function

ii) General structure of eight puzzle problem.

iii) Logic of A star implementation for eight puzzle problem.

Theory:

Introduction:

In computer science, A* (pronounced as "A star") is a computer algorithm that is widely


used in path finding and graph traversal, the process of plotting an efficiently traversable

SIT, LONAVALA 50
LABORATORY PRACTICE-I BE COMPUTER

path between multiple points, called nodes. The A* algorithm combines features of
uniform-cost search and pure heuristic search to efficiently compute optimal solutions.

A* algorithm is a best-first search algorithm in which the cost associated with a node is
f(n) = g(n) + h(n), where g(n) is the cost of the path from the initial state to node n and
h(n) is the heuristic estimate or the cost or a path from node n to a goal.

Thus, f(n) estimates the lowest total cost of any solution path going through node n. At
each point a node with lowest f value is chosen for expansion. Ties among nodes of equal
f value should be broken in favor of nodes with lower h values. The algorithm terminates
when a goal is chosen for expansion.

A* algorithm guides an optimal path to a goal if the heuristic function h(n) is admissible,
meaning it never overestimates actual cost. For example, since airline distance never
overestimates actual highway distance, and manhattan distance never overestimates actual
moves in the gliding tile.
For Puzzle, A* algorithm, using these evaluation functions, can find optimal solutions to
these problems. In addition, A* makes the most efficient use of the given heuristic
function in the following sense: among all shortest-path algorithms using the given
heuristic function h(n). A* algorithm expands the fewest number of nodes.

The main drawback of A* algorithm and indeed of any best-first search is its memory
requirement. Since at least the entire open list must be saved, A* algorithm is severely
space-limited in practice, and is no more practical than best-first search algorithm on
current machines. For example, while it can be run successfully on the eight puzzles, it
exhausts available memory in a matter of minutes on the fifteen puzzles.

A star algorithm is very good search method, but with complexity problems

To implement such a graph-search procedure, we will need to use two lists of node:

1) OPEN: nodes that have been generated and have had the heuristic function applied to
them but which have not yet been examined (i.e., had their successors generated). OPEN
is actually a priority queue in which the elements with the highest priority are those with
the most promising value of the heuristic function.

SIT, LONAVALA 51
LABORATORY PRACTICE-I BE COMPUTER

2) CLOSED: Nodes that have already been examined. We need to keep these nodes in
memory if we want to search a graph rather than a tree, since whether a node is generated,
we need to check whether it has been generated before

A * Algorithm:

1. Put the start nodes on OPEN.

2. If OPEN is empty, exit with failure

3. Remove from OPEN and place on CLOSED a node n having minimum f.

4. If n is a goal node exit successfully with a solution path obtained by tracing


back the pointers from n to s.

5. Otherwise, expand n generating its children and directing pointers from each child
node to n.

 For every child node n’ do


evaluate h(n’) and computef(n’) = g(n’) +h(n’)=
g(n)+c(n,n’)+h(n)

If n’ is already on OPEN or CLOSED compare its new f

with the old f and attach the lowest f to n’.
 
put n’ with its f value in the right order in OPEN

6. Go to step 2.

Example of calculation of heuristic values for 8-puzzle problem:

• h1(n) = number of misplaced tiles

• h2(n) =no. of squares from desired location of each tile

SIT, LONAVALA 52
LABORATORY PRACTICE-I BE COMPUTER

• h1(S) = 8

• h2(S) = 3+1+2+2+2+3+3+2 = 18

Implementation logic for 8-puzzle problem using A* algorithm

f(n)=g(n)+h(n)

Where, f(n) is evaluation function

g(n) is path cost

h(n) is heuristic function

A* is commonly used for the common path finding problem in applications such as
games, but was originally designed as a general graph traversal algorithm.

Program:

Puzzleboard.java

package ai_practical.assno3;

import java.util.Scanner;
import javax.swing.JOptionPane;

public class PuzzelBoard {


private String board[][];
private int blankX,blankY;

public PuzzelBoard()
{
this.board = new String[3][3];
}

public PuzzelBoard(PuzzelBoard b)

SIT, LONAVALA 53
LABORATORY PRACTICE-I BE COMPUTER

{
this.board = b.board;
this.blankX = b.blankX;
this.blankY = b.blankY;
}

public void initBoard()


{
Scanner inp = new Scanner(System.in);
System.out.println("\nEnter one tile as '-' ie. Blank tile\n");
for(int i=0; i<3; i++)
{
for(int j=0; j<3; j++)
{
board[i][j] = JOptionPane.showInputDialog("Enter the value of tile
["+i+"]["+(j)+"] : ");

if(board[i][j].equals("-"))
{
blankX=i;
blankY=j;
}
}
}
}

public String[][] getBoard()


{
return board;
}

public void setBoard(String[][] board)


{
for(int i=0; i<3; i++)
{
for(int j=0; j<3; j++)
{
this.board[i][j] = board[i][j];
}
}
}

SIT, LONAVALA 54
LABORATORY PRACTICE-I BE COMPUTER

public int getBlankX()


{
return blankX;
}

public int getBlankY()


{
return blankY;
}

public void setBlankX(int x)


{
blankX = x;
}

public void setBlankY(int y)


{
blankY = y;
}

public void display()


{
for(int i=0; i<3; i++)
{
for(int j=0; j<3; j++)
{
System.out.print("\t"+board[i][j]);
}
System.out.println();
}
}

public PuzzelBoard nextMove(int gn, PuzzelBoard goal)


{
PuzzelBoard temp = new PuzzelBoard();
PuzzelBoard next = new PuzzelBoard();
int minFn = 999;
System.out.println("\nPossible moves are : ");
if(blankY>0)
{
temp.setBoard(board);
temp.swap(blankX, blankY, blankX, blankY-1);
int fn = (temp.getHn(goal)+gn);

SIT, LONAVALA 55
LABORATORY PRACTICE-I BE COMPUTER

System.out.println("\nFor Fn = "+fn+" : ");


temp.display();
if(fn < minFn)
{
minFn = fn;
next.setBoard(temp.board);
next.setBlankX(blankX);
next.setBlankY(blankY-1);
}

}
if(blankY<2)
{
temp.setBoard(board);
temp.swap(blankX, blankY, blankX, blankY+1);
int fn = (temp.getHn(goal)+gn);
System.out.println("\nFor Fn = "+fn+" : ");
temp.display();
if(fn < minFn)
{
minFn = fn;
next.setBoard(temp.board);
next.setBlankX(blankX);
next.setBlankY(blankY+1);
}

}
if(blankX>0)
{
temp.setBoard(board);
temp.swap(blankX, blankY, blankX-1, blankY);
int fn = (temp.getHn(goal)+gn);
System.out.println("\nFor Fn = "+fn+" : ");
temp.display();
if(fn < minFn)
{
minFn = fn;
next.setBoard(temp.board);
next.setBlankX(blankX-1);
next.setBlankY(blankY);
}

SIT, LONAVALA 56
LABORATORY PRACTICE-I BE COMPUTER

if(blankX<2)
{
temp.setBoard(board);
temp.swap(blankX, blankY, blankX+1, blankY);
int fn = (temp.getHn(goal)+gn);
System.out.println("\nFor Fn = "+fn+" : ");
temp.display();
if(fn < minFn)
{
minFn = fn;
next.setBoard(temp.board);
next.setBlankX(blankX+1);
next.setBlankY(blankY);
}

}
return next;
}

public void swap(int i1, int j1, int i2, int j2)
{
String temp = board[i1][j1];
board[i1][j1] = board[i2][j2];
board[i2][j2] = temp;

public boolean equals(PuzzelBoard b)


{
for(int i=0; i<3; i++)
{
for(int j=0; j<3; j++)
{
if(!this.board[i][j].equals(b.board[i][j]))
{
return false;
}
}

}
return true;

SIT, LONAVALA 57
LABORATORY PRACTICE-I BE COMPUTER

public int getHn(PuzzelBoard goal)


{
int hn = 0;
for(int i=0; i<3; i++)
{
for(int j=0; j<3; j++)
{
if(!this.board[i][j].equals(goal.board[i][j]))
{
hn++;
}
}

}
return hn;
}
}

Output:

run:

Enter start Board :

Enter one tile as '-' ie. Blank tile

The given start board is :


a b c
d - f
g e h

Enter goal Board :

Enter one tile as '-' ie. Blank tile

The given goal board is :


a b c
d e f
g h -

SIT, LONAVALA 58
LABORATORY PRACTICE-I BE COMPUTER

The board is solved as :

Board after 0 moves :


a b c
d - f
g e h

Possible moves are :

For Fn = 5 :
a b c
- d f
g e h

For Fn = 5 :
a b c
d f -
g e h

For Fn = 5 :
a - c
d b f
g e h

For Fn = 3 :
a b c
d e f
g - h

Board after 1 moves :


a b c
d e f
g - h

Possible moves are :

For Fn = 5 :
a b c
d e f
- g h

For Fn = 2 :

SIT, LONAVALA 59
LABORATORY PRACTICE-I BE COMPUTER

a b c
d e f
g h -

For Fn = 5 :
a b c
d - f
g e h

Board after 2 moves :


a b c
d e f
g h -

Goal state achieved.

Conclusion: A star algorithm is implemented for eight puzzle problem

SIT, LONAVALA 60
LABORATORY PRACTICE-I BE COMPUTER

Assignment No 2

Aim:
Implement any one of the following Expert System ,
 Medical Diagnosis of 10 diseases based on adequate symptoms
 Identifying birds of India based on characteristics

Software Requirements:
SWI-Prolog for Windows, Editor.

Theory:

A system that uses human expertise to make complicated decisions. Simulates reasoning
by applying knowledge and interfaces. Uses expert’s knowledge as rules and data within
the system. Models the problem solving ability of a human expert.

Components of an ES:

1. Knowledge Base
i. Represents all the data and information imputed by experts in the field.
ii. Stores the data as a set of rules that the system must follow to
make decisions.
2. Reasoning or Inference Engine
i. Asks the user questions about what they are looking for.
ii. Applies the knowledge and the rules held in the knowledge base.
iii. Appropriately uses this information to arrive at a decision.
3. User Interface
i. Allows the expert system and the user to communicate.
ii. Finds out what it is that the system needs to answer.
iii. Sends the user questions or answers and receives their response.
4. Explanation Facility
i. Explains the systems reasoning and justifies its conclusions.

SIT, LONAVALA 61
LABORATORY PRACTICE-I BE COMPUTER

PROGRAM-

go:-
hypothesis(Disease),
write('It is suggested that the patient has '),
write(Disease),
nl,
undo;
write('Sorry, the system is unable to identify the disease'),nl,undo.

hypothesis(cold) :-
symptom(headache),
symptom(runny_nose),
symptom(sneezing),
symptom(sore_throat),
nl,
write('Advices and Sugestions:'),
nl,
write('1: Tylenol'),
nl,
write('2: Panadol'),
nl,
write('3: Nasal spray'),
nl,
write('Please weare warm cloths because'),
nl,!.

hypothesis(influenza) :-
symptom(sore_throat),
symptom(fever),
symptom(headache),
symptom(chills),
symptom(body_ache),
nl,
write('Advices and Sugestions:'),
nl,
write('1: Tamiflu'),

SIT, LONAVALA 62
LABORATORY PRACTICE-I BE COMPUTER

nl,
write('2: Panadol'),
nl,
write('3: Zanamivir'),
nl,
write('Please take a warm bath and do salt gargling because'),
nl,!.

hypothesis(typhoid) :-
symptom(headache),
symptom(abdominal_pain),
symptom(poor_appetite),
symptom(fever),
nl,
write('Advices and Sugestions:'),
nl,
write('1: Chloramphenicol'),
nl,
write('2: Amoxicillin'),
nl,
write('3: Ciprofloxacin'),
nl,
write('4: Azithromycin'),
nl,
write('Please do complete bed rest and take soft diet because'),
nl,!.

hypothesis(chicken_pox) :-
symptom(rash),
symptom(body_ache),
symptom(fever),
nl,
write('Advices and Sugestions:'),
nl,
write('1: Varicella vaccine'),
nl,
write('2: Immunoglobulin'),
nl,
write('3: Acetomenaphin'),
nl,

SIT, LONAVALA 63
LABORATORY PRACTICE-I BE COMPUTER

write('4: Acyclovir'),
nl,
write('Please do have oatmeal bath and stay at home because'),
nl,!.

hypothesis(measles) :-
symptom(fever),
symptom(runny_nose),
symptom(rash),
symptom(conjunctivitis),
nl,
write('Advices and Sugestions:'),
nl,
write('1: Tylenol'),
nl,
write('2: Aleve'),
nl,
write('3: Advil'),
nl,
write('4: Vitamin A'),
nl,
write('Please get rest and use more liquid because'),
nl,!.

hypothesis(malaria) :-
symptom(fever),
symptom(sweating),
symptom(headache),
symptom(nausea),
symptom(vomiting),
symptom(diarrhea),
nl,
write('Advices and Sugestions:'),
nl,
write('1: Aralen'),
nl,
write('2: Qualaquin'),
nl,
write('3: Plaquenil'),
nl,

SIT, LONAVALA 64
LABORATORY PRACTICE-I BE COMPUTER

write('4: Mefloquine'),
nl,
write('Please do not sleep in open air and cover your full skin because'),
nl,!.

ask(Question) :-
write('Does the patient has the symptom '),
write(Question),
write('? : '),
read(Response),
nl,
( (Response == yes ; Response == y)
->
assert(yes(Question)) ;
assert(no(Question)), fail).
:- dynamic yes/1,no/1.

symptom(S) :-
(yes(S)
->
true ;
(no(S)
->
fail ;
ask(S))).

undo :- retract(yes(_)),fail.
undo :- retract(no(_)),fail.
undo.

OUTPUT-

/*
SIT@SIT-ThinkCentre-E73:~$ swipl -s medicalExpert.pl
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free
software.
Please run ?- license. for legal details.

For online help and background, visit http://www.swi-prolog.org


For built-in help, use ?- help(Topic). or ?- apropos(Word).

SIT, LONAVALA 65
LABORATORY PRACTICE-I BE COMPUTER

?- go.

|: go.
Does the patient has the symptom headache? :
|: yes.
Does the patient has the symptom sore_throat? :
|: n.
Does the patient has the symptom fever? :
|: y.
Does the patient has the symptom rash? :
|: y.
Does the patient has the symptom body_ache? :
Sorry, the system is unable to identify the disease
true.

?- go.
go.
Does the patient has the symptom headache? :
|: yes.
Does the patient has the symptom sore_throat? :
|: yes.
Does the patient has the symptom fever? :
|: yes.
Does the patient has the symptom rash? :
|: yes.
Does the patient has the symptom body_ache? :

Advices and Sugestions:


1: Varicella vaccine
2: Immunoglobulin
3: Acetomenaphin
4: Acyclovir
Please do have oatmeal bath and stay at home because
It is suggested that the patient has chicken_pox
true .

?- go.
go.
Does the patient has the symptom headache? :
|: yes.
Does the patient has the symptom sore_throat? :

SIT, LONAVALA 66
LABORATORY PRACTICE-I BE COMPUTER

|: no.
Does the patient has the symptom fever? :
|: no.
Does the patient has the symptom rash? :
Sorry, the system is unable to identify the disease
true.

?- go.
go
|:
|: go.
Does the patient has the symptom headache? :
ERROR: Stream user_input:56:0 Syntax error: Operator expected
Exception: (9) hypothesis(_2070) ? creep
?- go.

|: go.
Does the patient has the symptom headache? :
|: n.
Does the patient has the symptom sore_throat? :
|: yes.
Does the patient has the symptom rash? :
|: yes.
Does the patient has the symptom body_ache? :
|: yes.
Does the patient has the symptom fever? :

Advices and Sugestions:


1: Varicella vaccine
2: Immunoglobulin
3: Acetomenaphin
4: Acyclovir
Please do have oatmeal bath and stay at home because
It is suggested that the patient has chicken_pox
true .

?- go.

|: go.
Does the patient has the symptom headache? :
|: y.
Does the patient has the symptom sore_throat? :

SIT, LONAVALA 67
LABORATORY PRACTICE-I BE COMPUTER

|: n.
Does the patient has the symptom fever? :
|: y.
Does the patient has the symptom rash? :
|: n.
Does the patient has the symptom body_ache? :
Sorry, the system is unable to identify the disease
true.

?- go.

|: go.
Does the patient has the symptom headache? :
|: n.
Does the patient has the symptom sore_throat? :
|: n.
Does the patient has the symptom rash? :
|: y.
Does the patient has the symptom fever? :
|: y.
Does the patient has the symptom runny_nose? :
|: y.
Does the patient has the symptom sweating? :
Sorry, the system is unable to identify the disease
true.

?-
[1]+ Stopped swipl -s medicalExpert.pl
SIT@SIT-ThinkCentre-E73:~$ swipl -s medicalExpert.pl
Welcome to SWI-Prolog (threaded, 64 bits, version 7.6.4)
SWI-Prolog comes with ABSOLUTELY NO WARRANTY. This is free
software.
Please run ?- license. for legal details.

For online help and background, visit http://www.swi-prolog.org


For built-in help, use ?- help(Topic). or ?- apropos(Word).

?- go.
go.
Does the patient has the symptom headache? :
|: n.
Does the patient has the symptom sore_throat? :

SIT, LONAVALA 68
LABORATORY PRACTICE-I BE COMPUTER

|: n.
Does the patient has the symptom rash? :
|: y.
Does the patient has the symptom fever? :
|: y.
Does the patient has the symptom runny_nose? :
|: y.
Does the patient has the symptom sweating? :
Sorry, the system is unable to identify the disease
true.

?-
| go.

|: go.
Does the patient has the symptom headache? :
|: y
|: y.
Does the patient has the symptom sore_throat? :
ERROR: Stream user_input:28:2 Syntax error: Operator expected
Exception: (9) hypothesis(_2070) ? creep
?- go.

|: go.
Does the patient has the symptom sore_throat? :
|: y.
Does the patient has the symptom rash? :
|: n.
Does the patient has the symptom body_ache? :
|: y.
Does the patient has the symptom fever? :
|: y.
Does the patient has the symptom runny_nose? :
|: y.
Does the patient has the symptom conjunctivitis? :

Advices and Sugestions:


1: Tylenol
2: Aleve
3: Advil
4: Vitamin A
Please get rest and use more liquid because

SIT, LONAVALA 69
LABORATORY PRACTICE-I BE COMPUTER

It is suggested that the patient has measles


true .

?-
*/
Conclusion:
To implement any one of the following Expert System , Medical
Diagnosis of 10 diseases based on adequate symptoms and Identifying birds of
India based on characteristics

Assignment No 3

Aim:
Use Heuristic Search Techniques to Implement Best first search (Best-
Solution but not always optimal) and A* algorithm (Always gives optimal
solution).

Program:

BFS:

DistanceComprater.java
packagebfs;

importjava.util.Comparator;

public class DistanceComparator implements Comparator<Node> //


Comparator for priority queue based on Node distance
{

@Override
public int compare(Node o1, Node o2) {
if(o1.getDistance() > o2.getDistance())
return 1;
else if(o1.getDistance() < o2.getDistance())
return -1;
return 0;
}

SIT, LONAVALA 70
LABORATORY PRACTICE-I BE COMPUTER

Graph.java

packagebfs;

import java.util.ArrayList;
import java.util.Scanner;
import javax.swing.JOptionPane;

public class Graph {

ArrayList<HeadNode>headNodesList; //arraylist to
hold headNodes
int n;

public Graph(int size) //constructor to


initialize the Graph
{
this.n = size;
headNodesList = new ArrayList<>();
}

public void initGraph() //method to accept


graph nodes
{
Scanner sc = new Scanner(System.in);
for(int i=0;i<n;i++)
{
//System.out.println("Enter the name of node" +(i+1)+" : ");
HeadNode hn = new HeadNode();
hn.setName(JOptionPane.showInputDialog("Enter the name of node" +(i+1)+" :
"));
headNodesList.add(hn); //add the nodes to
headNodeList

}
for(int i=0;i<n;i++)
{
HeadNodetempHeadNode = headNodesList.get(i);

while(true) //adjacent nodes input and


their distances

SIT, LONAVALA 71
LABORATORY PRACTICE-I BE COMPUTER

{
String name = tempHeadNode.getName();
// sc.skip("\n");
String ans = JOptionPane.showInputDialog("\nDo you want to add
any adjacent node to node "+ name + "? (y/n) : ");
if(ans.equals("n") || ans.equals("N"))
break;
// sc.skip("\n");
String tempName=JOptionPane.showInputDialog("Enter the name of
adjacent node of "+ name + " : ");
//sc.skip("\n");
inttempDistance=Integer.parseInt(JOptionPane.showInputDialog("Enter
distance between nodes " + name + " and " + tempName+ " :"));

tempHeadNode.setNodeInfo(tempName,tempDistance);
headNodesList.set(i, tempHeadNode);

}
}
}

public void displayGraph() //method to display


graph in form of adjacency list
{
for(int i=0;i<n;i++)
{
HeadNodetempHeadNode = headNodesList.get(i);
System.out.print("\n"+ tempHeadNode.getName() + " : ");
tempHeadNode.displayNodeList();
}
}

public int getIndex(String name) //method to get


index by passing name of node
{
for(int i=0;i<n;i++)
{
HeadNodetempHeadNode = headNodesList.get(i);
if(tempHeadNode.getName().equals(name))
return i;
}
return -1; //if node not found return -1

SIT, LONAVALA 72
LABORATORY PRACTICE-I BE COMPUTER

publicArrayListgetNeighbours(String node) //method to


return neighbours of selected node
{
intheadIndex=getIndex(node);
returnheadNodesList.get(headIndex).getNodeList();
}
}

HeadNode.java

packagebfs;

import java.util.ArrayList;
import java.util.Iterator;

public class HeadNode // Head node in adjacency list of graph


{
private String name; // node name
privateArrayList<Node>adjnodes = new ArrayList<>(); // List of adjacent
nodes

public void setName(String name) {


this.name = name;
}

public String getName() {


return name;
}

public void setNodeInfo(String name,int distance)// Add adjacent node


{
Node n = new Node(name,distance);
adjnodes.add(n);
}

publicArrayListgetNodeList()
{
returnadjnodes;

SIT, LONAVALA 73
LABORATORY PRACTICE-I BE COMPUTER

public void displayNodeList() // Display adjacent nodes list (name,distance)


{
Iterator i = adjnodes.iterator();
if(i.hasNext())
{
Node temp= (Node)i.next();
System.out.print("("+temp.getName()+","+temp.getDistance()+")");

}
while(i.hasNext())
{
Node temp= (Node)i.next();
System.out.print(", ("+temp.getName()+","+temp.getDistance()+")");
}
}

Node.java

packagebfs;

public class Node // Class for adjacent nodes to headnode


{
String name; // node name
int distance; // distance between this node and headnode

// constructor and getters, setters for data members

public Node(String name, int dist)


{
this.name = name;
this.distance = dist;
}

public int getDistance() {


return distance;
}

SIT, LONAVALA 74
LABORATORY PRACTICE-I BE COMPUTER

public String getName() {


return name;
}

public void setDistance(int distance) {


this.distance = distance;
}

public void setName(String name) {


this.name = name;
}

BFS.java

packagebfs;

import java.util.ArrayList;
importjava.util.PriorityQueue;
import java.util.Scanner;
import javax.swing.JOptionPane;

public class BFS // Class for BFS Algorithm


{

/**
* @param args the command line arguments
*/
public static void main(String[] args)
{
int n;
n=Integer.parseInt(JOptionPane.showInputDialog("Enter No of nodes"));
// Enter no. of rows
PriorityQueue<Node>pq = new PriorityQueue<>(new DistanceComparator());
// Initilize priority queue
ArrayList<Boolean> visited = new ArrayList<>(n);
ArrayList<String> parent = new ArrayList<>(n); // Store
parent node
for(int i=0;i<n;i++)
{

SIT, LONAVALA 75
LABORATORY PRACTICE-I BE COMPUTER

visited.add(false); // Set visited list for all


nodes false
parent.add("NIL"); // Set parent of all
nodes NIL
}

Graph graph = new Graph(n); // Create


graph instance
graph.initGraph(); // Initialize graph
graph.displayGraph(); // Display graph as
adjacency list

String start, goal; // Accept start


and goal nodes
start = JOptionPane.showInputDialog("Enter the name of start node : ");

goal = JOptionPane.showInputDialog("Enter the name of goal node : ");

pq.add(new Node(start,0)); // Add start node


to priority queue with distance 0
visited.set(graph.getIndex(start), true); // Set visited
true
parent.set(graph.getIndex(start), "NIL"); // Set parent
of start NIL
System.out.println("\n\nPriority queue contents : \n");
displayQueue(pq);

while(!pq.isEmpty()) // Process untill


queue is not empty
{
Node temp = pq.poll(); // Remove
node with minimum distance
displayQueue(pq);
if(temp.getName().equals(goal)) // Check if goal
node is found
{
//JOptionPane.showMessageDialog(,"\nGoal node found");
System.out.println("\nGoal node '"+temp.getName() + "' found");
break;
}
else

SIT, LONAVALA 76
LABORATORY PRACTICE-I BE COMPUTER

{
ArrayList<Node>neighbours = graph.getNeighbours(temp.getName()); //
Get the neighbours of the retrieved node that are not visited
for(Node n1:neighbours) // For all adjacent
nodes
{
if(!visited.get(graph.getIndex(n1.getName())))
{
visited.set(graph.getIndex(n1.getName()), Boolean.TRUE); // Mark
visited if not marked
pq.add(n1); // Add them to queue
parent.set(graph.getIndex(n1.getName()), temp.getName()); // Set parent
of neighbour node
}
}
displayQueue(pq); // Display the Queue
}

}
tracePath(parent,graph,goal);
}

private static void displayQueue(PriorityQueue<Node>pq) //


Fuction to display queue
{

for(Node n:pq)
{
System.out.print(n.getName()+"\t");
}

System.out.println("");
}

private static void tracePath(ArrayList<String> parent, Graph graph, String


goal) // Function to trace the path
{
System.out.println("\n\nPath : ");
String path = goal;
String temp = goal;

SIT, LONAVALA 77
LABORATORY PRACTICE-I BE COMPUTER

while(!parent.get(graph.getIndex(temp)).equals("NIL")) //
Continue path till parent is not NIL
{
temp = parent.get(graph.getIndex(temp));
path = temp + ", " + path;
}

System.out.println(path);
}
}

/*

OUTPUT :

run:

A : (B,3), (C,1)
B : (D,3), (E,2)
C:
D:
E:

Priority queue contents :

C B
B
B

E D
D
D

Goal node 'D' found

Path :
A, B, D
BUILD SUCCESSFUL (total time: 1 minute 8 seconds)

SIT, LONAVALA 78
LABORATORY PRACTICE-I BE COMPUTER

*/

A* Algorithm:

FixComprator.java
packageastargraph;

importjava.util.Comparator;

public class FxComparator implements Comparator<HeadNode> //


Comparator for priority queue based on fx value
{

@Override
public int compare(HeadNode o1, HeadNode o2) {
if(o1.getFx()> o2.getFx())
return 1;
else if(o1.getFx() < o2.getFx())
return -1;
return 0;
}

Graph.java
packageastargraph;

import java.util.ArrayList;
import java.util.Scanner;
import javax.swing.JOptionPane;

public class Graph { // Class for graph

ArrayList<HeadNode>headNodesList;
int n;

SIT, LONAVALA 79
LABORATORY PRACTICE-I BE COMPUTER

public Graph(int size) // Initialize size and head node list


{
this.n = size;
headNodesList = new ArrayList<>();
}

public void initGraph() // Initialize graph nodes and edges


{
Scanner sc = new Scanner(System.in);
for(int i=0;i<n;i++) // Accept node names and their heuristic values
{

HeadNode hn = new HeadNode();


hn.setName(JOptionPane.showInputDialog("Enter the name of node " +
(i+1)+" : "));
hn.setHx(Integer.parseInt(JOptionPane.showInputDialog("Enter the heuristic
value of node " +(i+1)+" : ")));
headNodesList.add(hn);

}
for(int i=0;i<n;i++)
{
HeadNodetempHeadNode = headNodesList.get(i);

while(true) // Accept adjacent nodes and their distances


{
String name = tempHeadNode.getName();
String ans = JOptionPane.showInputDialog("\nDo you want to add
any adjacent node to node "+ name + "? (y/n) : ");
if(ans.equals("n") || ans.equals("N"))
break;
// sc.skip("\n");
String tempName=JOptionPane.showInputDialog("Enter the name of
adjacent node of "+ name + " : ");
//sc.skip("\n");
inttempDistance=Integer.parseInt(JOptionPane.showInputDialog("Enter
distance between nodes " + name + " and " + tempName+ " :"));

tempHeadNode.setNodeInfo(tempName,tempDistance);
headNodesList.set(i, tempHeadNode);

SIT, LONAVALA 80
LABORATORY PRACTICE-I BE COMPUTER

}
}

publicvoid displayGraph() // Display graph adjacency list


{
for(int i=0;i<n;i++)
{
HeadNodetempHeadNode = headNodesList.get(i);
System.out.print("\n"+ tempHeadNode.getName() + " (hx =
"+tempHeadNode.getHx()+") : ");
tempHeadNode.displayNodeList();
}
System.out.println("");
}

public int getIndex(String name) // Get index for given name


{
for(int i=0;i<n;i++)
{
HeadNodetempHeadNode = headNodesList.get(i);
if(tempHeadNode.getName().equals(name))
return i;
}
return -1;
}

publicArrayListgetNeighbours(String node) // Get neighbour nodes list


{
intheadIndex=getIndex(node);
returnheadNodesList.get(headIndex).getNodeList();
}

public void setGx(String name,intgx) // Set gx for a node and update adjacency
list
{
int index = getIndex(name);
HeadNode node = headNodesList.get(index);
node.setGx(gx);
headNodesList.set(index, node);
}

publicHeadNodegetHeadNode (String name){ // Get Head node by name

SIT, LONAVALA 81
LABORATORY PRACTICE-I BE COMPUTER

returnheadNodesList.get(getIndex(name));
}

public void setFx(Node neighbour,HeadNodecurr) // Set fx for neighbour via


current node
{
inttempGx = curr.getGx() + neighbour.getDistance(); // Get distance from
source to neighbour via current node
HeadNodeadj = getHeadNode(neighbour.getName()); // Get adjacent head node
if(tempGx>= adj.getGx()) // Check if calculated distance is less than previous
distance
return;

adj.setGx(tempGx); // Set gx as calculated distance


headNodesList.set(getIndex(adj.getName()), adj);// Update headnode list
}
}

HeadNode.java
packageastargraph;

import java.util.ArrayList;
import java.util.Iterator;

public class HeadNode// Adjacency list head node


{
private String name; // node name
private int gx; // gx value
private int hx; // heuristic value hx
private int fx; // fx = gx+hx value
privateArrayList<Node>adjnodes = new ArrayList<>(); // Adjacent nodes
list

publicHeadNode() // Initialize gx,hx and fx to infinity


{
gx=hx=999;
fx = gx+hx;
}

public int getGx() {

SIT, LONAVALA 82
LABORATORY PRACTICE-I BE COMPUTER

returngx;
}

public void setGx(int gx) { // Set gx and update fxaccordigly


this.gx = gx;
setFx(this.gx+hx);
System.out.println("\nFx of node "+this.name+" = "+this.fx);
}

public int getHx() {


returnhx;
}

public void setHx(int hx) { // Set hx and update fx accordingly


this.hx = hx;
setFx(this.hx+gx);
}

public int getFx() {


returnfx;
}

public void setFx(int fx) {


this.fx = fx;
}

public void setName(String name) {


this.name = name;
}

public String getName() {


return name;
}

public void setNodeInfo(String name,int distance) // Set adjacent node name


and distance
{
Node n = new Node(name,distance);
adjnodes.add(n); // Add node to list
}

SIT, LONAVALA 83
LABORATORY PRACTICE-I BE COMPUTER

publicArrayListgetNodeList()
{
returnadjnodes;
}

public void displayNodeList() // Display adjacent nodes list (name,distance)


{

Iterator i = adjnodes.iterator();
if(i.hasNext())
{
Node temp= (Node)i.next();
System.out.print("("+temp.getName()+","+temp.getDistance()+")");

}
while(i.hasNext())
{
Node temp= (Node)i.next();
System.out.print(", ("+temp.getName()+","+temp.getDistance()+")");
}
}

Node.java
packageastargraph;

public class Node // Adjacent node name and distance


{
String name;
int distance;

public Node(String name, int dist)


{
this.name = name;
this.distance = dist;
}

public int getDistance() {

SIT, LONAVALA 84
LABORATORY PRACTICE-I BE COMPUTER

return distance;
}

public String getName() {


return name;
}

public void setDistance(int distance) {


this.distance = distance;
}

public void setName(String name) {


this.name = name;
}

AStarGraph.java
packageastargraph;

import java.util.ArrayList;
importjava.util.PriorityQueue;
import javax.swing.JOptionPane;

public class AStarGraph {

/**
* @param args the command line arguments
*/
public static void main(String[] args) {
// TODO code application logic here
int n;
n=Integer.parseInt(JOptionPane.showInputDialog("Enter No of nodes"));
// Enter no. of rows

PriorityQueue<HeadNode> open = new PriorityQueue<>(new


FxComparator()); // Initilize priority queue openlist
ArrayList<HeadNode> closed = new ArrayList<>(n); //
Initialize closed list

SIT, LONAVALA 85
LABORATORY PRACTICE-I BE COMPUTER

ArrayList<String> parent = new ArrayList<>(n); // Store


parent node
for(int i=0;i<n;i++)
{
parent.add("NIL"); // Set parent of all
nodes NIL
}

Graph graph = new Graph(n); // Create


graph instance
graph.initGraph(); // Initialize graph
graph.displayGraph(); // Display graph as
adjacency list

String start, goal; // Accept start


and goal nodes
start = JOptionPane.showInputDialog("Enter the name of start node : ");

goal = JOptionPane.showInputDialog("Enter the name of goal node : ");

graph.setGx(start, 0); // Set gx=0 for start


node
open.add(graph.getHeadNode(start)); // Add start
node to open list
parent.set(graph.getIndex(start), "NIL"); // Set parent
of start NIL

displayQueue(open);
displayClosed(closed);

while(!open.isEmpty()) // Process until


open list is not empty
{
HeadNode temp = open.poll(); // Remove node
with minimum fx from open list
closed.add(temp); // Add it to closed list
displayQueue(open);
displayClosed(closed);
if(temp.getName().equals(goal)) // Check if goal
node is found
{

SIT, LONAVALA 86
LABORATORY PRACTICE-I BE COMPUTER

System.out.println("\nGoal node '"+temp.getName() + "' found");


break;
}
else
{
ArrayList<Node>neighbours = temp.getNodeList(); // Get the
neighbours of the retrieved node
for(Node n1:neighbours) // For all adjacent
nodes
{
if(inClosed(n1.getName(), closed)) // If node in closed
list, process next node
continue;
if(!inOpen(n1.getName(), open)) // Check if not in
open list
{

graph.setFx(n1,temp); // Set fx for neighbour node


via current
open.add(graph.getHeadNode(n1.getName())); // Add it toopen
list
parent.set(graph.getIndex(n1.getName()), temp.getName()); // Set parent
of neighbour node
}
}
displayQueue(open);
}

tracePath(parent, graph, goal);


}

private static void displayQueue(PriorityQueue<HeadNode> open)


// Fuction to display queue open list
{
System.out.print("\nOpen List : ");
if(open.isEmpty())
{

SIT, LONAVALA 87
LABORATORY PRACTICE-I BE COMPUTER

System.out.println("Empty");
return;
}
for(HeadNode n: open)
{
System.out.print(n.getName()+"\t");
}
System.out.println("");
}

private static void displayClosed(ArrayList<HeadNode> closed)


// Fuction to display closed list
{
System.out.print("\nClosed List : ");
if(closed.isEmpty())
{
System.out.println("Empty");
return;
}
for(HeadNode n: closed)
{
System.out.print(n.getName()+"\t");
}
System.out.println("");
}

private static boolean inClosed(String name, ArrayList<HeadNode> closed)


// Check if node in closed list
{
for(HeadNode n: closed)
{
if(n.getName().equals(name))
return true;
}
return false;
}

private static boolean inOpen(String name, PriorityQueue<HeadNode> open)


// Check if node in closed list
{
for(HeadNode n: open)
{

SIT, LONAVALA 88
LABORATORY PRACTICE-I BE COMPUTER

if(n.getName().equals(name))
return true;
}
return false;
}

private static void tracePath(ArrayList<String> parent, Graph graph, String


goal) // Function to trace the path
{
System.out.println("\n\nPath : ");
String path = goal;
String temp = goal;
while(!parent.get(graph.getIndex(temp)).equals("NIL")) //
Continue path till parent is not NIL
{
temp = parent.get(graph.getIndex(temp));
path = temp + ", " + path;
}

System.out.println(path);
}
}

/*
OUTPUT:
run:

A (hx = 6) : (B,1), (C,3)


B (hx = 4) : (D,2)
C (hx = 3) : (D,5)
D (hx = 1) :

Fx of node A = 6

Open List : A

Closed List : Empty

Open List : Empty

Closed List : A

SIT, LONAVALA 89
LABORATORY PRACTICE-I BE COMPUTER

Fx of node B = 5

Fx of node C = 6

Open List : B C

Open List : C

Closed List : A B

Fx of node D = 4

Open List : D C

Open List : C

Closed List : A B D

Path :
A, B, D

Goal node 'D' found


BUILD SUCCESSFUL (total time: 1 minute 37 seconds)

*/

Conclusion:
Thus we have studied to use Heuristic Search Techniques to Implement
Best first search and A* algorithm

SIT, LONAVALA 90
LABORATORY PRACTICE-I BE COMPUTER

Assignment No 4

Aim:
Constraint Satisfaction Problem:
Implement crypt-arithmetic problem or n-queens or graph coloring problem
( Branch and Bound and Backtracking)

Objective:
Student will learn:
1. The basic concept of constraint satisfaction problem and backtracking.
2. General structure of N Queens problem.
Theory:
The N Queen is the problem of placing N chess queens on an N×N chessboard so that no
two queens attack each other. For example, following is a solution for 4 Queen problem.

The expected output is a binary matrix which has 1s for the blocks where queens are
placed.
For example, following is the output matrix for above 4 queen solution.
{ 0, 1, 0, 0} {
0, 0, 0, 1} {
1, 0, 0, 0} {
0, 0, 1, 0}
Generate all possible configurations of queens on board and print a configuration
that satisfies the given constraints.while there are untried conflagrations {

SIT, LONAVALA 91
LABORATORY PRACTICE-I BE COMPUTER

generate the next configuration


if queens don't attack in this configuration then
{
print this configuration;
}
}

Backtracking Algorithm
Backtracking is finding the solution of a problem whereby the solution depends on the
previous steps taken.
In backtracking, we first take a step and then we see if this step taken is correct or not i.e.,
whether it will give a correct answer or not. And if it doesn’t, then we just come back and
change our first step. In general, this is accomplished by recursion. Thus, in backtracking,
we first start with a partial sub-solution of the problem (which may or may not lead us to
the solution) and then check if we can proceed further with this sub-solution or not. If not,
then we just come back and change it.
Thus, the general steps of backtracking are:
• start with a sub-solution
• check if this sub-solution will lead to the solution or not
• If not, then come back and change the sub-solution and continue again
The idea is to place queens one by one in different columns, starting from the leftmost
column. When we place a queen in a column, we check for clashes with already placed
queens. In the current column, if we find a row for which there is no clash, we mark this
row and column as part of the solution. If we do not find such a row due to clashes then
we backtrack and return false.

Algorithm:

1) Start in the leftmost column

2) If all queens are placed return true

3) Try all rows in the current column. Do following for every tried row.

SIT, LONAVALA 92
LABORATORY PRACTICE-I BE COMPUTER

a) If the queen can be placed safely in this row then mark this [row, column]
as part of the solution and recursively check if placing queen here leads
to a solution.

b) If placing the queen in [row, column] leads to a solution then return true.

c) If placing queen doesn't lead to a solution then unmark this [row,


column] (Backtrack) and go to step (a) to try other rows.

3) If all rows have been tried and nothing worked, return false to trigger backtracking.

Program:

package ai_practical.assno12;

public class NQueen {

public static void main(String[] args) {


placeQueens(4);
}

private static void placeQueens(int gridSize) {


if(gridSize<4){
System.out.println("No Solution available");
}else{
int[] board = new int[gridSize];
placeAllQueens(board, 0);
printBoard(board);
}
}

private static boolean placeAllQueens(int[] board, int row) {


if(row == board.length){
return true;
}

boolean isAllQueensPlaced = false;


for (int column = 0; column < board.length; column++) {
board[row] = column;
if(isSafe(board, row)){
isAllQueensPlaced = placeAllQueens(board, row+1);
}

SIT, LONAVALA 93
LABORATORY PRACTICE-I BE COMPUTER

if(isAllQueensPlaced){
return true;
}
}
return false;
}

private static boolean isSafe(int[] board, int row) {


for (int i = 0; i < row; i++) {

if(board[row] == board[i]){
return false;
}

if(Math.abs(board[row] - board[i]) == Math.abs(row-i)){


return false;
}
}

return true;
}

private static void printBoard(int[] board) {


for (int i = 0; i < board.length; i++) {
for (int j = 0; j < board.length; j++) {
if(j==board[i]){
System.out.print("Q ");
}else{
System.out.print("_ ");
}
}
System.out.println();
}
}

/*

run:
_Q__
___Q

SIT, LONAVALA 94
LABORATORY PRACTICE-I BE COMPUTER

Q___
__Q_

Conclusion: N-queens problem is implemented using backtracking.

DATA ANALYTICS

SIT, LONAVALA 95
LABORATORY PRACTICE-I BE COMPUTER

Assignment No 1

Aim:
Download the Iris flower dataset or any other dataset into a DataFrame.
(eg https://archive.ics.uci.edu/ml/datasets/Iris ) Use Python/R and Perform
following –
 How many features are there and what are their types (e.g., numeric,
nominal)?
 Compute and display summary statistics for each feature available in the
dataset. (eg. minimum value, maximum value, mean, range, standard
deviation, variance and percentiles
 Data Visualization-Create a histogram for each feature in the dataset to
illustrate the feature distributions. Plot each histogram.
 Create a boxplot for each feature in the dataset. All of the boxplots should
be combined into a single plot. Compare distributions and identify
outliers.

Theory:
R:
R is a powerful language used widely for data analysis and statistical computing. It
was developed in early 90s. Since then, endless efforts have been made to improve R’s user
interface. The journey of R language from a rudimentary text editor to interactive R Studio
and more recently Jupyter Notebooks has engaged many data science communities across the
world.

This was possible only because of generous contributions by R users globally. Inclusion of
powerful packages in R has made it more and more powerful with time. Packages such as
dplyr, tidyr, readr, data.table, SparkR, ggplot2 have made data manipulation, visualization
and computation much faster.

Advantages of R:

1. The style of coding is quite easy.


2. It’s open source. No need to pay any subscription charges.

SIT, LONAVALA 96
LABORATORY PRACTICE-I BE COMPUTER

3. Availability of instant access to over 7800 packages customized for various


computation tasks.
4. The community support is overwhelming. There are numerous forums to help you
out.
5. Get high performance computing experience ( require packages)
6. One of highly sought skill by analytics and data science companies.

The interface of R Studio:

1. R Console: This area shows the output of code you run. Also, you can directly write
codes in console. Code entered directly in R console cannot be traced later. This is
where R script comes to use.
2. R Script: As the name suggest, here you get space to write codes. To run those codes,
simply select the line(s) of code and press Ctrl + Enter. Alternatively, you can click
on little ‘Run’ button location at top right corner of R Script.
3. R environment: This space displays the set of external elements added. This includes
data set, variables, vectors, functions etc. To check if data has been loaded properly in
R, always look at this area.
4. Graphical Output: This space display the graphs created during exploratory data
analysis. Not just graphs, you could select packages, seek help with embedded R’s
official documentation.

SIT, LONAVALA 97
LABORATORY PRACTICE-I BE COMPUTER

Train Data: The predictive model is always built on train data set. An intuitive way to
identify the train data is, that it always has the ‘response variable’ included.

Test Data: Once the model is built, it’s accuracy is ‘tested’ on test data. This data always
contains less number of observations than train data set. Also, it does not include ‘response
variable’.

Data Sets Loads specified data sets, or list the available data sets.

Names The Names Of An Object

Functions to get or set the names of an object.

DIM Dimensions Of An Object

Retrieve or set the dimension of an object.

VIEW Invoke A Data Viewer

Invoke a spreadsheet-style data viewer on a matrix-like R object.

Standard deviation and Variance:

The standard deviation of an observation variable is the square root of its variance.


The variance is a numerical measure of how the data values is dispersed around the mean. 

Percentile:

The nth percentile of an observation variable is the value that cuts off the first n percent of the
data values when it is sorted in ascending order.

Histogram:

A histogram represents the frequencies of values of a variable bucketed into ranges.


Histogram is similar to bar chat but the difference is it groups the values into continuous
ranges. Each bar in histogram represents the height of the number of values present in that
range.

R creates histogram using hist() function. This function takes a vector as an input and uses
some more parameters to plot histograms.

Syntax
The basic syntax for creating a histogram using R is –
hist(v,main,xlab,xlim,ylim,breaks,col,border)
Following is the description of the parameters used −
 v is a vector containing numeric values used in histogram.
 main indicates title of the chart.
 col is used to set color of the bars.

SIT, LONAVALA 98
LABORATORY PRACTICE-I BE COMPUTER

 border is used to set border color of each bar.


 xlab is used to give description of x-axis.
 xlim is used to specify the range of values on the x-axis.
 ylim is used to specify the range of values on the y-axis.
 breaks is used to mention the width of each bar.

Summary

A very useful multipurpose function in R is summary(X), where X can be one of any number
of objects, including datasets, variables, and linear models, just to name a few

 Response Variable (a.k.a Dependent Variable): In a data set, the response variable (y) is
one on which we make predictions. In this case, we’ll predict ‘Item_Outlet_Sales’.

Predictor Variable (a.k.a Independent Variable): In a data set, predictor variables (Xi) are
those using which the prediction is made on response variable.

Boxpots
Boxplots are great for comparing a groups of data. Let’s compare the sepal widths to the
species. The key is that the first variable is an ordered vector of quantitative
data Sepal.Width and the second variable is a vector of categorical data Species. We model
the relationship as Sepal.Width~Speciesmeaning that the Sepal.Width depends on the type
of Species.

SIT, LONAVALA 99
LABORATORY PRACTICE-I BE COMPUTER

Program:

library(datasets)
data("iris")

names(iris)
dim(iris)
#view a dataset
View(iris)
#internal structur
min(iris$Sepal.Length)
max(iris$Sepal.Length)
mean(iris$Sepal.Length)
range(iris$Sepal.Length)
#standard deviation
sd(iris$Sepal.Length)
#variance
var(iris$Sepal.Length)
#percentile
quantile(iris$Sepal.Length)
#to display specific value
quantile(iris$Sepal.Length,c(0.3,0.6))
#histo
h <- hist(iris$Sepal.Length,main="sepal length frequencies-
histogram",xlab="sepal length",xlim=c(3.5,8.5),col="blue")
h
#using breaks
h <- hist(iris$Sepal.Length,main="sepal length frequencies-
histogram",xlab="sepal
length",xlim=c(3.5,8.5),col="blue",labels=TRUE,breaks=3,border="green",las=
2)

SIT, LONAVALA 100


LABORATORY PRACTICE-I BE COMPUTER

h <- hist(iris$Sepal.Length,main="sepal length frequencies-


histogram",xlab="sepal
length",xlim=c(3.5,8.5),col="red",labels=TRUE,breaks=3,border="green",las=3
)
H<-
hist(iris$Sepal.Length,breaks=c(4.3,4.6,4.9,5.2,5.5,5.8,6.1,6.4,6.7,7.0,7.3,7.6,7.
9))
boxplot(iris$Sepal.Length)
summary(iris$Sepal.Length)
myboxplot<-boxplot(iris[,-5])
#outliers
myboxplot$out

Output:

> library(datasets)
> data("iris")
> names(iris)
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
> dim(iris)
[1] 150 5
> View(iris)

SIT, LONAVALA 101


LABORATORY PRACTICE-I BE COMPUTER

> min(iris$Sepal.Length)
[1] 4.3
> max(iris$Sepal.Length)
[1] 7.9
> mean(iris$Sepal.Length)
[1] 5.843333
> range(iris$Sepal.Length)
[1] 4.3 7.9
> sd(iris$Sepal.Length)
[1] 0.8280661
> var(iris$Sepal.Length)
[1] 0.6856935
> quantile(iris$Sepal.Length)
0% 25% 50% 75% 100%
4.3 5.1 5.8 6.4 7.9
> quantile(iris$Sepal.Length,c(0.3,0.6))
30% 60%
5.27 6.10
> h <- hist(iris$Sepal.Length,main="sepal length frequencies-
histogram",xlab="sepal length",xlim=c(3.5,8.5),col="blue")
>h
$breaks
[1] 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

$counts
[1] 5 27 27 30 31 18 6 6

$density
[1] 0.06666667 0.36000000 0.36000000 0.40000000 0.41333333 0.24000000
0.08000000 0.08000000

$mids
[1] 4.25 4.75 5.25 5.75 6.25 6.75 7.25 7.75

$xname
[1] "iris$Sepal.Length"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"

SIT, LONAVALA 102


LABORATORY PRACTICE-I BE COMPUTER

> h <- hist(iris$Sepal.Length,main="sepal length frequencies-


histogram",xlab="sepal
length",xlim=c(3.5,8.5),col="blue",labels=TRUE,breaks=3,border="green",las=
2)

> h <- hist(iris$Sepal.Length,main="sepal length frequencies-


histogram",xlab="sepal

SIT, LONAVALA 103


LABORATORY PRACTICE-I BE COMPUTER

length",xlim=c(3.5,8.5),col="red",labels=TRUE,breaks=3,border="green",las=3
)

> H<-
hist(iris$Sepal.Length,breaks=c(4.3,4.6,4.9,5.2,5.5,5.8,6.1,6.4,6.7,7.0,7.3,7.6,7.
9))

> boxplot(iris$Sepal.Length)

SIT, LONAVALA 104


LABORATORY PRACTICE-I BE COMPUTER

> summary(iris$Sepal.Length)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.300 5.100 5.800 5.843 6.400 7.900
> myboxplot<-boxplot(iris[,-5])

> myboxplot$out
[1] 4.4 4.1 4.2 2.0

SIT, LONAVALA 105


LABORATORY PRACTICE-I BE COMPUTER

Assignment No 2

Aim:
Download Pima Indians Diabetes dataset. Use Naive Bayes‟ Algorithm
for classification
 Load the data from CSV file and split it into training and test datasets.
 Summarize the properties in the training dataset so that we can calculate
probabilities and make predictions.
 Classify samples from a test dataset and a summarized training dataset.

Problem Statement:
Use of Naive Bayes‟ Algorithm for classification Load the data from CSV file and
split it into training and test datasets. Summarize the properties in the training dataset so that
we can calculate probabilities and make predictions. And Classify samples from a test dataset
and a summarized training dataset.

Objective:
 Load the data from CSV file and split it into training and test datasets.
 Summarize the properties in the training dataset so that we can calculate probabilities
and make predictions.
 Classify samples from a test dataset and a summarized training dataset.

Theory:
R:
R is a powerful language used widely for data analysis and statistical computing. It
was developed in early 90s. Since then, endless efforts have been made to improve R’s user
interface. The journey of R language from a rudimentary text editor to interactive R Studio
and more recently Jupyter Notebooks has engaged many data science communities across the
world.

This was possible only because of generous contributions by R users globally. Inclusion of
powerful packages in R has made it more and more powerful with time. Packages such as
dplyr, tidyr, readr, data.table, SparkR, ggplot2 have made data manipulation, visualization
and computation much faster.

Advantages of R:

7. The style of coding is quite easy.


8. It’s open source. No need to pay any subscription charges.
9. Availability of instant access to over 7800 packages customized for various
computation tasks.
10. The community support is overwhelming. There are numerous forums to help you
out.
11. Get high performance computing experience ( require packages)
12. One of highly sought skill by analytics and data science companies.

SIT, LONAVALA 106


LABORATORY PRACTICE-I BE COMPUTER

The interface of R Studio:

5. R Console: This area shows the output of code you run. Also, you can directly write
codes in console. Code entered directly in R console cannot be traced later. This is
where R script comes to use.
6. R Script: As the name suggest, here you get space to write codes. To run those codes,
simply select the line(s) of code and press Ctrl + Enter. Alternatively, you can click
on little ‘Run’ button location at top right corner of R Script.
7. R environment: This space displays the set of external elements added. This includes
data set, variables, vectors, functions etc. To check if data has been loaded properly in
R, always look at this area.
8. Graphical Output: This space display the graphs created during exploratory data
analysis. Not just graphs, you could select packages, seek help with embedded R’s
official documentation.

Library:

1. caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc

Contains several basic utility functions including: moving (rolling, running) window statistic
functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost
classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.

2. e1071: Misc Functions of the Department of Statistics, Probability Theory Group


(Formerly: E1071), TU Wien

SIT, LONAVALA 107


LABORATORY PRACTICE-I BE COMPUTER

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support
vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ...

CSV(comma-separated values )

In computing, a comma-separated values (CSV) file is a delimited text file that uses a


comma to separate values. A CSV file stores tabular data (numbers and text) in plain text.
Each line of the file is a data record. Each record consists of one or more fields, separated
by commas. The use of the comma as a field separator is the source of the name for this file
format.

How to read csv:

mydata<-read.csv(file="F:/SEM7/DA/KomalN/Ass2/diabetes.csv",header=TRUE,sep=",")

The above reads the file diabetes.csv into a data frame that it creates called mydata.
header=TRUE specifies that this data includes a header row and sep=”,” specifies that the
data is separated by commas (though read.csv implies the same I think it’s safer to be
explicit).

Sample.split: Split Data Into Test And Train Set

temp_field <- sample.split(mydata,SplitRatio=0.7)

Split data from vector Y into two sets in predefined ratio while preserving relative ratios of
different labels in Y. Used to split the data used during classification into train and test
subsets.

SplitRatio
Splitting ratio:

I. if (0<=splitratio<1)< code=""> then SplitRatio fraction of points from Y will be et


toTRUE
II. if (SplitRatio==1) then one random point from Y will be set to TRUE
III. if (SplitRatio>1) then SplitRatio number of points from Y will be set to TRUE

Train Data: The predictive model is always built on train data set. An intuitive way to
identify the train data is, that it always has the ‘response variable’ included.

train <- subset(mydata, temp_field==TRUE)

Test Data: Once the model is built, it’s accuracy is ‘tested’ on test data. This data always
contains less number of observations than train data set. Also, it does not include ‘response
variable’.

test <- subset(mydata, temp_field==FALSE)

SIT, LONAVALA 108


LABORATORY PRACTICE-I BE COMPUTER

Naive Bayes:
Naïve Bayes classification is a kind of simple probabilistic classification methods
based on Bayes’ theorem with the assumption of independence between features. The model
is trained on training dataset to make predictions by predict() function. This article introduces
two functions naiveBayes() and train() for the performance of Naïve Bayes classification.

Predict:
The predict() function to make predictions from that model on new data. The new dataset
must have all of the columns from the training data, but they can be in a different order with
different values.

Table: Cross Tabulation And Table Creation

table(pred1,test$Outcome,dnn = c("predicted","Actual"))
table uses the cross-classifying factors to build a contingency table of the counts at each
combination of factor levels.
Dnn: the names to be given to the dimensions in the result (the dimnames names).

Cbind Combine R Objects By Rows Or Columns

Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows,
respectively. These are generic functions with methods for other R classes

Program:

#library(datasets)
library(caTools)
library(e1071)
mydata<-
read.csv(file="F:/SEM7/DA/KomalN/Ass2/diabetes.csv",header=TRUE,sep=","
)
View(mydata)
temp_field <- sample.split(mydata,SplitRatio=0.7)
train <- subset(mydata, temp_field==TRUE)
test <- subset(mydata, temp_field==FALSE)
head(train)
head(test)
my_model <- naiveBayes(as.factor(train$Outcome)~.,train)
my_model

SIT, LONAVALA 109


LABORATORY PRACTICE-I BE COMPUTER

pred1<-predict(my_model,test[,-9])
pred1
pred1<-predict(my_model,test[,-9],type="raw")
pred1
pred1<-predict(my_model,test[,-9])
pred1
table(pred1,test$Outcome,dnn = c("predicted","Actual"))
output<- cbind(test,pred1)
View(output)

Output:

#library(datasets)
library(caTools)
library(e1071)
mydata<-
read.csv(file="F:/SEM7/DA/KomalN/Ass2/diabetes.csv",header=TRUE,sep=","
)
View(mydata)

> temp_field <- sample.split(mydata,SplitRatio=0.7)


> train <- subset(mydata, temp_field==TRUE)
> test <- subset(mydata, temp_field==FALSE)

SIT, LONAVALA 110


LABORATORY PRACTICE-I BE COMPUTER

> head(train)
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI
DiabetesPedigreeFunction Age Outcome
2 1 85 66 29 0 26.6 0.351 31 0
3 8 183 64 0 0 23.3 0.672 32 1
4 1 89 66 23 94 28.1 0.167 21 0
7 3 78 50 32 88 31.0 0.248 26 1
8 10 115 0 0 0 35.3 0.134 29 0
9 2 197 70 45 543 30.5 0.158 53 1
> head(test)
Pregnancies Glucose BloodPressure SkinThickness Insulin BMI
DiabetesPedigreeFunction Age Outcome
1 6 148 72 35 0 33.6 0.627 50 1
5 0 137 40 35 168 43.1 2.288 33 1
6 5 116 74 0 0 25.6 0.201 30 0
10 8 125 96 0 0 0.0 0.232 54 1
14 1 189 60 23 846 30.1 0.398 59 1
15 5 166 72 19 175 25.8 0.587 51 1

> my_model <- naiveBayes(as.factor(train$Outcome)~.,train)


> my_model

Naive Bayes Classifier for Discrete Predictors

Call:

SIT, LONAVALA 111


LABORATORY PRACTICE-I BE COMPUTER

naiveBayes.default(x = X, y = Y, laplace = laplace)

A-priori probabilities:
Y
0 1
0.6269531 0.3730469

Conditional probabilities:
Pregnancies
Y [,1] [,2]
0 3.264798 3.073319
1 4.712042 3.771892

Glucose
Y [,1] [,2]
0 110.1277 26.59334
1 138.8272 33.08691

BloodPressure
Y [,1] [,2]
0 68.51402 17.91265
1 71.36126 20.30531

SkinThickness
Y [,1] [,2]
0 19.46106 14.81635
1 21.72251 17.42568

Insulin
Y [,1] [,2]
0 65.71963 92.92128
1 99.55497 134.75274

BMI
Y [,1] [,2]
0 30.39564 7.462149
1 35.18325 6.494494

DiabetesPedigreeFunction
Y [,1] [,2]
0 0.4289221 0.3089013
1 0.5271518 0.3344238

SIT, LONAVALA 112


LABORATORY PRACTICE-I BE COMPUTER

Age
Y [,1] [,2]
0 31.4486 12.09977
1 37.1466 10.94577
> pred1<-predict(my_model,test[,-9])
> pred1
[1] 1 1 0 0 1 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0
0001101011010010001000101000011101100001001
10
[86] 0 0 0 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0
0100111000010000011010010001000100000100110
00
[171] 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 1 1 0 0 0 0 0
0010001001110101000011001010110000000010001
01
[256] 0
Levels: 0 1
> pred1<-predict(my_model,test[,-9],type="raw")
> pred1
0 1
[1,] 3.023617e-01 0.6976383049
[2,] 1.042643e-02 0.9895735696
[3,] 9.363969e-01 0.0636031459
[4,] 9.977055e-01 0.0022944542
[5,] 3.710766e-10 0.9999999996
[6,] 2.908498e-01 0.7091502399
[7,] 7.638351e-01 0.2361648756
[8,] 1.867342e-02 0.9813265778
[9,] 7.268751e-01 0.2731249027
[10,] 9.818539e-01 0.0181461175
[11,] 1.743120e-01 0.8256879958
[12,] 9.867358e-01 0.0132641733
[13,] 2.678128e-01 0.7321871927
[14,] 3.426135e-01 0.6573865416
[15,] 3.484119e-01 0.6515880608
[16,] 1.182130e-02 0.9881787022
[17,] 9.992828e-01 0.0007172357
[18,] 9.901948e-01 0.0098051522
…..
……
[255,] 5.173472e-02 0.9482652816

SIT, LONAVALA 113


LABORATORY PRACTICE-I BE COMPUTER

[256,] 9.142187e-01 0.0857813186


> pred1<-predict(my_model,test[,-9])
> pred1
[1] 1 1 0 0 1 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 0 0 1 0 1 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0
0001101011010010001000101000011101100001001
10
[86] 0 0 0 0 1 1 1 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 0 0
0100111000010000011010010001000100000100110
00
[171] 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 1 1 0 0 0 0 0
0010001001110101000011001010110000000010001
01
[256] 0
Levels: 0 1

> table(pred1,test$Outcome,dnn = c("predicted","Actual"))


Actual
predicted 0 1
0 144 22
1 35 55
> output<- cbind(test,pred1)
> View(output)

Assignment No 3

SIT, LONAVALA 114


LABORATORY PRACTICE-I BE COMPUTER

Aim:
Trip History Analysis: Use trip history dataset that is from a bike sharing
service in the United States. The data is provided quarter-wise from 2010 (Q4)
onwards. Each file has 7 columns. Predict the class of user. Sample Test data set
available here https://www.capitalbikeshare.com/trip-history-data.

Problem Statement:
Analysis Trip History by using trip history dataset that is from a bike sharing service
in the United States. The data is provided quarter-wise from 2010 (Q4) onwards. Each file
has 7 columns. Predict the class of user.

Objective:
Predict the result from previous data.

Theory:
R:
R is a powerful language used widely for data analysis and statistical computing. It
was developed in early 90s. Since then, endless efforts have been made to improve R’s user
interface. The journey of R language from a rudimentary text editor to interactive R Studio
and more recently Jupyter Notebooks has engaged many data science communities across the
world.

This was possible only because of generous contributions by R users globally. Inclusion of
powerful packages in R has made it more and more powerful with time. Packages such as
dplyr, tidyr, readr, data.table, SparkR, ggplot2 have made data manipulation, visualization
and computation much faster.

Advantages of R:

13. The style of coding is quite easy.


14. It’s open source. No need to pay any subscription charges.
15. Availability of instant access to over 7800 packages customized for various
computation tasks.
16. The community support is overwhelming. There are numerous forums to help you
out.
17. Get high performance computing experience ( require packages)
18. One of highly sought skill by analytics and data science companies.

The interface of R Studio:

9. R Console: This area shows the output of code you run. Also, you can directly write
codes in console. Code entered directly in R console cannot be traced later. This is
where R script comes to use.
10. R Script: As the name suggest, here you get space to write codes. To run those codes,
simply select the line(s) of code and press Ctrl + Enter. Alternatively, you can click
on little ‘Run’ button location at top right corner of R Script.

SIT, LONAVALA 115


LABORATORY PRACTICE-I BE COMPUTER

11. R environment: This space displays the set of external elements added. This includes
data set, variables, vectors, functions etc. To check if data has been loaded properly in
R, always look at this area.
12. Graphical Output: This space display the graphs created during exploratory data
analysis. Not just graphs, you could select packages, seek help with embedded R’s
official documentation.

Library:

3. caTools: Tools: moving window statistics, GIF, Base64, ROC AUC, etc

Contains several basic utility functions including: moving (rolling, running) window statistic
functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost
classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.

4. e1071: Misc Functions of the Department of Statistics, Probability Theory Group


(Formerly: E1071), TU Wien

Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support
vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ...

5. rpart: Recursive Partitioning and Regression Trees

Recursive partitioning for classification, regression and survival trees. An implementation of


most of the functionality of the 1984 book by Breiman, Friedman, Olshen and Stone.

Data: an optional data frame in which to interpret the variables named in the formula.

SIT, LONAVALA 116


LABORATORY PRACTICE-I BE COMPUTER

Method: one of "anova", "poisson", "class" or "exp". If method is missing then the routine


tries to make an intelligent guess. If y is a survival object, then method = "exp" is assumed,
if y has 2 columns then method = "poisson" is assumed, if y is a factor then method =
"class" is assumed, otherwise method = "anova" is assumed. It is wisest to specify the
method directly, especially as more criteria may added to the function in
future.Alternatively, method can be a list of functions named init, split and eval. Examples
are given in the file tests/usersplits.R in the sources, and in the vignettes ‘User Written Split
Functions’.

CSV(comma-separated values )

In computing, a comma-separated values (CSV) file is a delimited text file that uses a


comma to separate values. A CSV file stores tabular data (numbers and text) in plain text.
Each line of the file is a data record. Each record consists of one or more fields, separated
by commas. The use of the comma as a field separator is the source of the name for this file
format.

How to read csv:

mydata<-read.csv(file="F:/SEM7/DA/KomalN/Ass2/tripdata.csv",header=TRUE,sep=",")

The above reads the file tripdata.csv into a data frame that it creates called mydata.
header=TRUE specifies that this data includes a header row and sep=”,” specifies that the
data is separated by commas (though read.csv implies the same I think it’s safer to be
explicit).

Sample.split: Split Data Into Test And Train Set

temp_field <- sample.split(mydata,SplitRatio=0.7)

Split data from vector Y into two sets in predefined ratio while preserving relative ratios of
different labels in Y. Used to split the data used during classification into train and test
subsets.

SplitRatio
Splitting ratio:

IV. if (0<=splitratio<1)< code=""> then SplitRatio fraction of points from Y will be et


toTRUE
V. if (SplitRatio==1) then one random point from Y will be set to TRUE
VI. if (SplitRatio>1) then SplitRatio number of points from Y will be set to TRUE

Train Data: The predictive model is always built on train data set. An intuitive way to
identify the train data is, that it always has the ‘response variable’ included.

train <- subset(mydata, temp_field==TRUE)

SIT, LONAVALA 117


LABORATORY PRACTICE-I BE COMPUTER

Test Data: Once the model is built, it’s accuracy is ‘tested’ on test data. This data always
contains less number of observations than train data set. Also, it does not include ‘response
variable’.

test <- subset(mydata, temp_field==FALSE)

Summary

A very useful multipurpose function in R is summary(X), where X can be one of any number
of objects, including datasets, variables, and linear models, just to name a few. When used,
the command provides summary data related to the individual object that was fed into it.
Thus, the summary function has different outputs depending on what kind of object it takes as
an argument.

Head: Return The First Or Last Part Of An Object

Returns the first or last parts of a vector, matrix, table, data frame or function.
Since head() and tail() are generic functions, they may also have been extended to other
classes.

Predict: Predicted values based on linear model object

newdata Data frame in which to predict


type Type of prediction (response or model term)

Cbind: Combine R Objects By Rows Or Columns


Take a sequence of vector, matrix or data-frame arguments and combine by columns or rows,
respectively. These are generic functions with methods for other R classes

Printcp: Displays CP Table For Fitted Itree Object


Displays the cp table for fitted itree object. Note that cp is not defined
for method="purit" or "extremes". Otherwise identical to rpart's printcp function.
Prune: Cost-Complexity Pruning Of An Rpart Object
Determines a nested sequence of subtrees of the supplied rpart object by
recursively snipping off the least important splits, based on the complexity parameter (cp).

Cp: Complexity parameter to which the rpart object will be trimmed.

Program:

library(e1071)
library(caTools)
library(rpart)

SIT, LONAVALA 118


LABORATORY PRACTICE-I BE COMPUTER

mydata<-
read.csv(file="/home/SIT/Desktop/tripdata.csv",header=TRUE,sep=",")
View(mydata)
#consider column1,4,6,9 - output class
subset_mydata <- mydata[,c(1,4,6,9)]
temp_field <- sample.split(subset_mydata,SplitRatio=0.9)
train <- subset(subset_mydata, temp_field==TRUE)
test <- subset(subset_mydata, temp_field==FALSE)
summary(train)
summary(test)
head(train)
head(test)
fit <- rpart(train$Member.type~.,data=train,method="class")
plot(fit)
text(fit)
#test excluding last colm
pred<- predict(fit,newdata=test[,-4],type=("class"))
mean(pred==test$Member.type)
output <- cbind(test,pred)
View(output)

#fit<- rpart(train$Member.type~., data=train)


printcp(fit)
opt<- which.min(fit$cptable[,"xerror"])
cp <- fit$cptable[opt,"CP"]
#prune tree
pruned_model<-prune(fit,cp)

plot(fit)
text(fit)

Output:

> library(e1071)
> library(caTools)
> library(rpart)
> mydata<-
read.csv(file="/home/SIT/Desktop/tripdata.csv",header=TRUE,sep=",")
> View(mydata)

SIT, LONAVALA 119


LABORATORY PRACTICE-I BE COMPUTER

> subset_mydata <- mydata[,c(1,4,6,9)]


> temp_field <- sample.split(subset_mydata,SplitRatio=0.9)
> train <- subset(subset_mydata, temp_field==TRUE)
> test <- subset(subset_mydata, temp_field==FALSE)

> summary(train)
Duration Start.station.number End.station.number Member.type
Min. : 60 Min. :31000 Min. :31000 Casual: 76741

SIT, LONAVALA 120


LABORATORY PRACTICE-I BE COMPUTER

1st Qu.: 440 1st Qu.:31208 1st Qu.:31212 Member:203845


Median : 769 Median :31258 Median :31257
Mean : 1263 Mean :31322 Mean :31322
3rd Qu.: 1323 3rd Qu.:31500 3rd Qu.:31408
Max. :85674 Max. :32227 Max. :32227
> summary(test)
Duration Start.station.number End.station.number Member.type
Min. : 60 Min. :31000 Min. :31000 Casual:25562
1st Qu.: 443 1st Qu.:31208 1st Qu.:31212 Member:67967
Median : 769 Median :31258 Median :31257
Mean : 1260 Mean :31321 Mean :31322
3rd Qu.: 1320 3rd Qu.:31411 3rd Qu.:31408
Max. :86181 Max. :32227 Max. :32227
> head(train)
Duration Start.station.number End.station.number Member.type
2 578 31232 31609 Casual
3 580 31232 31609 Casual
4 606 31104 31509 Member
6 175 31104 31117 Member
7 1605 31264 31641 Casual
8 1591 31264 31641 Casual
> head(test)
Duration Start.station.number End.station.number Member.type
1 679 31302 31307 Member
5 582 31129 31118 Member
9 509 31116 31203 Member
13 1226 31609 31230 Casual
17 209 31204 31275 Member
21 5012 31084 31084 Casual
> fit <- rpart(train$Member.type~.,data=train,method="class")
> plot(fit)
> text(fit)

SIT, LONAVALA 121


LABORATORY PRACTICE-I BE COMPUTER

> pred<- predict(fit,newdata=test[,-4],type=("class"))


> mean(pred==test$Member.type)
[1] 0.8073966
> output <- cbind(test,pred)
> View(output)

> printcp(fit)

SIT, LONAVALA 122


LABORATORY PRACTICE-I BE COMPUTER

Classification tree:
rpart(formula = train$Member.type ~ ., data = train, method = "class")

Variables actually used in tree construction:


[1] Duration End.station.number Start.station.number

Root node error: 76741/280586 = 0.2735

n= 280586

CP nsplit rel error xerror xstd


1 0.229617 0 1.00000 1.00000 0.0030768
2 0.017259 1 0.77038 0.78160 0.0028298
3 0.013174 4 0.71603 0.70976 0.0027301
4 0.010000 5 0.70286 0.70287 0.0027200
> opt<- which.min(fit$cptable[,"xerror"])
> cp <- fit$cptable[opt,"CP"]
> pruned_model<-prune(fit,cp)
> plot(fit)

> text(fit)

SIT, LONAVALA 123


LABORATORY PRACTICE-I BE COMPUTER

SIT, LONAVALA 124


LABORATORY PRACTICE-I BE COMPUTER

Assignment No 4

Aim:
Twitter Data Analysis: Use Twitter data for sentiment analysis. The dataset is 3MB in
size and has 31,962 tweets. Identify the tweets which are hate tweets and which are not.
Sample Test data set available here https://datahack.analyticsvidhya.com/contest/practice-
problem-twitter-sentiment-analysis/

Prerequisites: Fundamentals of Python Programming Languages

Objective: To learn the concept of natural language processing (NLP) tasks such as
part-of-speech tagging, noun phrase extraction, sentiment analysis, and
classification.
Theory: I. Python regular expression Library: Regular expressions are used to
identify whether a pattern exists in a given sequence of characters
(string) or not. They help in manipulating textual data, which is often a
pre-requisite for data science projects that involve text mining. You
must have come across some application of regular expressions: they
are used at the server side to validate the format of email addresses or
password during registration, used for parsing text data files to find,
replace or delete certain string, etc.

II.Python Tweepy library: This library provides access to the entire


twitter RESTful API methods. Each method can accept various
parameters and return responses.

III. Python TextBlob library: TextBlob is a Python (2 and 3) library for


processing textual data. It provides a consistent API for diving into
common natural language processing (NLP) tasks such as part-of-
speech tagging, noun phrase extraction, sentiment analysis, and more.

IV. Authentication: create OAuthHandler object : Tweepy supports oauth


authentication. Authentication is handled by the tweepy AuthHandler
class.

V.Utility Function in Python : This function provides sentiments


analysis of tweets.

SIT, LONAVALA 125


LABORATORY PRACTICE-I BE COMPUTER

Facilities: Linux Operating Systems, Python editor.

Input:
Structured Dataset : Twitter Dataset
File: Twitter.csv
Output:
1. Sentiment analysis of twitter dataset.
2. Categorization of tweets as positive and negative tweets..

Conclusion: Hence, we have studied sentiment analysis of Twitter dataset to


classify the tweets from dataset.
Questions:
1. What is Sentiment analysis?
2. Which API is required to handle authentication?
3. What is syntax of utility function?
4. What is function of Text Blob library?
5. What is Re library in python?

Program:

library(dplyr)
library(tibble)
library(twitteR)
library(graphics)
library(purrr)
library(stringr)
library(tm)
library(syuzhet)
library(gapminder)
library(httpuv)
library(openssl)
library(RCurl)
library(RInside)
library(Rcpp)
library(textclean)
library(SnowballC)
library(gapminder)
#Connect to Twitter API:+

api_key<- "CRzxTe08UF5Mrl7nFxovwmAhN"

SIT, LONAVALA 126


LABORATORY PRACTICE-I BE COMPUTER

api_secret <- "1DyGZD5alm0JS5GvsB5E5FB9piWf4Q4GjV4MAC1wDCXk0rTA7T"


access_token <- "1054441427696988160-TYFDtvBmMlwv2VPu0JMs6a1SfVFpfT"
access_token_secret <- "D44rlviM8AFH03v6EjpgZsP2XpPlnIsiKU1n6XWRUxSrW"
setup_twitter_oauth(api_key,api_secret,access_token,access_token_secret)
#Get tweets:

prat_tweets <- userTimeline("prattprattpratt", n = 250)

oprah_tweets <- userTimeline("Oprah", n = 250)

neil_tweets <- userTimeline("neiltyson", n = 250)

mar_tweets <- userTimeline("billmaher", n = 250)

kutch_tweets <- userTimeline("aplusk", n = 250)

tweets<- tbl_df(map_df(c(prat_tweets,oprah_tweets,neil_tweets,
mar_tweets,kutch_tweets),as.data.frame))

write.csv(tweets, file="tweets.csv", row.names=FALSE)

#Read in data:

setwd("C:/Users/mateo/Documents/Repo/text-analysis")

tweets<-read.csv("tweets.csv")
#Clean up data:

twitterCorpus <-Corpus(VectorSource(tweets$text))

inspect(twitterCorpus[1:10])

twitterCorpus<- tm_map(twitterCorpus, content_transformer(tolower))


twitterCorpus<- tm_map(twitterCorpus,removeWords,stopwords("en"))
twitterCorpus<- tm_map( twitterCorpus,removeNumbers)
twitterCorpus<- tm_map( twitterCorpus,removePunctuation)

removeURL<- function(x) gsub("http[[:alnum:]]*", "", x)


twitterCorpus<- tm_map(twitterCorpus,content_transformer(removeURL))

removeURL<- function(x) gsub("edua[[:alnum:]]*", "", x)


twitterCorpus<- tm_map(twitterCorpus,content_transformer(removeURL))

# remove non "American standard code for information interchange (curly quotes and
ellipsis)"

SIT, LONAVALA 127


LABORATORY PRACTICE-I BE COMPUTER

# using function from package "textclean"

removeNonAscii<-function(x) textclean::replace_non_ascii(x)
twitterCorpus<-tm_map(twitterCorpus,content_transformer(removeNonAscii))

twitterCorpus<- tm_map(twitterCorpus,removeWords,c("amp","ufef",
"ufeft","uufefuufefuufef","uufef","s"))

twitterCorpus<- tm_map(twitterCorpus,stripWhitespace)

inspect(twitterCorpus[1:10])

# stem corpus after sentiment analysis(given my sentiment dictionary choice), but before
cluster analysis

#Sentiment analysis:

# find count of 8 emotional sentiments

+ emotions<-get_nrc_sentiment(twitterCorpus$content)

barplot(colSums(emotions),cex.names = .7,
col = rainbow(10),
main = "Sentiment scores for tweets"
)

# sentiment positiviy rating

get_sentiment(twitterCorpus$content[1:10])

sent<-get_sentiment(twitterCorpus$content)
sentimentTweets<-dplyr::bind_cols(tweets,data.frame(sent))

# mean of sentiment positivity

meanSent<-function(i,n){
mean(sentimentTweets$sent[i:n])
}

(scores<-c(prat=meanSent(1,250),
oprah=meanSent(251,500),
neil=meanSent(501,750),
maher=meanSent(751,849),
astk=meanSent(850,1002)))

SIT, LONAVALA 128


LABORATORY PRACTICE-I BE COMPUTER

#Cluster analysis:

# convert to stem words


twitterCorpus<-tm_map(twitterCorpus,stemDocument)

# build document term matrix

dtm<-DocumentTermMatrix(twitterCorpus)
dtm
mat<-as.matrix(dtm)

# create distance matrix

d<-dist(mat)

# input distance matrix into hclust function using method "ward.D"

groups<-hclust(d,method="ward.D")
plot(groups,hang=-1)

cut<-cutree(groups,k=6)
newMat<-dplyr::bind_cols(tweets,data.frame(cut))

table(newMat$screenName,newMat$cut)

SIT, LONAVALA 129

You might also like