Professional Documents
Culture Documents
Recommender System With Sentiment Analysis: Summer Internship Report
Recommender System With Sentiment Analysis: Summer Internship Report
Recommender System With Sentiment Analysis: Summer Internship Report
On
(Online Internship)
Submitted by
I
L. J. INSTITUTE OF ENGINEERING AND TECHNOLOGY
CERTIFICATE
This is to certify that the Summer Internship “Recommender System with Sentiment
Analysis” submitted by Bhargav Dobariya, towards the fulfillment of the requirements for
the degree of Bachelor of Engineering in Information & Communication Technology of L.J.
Institute of Engineering and Technology, Ahmedabad, under the Gujarat Technological
University, Ahmedabad is the record of work carried out by him/her under my supervision
and guidance. In my opinion, the submitted work has reached a level required for being
accepted for examination. The results embodied in this project, to the best of my knowledge,
haven’t been submitted to any other university or institution for award of any degree or
diploma.
II
Student’s Declaration
III
Internship Information
Joining Letter
IV
Completion Certificate
V
INDEX
1 Acknowledgment VII
2 Abstract VIII
3 Table of Contents IX
4 List of Figures X
5 List of Tables XI
VI
ACKNOWLEDGEMENT
I would like to express my deepest gratitude to all those who provided me the
possibility to the completion of the internship. Special gratitude of thanks I give
to our Assistant Professor, Prof. Bhautik Trivedi, whose contribution in
stimulating suggestions and encouragement, helped me to coordinate the
internship especially in drafting this report.
Furthermore, I would also like to acknowledge with much appreciation the crucial
role of the Head of Department, who gave the permission to use all required
equipment and the necessary material to fulfill the task. Last but not the least,
many thanks goto the teachers and my friends and families who have invested
their full effort in guiding us inachieving the goal.
Also, I appreciate the guidance given by the developer at Brainy Beam, Mr. Raj
and the panels, especially for the internship that has advised me and given
guidance at every momentof the internship.
VII
ABSTRACT
Data Science and analysis is playing the most significant role today covering
every industry in the market. For e.g., finance, e-commerce, business, education,
and government.
Now organizations play a 360-degree role to analyse the behaviour and interest
of their customers to make decisions in favour of them. Data is analysed through
programming languages such as python which is one of the most versatile
languages and helps in doing a lot of things through it.
Netflix is a pure data science project that reached at the top through analysing
every single interest of their customers. Key terminology that are used in Data
Science are: Data Visualization, Anaconda Jupyter Notebook, Exploratory Data
Analysis, Machine Learning, Data wrangling, and Evaluation using scikit
library’s surprise module.
VIII
Table of Content
IX
List of Figures
X
List of Tables
XI
190320132014 CHAPTER 2
Website: www.brainybeam.com
About Us
At Brainy Beam, we see Innovation as a clear differentiator. Innovation, along with focus
on deep,long-lasting client relationships and strong domain expertise, drives every facet
of our day-to-dayoperations.
Brainy Beam Technologies was founded with a vision to address growing businesses'
needs of reducing the time to market and cost effectiveness required to develop and
maintain unique and customized web and mobile solutions. We are uniquely and
strategically positioned to partner with startups and leading brands to help them expand
their business and offer the most effective and cost-efficient solutions that provide
revenues and value to their business needs.
Vision
To become the most trusted and preferred offshore IT solutions partner for Startups,
SMBs, andEnterprises through innovation and technology leadership. Understanding
your ambitious vision, honing in on its essence, creating a design strategy, and knowing
how to technically execute it is what we do best. Our promise? The integrity of your
vision will be maintained and we'll enhance it to best reach your target customers. With
our primary focus on creating amazing user experiences, we'll help you understand the
tradeoffs, prioritize features, and distill valuable functionality. It's an art form we care
about getting right.
LJIET-ICT
12 | P a g e
190320132014 CHAPTER 2
Information system, an integrated set of components for collecting, storing, and processing
data and for providing information, knowledge, and digital products. Business firms and
other organizations rely on information systems to carry out and manage their operations,
interact with their customers and suppliers, and compete in the marketplace. Information
systems are used to run interorganizational supply chains and electronic markets.
The main components of information systems are computer hardware and software,
telecommunications, databases and data warehouses, human resources, and procedures.
The hardware, software, and telecommunications constitute information technology (IT),
which is now ingrained in the operations and management of organizations.
LJIET-ICT 13 | P a g e
190320132014 CHAPTER 3
ii. Also explained how to install and run python and jupyter notebook
andother useful tools?
TASK 1:
Python Keywords:
LJIET-ICT 14 | P a g e
190320132014 CHAPTER 3
Python Operators:
Python divides the operators in the following groups:
Arithmetic operators
Assignment operators
Comparison operators
Logical operators
Identity operators
Membership operators
Arithmetic operators are used with numeric values to perform common mathematical
operations:
+ Addition x+y
- Subtraction x–y
* Multiplication x*y
/ Division x/y
% Modulus x%y
** Exponentiation x ** y
// Floor division x // y
LJIET-ICT 15 | P a g e
190320132014 CHAPTER 3
= x=5 x=5
+= x += 5 x=x+5
-= x -= 5 x=x–5
*= x *= 5 x=x*5
/= x /= 5 x=x/5
%= x %= 5 x=x%5
//= x //= 5 x = x // 5
**= x **= 5 x = x ** 5
|= x |= 5 x=x|5
^= x ^= 5 x=x^5
LJIET-ICT 16 | P a g e
190320132014 CHAPTER 3
== Equal x == y
!= Not equal x != y
LJIET-ICT 17 | P a g e
190320132014 CHAPTER 3
x < 10)
Identity operators are used to compare the objects, not if they are equal, but if they are
actually the same object, with the same memory location:
LJIET-ICT 18 | P a g e
190320132014 CHAPTER 3
Description Example
Operator
<< Zero fill left shift Shift left by pushing zeros in from the
right
LJIET-ICT 19 | P a g e
190320132014 CHAPTER 3
TASK 2:
Structured, semi structured, Unstructured Data:
Structured data
Structured data is information that has been formatted and transformed into a well-defined
data model. The raw data is mapped into predesigned fields that can then be extracted and
read through SQL easily. SQL relational databases, consisting of tables with rows and
columns, are the perfect example of structured data.
The relational model of this data format utilizes memory since it minimizes data
redundancy. However, this also means that structured data is more inter-dependent and less
flexible. Now let’s look at more examples of structured data.
Your data sets may not always be structured or unstructured; semi-structured data or
partially structured data is another category between structured and unstructured data.
Semi-structured data is a type of data that has some consistent and definite characteristics.
It does not confine into a rigid structure such as that needed for relational databases.
Organizational properties like metadata or semantics tags are used with semi-structured
data to make it more manageable; however, it still contains some variability and
inconsistency.
Unstructured data
Unstructured data is defined as data present in absolute raw form. This data is difficult to
process due to its complex arrangement and formatting. Unstructured data management
may take data from many forms, including social media posts, chats, satellite imagery, IoT
sensor data, emails, and presentations, to organize it in a logical, predefined manner in a
data storage. In contrast, the meaning of structured data is data that follows predefined data
models and is easy to analyze. Structured data examples would include alphabetically
arranged names of customers and properly organized credit card numbers. After
understanding the definition of unstructured data, let’s look at some examples.
LJIET-ICT 20 | P a g e
190320132014 CHAPTER 3
LJIET-ICT 21 | P a g e
190320132014 CHAPTER 3
Screenshot:
LJIET-ICT 22 | P a g e
190320132014 CHAPTER 3
Output:
LJIET-ICT 23 | P a g e
190320132014 CHAPTER 3
TASK 4:
AIM: List out 5 methods of list, set explain them with
example.
1. List: Lists are the build-in data-types in python that are used to store
multiple items in a single variable. The data is stored in [].
2. Sets are also used to store multiple items in a single variable. In set
there is no orderand no index. Data stored between { }.
3. Dictionary: Storing of values, Ordered, changeable(mutable) , doesn’t
allow changeof values.
LIST:
Example: a= [‘Bha’,’r’,’gav’]
Lists are the build-in data-types in python that are used to store multiple items in a
single variables. The plus point of list is that the order of list does not change, and
the items in thelist are changeable (mutable) and the last point as the list allows
duplicate values too.
LIST Methods:
- . append(x): Add an item to the end of the list
- . insert (i, x): Inserting an item at a given position
- . remove(x): removing the first item from the list whose value is equal to x
- copy (): Copying of the list
- count (): Number of elements with the specified value
- reverse (): reverse the list
CODE:
b = a[-3:-1]
print(b)
c = a[-1:-3]
print(c)
LJIET-ICT 24 | P a g e
190320132014 CHAPTER 3
SCREENSHOT:
SET:
- Sets are also used to store multiple items in a single variables.
- In set there is no order and no index.
- The down point of set data type is the value cannot be changed once
the set is createdimmutable
- Repetition of values are not allowed in set.
Sets Methods:
a) add(): adds element to a set
b) discard(): Removes an Element from The Set
c) union(): Returns the union of sets
d)update(): Add elements to the set
e)clear(): remove all elements from a set
LJIET-ICT 25 | P a g e
190320132014 CHAPTER 3
CODE:
# set of vowels
vowels = {'a', 'e', 'i', 'u'}
print(vowels)
# Adding 'o'
vowels.add('o')
print('Vowels are:',vowels)
#Discarding 'o'
vowels.discard('o')
print('Vowels are:',vowels)
#union
A2 = {'a', 'c', 'd'}
B2 = {'c', 'd', 2 }
print('A U B =', A2.union(B2))
#update
A3 = {'a', 'b'}
B3 = {1, 2, 3}
result = A3.update(B3)
print('A =', A3)
#clear vowels.clear()
print('Vowels (after clear):', vowels)
LJIET-ICT 26 | P a g e
190320132014 CHAPTER 3
SCREENSHOT:
LJIET-ICT 27 | P a g e
190320132014 CHAPTER 3
TASK 5:
AIM: List out 5 methods of dictionary explain them with
example.
Dictionaries:
- Storing of values
- Ordered , changeable(mutable) , doesn’t allow
change of values
Dictionary Methods:
CODE:
#get()
person = {'name': 'Jainish', 'age': 21}
print('Name: ', person.get('name'))
print('Age: ', person.get('age'))
#items()
print(person.items())
#keys
print(person.keys())
#setdefault()
age = person.setdefault('age')
print('person = ',person)
print('Age = ',age)
#values()
LJIET-ICT 28 | P a g e
190320132014 CHAPTER 3
print(person.values())
#clear()
person.clear()
print(person)
SCREENSHOT:
LJIET-ICT 29 | P a g e
190320132014 CHAPTER 3
TASK 6:
Code:
import random as r
print(
"Random Float: ", r.random()
) # it will return random float value between 0.0 & 1.0
print(
"Random Integer: ", r.randint(50, 150)
) # it will print random integer value between specified integers
print(
"Random Range: ", r.randrange(11, 111, 11)
) # it will return an element randomly and it contains arguments (start,stop,step)
print(
"Random Choice: ", r.choice("element to be selected from here")
) # it will choose an element from specified string or variable
a = ["s", "h", "u", "f", "f", "l", "e"]
r.shuffle(a)
print("Random Shuffle: ", a) # it will shuffle the given list
LJIET-ICT 30 | P a g e
190320132014 CHAPTER 3
Output
SCREENSHOT:
LJIET-ICT 31 | P a g e
190320132014 CHAPTER 3
TASK 7:
AIM: Build a student report card program which can take subject,
marks as input and return the sum of marks of students using functions.
PROGRAM:
def student():
s=int(input("How many Students:"))
stu_name(s)
def stu_name(x):
i=1
n=[]
while i<=x:
name=input("Enter name:")
sub=int(input("How many Subjects:"))
marks(sub)
n.append(name)
i=i+1
return n
def marks(z):
i=1
t=0
while i<=z:
m=int(input("Enter marks:"))
t=t+m
i=i+1
print(t)
student()
LJIET-ICT 32 | P a g e
190320132014 CHAPTER 3
Output:
SCREENSHOT:
LJIET-ICT 33 | P a g e
190320132014 CHAPTER 3
TASK 8:
AIM: Build a program to find factorial, prime and odd even from user
input.
PROGRAM:
from math import factorial
i = int(input("Enter Number:"))
a=i
#Factorial
fact=1
if i == 0:
print("Factorial Of 0 is 1")
elif i<0:
print("Factorial does'nt exist for negative number")
else:
while(i>0):
fact=fact*i
i=i-1
print("Factorial=",fact)
#Prime
if (a<=1):
print(a,"Not a Prime number.")
else:
for n in range(2,a):
if (a%n) == 0:
print(a,"is not a Prime number.")
break
else:
print(a,"is a Prime number.")
#Odd-Even
if(a%2==0):
print(a,"is Even")
LJIET-ICT 34 | P a g e
190320132014 CHAPTER 3
OUTPUT:
SCREENSHOT:
LJIET-ICT 35 | P a g e
190320132014 CHAPTER 3
LJIET-ICT 36 | P a g e
190320132014 CHAPTER 3
OUTPUT:
SCREENSHOT:
LJIET-ICT 37 | P a g e
190320132014 CHAPTER 3
AIM: List out 5 inbuilt libraries of python and use their 3 methods.
1. Pandas: It is an open source library which is widely used in data
science. Pandas are used for the analysis, manipulation, and cleaning
of data.
2. NumPy: It is defined as ‘Numerical Python’. It is used for
mathematical operations.
3. Matplotlib: This library is used for plotting numerical data used in
data analysis and publishing high-quality figures like graphs, pie
charts, scatterplots, histograms, etc.
4. SciPy: It is defined as ‘Scientific Python’. SciPy is an open-source
python library used for scientific computation, data computation, and
high-performance computation.
5. Beautiful Soup: Beautiful Soup is a library used for the extraction and
collection of information from websites.
TASK 11:
AIM: Explain pandas applications and list out at least 5
methods of pandas and explain them with example.
1. Economics: Economics is in constant demand for data analysis. Analysing data to
form patterns and understanding trends about how the economy in various sectors
is growing, is something very essential for economists. Therefore, a lot of
economists have started using Python and Pandas to analyse huge datasets. Pandas
provide a comprehensive set of tools, like datagrams and file-handling. These
tools help immensely in accessing and manipulating data to get the desired results.
Through these applications of Pandas, economists all around the world have been
able to make breakthroughs like never before.
2. Recommendation Systems: We all have used Spotify or Netflix and been appalled
at the brilliant recommendations provided by these sites. These systems are
a miracle of Deep Learning. Such models for providing recommendations are one
of the most important applications of Pandas. Mostly, these models are made in
python and Pandas being the main libraries of python, used when handling data in
such models. We know that Pandas are best for managing huge amounts of data.
And the recommendation system is possible only by learning and handling huge
masses of data. Functions like group By and mapping help tremendously in
making these systems possible.
3. Stock Prediction: The stock market is extremely volatile. However, that doesn’t
mean that it cannot be predicted. With the help of Pandas and a few other libraries
LJIET-ICT 38 | P a g e
190320132014 CHAPTER 3
like numpy and matplotlib, we can easily make models which can predict how the
stock markets turn out. This is possible because there is a lot of previous data of
stocks which tells us about how they behave. And by learning these data of stocks,
a model can easily predict the next move to be taken with some accuracy. Not only
this, but people can also automate buying and selling of stocks with the help of
such prediction models.
4. Statistics: Pure math’s itself has made much progress with the various applications
of Pandas. Since Statistic deals with a lot of data, a library like Pandas which deals
with data handling has helped in a lot of different ways. The functions of mean,
median and mode are just very basic ones which help in performing statistical
calculations. There are a lot of other complex functions associated with statistics
and pandas plays a huge role in these so as to bring perfect results.
5. Analytics: Analytics has become easier than ever with the use of Pandas. Whether
it is website analytics or analytics of some other platform, Pandas do it all, with
its amazing data manipulation and handling capabilities. The visualization
capabilities of pandas play a big role too in this field. It not only takes in data and
displays it but also helps in applying a lot of functions over the data.
6. Natural Language Processing: NLP or Natural Language processing has taken the
world by a storm and it is creating a lot of buzzes. The main concept is to decipher
human language and several nuances related to it. This is very difficult, but with
the help of the various applications of Pandas and Scikit-learn, it is easier to create
an NLP model which we can be improved continuously with the help of various
other libraries and their functions.
Methods of Pandas: -
1. df=pd.read_csv(‘abc.csv’)
2. df.columns
a. When you have a big dataset like that it can be hard to see all the columns.
using . columns function, you can print out all the columns of the dataset:
3. df.drop()
4. df.insert()
LJIET-ICT 39 | P a g e
190320132014 CHAPTER 3
5. .len()
TASK 12:
AIM: List out at least 10 meta characters and use them in pattern for
email and phone number validation.
LJIET-ICT 40 | P a g e
190320132014 CHAPTER 3
PROGRAM:
import re
for i in range(3):
email_val='^[a-z0-9]+[\._]?[a-z0-9]+[@]\w+[.]\w{2,3}$'
email_match=re.match(email_val, ip1)
if email_match:
print("match")
num_val='[0-9]{10}'
num_match=re.match(num_val, ip2)
if num_match:
print("number valid")
else:
print("number invalid")
break
else:
continue
Output:
LJIET-ICT 41 | P a g e
190320132014 CHAPTER 3
Screenshot:
LJIET-ICT 42 | P a g e
190320132014 CHAPTER 3
Program:
LJIET-ICT 43 | P a g e
190320132014 CHAPTER 3
TASK 14:
AIM: Use any external library and inbuilt library in one
python program with user input.
PROGRAM:
import math
from math import sqrt,sin
A=int(input("Enter a number to find its Square Root: "))
B=int(input("Enter a number to find its sine value: "))
print(sqrt(A))
print(math.sin(math.radians(B)))
OUTPUT:
Screenshot:
LJIET-ICT 44 | P a g e
190320132014 CHAPTER 3
TASK 15:
Aim: Build a password generator program containing numbers,
alphabets and characters.
PROGRAM:
import random
lsn = [1, 2]
le = int(input("Enter Length: "))
lec = -1
le1 = -1
le2 = -1
leo = -1
le3 = -1
le4 = -1
if random.choice(lsn) == 1:
lec = le // 2
leo = le - lec
else:
leo = le // 2
lec = le - leo
if random.choice(lsn) == 1:
le1 = lec // 2
le2 = lec - le1
else:
le2 = lec // 2
le1 = lec - le2
if random.choice(lsn) == 1:
le3 = leo // 2
le4 = leo - le3
else:
le4 = leo // 2
LJIET-ICT 45 | P a g e
190320132014 CHAPTER 3
password =
( random.sample(lsc1,
le1)
+ random.sample(lsc2, le2)
+ random.sample(lss, le3)
+ random.sample(lsnu, le4)
)
random.shuffle(password)
OUTPUT:
print("Password Generated: " + "".join(password))
LJIET-ICT 46 | P a g e
190320132014 CHAPTER 3
SCREENSHOT:
LJIET-ICT 47 | P a g e
190320132014 CHAPTER 3
TASK 16:
AIM: Convert multiple Series to Data frame, and find the
shape and datatype of each column.
PROGRAM:
LJIET-ICT 48 | P a g e
190320132014 CHAPTER 3
TASK 17:
LJIET-ICT 49 | P a g e
190320132014 CHAPTER 3
TASK 18:
AIM: Download and use the nltk packages and corpus
data with example.
PROGRAM:
LJIET-ICT 50 | P a g e
190320132014 CHAPTER 3
TASK 19:
AIM: Explanation TF IDF Text Vectorization with
equation.
TF-IDF Vectorization
It helps us in dealing with most frequent words. Using it we
can penalize them. Tf-idf Vectorizer weights the word counts
by a measure of how often they appear in the documents.
TF-IDF
The term frequency(i.e.,tf) for cat is then(3/100) = 0.03. Now,
assume we have 10 million documents and the word cat
appears in one thousand of these. Then the inverse document
frequency(i.e.,idf) is calculated as log(10,000,000/1,000)=4.
LJIET-ICT 51 | P a g e
190320132014 CHAPTER 3
TASK 20:
AIM: Load data from json file and find total words and
sentences from that.
PROGRAM:
LJIET-ICT 52 | P a g e
190320132014 CHAPTER 3
LJIET-ICT 53 | P a g e
190320132014 CHAPTER 3
TASK 23: -
AIM: Explain radius and neighbours parameter of KNN.
Radius & neighbours parameter of KNN:
Radius Neighbours Classifier is a classification machine learning algorithm.
It is an extension to the k-nearest neighbours algorithm that makes predictions using all
examples in the radius of a new example rather than the k-closest neighbours.
As such, the radius-based approach to selecting neighbours is more appropriate for sparse data,
preventing examples that are far away in the feature space from contributing to a prediction.
In this tutorial, you will discover the Radius Neighbours Classifier classification machine
learning algorithm.
After completing this tutorial, you will know:
LJIET-ICT 54 | P a g e
190320132014 CHAPTER 3
TASK 24:
AIM: Convert the Recommendation in Dataframe
containing product and their distance.
PROGRAM:
LJIET-ICT 55 | P a g e
CHAPTER 4 SKILLS LEARNED
During these 15 days of Internship. I learned so many new things about the Python. I was
having the intermediate knowledge about the Python, after doing this Internship I came to
know about that using python we can also make Recommendation system just by adding
some In-built library.
In Python I came to know about how to set-up and use language for the required project.
Then to add library and to integrate it with our code to work in the way we want and then also
learned about the Anaconda navigator which is the most important part in the project work
where we can push our project on the jupyter notebook, also to create different branches for
projects and to merge them.
Overall, it was great, creative and challenging experience where I find lots of errors during
project, learned about something new and creative new ideas which surely helps me in the
future for creating some new project.
56| P a g e
190320132014 CHAPTER 3
CHAPTER 5 CONCLUSION
I can honestly say that my time spent interning with Brainy beam and Company resulted in one
of the best summers of my life. Not only did I gain practical skills but I also had the opportunity
to meet many fantastic people. The atmosphere was always welcoming which made me feel
right at home. Additionally, I felt like I was able to contribute to the company by assisting and
working on projects throughout the summer. In addition to these projects, I also helped many
of the CPAs with document organization, trial balance reviews, and many other day-to-day
needs.
While I was able to learn a lot from normal collage life, my two most memorable days were
events in which Brainy beam organized outside of work.
Overall, my internship at Brainy beam has been a success. I was able to gain practical skills,
work in a fantastic environment, and make connections that will last a lifetime. I could not be
more thankful.
LJIET-ICT
57 | P a g e
CHAPTER 6 REFERENCES
Python - https://www.python.org/downloads/release/python-3912/
Jupyter notebook - https://jupyter.org/install
Anaconda navigator - https://docs.anaconda.com/anaconda/navigator/
Visual studio code - https://code.visualstudio.com/download
LJIET-ICT 58| P a g e