Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 5

News Article Recommendation System

Title: News Article Recommendation System


Submitted by: Usama Javeed (Leader)
Hussain Muhammad Ehtasham
Humaira Aqeel
Mujeeb Ur Rehman
Talha Sajid
Muhammad Hamad
Submitted to:
Dr.Muhammad Omar
Abstract

Recommender System is a tool helping users find content and overcome information overload. It
predicts Interests of users and makes recommendation according to the interest model of users.
The original content-based recommender system is the continuation and development of
collaborative filtering, which doesn’t need the user’s evaluation for items. Instead, the similarity
is calculated based on the information of items that are chose by users, and then make the
recommendation accordingly. With the improvement of machine learning, current content-based
recommender system can build profile for users and products respectively. Building or updating
the profile according to the analysis of items that are bought or visited by users. The system can
compare the user and the profile of items and then recommend the most similar products. So this
recommender method that compare user and product directly cannot be brought into
collaborative filtering model. The foundation of content-based algorithm is acquisition and
quantitative analysis of the content. As the research of acquisition and filtering of text
information are mature, many current content-based recommender systems make
recommendation according to the analysis of text information. This paper introduces content-
based recommender system for the news articles. There are a lot of features extracted from the
articles, they are diversity and unique, which is also the difference from other recommender
systems. We use these features to construct news articles model and calculate similarity.
Table of content
Introduction of Project
Problem Statement……………………………………………………………………………..
2.1 Dataset………………………………………………………………………………………
2.2 Preprocessing ……………………………………………………………………………….
2.3 Text to Numeric Representation………………………………………………………..……
Result and Decision ……………………………………………………….….………………….
Conclusion and future work ……………………………………………………………………..
References
……………………………………………………………………………………………………..

Introduction
With the advancement in interactive communication technology, the internet has become a major source
of news due to its 24/7 availability, instant updating and free distribution. One reason for this could be the
lack of prescribed procedures to offer a wide variety of news in a timely manner and the inability of the
system to model user behaviors in a better way. Therefore, there is a need to move towards tools and
techniques such as recommender systems to provide news updates tailored to readers’ information needs.
Many news sources and agencies such as CNN, BBC, New York Times and The Washington Post
provide anytime, anywhere access to news readers so that they browse through latest news using online
portals. To attract higher volume of traffic to their websites, these online portals are increasingly adopting
recommender systems to improve user experience on their sites.

Problem Statement
For building a recommender system, we face several different problems. Currently there are a lot of
recommender systems based on the user information, so what should we do if the website has not gotten
enough users? After that, we will solve the representation of an articles, which is how a system can
understand Articles. That is the precondition for comparing similarity between two articles. But For each
feature of the Articles, there should be different weight for them and each of They plays a different role
for recommendation. So we get these questions:

• How to recommend Articles when there are no user information.


• What kind of articles can be used for the recommender system?
• How to calculate the similarity between two Articles.
Material and Methods
Dataset
The Dataset based on CSV form in which there are 3 features that are used in the project. They are Title,
Content and label.

Preprocessing
Data preprocessing is a data mining technique that involves transforming raw data into an understandable
format. Real-world data is often incomplete, inconsistent, and/or lacking in certain behaviors or trends,
and is likely to contain many errors. Data preprocessing is a proven method of resolving such issues.

 Change content into a single form of list


 We should need to remove all punctuations
 Convert all Dataset to lowercase
Text to Numeric Representation
The process of converting textual data into numerical data is known as the process of vectorization in
machine learning. It is an important task because you cannot use machine learning algorithms directly on
a text as they only support numerical data.  The Scikit-learn library to convert textual data into numerical
data using Python
from sklearn.feature_extraction.text import CountVectorizer
vectorize = CountVectorizer()

Non-Negative Matrix Factorization (NMF)


Similar articles should have similar topics. Strategy of NMF are

 Apply NMF to the word-frequency array


 NMF feature values describe the topics
 ... so similar documents have similar NMF feature values
 Compare NMF feature values?

Result and Decision


In our case, feature to Articles is term to document. We can easily convert articles to vector space model
which can be used to calculate the similarity. Then we use cosine similarity discussed to calculate
similarity for each Articles.
Conclusion and Future Work
For future study, dynamic parameters will be introduced into recommender system, we will use machine
learning to adjust the weight of each feature automatically and find the most suitable weights. Make the
recommender system as an internal service. In the future, the recommender system is no longer an
external website that will be just for testing. We will make it as an internal APIs for developers to invoke.
Some Articles lists in the website will be sorted by recommendation
References
[1] geeksforgeeks

You might also like