Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 10

BIG DATA SEARCH USING MACHINE

LEARNING

Presented By: Guided by:


V.Aarsan (210919104001) Dr.G.Bhuvaneswari,

H.Ashish Joyson (210919104007 ) Head Of The Departmrnt(CSE)


CONTENT
 DOMAIN DESCRIPTION
 AIM AND OBJECTIVES
 LIMITATIONS
 PROBLEM STATEMENT
 PROPOSED SYSTEM
 SCOPE OF PROPOSED SYSTEM
DOMAIN DESCRIPTION

Machine learning is an application of AI that enables systems to learn and improve


from experience without being explicitly programmed.
Machine learning focuses on developing computer programs that can access data and
use it to learn for themselves.
The robot-depicted world of our not-so-distant future relies heavily on our ability to
deploy artificial intelligence (AI) successfully.
AIM AND OBJECTIVE

 The Big Data revolution promises to transform how we live, work, and think by
enabling process optimization, empowering insight discovery and improving decision
making.
 Today, the amount of data is exploding at an unprecedented rate as a result of
developments in Web technologies, social media, and mobile and sensing devices.
The ML will be one of the main drivers of the Big Data revolution.

For example, Twitter processes over 70M tweets per day, thereby generating over
8TB daily.
EXISTING SYSTEM
Information retrieval is to retrieve the information resources that we are interested in
or extract whatever information we need.
• Information Retrieval (IR) may deal with the organization, storage,
retrieval and evaluation of information from documents, particularly
textual information.
• But we cannot give the ranks to those documents.
Various sources report that 65-100% of Big Data Analytics projects fail. Gartner, a
research and advisory company, claims that 60% of big data projects would fail to move
past preliminary stages in 2017 (Gartner Inc., 2015).
LIMITATIONS

• Storage: datasets can require considerable resources to store


• Formatting and data cleaning: advanced computer science can be required before
the data is analyzable
• Quality control: can be difficult and often has to be done through small
representative samples
• Security and privacy concerns: often more complex than for traditional datasets
• Accuracy and consistency of methods: many approaches are relatively new and
imperfect, although these may continue to improve
PROBLEM STATEMENT

A Unified from of database software that can search text (numeric, alphanumeric,
alphabetic) or word within any from of data/file such as jpeg, png, pdf, word, excel,
SQL and other formats/proforma which also includes interconnectivity with
predefined key word within our database.
PROPOSED SYSTEM
The main focus of our system is to build a search engine using machine
learning technique for increasing accuracy compare to available search engine.
• The proposed search engine is very useful for finding out more relevant URLs for given
keywords.
• Anyone can easily identify the important documents in a collection of documents and
retrieve the related data.
• It proposes a novel model
We design and implement an in-memory index and extensively evaluate it in
comparison with several representative indexes, including B+ tree, skip list, Adaptive Radix
Tree. Experiment results outperforms the indexes
SCOPE OF PROPOSED SYSTEM

 The proposed system aims to develop a search engine using machine learning
techniques that can retrieve the most relevant textual data from a collection of
documents based on user queries.
 The proposed search engine can be useful for a variety of applications, including
academic research, business intelligence, and general information retrieval.
 The system's novel model can be a significant contribution to the field of
information retrieval.
Thank You

You might also like