python2cbp

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

A Course Based Project Report on

OCCURRENCE OF WORDS
Submitted to the

Department of Information Technology

in partial fulfillment of the requirements for the completion of course


PYTHON PROGRAMMING LABORATORY (22ES2DS101)

BACHELOR OF TECHNOLOGY

IN

INFORMATION TECHNOLOGY

Submitted by

B.MANIRAKSHITH 23071A12D6
B.CHAITHRIKA 23071A12D7
B.MAHATHI 23071A12D8
B.CHOHAN 23071A12D9

Under the guidance of


Mrs. S Swathi
(Course Instructor)
Assistant Professor, Department of IT, VNRVJIET

DEPARTMENT OF INFORMATION TECHNOLOGY

VALLURUPALLI NAGESWARA RAO VIGNANA


JYOTHI INSTITUTE OF ENGINEERING &
TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade, NBA
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet (S.O), Hyderabad – 500 090, TS,
India
SEPTEMBER 2023
VALLURUPALLI NAGESWARA RAO VIGNANA JYOTHI
INSTITUTE OF ENGINEERING AND TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade, NBA Accredited for CE, EEE, ME, ECE,
CSE, EIE, IT B. Tech Courses, Approved by AICTE, New Delhi, Affiliated to JNTUH, Recognized as
“College with Potential for Excellence” by UGC, ISO 9001:2015 Certified, QS I GUAGE Diamond Rated
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet(SO), Hyderabad-500090, TS, India

DEPARTMENT OF INFORMATION TECHNOLOGY

CERTIFICATE

This is to certify that the project report entitled “Occurrence Of Words” is a


bonafide work done under our supervision and is being submitted by
Mr.Manirakshith (23071A12D6), Miss. Chaithrika(23071A12D7), Miss. Mahathi
(23071A12D8), Mr. Chohan (23071A12D9) in partial fulfilment for the award of
the degree of Bachelor of Technology in Information Technology, of the
VNRVJIET, Hyderabad during the academic year 2023-2024.

S SWATHI Dr D Srinvasa Rao

Assistant Professor, IT Associate Professor & HOD, IT


Course based Projects Reviewer

VALLURUPALLI NAGESWARA RAO VIGNANA JYOTHI


INSTITUTE OF ENGINEERING AND TECHNOLOGY
An Autonomous Institute, NAAC Accredited with ‘A++’ Grade,
Vignana Jyothi Nagar, Pragathi Nagar, Nizampet(SO), Hyderabad-500090, TS, India

DEPARTMENT OF INFORMATION TECHNOLOGY

DECLARATION

We declare that the course based project work entitled “OCCURRENCE OF


WORDS” submitted in the Department of Information Technology, Vallurupalli
Nageswara Rao Vignana Jyothi Institute of Engineering and Technology, Hyderabad,
in partial fulfilment of the requirement for the award of the degree of Bachelor of
Technology in Information Technology is a bonafide record of our own work
carried out under the supervision of S SWATHI, Assistant Professor, Department
of IT, VNRVJIET. Also, we declare that the matter embodied in this thesis has not
been submitted by us in full or in any part thereof for the award of any
degree/diploma of any other institution or university previously.
Place: Hyderabad.

B.Manirakhsith B.Chaithrika B.Mahathi B.Chohan

(23071A12D6) (23071A12D7) (23071A12D8) (23071A12D9)


ACKNOWLEDGEMENT

We express our deep sense of gratitude to our beloved President, Sri. D. Suresh Babu,
VNR Vignana Jyothi Institute of Engineering & Technology for the valuable
guidance and for permitting us to carry out this project.

With immense pleasure, we record our deep sense of gratitude to our beloved
Principal, Dr. C.D Naidu, for permitting us to carry out this project.

We express our deep sense of gratitude to our beloved Professor Dr. SRINIVASA
RAO DAMMAVALAM, Associate Professor and Head, Department of Information
Technology, VNR Vignana Jyothi Institute of Engineering & Technology,
Hyderabad-500090 for the valuable guidance and suggestions, keen interest and
through encouragement extended throughout the period of project work.

We take immense pleasure to express our deep sense of gratitude to our beloved
Guide, S Swathi, Assistant Professor in Information Technology, VNR Vignana
Jyothi Institute of Engineering & Technology, Hyderabad, for his/her valuable
suggestions and rare insights, for constant source of encouragement and inspiration
throughout my project work.

We express our thanks to all those who contributed for the successful completion of
our project work.

Mr. B. Manirakshith (23071A12D6)


Miss. B.Chaithrika (23071A12D7)
Miss. B. Mahathi (23071A12D8)
Mr. B. Chohan (23071A12D9)
ABSTRACT

This project aims to analyze the occurrence of words within a given text corpus using

Python. The primary objective is to develop a comprehensive tool that can process

text data, count the frequency of each word, and visualize the results in an insightful

manner. By leveraging Python's rich ecosystem of libraries, such as collections for

counting, matplotlib and seaborn for visualization, and nltk for text processing, this

project provides a robust solution for textual analysis.

Data Preprocessing: The text data is cleaned and prepared for analysis. This involves

converting text to lowercase, removing punctuation, and handling stopwords. Word

Counting: The cleaned text is then processed to count the occurrences of each word

using Python's Counter from the collections module.Data Visualization: The word

frequency data is visualized using bar charts and word clouds to provide a clear and

intuitive understanding of the most common words in the text corpus. Advanced

Analysis: Further analysis includes n-gram generation, sentiment analysis, and topic

modeling to gain deeper insights into the text data.Scalability: The project is

designed to handle large datasets efficiently. By utilizing optimized data structures

and algorithms, it ensures scalability for extensive text corpora without

compromising performance. Customization: Users can customize the analysis by

selecting specific subsets of text, defining custom stopwords, and setting parameters

for visualization, making the tool adaptable to various text analysis needs. Language

Support The tool supports multiple languages, allowing for word occurrence

analysis in diverse linguistic contexts.

1
This is achieved through the integration of language-specific libraries and

resources.*User Interface:* A simple and intuitive user interface is provided for non-

technical users, enabling easy upload of text files, execution of analysis, and viewing

of results without requiring programming knowledge. *Integration Capabilities:*

The project can be integrated with other data processing and visualization tools, such

as Pandas for data manipulation and Plotly for interactive visualizations, enhancing

its utility in comprehensive data analysis workflows.

This project has broad applications, including text mining, sentiment analysis, and

natural language processing tasks, making it a valuable tool for researchers, data

scientists, and developers working with textual data. Through this project, users can

uncover patterns, trends, and insights from textual datasets, facilitating more

informed decision-making.

2
TABLE OF CONTENTS

S No Contents Page No
1. INTRODUCTION 4
2. SOURCE CODE 5
3. OUTPUT 6
4. CONCLUSION 7
5. REFERENCES 8

3
INTRODUCTION
1.1 PROBLEM DEFINITION

Python program for printing of occurrence of words in a given text.

1.2OBJECTIVE

The objective of this Python project is to develop a versatile and efficient tool for

analyzing the occurrence of words within a given text corpus.

1. Text Data Preprocessing: Implement robust methods to clean and preprocess text

data, including tasks such as case normalization, punctuation removal, and stopword

filtering.

2. *Word Frequency Analysis:* Accurately count and record the frequency of each

word in the text corpus using efficient data structures and algorithms.

3. *Data Visualization:* Create clear and insightful visualizations, such as bar charts

and word clouds, to represent word frequencies and patterns in the text data.

4. *Scalability:* Ensure the tool can handle large text datasets efficiently, maintaining

performance and accuracy as the size of the data increases.

5. *Educational Resource:* Provide clear documentation and examples to serve as an

educational resource for users interested in learning about text analysis and

Python programming.

4
2. SOURCE CODE
def word_occurrences(text):

# Normalize the text to lower case and split into words

words = text.lower().split()

# Use a set to store unique words

unique_words = set(words)

# Create a dictionary to store word counts

word_count = {word: 0 for word in unique_words}

# Count occurrences of each word

for word in words:

word_count[word] += 1

# Convert the dictionary to a list of tuples

word_count_tuples = [(word, count) for word, count in word_count.items()]

return word_count_tuples

# Sample text

text = "This is a test. This test is only a test."

# Get word occurrences

occurrences = word_occurrences(text)

# Print the result

print("Word occurrences:")

for word, count in occurrences:

print(f"{word}: {count}")

5
3. TEST CASES/ OUTPUT
3.1 Test case 1:

INPUT: text= This is a test. This test is only a test.

Output:

3.2

Input : text= How much wood would a woodchuck chuck, if a woodchuck

could chuck wood.

Output:

6
CONCLUSION

The word occurrence counter project effectively demonstrates text preprocessing and

analysis using Python. By employing regular expressions and the Counter class, it

accurately counts word frequencies, providing a foundation for various NLP tasks.

This project highlights Python's utility in handling and analyzing textual

data efficiently. The word occurrence counter project effectively showcases the

capability of Python for text analysis. By utilizing regular expressions for text

preprocessing and the collections.Counter class for counting, the project demonstrates

efficient handling of textual data. This approach ensures accurate word frequency

analysis, providing valuable insights into the text's structure and content. The project

highlights Python's strength in data manipulation and its suitability for natural

language processing (NLP) tasks. With practical applications in various fields like

linguistics, content analysis, and SEO, this project serves as a foundational tool for

more advanced text processing and analysis endeavors.

7
REFERENCES

[1]. W3schools: https://www.w3schools.com/python/

[2]. *Coursera: https://www.coursera.org/courses?query=python

[3]. *edX : https://www.edx.org/learn/python

[4]. *Codecademy : https://www.codecademy.com/learn/learn-python-3

You might also like