Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 30

A REPORT OF SIX WEEKS INDUSTRIAL TRAINING

At

THINK-NEXT PRIVATE LIMITED

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD

OF THE DEGREE OF

BACHELOR OF ENGINEERING

(Computer Science & Engineering)

JUNE-JULY, 2018

SUBMITTED BY:

ABHISHEK JOSHI

UNIVERSITY UID:-

16BCS3171

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

CHANDIGARH UNIVERSITY GHARUAN, MOHALI

Page 1 of 30
Page 2 of 30
CHANDIGARH UNIVERSITY 16BCS3171

CONTENTS

Certificate by Company/Industry/Institute…………………………………………………...3
Candidate’s Declaration……………………………………………………………………...4
Abstract……………………………………………………………………………………….5
Acknowledgement…………………………………………………………………………….6
About the Company/ Industry / Institute…………………………………………………….7
List of Figures………………………………………………………………………………..8

CHAPTER 1 INTRODUCTION

1.1 SW/HW REQUIRED……………………………………………….9


1.2 BACKGROUND DETAILS………………………………………..10
1.3 THEORETICAL EXPLANATION…………………………………11

CHAPTER 2 TRAINING WORK UNDERTAKEN

2.1 Designing of web app………………………………………………..12


2.2 Coding………………………………………………………………15
2.3 Database…………………………………………………………….16
2.4 Connectivity………………………………………………………...19

CHAPTER 3 RESULTS AND DISCUSSION

3.1 Result……………………………………………………………….20
3.2 Discussion…………………………………………………………..21
3.3 Snapshots of results…………………………………………………22

CHAPTER 4 CONCLUSION AND FUTURE SCOPE


4.1 Conclusion…………………………………………………………29
4.2 Future Scope……………………………………………………….29

REFERENCES……………………………………………………………………………. 30

Page 3 of 30
CHANDIGARH UNIVERSITY 16BCS3171

CANDIDATE’S DECLARATION

I “ABHISHEK JOSHI” hereby declare that I have undertaken six weeks industrial training at

“THINK-NEXT PRIVATE LIMITED” during a period from 14 MAY 2018 to 29 JUNE 2018 in

partial fulfillment of requirements for the award of degree of B.E (COMPUTER SCIENCE &

ENGINEERING) at CHANDIGARH UNIVERSITY GHARUAN, MOHALI. The work which

is being presented in the training report submitted to Department of Computer Science &

Engineering at CHANDIGARH UNIVERSITY GHARUAN, MOHALI is an authentic record of

training work.

Signature of the Student

Page 4 of 30
CHANDIGARH UNIVERSITY 16BCS3171

ABSTRACT
My project is based on the fact of scrapping out the data out from the various websites.

As we usually know that every minute millions of data is produced at very fast pace. The data is
then stored on databases on various websites. So, thereby my project will help to scratch out the
important data from the websites that may be used in data analysis or for research work.

Since this system required to be accessed by the admin and users only so I have created login
system and signup system for the same reason.

The project is developed in python language that makes it robust. It include both C and python
language to encode.

The project provide the complete description of using the various tools used for web scrapping.
As web scrapping is not at all easy so therefore a separate section called Documentation is
provided in the web App.

The app is generally based on data analytics and cover all sections regarding the data analytics
including various tools used in the data analytics.

Web Scrapping is a technique employed to extract large amounts of data from websites whereby
the data is extracted and saved to a local file in your computer or to a database in table
(spreadsheet) format.

A web scraping software will automatically load and extract data from multiple pages of
websites based on your requirement. It is either custom built for a specific website or is one
which can be configured to work with any website. With the click of a button you can easily save
the data available in the website to a file in your computer.

The problem with most generic web scraping software is that they are very difficult to setup and
use. There is a steep learning curve involved. So my project will solve this problem. With a very
intuitive, point and click interface, using mu app you can start extracting data within minutes
from any website.

Page 5 of 30
CHANDIGARH UNIVERSITY 16BCS3171

ACKNOWLEDGEMENT

I would like to express my special thanks of gratitude to my teacher (Mr. Sunil Kumar) as well

as my friend(Daksh Agarwal) who gave me the golden opportunity to do this wonderful project

on the topic (Web Scrapping), which also helped me in doing a lot of Research and I came to

know about so many new things I am really thankful to them. Secondly I would also like to

thank my parents and friends who helped me a lot in finalizing this project within the limited

time frame.

Page 6 of 30
CHANDIGARH UNIVERSITY 16BCS3171

ABOUT THE INSTITUTE

Think Next Technologies Pvt. Ltd. is an ISO 9001:2008 Certified Software, Electronics and
CAD/CAM Trainer that is also approved by the Ministry of Corporate Affairs. We deal in
offering superior training for Web Designing and Development, Mobile Apps Development,
Digital Marketing, College/School ERP Software, University Conferences and Journals
Management.

Accredited Training Partner of National Institute of Electronics and Information Technology,


Department of Electronics and Information Technology, Ministry of Communications
Information Technology. 

Approved from Ministry of Corporate Affairs, Govt. of India. Corporate Identity No.
U72200PB2011PTCO35677 Affiliated with Indian Testing Board & ISTQB (International
Software Testing Qualifications Board). Member of CII (Confederation of Indian Industry)
Membership No. N5238P.

ThinkNEXT offers various 6 Months/3 Months/ 6 Weeks Industrial Training programs for
B.Tech, MCA, BCA, Diploma, M.Sc (IT), B.Sc (IT) and other related students. ThinkNEXT
offers Industrial Training in the field of CSE/IT/Electronics (ECE)/Mechanical/Civil/Electrical
Engineering students to make students Industry-Ready.

Page 7 of 30
CHANDIGARH UNIVERSITY 16BCS3171

LIST OF FIGURES

 FIGURE 1:- Page 21

Explanation of Application at glance.

 FIGURE 2:-Page 24

Snapshots of the results

1. Signup page

2. Login page

3. Main page

4. Application Page

5. Documentation section

Page 8 of 30
CHANDIGARH UNIVERSITY 16BCS3171

1. INTRODUCTION

1.1 SOFTWARE/HARDWARE DETAIL

1.1.1 HARDWARE DETAILS

1. 4 GB RAM

2. Laptop / Desktop with supported internet

1.1.2 SOFTWARE REQUIRED

1. Python 3.5.1

2. Sqlite

3.Django 1.10

4.Apache Server

Page 9 of 30
CHANDIGARH UNIVERSITY 16BCS3171

1.2 BACKGROUND OF THE PROJECT

(Web Scrapping)

My project is based on the fact of scrapping out the data from the various websites.

The project is developed in python language that makes it robust. It include both C and python
language to encode.

The app is generally based on data analytics and cover all sections regarding the data analytics
including various tools used in the data analytics.

Web Scrapping is a technique employed to extract large amounts of data from websites whereby
the data is extracted and saved to a local file in your computer or to a database in table
(spreadsheet) format.

A web scraping software will automatically load and extract data from multiple pages of
websites based on your requirement. It is either custom built for a specific website or is one
which can be configured to work with any website. With the click of a button you can easily save
the data available in the website to a file in your computer

Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming


language. It was created by Guido van Rossum during 1985- 1990. Like Perl, Python source
code is also available under the GNU General Public License (GPL). This tutorial gives enough
understanding on Python programming language

Page 10 of 30
CHANDIGARH UNIVERSITY 16BCS3171

1.3 THEORATICAL EXPLANATION

Since this system required to be accessed by the admin and users only so I have created login
system and signup system for the same reason.

The project is developed in python language that makes it robust. It include both C and python
language to encode.

The project provide the complete description of using the various tools used for web scrapping.
As web scrapping is not at all easy so therefore a separate section called Documentation is
provided in the web App.

The app is generally based on data analytics and cover all sections regarding the data analytics
including various tools used in the data analytics.

Web Scrapping is a technique employed to extract large amounts of data from websites whereby
the data is extracted and saved to a local file in your computer or to a database in table
(spreadsheet) format.

A web scraping software will automatically load and extract data from multiple pages of
websites based on your requirement. It is either custom built for a specific website or is one
which can be configured to work with any website. With the click of a button you can easily save
the data available in the website to a file in your computer.

The problem with most generic web scraping software is that they are very difficult to setup and
use. There is a steep learning curve involved. So my project will solve this problem. With a very
intuitive, point and click interface, using mu app you can start extracting data within minutes
from any website.

Page 11 of 30
CHANDIGARH UNIVERSITY 16BCS3171

2 TRAINING WORK UNDERTAKEN

2.1 DESIGNING OF WEBSITE

2.1.1 HTML:-
Every webpage we look at is written in a language called HTML. You can think of HTML as

the skeleton that gives every webpage structure. In this course, we'll use HTML to add

paragraphs, headings, images and links to a webpage.

In the editor to the right, there's a tab called test.html. This is the file we'll type our HTML into.

The code with the <>s. That's HTML! Like any language, it has its own special syntax (rules

for communicating).

2.1.2 CSS:-
Cascading Style Sheets, fondly referred to as CSS, is a simple design language intended to

simplifythe process of making web pages presentable.

CSS handles the look and feel part of a web page. Using CSS, you can control the colour of the

text, the style of fonts, the spacing between paragraphs, how columns are sized and laid out,

what background images or colours are used, layout designs, and variations in display for

different devices and screen sizes as well as a variety of other effects.

CSS is easy to learn and understand but it provides powerful control over the presentation of an

HTML document. Most commonly, CSS is combined with the markup languages HTML or

XHTML.

Page 12 of 30
CHANDIGARH UNIVERSITY 16BCS3171

2.1.3 JAVASCRIPT:-
JavaScript is a dynamic computer programming language. It is lightweight and most commonly

used as a part of web pages, whose implementations allow client-side script to interact with the

user and make dynamic pages. It is an interpreted programming language with object-oriented

capabilities.

JavaScript was first known as Live Script, but Netscape changed its name to JavaScript,

possibly because of the excitement being generated by Java. JavaScript made its first

appearance in Netscape 2.0 in 1995 with the name Live Script. The general-purpose core of the

language has been embedded in Netscape, Internet Explorer, and other web browsers.

2.1.3 PYTHON:-

Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming

language. It was created by Guido van Rossum during 1985- 1990. Like Perl, Python source

code is also available under the GNU General Public License (GPL). This tutorial gives enough

understanding on Python programming language

Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,

SmallTalk, and Unix shell and other scripting languages.

Python is copyrighted. Like Perl, Python source code is now available under the GNU General

Public License (GPL).

Python is now maintained by a core development team at the institute, although Guido van

Rossum still holds a vital role in directing its progress.

Page 13 of 30
CHANDIGARH UNIVERSITY 16BCS3171

2.1.4 DJANGO:-

Django is a high-level Python Web framework that encourages rapid development and clean,

pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web

development, so you can focus on writing your app without needing to reinvent the wheel. It’s

free and open source

When a request comes to a web server, it's passed to Django which tries to figure out what is

actually requested. It takes a web page address first and tries to figure out what to do. This

part is done by Django's urlresolver (note that a website address is called a URL – Uniform

Resource Locator – so the name urlresolver makes sense). It is not very smart – it takes a list

of patterns and tries to match the URL. Django checks patterns from top to bottom and if

something is matched, then Django passes the request to the associated function (which is

called view).

Page 14 of 30
CHANDIGARH UNIVERSITY 16BCS3171

2.2 CODING IN WEBSITE

(USING PYTHON)
Python is a general-purpose interpreted, interactive, object-oriented, and high-level programming

language. It was created by Guido van Rossum during 1985- 1990. Like Perl, Python source

code is also available under the GNU General Public License (GPL). This tutorial gives enough

understanding on Python programming language

Python is derived from many other languages, including ABC, Modula-3, C, C++, Algol-68,

SmallTalk, and Unix shell and other scripting languages.

The concepts and rules used in python programming provide these important benefits:

 Interactive

 Interpreted

 Modular

 Dynamic

 Object-oriented

 Portable

 High level

 Extensible in C++ & C

Page 15 of 30
CHANDIGARH UNIVERSITY 16BCS3171

2.3 DATABASE (STRUCTURED QUERY LANGUAGE)

Structure Query Language (SQL) is a database query language used for storing and

managing data in Relational DBMS. SQL was the first commercial language introduced for

E.F Codd's Relational model of database. Today almost all RDBMS (MySQL, Oracle,

Informix, Sybase, MS Access) use SQL as the standard database query language. SQL is

used to perform all types of data operations in RDBMS.

SQL Command

SQL defines following ways to manipulate data stored in an RDBMS.

DDL: Data Definition Language

This includes changes to the structure of the table like creation of table, altering table,

deleting a table etc.

All DDL commands are auto-committed. That means it saves all the changes

permanently in the database.

Page 16 of 30
CHANDIGARH UNIVERSITY 16BCS3171

DML: Data Manipulation Language

DML commands are used for manipulating the data stored in the table and not the table

itself. DML commands are not auto-committed. It means changes are not permanent to

database, they can be rolled back.

Command Description

Insert to insert a new row

Update to update existing row

Delete to delete a row

Merge merging two rows or two tables

DQL: Data Query Language

Page 17 of 30
CHANDIGARH UNIVERSITY 16BCS3171

Data query language is used to fetch data from tables based on conditions that we can easily
apply.

Command Description

Select retrieve records from one or more table

CONNECTIVITY

1. Django determines the root URLconf module to use. Ordinarily, this is the value of

the ROOT_URLCONF setting, but if the incoming HttpRequest object has

a urlconf attribute (set by middleware), its value will be used in place of

theROOT_URLCONF setting.

2. Django loads that Python module and looks for the variable urlpatterns. This should be

a Python list of django.urls.path () and/or django.urls.re_path () instances.

3. Django runs through each URL pattern, in order, and stops at the first one that matches

the requested URL.

4. Once one of the URL patterns matches, Django imports and calls the given view, which

is a simple Python function (or a class-based view). The view gets passed the following

arguments:

o An instance of HttpRequest.

o If the matched URL pattern returned no named groups, then the matches from the

regular expression are provided as positional arguments.

Page 18 of 30
CHANDIGARH UNIVERSITY 16BCS3171

o The keyword arguments are made up of any named parts matched by the path

expression, overridden by any arguments specified in the

optional kwargs argument to django.urls.path() or django.urls.re_path().

5. If no URL pattern matches, or if an exception is raised during any point in this process,

Django invokes an appropriate error-handling view. See Error handling below.

 To capture a value from the URL, use angle brackets.

 Captured values can optionally include a converter type. For example, use <int:name> to

capture an integer parameter. If a converter isn’t included, any string, excluding

a / character, is matched.

Page 19 of 30
CHANDIGARH UNIVERSITY 16BCS3171

3. RESULTS AND DISCUSSIONS

3.1 RESULTS

3.1.1 BENEFIT OF PROJECT:-

This project results in enhancement of the DATA ANALYSIS in the can be viewed as a vast and

way better application for users to save time in order to fetch the data from any website in

minutes.

3.1.2 RESULT ON INDIVIDUAL DEVELOPMENT:-

On an individual basis this project helped me a lot in understanding concepts of Python. By

this I was able to explore the use of Django, Apache Server and in enhancement the concepts

of hybrid programing, JavaScript, html, css, python and enhancing the concepts of HTML and

CSS and Django.

At the due of all these things, I am able to create web applications using Django and Python.

Page 20 of 30
CHANDIGARH UNIVERSITY 16BCS3171

3.2 DISCUSSIONS OF PROJECT


3.2.1 USER SECTION:-

Since this system required to be accessed by the admin and the users so I have created

login system and signup system for the same reason.

Some direct links are provided so that one can visit website that can provide complete

explanation for the tools used in the scrapping process and getting more information about the

Web scrapping.

3.2.2 DOCUMENTATION SECTION:-

For describing the complete functioning of the web app documentation is made that provide the

complete overview of the functionality of the application.

3.2.3 APPLICATION SECTION:-

The application section has interface that demand for the url to fetch out the data from that

website It has three sections. First section ask for url to copy out the complete code of html

document .Second section ask for url to obtain out the various link that are associated with the

html document. Third section asks for the url to extract all the text information that is written on

the website.

3.2.4 APPLICATION SECTION:-

Lastly the layout pages has been designed through some beautiful quotes and some mesmerizing

gallery and with information regarding web scrapping that is coded with html, css, bootstrap.

3.3 SNAPSHOTS OF APPLICATION


Page 21 of 30
CHANDIGARH UNIVERSITY 16BCS3171

Page 22 of 30
CHANDIGARH UNIVERSITY 16BCS3171

Page 23 of 30
CHANDIGARH UNIVERSITY 16BCS3171

Page 24 of 30
CHANDIGARH UNIVERSITY 16BCS3171

Page 25 of 30
CHANDIGARH UNIVERSITY 16BCS3171

Page 26 of 30
CHANDIGARH UNIVERSITY 16BCS3171

4. CONCLUSION AND FUTURE SCOPE


Page 27 of 30
CHANDIGARH UNIVERSITY 16BCS3171

4.1 CONCLUSION

This project results in enhancement of the DATA ANALYSIS in the can be viewed as a vast and

way better application for users to save time in order to fetch the data from any website in

minutes.

On an individual basis this project helped me a lot in understanding concepts of Python. By

this I was able to explore the use of Django, Apache Server and in enhancement the concepts

of hybrid programing, JavaScript, html, css, python and enhancing the concepts of HTML and

CSS and Django.

At the due of all these things, I am able to create web applications using Django and Python.

4.2 FUTURE SCOPE

Scrapping methods can be changed in future but it will be always in demand. Because

 People/business always love to gather data instantly, no one like manual efforts

 There is always need to compare your business with competitor’s business and web

scraping helps a lot in it to gather data from competitor’s website

 Web scraping is the most important part of marketing. Because thereis always need of

database of targeted audience.

For web scraping services, methods are changing. Firstly, PHP scripts was used for web scraping

and still PHP scripts are famous but Python is getting more famous for web scraping now.

Page 28 of 30
CHANDIGARH UNIVERSITY 16BCS3171

REFRENCES
1. www.w3schools.com

2. www.tutorialpoint.com

3. A reference to HTML/CSS/Bootstrap

4. .Django Tutorials

5. www.datacamp.com

Page 29 of 30
CHANDIGARH UNIVERSITY 16BCS3171

Page 30 of 30

You might also like