Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

SOFTWARE REQUIREMENTS SPECIFICATION

DATA LEAKAGE DETECTION

Aparna Ramesh
Aparna Raphael
Antarlina Paul
Ishika Shruti
Mahak Faheem
Table of Contents

1. Introduction
1.1 Purpose
1.2 Existing system
1.3 Proposed System
1.4 Key terms and definitions and abbreviations
1.5 The type of employees that may leak data
1.6 References
1.7 Overview
2. The Overall Description
2.1 Objective
2.2 Plan of execution
2.3 User Characteristics
2.4 Constraints
2.5 Assumptions
3. External Interface Requirements
3.1 User Interfaces
3.2 Hardware Interfaces
3.3 Software Interfaces
4. System Features
4.1 Algorithms
4.1.1 Evaluation of Data Request
4.1.2 Data allocation algorithm
4.1.2.1 Watermarking
4.1.2.2 Fake Object Detection
4.2 Diagrams
4.2.1 Data flow diagrams
A) Context level DFD
B) Level 0 DFD
4.2.2 Sequence Diagram
4.2.3 UML Use-Case Diagrams
4.2.4 UML Class Diagram
4.2.5 UML Activity Diagram
5. Other Non-Functional Requirements
5.1 Performance Requirements
5.2 Reliability
5.3 Accessibility
5.4 Scalability
5.5 Portability
5.6 Maintainability
1. INTRODUCTION
Sometimes it is necessary to give sensitive data to third parties we deem to be trustworthy during
the course of doing business. Patients' records may be shared by a hospital with researchers who
will develop new treatments. Similarly, a company may have partnerships with other companies
that Require sharing customer data. The data has to be provided to various other companies if
another company outsources its data processing. The owner of the data is called the distributor,
and the supposedly trustworthy third parties are called agents.
1.1 Purpose
The purpose of this srs is to identify the leakages and improve the probability of detecting them.
Our goal is to detect when the distributor’s sensitive data have been leaked by agents, and if
possible to identify the agent that leaked the data.
1.2 Existing System
Watermarking(a unique code is embedded in each distributed copy to agents) has traditionally
been used to detect leaks. The leaker can be identified if the copy is later discovered in the hands
of an unauthorized party. In some cases, watermarks are extremely useful, but again, they modify
the original data. If the recipient of the data is malicious, watermarks can sometimes be
destroyed. The Existing System can detect the hackers but the total no of cookies (evidence) will
be less and the organization may not be able to proceed legally for further proceedings due to the
lack of a good amount of cookies and the chances to escape of hackers are high.
1.3 Proposed System
In the proposed system we study unobtrusive techniques for detecting leakage of a set of objects
or records. Specifically, we study the following scenario: After giving a set of objects to agents,
the distributor discovers some of those same objects in an unauthorized place. (For example, the
data may be found on a website, or maybe obtained through a legal discovery process.) At this
point, the distributor can assess the likelihood that the leaked data came from one or more agents,
as opposed to having been independently gathered by other means.
● In the proposed approach, we develop a model for assessing the “guilt” of agents. We also
present algorithms for distributing objects to agents, in a way that improves our chances of
identifying a leaker.
● Finally, we also consider the option of adding “fake” objects to the distributed set. Such objects
do not correspond to real entities but appear realistic to the agents. In a sense, the fake objects act
as a type of watermark for the entire set, without modifying any individual members. If it turns
out that an agent was given one or more fake objects that were leaked, then the distributor can be
more confident that the agent was guilty.
● In the Proposed System the hackers can be traced with a good amount of evidence.
1.4 Key Terms and Definitions and Abbreviations

Data A Data breach is the unintentional release of secure information to an untrusted environment.
leakage

Data Data privacy or information privacy is a branch of data security concerned with the proper
Privacy handling of data – consent, notice, and regulatory obligations

Database Manage confidential information and keep access records

Firewall Store all access records with software that controls access to the PC using an external
network

Internet An interconnected system of networks that connects computers around the world via the
TCP/IP protocol.

Server
Maintains the database which contains user information, third-party server information, and
information about semantic

User All users store their information by registering themselves on the server-side.

OS Operating system

DFD Data flow diagrams: The flow of data of a system or a process is represented by DFD

SRS Software Requirements Specification.

GUI Graphic User Interface

GB Gigabyte

HDD Hard disk drive

1.5 The Type of employees that may leak data:

● The Security illiterate: The majority of employees with little or no knowledge of


security can risk their corporate because of accidental breaches.
● The Gadget nerds: Introduce a variety of devices to their work PCs and download all
types of software
● The Unlawful residents: Using the company's IT resources for purposes they shouldn't,
such as storing music, movies, and playing video games.
● The Malicious/Disgruntled employees: Typically minority of employees who can gain
access to areas of the IT system to which they shouldn’t and send corporate data (e.g.,
customer lists, R&D, etc.) to third parties.

1.6 References
i. www.google.co.in
ii. www.winkipedia.com
iii. IEEE. Software Requirements Specification Std. 830-1993.
iv. https://www.computer.org/csdl/journal/tk/2011/01/ttk2011010051/13rRUwghd9s

1.7 Overview
Section 1.0 discusses the purpose and scope of the software.
Section 2.0 describes the overall functionalities and constraints of
the software and user characteristics.
Section 3.0 details all the requirements needed to design the software.

2. THE OVERALL DESCRIPTION

2.1 Objective
A. Some data is leaked and found in an unauthorized place (e.g., on the web or on someone's
laptop) because a distributor gave it to a set of trusted agents (third parties).
B. Distributors must assess whether the leaked data came from an agent or an agent group,
rather than being gathered independently.
C. We propose strategies for data allocation (across the agents) that improve the likelihood
of identifying leaks.
D. The method requires alterations to the data (e.g., watermarking). To further improve our
chances of detecting leaks and identifying the guilty party, we can also inject fake but
realistic data records.
E. It is our goal to detect when the distributor's sensitive data has been leaked by agents, and
if possible, to identify which agent was responsible.

2.1 Plan of execution


A. Identification – Finding and identifying different project ideas, then finalizing one of
them for further implementation.
B. Conceptualization and design – During this stage, the concepts required to build the
project are studied closely, and the design at this level is also completed.
C. Preliminary presentation – Provided for further clarification of the project.
D. Detailed design – During this stage, low-level designing is done. The User Interface is
designed so that the project idea can be more clearly visualized.
E. Coding – The Actual implementation of the project begins at this point. Each module will
be coded and tested. This should take between 8 and 10 weeks.
F. Unit Testing – Initially, the backend database will be tested over hundreds of transactions.
The GUI for agent users as well as server users (or distributors) will be tested separately.
G. Integration Testing – All modules will be integrated and then the whole integrated test
will be conducted. This will also include an evaluation of the project.
H. System Testing – In testing the product, the entire system was taken into account. For
system testing, we will use different Linux systems and monitor their performance.
I. Documentation – At this stage, a detailed project document should be prepared.

2.3 User Characteristics


User: All users register themselves and their data is stored on the server.
Server: Maintains the database with user information added in during registration.

2.4 Constraints
1. Security: The files containing information security details must be protected.
2. Fault tolerance: Data shouldn’t be corrupted if the system crashes or power fails.
3. SQL commands for the above queries/applications.
4. At least a centralized database management system should be used.

2.5 Assumptions
Assumption 1:
A unique ID will be generated by our system itself for every agent.
Assumption 2:
Data must be available for each request.

3. EXTERNAL INTERFACE REQUIREMENTS

3.1 User Interfaces:


1. Java 1.6
2. Eclipse Helius
3. Tomcat 7.2 (Server)
4. Mysql

3.2 Hardware Interfaces


1. Pentium 4 Processor

2. Minimum 1 GB RAM
3. 40 GB HDD

3.3 Software Interfaces

Software used Description

Operating system We are using the Windows XP operating system because it


provides the best support.

Database We are using MySQL database to save data like distributor’s


data, guilty agent data, and agent records.

Language We are using the Java language because it provides more


interactive support.

4. SYSTEM FEATURES
1.In this project, if an authorized agent requests data; according to request, the agent will be
provided with the data.
2.If a data leakage is to occur, then the leaked dataset is used to find which agent leaked it.
3.The guilty agent module will work on the server-side by extracting the fake object from leaked
data. And if he found the agent guilty then the admin will take action on the guilty agent.

Entities And Agents


Distributor has a set of data objects T= {t1,t2,…,tn}.
The distributor shares some of the data objects with a set of agents U1, U2 . . . Un. The
distributor does not wish the object to be leaked to any third parties. The objects in T could be of
any type and size. They could be tuples in relation or relations in a database.
An agent Ui receives a subset Ri of T.
4.1 Algorithms:

4.1.1 Evaluation of Data Request

The agent will send the request with appropriate conditions. Agent gives the input as a request
with input as well as the condition for the request. After processing the data he will get the new
data by either adding fake objects or watermarking depending on the data and request.

Request Ri=Agent(T, ID): An agent Ui with a unique ID receives all T objects that satisfy the
condition.

4.1.2 Data allocation algorithm


4.1.2.1 Watermarking
A unique code is embedded into each copy of the data. The code, or watermark, is generated
using the unique id of the agent and will be indistinguishable for any outside party. This will help
trace back the data to the agent if leaked. The main drawback of this method is the alteration to
the data. The watermarks may also be destroyed by the recipient.

4.1.2.2 Fake Object Detection


In this technique, the fake objects are generated by the system that is not in the set T. The
objects are designed to look like real objects and are distributed to agents together with the T
objects, in order to increase the chances of detecting agents that leak data.
The agents can detect fake objects by closely identifying the common data as real objects and the
rest as false objects.
4.2 Diagrams

4.2.1 Data Flow Diagram

A.Context level DFD


B. Level 0 DFD
4.2.2 Sequence Diagram
4.2.3 UML Use-Case Diagrams
4.2.4 UML Class Diagram
4.2.5 UML Activity Diagram
5. OTHER NON-FUNCTIONAL REQUIREMENT

5.1 Performance Requirements:


Data leakage detection software shall be compatible with allowing administrators to set business
rules that classify confidential and sensitive information so that it cannot be disclosed
maliciously or accidentally by unauthorized end users. The software shall allow the user to
discover and control all sensitive data easily and identify the riskiest users within seconds.
Whether one needs to apply control to source code, engineering drawings, financial data, or
sensitive trade secrets, our solution shall try to have granular control over the data that matters
without affecting productivity and progress.

5.2 Reliability:
The model shall be such that it safeguards proprietary and sensitive information from
cybercriminals looking to steal or leak one’s company's data online, from email addresses to
product plans to financial information.

5.3 Accessibility:
The model will indicate the product to which it belongs such as mobile devices, other devices,
services, or environments that can be accessible by many people.
The user interface is very simple and can be used easily.

5.4 Scalability:
When the resources are added, the system must be able to increase the
load and throughput to handle the system capability. Under low bandwidth and
with a large number of users it should be able to work normally.

5.5 Portability:
When we have to change something in the project instead of adding
new code, our software shall be able to use the existing code. It is a key feature of high-level
programming. It should be easy to change from one environment to another.
Configuration changes must be easily accepted.
• Discover and protect confidential data.
• Provide comprehensive solutions that effectively lower the risk discovered.
• Monitor all the data usage and prevent the data.
• Maintain a high accuracy.
• Automate cloud policy enforcement.
• Lower the risk.
• Visibility and control over encrypted data.
• Safeguard the data.
• Deliver time should be approximate
5.6 Maintainability:
Maintainability is the process to sustain the data in the product. The project shall be able to
modify and meet new requirements. If the user wants to add new features it shall be able to
modify. The language used is easy and simple so it shall be easy to make changes to the product.

You might also like