Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 49

ABSTRACT

We propose AccountTrade, a set of accountable protocols, for big data


trading among dishonest consumers. To secure the big data trading
environment, our protocols achieve book-keeping ability and accountability
against dishonest consumers who may misbehave throughout the dataset
transactions. Specifically, we study the responsibilities of the consumers in
the dataset trading and design AccountTrade to achieve accountability
against the dishonest consumers who may try to deviate from their
responsibilities. Specifically, we propose uniqueness index, a new rigorous
measurement of the data uniqueness, as well as several accountable trading
protocols to enable data brokers to blame the dishonest consumer when
misbehavior is detected. We formally define, prove, and evaluate the
accountability of our protocols by an automatic verification tool as well as
extensive evaluation in real-world datasets.

iii
LIST OF FIGURES

S.NO LIST OF FIGURES PAGE NO

5.1 System Architecture

5.2 Dataflow Diagram

v
TABLE OF CONTENTS
CHAPTER TITLE PAGE NO.
NO.
ACKNOWLEDGEMENT ii
SYNOPSIS iii
LIST OF ABBREVIATIONS iv
LIST OF FIGURES v
CHAPTER 1 INTRODUCTION 1
1.1. Advantages 2
CHAPTER 2 EXISTING SYSTEM 3
2.1 Introduction 3
2.2 Literature Survey 3
2.2.1 Routing Information Protocol (RIP) 3
2.2.2 Enhanced Interior Gateway Routing 3
Protocol
(EIGRP)
2.2.3 Open Shortest Path First (OSPF) 3
2.2.4 Intermediate System to Intermediate 4
System (IS)
2.2.5 Border Gateway Protocol (BGP) 4
2.3 Technique And Disadvantages 6
2.4 Summary 7
CHAPTER 3 PROPOSED SYSTEM 8
3.1. Introduction 8
3.2. Advantages 9
3.3. Summary 10
CHAPTER 4 SYSTEM REQUIREMENTS 11
4.1 Introduction 11
4.2. Hardware Requirements 11
4.3. Software Requirements 11
4.4. Software Description 12
4.5 Summary 15
CHAPTER 5 SYSTEM DESIGN 16
5.1 Architecture 16
5.2 Data Flow Diagram 17
CHAPTER 6 MODULES DESCRIPTION 22
6.1. Algorithm 22
6.2. Modules 23
6.2.1 6.2.1 Data Trading with Brokers 23
6.2.2 Adversary Model & Channel 24
Assumption
6.2.3 Accountability Model 25
6.3 Summary 30
CHAPTER 7 IMPLEMENTATION AND RESULTS 31
CHAPTER 8 CONCLUSION AND FUTURE 32
ENHANCEMENT
8.1 Conclusion 32
8.2 Future Enhancements 32
REFERENCES 34
APPENDICES
Appendix A
Sample Screen Shots 36
Appendix B
Sample Source Code 42
Appendix C
Base Paper 48
CHAPTER-1
INTRODUCTION
The data trading industry has been increasing rapidly (e.g., CitizenMe,
DataExchange, Datacoup, Factual, Qlik, XOR Data Exchange, Terbine).
These data brokers provide B2C or C2C dataset trading services, and they
are the counterparts of physical commodities trading platforms, e.g., Ebay
and Amazon. Trading of digital datasets has become a trend as the trading of
digital datasets became a promising business in the big data era. Although
profits lie in big data, organizations possessing large-scale datasets
(companies or research institutes) do not participate in the data trading due to
serious concerns in user-generated data. One of the major concerns is that we
do not have accountability in the digital data trading. The concerns are
particularly huge due to the non-physical nature of the digital dataset –
replication and delivery are almost costless when compared to physical
commodities. Concerns arise at the broker side: data owners worry that
brokers may illegally disclose or resell the datasets they outsourced to the
brokers. On the other hand, concerns arise at the consumer side T. Jung and
Z. Qiao are with the Department of Computer Science and Engineering,
University of Notre Dame, Notre Dame, IN, 46556, USA X.-Y. Li and W.
Huang are with the School of Computer Science and Technology, University
of Science and Technology of China, Hefei, Anhui 230027, China J. Qian, L.
Chen, J. Han, and J. Hou are with the Department of Computer Science,
Illinois Institute of Technology, Chicago, IL 60616, USA X.-Y. Li is the
corresponding author. as well.
1.1 ADVANTAGES
Dishonest consumers may illegally resell the purchased datasets.
Addressing the first issue is possible because that was one of FTC’s main
concerns, and FTC has managed to detect and punish dishonest data
brokers already; and 2) achieving accountability in small-size systems has
shown to be possible. Therefore, it is possible to monitor the broker’s side.
The second issue is hard to address. It is reasonable to monitor the
broker’s side (recent movement also shows that), but it is hardly
acceptable to monitor the consumers. Firstly, it is not lawful to lively
monitor individual consumers’ behavior because of the privacy
implications. Secondly, the history of Internet suggests that any service
requires installation of a heavy monitoring system cannot survive because
it leads to bad user experiences.
CHAPTER 2
EXISTING SYSTEM
2.1 INTRODUCTION
One of the major concerns is that we do not have accountability in the digital
data trading. The concerns are particularly huge due to the non-physical
nature of the digital dataset – replication and delivery are almost costless
when compared to physical commodities. Concerns arise at the broker side:
data owners worry that brokers may illegally disclose or resell the datasets
they outsourced to the brokers. On the other hand, concerns arise at the
consumer side as well: dishonest consumers may illegally resell the
purchased datasets.
2.2 LITERATURE SURVEY
A. Data Trading - T. Jung and X.-Y. Li(2014)
Data trading has become a popular research topic in recent years. Cao et al.
in their paper designed a data trading platform utilizing the iterative auction
mechanism to achieve maximum efficiency to offset the self-interest
behaviors amongst the players (data collectors, users and owners). The
platform was created for multiple players who may compete with each other.
Delgado-Segura et al. proposed in an innovative protocol that ensures
fairness between data
buyers and sellers so that neither party is required to trust the other during
the whole transaction, as fairness is enforced by the said protocol based on
the Bitcoin language. The outcome of the fair protocol is that payment and
goods (data) delivery occur at the same time during each transaction.
B. Accountability Systems - K. Khurana and M. Chandak (2013)
Accountability system has been studied in many other areas. Logging
mechanisms are used to achieve accountability in Wireless Sensor Networks;
trusted IP manager is employed for accountability in internet protocol;
byzantine fault detection is studied to account for faults in distributed
systems; memory attestation protocol is proposed to achieve accountability
against energy theft attackers in smart grids; and finally, accountability in
virtual machines is studied to secure the cloud environment.
C. Copy Detection - J. Kim, D. Han, Y.-W. Tai, and J. Kim (2016)
Besides the fast variants we presented in, alternatives exist except
table/graph datasets. Text: Shingling along with Min Hashing has long been
used in the text copy detection to discover similar text documents. W-
shingling, shingling by words instead of letters is also proposed to capture
the word-based similarity. Charikar et al. proposed SimHash in order to
detect near-duplicate text documents, and they also convert documents to
high dimensional vectors and small-size hash values. Image: introduces two
approaches to perform copy detection: Locality Sensitive Hash on color
histograms and MinHash on feature descriptors. In both approaches, the
image is treated as a set of elements. In, feature descriptors are extracted
from each image, and MinHash is applied to them, after which the Jaccard
Index is approximated by the MinHash values. Video: Video copy-detection
is deeply related to that of image datasets. The major approach is to select
keyframes and compare the similarities of the keyframes. JSON/XML: XML
similarity comparison has been surveyed in. It includes a dynamic
programming method based on tree edit distance and a information retrieval
method for document-centric XMLs. For JSON types, a FOSS JavaScript
library enables calculation of similarities using a point-based mechanism.
However, none of them is practical in our scenario where a broker needs to
handle billions of datasets.
D. AccountTrade - D. C. Lee, Q. Ke, and M. Isard.(2010)
In, four types of common datasets (text, image, video, table) in dataset
trading were discussed. XML/JSON-like datasets with (key:value)
structurers were another common type which was not handled in the
conference version. Because (key:value) pairs in XML or JSON datasets
present a structure similar to a tree or a forest, the data examination
mechanism needs to capture the similarity of the structure rather than the
string values. In this paper, we present a similarity comparison mechanism
for JSON datasets which is scalable enough for handling millions of datasets.
The mechanism still uses the uniqueness index in the conference version.
Therefore, AccountTrade in the extended version is a coherent framework
which can handle the trading of text, image, video, table, and XML/JSON
datasets.
2.3 TECHNIQUE AND DISADVANTAGES
We do not have accountability in the digital data trading. The concerns are
particularly huge due to the nonphysical nature of the digital dataset –
replication and delivery are almost costless when compared to physical
commodities data owners worry that brokers may illegally disclose or resell
the datasets they outsourced to the Brokers concerns arise at the consumer
side as well: dishonest consumers may illegally resell the purchased datasets.
2.4 SUMMARY
This paper and its conference version are the first to study consumer
accountability in data trading. We first review recent works in data trading,
accountability systems, and copy-detection Then, we compare this paper
with our own conference version.
CHAPTER 3
PROPOSED SYSTEM
3.1 INTRODUCTION
Propose a suite of accountable protocols, denoted as AccountTrade, for big
data trading hosted by data brokers. AccountTrade enables brokers to attain
accountability against dishonest consumers throughout the trading by
detecting their misbehavior. The trading-related misbehavior defined in this
paper includes tax evasion, denial of purchase, and resale of others’ datasets.
Note that we do not try to detect idea-wise plagiarism (e.g., novels with
similar plots, images taken at the same scenery spot, videos taken with
similar angles) because originality is a subjective factor that is hardly
decidable even by human. Instead, we propose to detect the blatant copy (not
an exact copy) in the datasets uploaded by owners, by detecting whether the
given datasets are derived from others that have been uploaded before.
3.2 ADVANTAGES
AccountTrade which guarantees correct book-keeping and achieves
accountability in the big data trading among dishonest consumers
AccountTrade blames dis-honest consumers if they deviate from their
responsibilities in data transactions.
3.3 SUMMARY
Several challenges exist in designing AccountTrade. Firstly, the boundary
for whether sale is legitimate or not is hard to define. There are two reasons
for that: (1) dishonest sellers may bring perturbation into others’ datasets
before they try to resell them; defining how much overlap will make a
dataset copy of another one is challenging; (2) data brokers buy and sell a
huge number of large-scale datasets, but illegal resale monitoring inherently
involves certain form of scanning over all datasets that they possess. Finally,
as aforementioned, AccountTrade should impose negligible overhead to
consumers and practically acceptable overhead to brokers. We have
addressed these challenges by defining and proposing uniqueness index in
AccountTrade, which is a new rigorous measurement of the data uniqueness.
This index enables AccountTrade to detect dataset resale efficiently even in
trading of big data.
CHAPTER 4
SYSTEM SPECIFICATION
4.1 INTRODUCTION
The purpose of the document is to collect and analyze all assorted ideas that
have come up to define the system, its requirements with respect to
consumers. Also, we shall predict and sort out how we hope this product will
be used in order to gain a better understanding of the project, outline
concepts that may be developed later, and document ideas that are being
considered, but may be discarded as the product develops. In short, the
purpose of this SRS document is to provide a detailed overview of our
software product, its parameters and goals. This document describes the
project's target audience and its user interface, hardware and software
requirements. It defines how our client, team and audience see the product
and its functionality. Nonetheless, it helps any designer and developer to
assist in software delivery lifecycle (SDLC) processes.
4.2 SOFTWARE REQUIREMENTS
• Operating system : Windows 7 Ultimate.
• Front End : C#, .NET
• Back End : SQL Server
4.3 HARDWARE REQUIREMENTS
• System : Dual Core Processor
• Hard Disk : 160 GB
• Ram : 2 GB
4.4 SOFTWARE DESCRIPTION
VISUAL STUDIO .NET
Microsoft Visual Studio is an integrated development environment (IDE)
from Microsoft. It is used to develop computer programs for Microsoft
Windows, as well as web sites, web applications and web services. Visual
Studio uses Microsoft software development platforms such as Windows
API, Windows Forms, Windows Presentation Foundation, Store
and Microsoft Silver light. It can produce both native code and managed
code.
Visual Studio includes a code editor supporting IntelliSense (the code
completion component) as well as code refactoring. The
integrated debugger works both as a source-level debugger and a machine-
level debugger. Other built-in tools include a forms designer for
building GUI applications, web designer, class designer, and database
schema designer. It accepts plug-ins that enhance the functionality at almost
every level—including adding support for source-control systems
(like Subversion) and adding new toolsets like editors and visual designers
for domain-specific languages or toolsets for other aspects of the
software (like the Team Foundation Server client: Team Explorer).
Visual Studio supports different programming languages and allows the code
editor and debugger to support (to varying degrees) nearly any programming
language, provided a language-specific service exists. Built-in languages
include C, C++and C++/CLI (via Visual C++), VB.NET (via Visual Basic
.NET), C# (via Visual C#), and F# (as of Visual Studio 2010). Support for
other languages such as M, Python, and Ruby among others is available via
language services installed separately. It also supports XML/
XSLT, HTML/XHTML, JavaScript and CSS.
Features
Visual Studio features background compilation (also called incremental
compilation).[22][23] As code is being written, Visual Studio compiles it in
the background in order to provide feedback about syntax and compilation
errors, which are flagged with a red wavy underline. Warnings are marked
with a green underline. Background compilation does not generate
executable code, since it requires a different compiler than the one used to
generate executable code.[24] Background compilation was initially
introduced with Microsoft Visual Basic but has now been expanded for all
included languages.[23]
Debugger
Main article: Microsoft Visual Studio Debugger
Visual Studio includes a debugger that works both as a source-level
debugger and as a machine-level debugger. It works with both managed code
as well as native code and can be used for debugging applications written in
any language supported by Visual Studio. In addition, it can also attach to
running processes and monitor and debug those processes.[25] If source
code for the running process is available, it displays the code as it is being
run. If source code is not available, it can show the disassembly. The Visual
Studio debugger can also create memory dumps as well as load them later
for debugging.[26] Multi-threaded programs are also supported. The
debugger can be configured to be launched when an application running
outside the Visual Studio environment crashes.
Designer
Visual Studio includes a host of visual designers to aid in the development of
applications. These tools include:
Windows Forms Designer
The Windows Forms designer is used to build GUI applications using
Windows Forms. Layout can be controlled by housing the controls inside
other containers or locking them to the side of the form. Controls that display
data (like textbox, list box, grid view, etc.) can be bound to data sources like
databases or queries. Data-bound controls can be created by dragging items
from the Data Sources window onto a design surface.[30] The UI is linked
with code using an event-driven programming model. The designer
generates either C# or VB.NET code for the application.
WPF Designer
The WPF designer, codenamed Cider,[31] was introduced with Visual
Studio 2008. Like the Windows Forms designer it supports the drag and drop
metaphor. It is used to author user interfaces targeting Windows Presentation
Foundation. It supports all WPF functionality including data binding and
automatic layout management. It generates XAML code for the UI. The
generated XAML file is compatible with Microsoft Expression Design, the
designer-oriented product. The XAML code is linked with code using a
code-behind model.
Class designer
The Class Designer is used to author and edit the classes (including its
members and their access) using UML modeling. The Class Designer can
generate C# and VB.NET code outlines for the classes and methods. It can
also generate class diagrams from hand-written classes.
Data designer
The data designer can be used to graphically edit database schemas,
including typed tables, primary and foreign keys and constraints. It can also
be used to design queries from the graphical view.
Mapping designer
From Visual Studio 2008 onwards, the mapping designer is used by LINQ to
SQL to design the mapping between database schemas and the classes that
encapsulate the data. The new solution from ORM approach, ADO.NET
Entity Framework, replaces and improves the old technology.
Other tools
The open tabs browser is used to list all open tabs and to switch between
them. It is invoked using CTRL+TAB.
Properties Editor
The Properties Editor tool is used to edit properties in a GUI pane inside
Visual Studio. It lists all available properties (both read-only and those
which can be set) for all objects including classes, forms, web pages and
other items.
Object Browser
The Object Browser is a namespace and class library browser for Microsoft
.NET. It can be used to browse the namespaces (which are arranged
hierarchically) in managed assemblies. The hierarchy may or may not reflect
the organization in the file system.
Solution Explorer
In Visual Studio parlance, a solution is a set of code files and other resources
that are used to build an application. The files in a solution are arranged
hierarchically, which might or might not reflect the organization in the file
system. The Solution Explorer is used to manage and browse the files in a
solution.
Team Explorer
Team Explorer is used to integrate the capabilities of Team Foundation
Server, the Revision Control System into the IDE (and the basis for
Microsoft's Cineplex hosting environment for open source projects). In
addition to source control it provides the ability to view and manage
individual work items (including bugs, tasks and other documents) and to
browse TFS statistics. It is included as part of a TFS install and is also
available as a download for Visual Studio separately.[34][35] Team Explorer
is also available as a stand-alone environment solely to access TFS services.
Data Explorer
Data Explorer is used to manage databases on Microsoft SQL Server
instances. It allows creation and alteration of database tables (either by
issuing T-SQL commands or by using the Data designer). It can also be used
to create queries and stored procedures, with the latter in either T-SQL or in
managed code via SQL CLR. Debugging and IntelliSense support is
available as well.
Server Explorer
The Server Explorer tool is used to manage database connections on an
accessible computer. It is also used to browse running Windows Services,
performance counters, Windows Event Log and message queues and use
them as a data source.
Obfuscator Software Services Community Edition
Visual Studio includes a free 'light' version of Obfuscator by Preemptive
Solutions which performs code obfuscation and application size
reduction.[37] Starting with Visual Studio 2010, this version of Obfuscator
includes runtime intelligence capabilities that allow authors to gather end-
user usage, performance, and stability information from their applications
running in production.
Text Generation Framework
Visual Studio includes a full text generation framework called T4 which
enables Visual Studio to generate text files from templates either in the IDE
or via code.
ASP.NET Web Site Administration Tool
The ASP.NET Web Site Administration Tool allows for the configuration of
ASP.NET websites.
Visual Studio Tools for Office
Visual Studio Tools for Office is a SDK and an add-in for Visual Studio that
includes tools for developing for the Microsoft Office suite. Previously (for
Visual Studio .NET 2003 and Visual Studio 2005) it was a separate SKU
that supported only Visual C# and Visual Basic languages or was included in
the Team Suite. With Visual Studio 2008, it is no longer a separate SKU but
is included with Professional and higher editions. A separate runtime is
required when deploying VSTO solutions.
Extensibility
See also: List of Microsoft Visual Studio Add-ins and Visual Studio
Extensibility Visual Studio allows developers to write extensions for Visual
Studio to extend its capabilities. These extensions "plug into" Visual Studio
and extend its functionality. Extensions come in the form of macros, add-ins,
and packages. Macros represent repeatable tasks and actions that developers
can record programmatically for saving, replaying, and distributing. Macros,
however, cannot implement new commands or create tool windows. They
are written using Visual Basic and are not compiled.[10] Add-Ins provide
access to the Visual Studio object model and can interact with the IDE tools.
Add-Ins can be used to implement new functionality and can add new tool
windows. Add-Ins are plugged into the IDE via COM and can be created in
any COM-compliant languages.[10] Packages are created using the Visual
Studio SDK and provide the highest level of extensibility. They can create
designers and other tools, as well as integrate other programming languages.
The Visual Studio SDK provides unmanaged APIs as well as a managed API
to accomplish these tasks. However, the managed API isn't as
comprehensive as the unmanaged one.[10] Extensions are supported in the
Standard (and higher) versions of Visual Studio 2005. Express Editions do
not support hosting extensions.Visual Studio 2008 introduced the Visual
Studio Shell that allows for development of a customized version of the IDE.
The Visual Studio Shell defines a set of VSPackages that provide the
functionality required in any IDE. On top of that, other packages can be
added to customize the installation. The Isolated mode of the shell creates a
new AppId where the packages are installed. These are to be started with a
different executable. It is aimed for development of custom development
environments, either for a specific language or a specific scenario. The
Integrated mode installs the packages into the AppId of the
Professional/Standard/Team System editions, so that the tools integrate into
these editions.[17] The Visual Studio Shell is available as a free download.
After the release of Visual Studio 2008, Microsoft created the Visual Studio
Gallery. It serves as the central location for posting information about
extensions to Visual Studio. Community developers as well as commercial
developers can upload information about their extensions to Visual Studio
.NET 2002 through Visual Studio 2010. Users of the site can rate and review
the extensions to help assess the quality of extensions being posted. RSS
feeds to notify users on updates to the site and tagging features are also
planned.
Supported products
Microsoft Visual C++ is Microsoft's implementation of the C and C++
compiler and associated languages-services and specific tools for integration
with the Visual Studio IDE. It can compile either in C mode or C++ mode.
For C, it follows the 1990 version of the ISO C standard with parts of C99
specification along with MS-specific additions in the form of libraries.[40]
For C++, it follows the ANSI C++ specification along with a few C++11
features.[41] It also supports the C++/CLI specification to write managed
code, as well as mixed-mode code (a mix of native and managed code).
Microsoft positions Visual C++ for development in native code or in code
that contains both native as well as managed components. Visual C++
supports COM as well as the MFC library. For MFC development, it
provides a set of wizards for creating and customizing MFC boilerplate code,
and creating GUI applications using MFC. Visual C++ can also use the
Visual Studio forms designer to design UI graphically. Visual C++ can also
be used with the Windows API. It also supports the use of intrinsic
functions,which are functions recognized by the compiler itself and not
implemented as a library. Intrinsic functions are used to expose the SSE
instruction set of modern CPUs. Visual C++ also includes the OpenMP
(version 2.0) specification.
Microsoft Visual C#
Microsoft Visual C#, Microsoft's implementation of the C# language, targets
the .NET Framework, along with the language services that lets the Visual
Studio IDE support C# projects. While the language services are a part of
Visual Studio, the compiler is available separately as a part of the .NET
Framework. The Visual C# 2008, 2010 and 2012 compilers support versions
3.0, 4.0 and 5.0 of the C# language specifications, respectively. Visual C#
supports the Visual Studio Class designer, Forms designer, and Data
designer among others.
Microsoft Visual Basic
Microsoft Visual Basic is Microsoft's implementation of the VB.NET
language and associated tools and language services. It was introduced with
Visual Studio .NET (2002). Microsoft has positioned Visual Basic for Rapid
Application Development.Visual Basic can be used to author both console
applications as well as GUI applications. Like Visual C#, Visual Basic also
supports the Visual Studio Class designer, Forms designer, and Data
designer among others. Like C#, the VB.NET compiler is also available as a
part of .NET Framework, but the language services that let VB.NET projects
be developed with Visual Studio, are available as a part of the latter.
Microsoft Visual Web Developer
Microsoft Visual Web Developer is used to create web sites, web
applications and web services using ASP.NET. Either C# or VB.NET
languages can be used. Visual Web Developer can use the Visual Studio
Web Designer to graphically design web page layouts.
Team Foundation Server
Team Foundation Server is intended for collaborative software development
projects and acts as the server-side backend providing source control, data
collection, reporting, and project-tracking functionality. It also includes the
Team Explorer, the client tool for TFS services, which is integrated inside
Visual Studio Team System.
Previous products
Visual FoxPro is a data-centric object-oriented and procedural programming
language produced by Microsoft. It derives from FoxPro (originally known
as FoxBASE) which was developed by Fox Software beginning in 1984.
Visual FoxPro is tightly integrated with its own relational database engine,
which extends FoxPro's xBase capabilities to support SQL queries and data
manipulation. Visual FoxPro is a full-featured,[47] dynamic programming
language that does not require the use of an additional general-purpose
programming environment. In 2007, Visual FoxPro was discontinued after
version 9 Service Pack 2. It was supported until 2015.
Visual Source Safe
Microsoft Visual SourceSafe is a source control software package oriented
towards small software-development projects. The SourceSafe database is a
multi-user, multi-process file-system database, using the Windows file
system database primitives to provide locking and sharing support. All
versions are multi-user, using SMB (file server) networking.[49][50][51]
However, with Visual SourceSafe 2005, other client–server modes were
added, Lan Booster and VSS Internet (which used HTTP/HTTPS). Visual
SourceSafe 6.0 was available as a stand-alone product[52] and was included
with Visual Studio 6.0, and other products such as Office Developer Edition.
Visual SourceSafe 2005 was available as a stand-alone product and included
with the 2005 Team Suite. Team Foundation Server has superseded VSS as
Microsoft's recommended platform for source control.
Microsoft Visual J++/Microsoft Visual J#
Structured Query Language/Introduction to SQL
The main drive behind a relational database is to increase accuracy by
increasing the efficiency with which data is stored. For example, the names
of each of the millions of people who immigrated to the United States
through Ellis Island at the turn of the 20th century were recorded by hand on
large sheets of paper; people from the city of London had their country of
origin entered as England, or Great Britain, or United Kingdom, or U.K., or
UK, or Engl., etc. Multiple ways of recording the same information leads to
future confusion when there is a need to simply know how many people
came from the country now known as the United Kingdom.
The modern solution to this problem is the database. A single entry is made
for each country, for example, in a reference list that might be called the
Country table. When someone needs to indicate the United Kingdom, he
only has one choice available to him from the list: a single entry called
"United Kingdom". In this example, "United Kingdom" is the unique
representation of a country, and any further information about this country
can use the same term from the same list to refer to the same country. For
example, a list of telephone country codes and a list of European castles both
need to refer to countries; by using the same Country table to provide this
identical information to both of the new lists, we've established new
relationships among different lists that only have one item in common:
country. A relational database, therefore, is simply a collection of lists that
share some common pieces of information.
Structured Query Language (SQL) - SQL, which is an abbreviation
for Structured Query Language, is a language to request data from a
database, to add, update, or remove data within a database, or to manipulate
the metadata of the database.
SQL is a declarative language in which the expected result or operation is
given without the specific details about how to accomplish the task. The
steps required to execute SQL statements are handled transparently by the
SQL database. Sometimes SQL is characterized as non-procedural because
procedural languages generally require the details of the operations to be
specified, such as opening and closing tables, loading and searching indexes,
or flushing buffers and writing data to filesystems. because the lower level
logical and physical operations aren't specified and are determined by the
SQL engine or server process that executes it.
Instructions are given in the form of statements, consisting of a specific SQL
statement and additional parameters and operands that apply to that
statement. SQL statements and their modifiers are based upon official SQL
standards and certain extensions to that each database provider implements.
Commonly used statements are grouped into the following categories:
Data Query Language (DQL)
 SELECT - Used to retrieve certain records from one or more tables.
Data Manipulation Language (DML)
 INSERT - Used to create a record.
 UPDATE - Used to change certain records.
 DELETE - Used to delete certain records.
Data Definition Language (DDL)
 CREATE - Used to create a new table, a view of a table, or other
object in database.
 ALTER - Used to modify an existing database object, such as a table.
 DROP - Used to delete an entire table, a view of a table or other object
in the database.
Data Control Language (DCL)
 GRANT - Used to give a privilege to someone.
REVOKE - Used to take back privileges granted to someone.
4.5 SUMMARY
The software requirements specification document lists sufficient and
necessary requirements for the project development. To derive the
requirements, the developer needs to have clear and thorough understanding
of the products under development. This is achieved through detailed and
continuous communications with the project team and customer throughout
the software development process
CHAPTER - 5
SYSTEM DESIGN
5.1 ARCHITECTURE
5.2 DATA FLOW DIAGRAM
INPUT DESIGN AND OUTPUT DESIGN
INPUT DESIGN
The input design is the link between the information system and the user. It
comprises the developing specification and procedures for data preparation
and those steps are necessary to put transaction data in to a usable form for
processing can be achieved by inspecting the computer to read data from a
written or printed document or it can occur by having people keying the data
directly into the system. The design of input focuses on controlling the
amount of input required, controlling the errors, avoiding delay, avoiding
extra steps and keeping the process simple. The input is designed in such a
way so that it provides security and ease of use with retaining the privacy.
Input Design considered the following things:’
 What data should be given as input?
 How the data should be arranged or coded?
 The dialog to guide the operating personnel in providing input.
 Methods for preparing input validations and steps to follow when
error occur.
OBJECTIVES
1.Input Design is the process of converting a user-oriented description of the
input into a computer-based system. This design is important to avoid errors
in the data input process and show the correct direction to the management
for getting correct information from the computerized system.
2.It is achieved by creating user-friendly screens for the data entry to handle
large volume of data. The goal of designing input is to make data entry
easier and to be free from errors. The data entry screen is designed in such a
way that all the data manipulates can be performed. It also provides record
viewing facilities.
3.When the data is entered it will check for its validity. Data can be entered
with the help of screens. Appropriate messages are provided as when needed
so that the user
will not be in maize of instant. Thus the objective of input design is to create
an input layout that is easy to follow
OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and
presents the information clearly. In any system results of processing are
communicated to the users and to other system through outputs. In output
design it is determined how the information is to be displaced for immediate
need and also the hard copy output. It is the most important and direct source
information to the user. Efficient and intelligent output design improves the
system’s relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought
out manner; the right output must be developed while ensuring that each
output element is designed so that people will find the system can use easily
and effectively. When analysis design computer output, they should Identify
the specific output that is needed to meet the requirements.
2.Select methods for presenting information.
3.Create document, report, or other formats that contain information
produced by the system.
The output form of an information system should accomplish one or more of
the following objectives.
 Convey information about past activities, current status or projections
of the
 Future.
 Signal important events, opportunities, problems, or warnings.
 Trigger an action.
 Confirm an action.
SYSTEM STUDY

FEASIBILITY STUDY:

The feasibility of the project is analyzed in this phase and business


proposal is put forth with a very general plan for the project and some cost
estimates. During system analysis the feasibility study of the proposed
system is to be carried out. This is to ensure that the proposed system is not a
burden to the company. For feasibility analysis, some understanding of the
major requirements for the system is essential.

Three key considerations involved in the feasibility analysis are

 Economical feasibility

 Technical feasibility

 Social feasibility

ECONOMICAL FEASIBILITY:

This study is carried out to check the economic impact that the system
will have on the organization. The amount of fund that the company can
pour into the research and development of the system is limited. The
expenditures must be justified. Thus the developed system as well within the
budget and this was achieved because most of the technologies used are
freely available. Only the customized products had to be purchased.

TECHNICAL FEASIBILITY:

This study is carried out to check the technical feasibility, that is, the
technical requirements of the system. Any system developed must not have a
high demand on the available technical resources. This will lead to high
demands on the available technical resources. This will lead to high demands
being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing
this system.

SOCIAL FEASIBILITY:

The aspect of study is to check the level of acceptance of the system


by the user. This includes the process of training the user to use the system
efficiently. The user must not feel threatened by the system, instead must
accept it as a necessity. The level of acceptance by the users solely depends
on the methods that are employed to educate the user about the system and to
make him familiar with it. His level of confidence must be raised so that he
is also able to make some constructive criticism, which is welcomed, as he is
the final user of the system.
SYSTEM TESTING

The purpose of testing is to discover errors. Testing is the process of


trying to discover every conceivable fault or weakness in a work product. It
provides a way to check the functionality of components, sub assemblies,
assemblies and/or a finished product It is the process of exercising software
with the intent of ensuring that the Software system meets its requirements
and user expectations and does not fail in an unacceptable manner. There are
various types of test. Each test type addresses a specific testing requirement.

TYPES OF TESTS:
Testing is the process of trying to discover every conceivable fault or
weakness in a work product. The different type of testing are given below:

UNIT TESTING:

Unit testing involves the design of test cases that validate that the
internal program logic is functioning properly, and that program inputs
produce valid outputs. All decision branches and internal code flow should
be validated. It is the testing of individual software units of the application .it
is done after the completion of an individual unit before integration.

This is a structural testing, that relies on knowledge of its construction


and is invasive. Unit tests perform basic tests at component level and test a
specific business process, application, and/or system configuration. Unit
tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and
expected results.

INTEGRATION TESTING:

Integration tests are designed to test integrated software components


to determine if they actually run as one program. Testing is event driven and
is more concerned with the basic outcome of screens or fields. Integration
tests demonstrate that although the components were individually
satisfaction, as shown by successfully unit testing, the combination of
components is correct and consistent. Integration testing is specifically
aimed at exposing the problems that arise from the combination of
components.

FUNCTIONAL TEST:
Functional tests provide systematic demonstrations that functions tested
are available as specified by the business and technical requirements, system
documentation, and user manuals.
Functional testing is centered on the following items:

Valid Input : identified classes of valid input must be accepted.


Invalid Input : identified classes of invalid input must be rejected.
Functions : identified functions must be exercised.
Output : identified classes of application outputs must be
exercised.
Systems/ Procedures: interfacing systems or procedures must be invoked.

Organization and preparation of functional tests is focused on


requirements, key functions, or special test cases. In addition, systematic
coverage pertaining to identify Business process flows; data fields,
predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified
and the effective value of current tests is determined.

SYSTEM TEST:
System testing ensures that the entire integrated software system meets
requirements. It tests a configuration to ensure known and predictable
results. An example of system testing is the configuration oriented system
integration test. System testing is based on process descriptions and flows,
emphasizing pre-driven process links and integration points.
WHITE BOX TESTING:
White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or
at least its purpose. It is purpose. It is used to test areas that cannot be
reached from a black box level.
BLACK BOX TESTING:
Black Box Testing is testing the software without any knowledge of the
inner workings, structure or language of the module being tested. Black box
tests, as most other kinds of tests, must be written from a definitive source
document, such as specification or requirements document, such as
specification or requirements document. It is a testing in which the software
under test is treated, as a black box .you cannot “see” into it. The test
provides inputs and responds to outputs without considering how the
software works.
UNIT TESTING:
Unit testing is usually conducted as part of a combined code and unit
test phase of the software lifecycle, although it is not uncommon for coding
and unit testing to be conducted as two distinct phases.
Test strategy and approach
Field testing will be performed manually and functional tests will be
written in detail.
Test objectives
 All field entries must work properly.
 Pages must be activated from the identified link.
 The entry screen, messages and responses must not be delayed.
Features to be tested
 Verify that the entries are of the correct format
 No duplicate entries should be allowed
 All links should take the user to the correct page.
INTEGRATION TESTING:
Software integration testing is the incremental integration testing of
two or more integrated software components on a single platform to produce
failures caused by interface defects.
The task of the integration test is to check that components or software
applications, e.g. components in a software system or – one step up –
software applications at the company level – interact without error.
Test Results: All the test cases mentioned above passed successfully.
No defects encountered.
ACCEPTANCE TESTING:
User Acceptance Testing is a critical phase of any project and requires
significant participation by the end user. It also ensures that the system meets
the functional requirements.
Test Results: All the test cases mentioned above passed successfully. No
defects encountered
CHAPTER-5
MODULUS DESCRIPTION
6.1 PARALLEL K-MEANS ALGORITHM
6.2. MODULES
6.2.1 Data Trading with Brokers
Three entities exist in our model: brokers, sellers, buyers, and they have
different trading-related responsibilities.
Broker: Brokers provide general shopping services (product listing,
description, payment, deliver, etc.). Besides, they are in charge of book-
keeping for accounting purpose (i.e., recording trading transactions), and
they also need to define the rules to specify what type of selling is
considered as re-selling.
Seller: Sellers are mandated to sell only the datasets are collected/generated
by themselves, and they should not re-sell others’ datasets by slightly
perturbing them. Also, they should honestly file the tax report regarding the
dataset sale, and they are forbidden to disturb the brokers’ book-keeping.
Buyer: Buyers should not disturb the brokers’ book-keeping. Some areas are
important but orthogonal to ours. Properly describing quality/utility of
datasets for buyers is complementary, and sophisticated pricing mechanism
may exists but we simply let sellers set the prices.
6.2.2 Adversary Model & Channel Assumption
Malicious users: We assume users may try to avoid the aforementioned
responsibilities, e.g., disrupting the brokers’ data trading service enabled by
AccountTrade, denying cleared transactions (i.e., paid and sold), and
illegally selling previously purchased dataset. A user is defined as a
dishonest user if he avoided any of the trading-related responsibility, and
such behavior (either selling or buying) is denoted as misbehavior. Note that,
when illegally selling previously purchased datasets, attackers may try to
perturb the dataset to bypass existing copy-detection approaches.
Trusted brokers: We assume the brokers can be trusted, e.g., the role is
played by the organizations that are strictly supervised with great
transparency or commercial companies with high reputation. Similar
assumptions can be found in, and the assumption that the brokers will strictly
supervised is also consistent with the FTC’s recent action.
Channel assumption: We assume both buyers and sellers interact with the
broker via secure communication channels. The communication is encrypted
and decrypted with redistributed keys to guarantee that the dataset is not
open to the public. This also implies authentication is in place since the
broker needs to use the correct entity’s key for communication.

6.2.3 Accountability Model


We inherit to define our own accountability model. Our accountable
protocols are characterized by the two properties:
• Fairness: honest entities are never blamed.
• Goal-centered completeness: if accountability is not preserved due to
malicious entities’ misbehavior, at least one of them is blamed.
General completeness, which states all misbehaving entities should be
blamed, is impossible to satisfy because “some misbehavior cannot be
observed by any honest party”. Account Trade also requires individual
accountability, which states it must be able to correctly blame one or more
parties unambiguously, rather than to blame a group among which some
entities misbehaved but do not know whom.
6.3 Summary
Examine is executed when sellers upload their dataset, and it does not have
to be performed in a real time since it is acceptable that a hold is placed on
an uploaded dataset when it is uploaded for selling. Therefore, the presented
benchmark results are promising in that most of the data can be examined in
a time that is negligible to the uploading time.
CHAPTER- 7
IMPLEMENTATION RESULTS
The detection mechanism involves copy-detection in various types of
datasets, and we use the uniqueness index along with a threshold-based
detection. We adopt the current threshold based decision process because it
is extremely fast to make decisions with it, which makes it suitable for large-
scale platforms where more than billions of datasets may reside; and our
extensive simulation with real-world datasets show that the uniqueness
indices are clearly separable into two groups: unique and derivative datasets.
In theory, it is possible to project datasets to another complex feature space
so that more sophisticated perturbation can be captured (e.g., using recent
advancement such as deep neural networks), but a separate line of study
needs to be performed to increase the efficiency, and the “copy” needs to be
re-defined accordingly.

Besides the fast variants we presented in §III-C, alternatives exist except


table/graph datasets. Text: Shingling along with MinHashing has long been
used in the text copy detection to discover similar text documents. W-
shingling , shingling by words instead of letters, is also proposed to capture
the word-based similarity. Charikar et al. proposed SimHash in order to
detect near-duplicate text documents, and they also convert documents to
high dimensional vectors and small-size hash values. Image: introduces two
approaches to perform copy detection: Locality Sensitive Hash on color
histograms and MinHash on feature descriptors. In both approaches, the
image is treated as a set of elements. In , feature descriptors are extracted
from each image, and MinHash is applied to them, after which the Jaccard
Index is approximated by the MinHash values. Video: Video copy-detection
is deeply related to that of image datasets. The major approach is to select
keyframes and compare the similarities of the keyframes
CHAPTER-8
CONCLUSION AND FUTURE ENHANCEMENT

8.1 CONCLUSION:
This paper presents AccountTrade which guarantees correct book-keeping
and achieves accountability in the big data trading among dishonest
consumers. AccountTrade blames dishonest consumers if they deviate from
their responsibilities in data transactions. To achieve accountability against
dishonest sellers who may resell others’ datasets, we presented a novel
rigorous quantification of the dataset uniqueness – uniqueness index – which
is efficiently computable.
8.2 FUTURE ENHANCEMENT
We formally defined two accountability models and proved them with
ProVerif and theoretic analysis, and we also evaluated the performance and
QoS using real-world datasets in our implemented test bed.
REFFERENCES:
[1] Data markets compared – a look at data market offerings from four
providers. goo.gl/k3qZsj.
[2] Ftc charges data broker with facilitating the theft of millions of dollars
from consumers’ accounts. goo.gl/7ygm7Q.
[3] Ftc charges data brokers with helping scammer take more than $7
million from consumers’ accounts. goo.gl/kZMmXn.
[4] Ftc complaint offers lessons for data broker industry. goo.gl/csBYA3.
[5] Multimedia computing and computer vision lab. goo.gl/pbKeCj.
[6] R. Araujo, S. Foulle, and J. Traor ´ e. A practical and secure coercion- ´
resistant scheme for remote elections. In Dagstuhl Seminar Proceedings.
Schloss Dagstuhl-Leibniz-Zentrum fur Informatik, 2008. ¨
[7] D. Baltieri, R. Vezzani, and R. Cucchiara. Sarc3d: a new 3d body model
for people tracking and re-identification. In ICIAP, pages 197– 206.
Springer, 2011.
[8] B. Blanchet. Automatic verification of security protocols in the symbolic
model: The verifier proverif. In FOSAD, pages 54–87. Springer, 2014.
[9] B. H. Bloom. Space/time trade-offs in hash coding with allowable
errors. Communications of the ACM, 13(7):422–426, 1970.
[10] S. Brin, J. Davis, and H. Garcia-Molina. Copy detection mechanisms
for digital documents. In SIGMOD, volume 24, pages 398–409. ACM,
1995.
APPENDICES
APPENDIX A
Sample Screen Shots
APPENDIX B
Sample Source Code
APPENDIX C
Base Paper

You might also like