1.1 Organizational Profile

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 31

Distributed Metadata Management for Large Cluster-Based Storage Systems

1


CHAPTER I
INTRODUCTION
This chapter gives a detailed description of the organization for which the system was
developed. It defines the objective and scope of the proposed system and describes the
various users of the system. It gives brief introduction about objective, scope, users and
limitations of the project. It describes various technologies and tools used in the development
of the system. The development and deployment environments of the system are also
specified in this chapter.
1.1 ORGANIZATIONAL PROFILE
Infitech is a registered Consultancy, Software Development and Technical Training
Company that combines the definite bottom-line benefits with its recognized expertise in
strategic global models that help it in providing reliable Software Development Solutions.
Infitech shortly known as ITS was established by an Microsoft Certified Professional. The
services offered cover Custom Software Development, Application Development, Web
Application Development Software, and website design, Web Marketing, Website Design
and Software Maintenance.

Portfolio Products are:
Logistics
ERP for Textiles
PRM
ERP for Education
Vision
Infitech has achieved a prominent position of an expert Software Development
Company possessing some of the best analytical brains. Our transparent, efficient and flexible
world class software development process zero down risks of project failures and creates
powerful software solutions that meet present as well as future demands.


Distributed Metadata Management for Large Cluster-Based Storage Systems

2


Mission
Infitech mission is "To provide unique cost effective solutions to the Changing
world". Infitech recognizes the need for adaptability and is geared to be agile in meeting the
changing needs of its clients with changing state of the art tools and techniques


1.2 PROBLEM DEFINITION

OBJ ECTIVE
Rapid advances in general-purpose communication networks have
motivated the employment of inexpensive components to build competitive cluster-based
storage solutions to meet the increasing demand of scalable computing. In the recent years,
the bandwidth of these networks has been increased by two orders of magnitude. , which
greatly narrows the performance gap between them and the dedicated networks used in
commercial storage systems. Since all I/O requests can be classified into two categories, that
is, user data requests and metadata requests, the scalability of accessing both data and
metadata has to be carefully maintained to avoid any potential performance bottleneck along
all data paths. This paper proposes a novel scheme, called Hierarchical Bloom Filter Arrays
(HBA), to evenly distribute the tasks of metadata management to a group of MSs. A Bloom
filter (BF) is a succinct data structure for probabilistic membership query. A straightforward
extension of the BF approach to decentralizing metadata management onto multiple MSs is to
use an array of BFs on each MS. The metadata of each file is stored on some MS, called the
home MS.
SCOPE
Solution is basically aimed to create metadata of all the files in the network and finds
a particular file using a search engine in fast and easy

USERS
Login
Given Input-Login details
Expected Output-Login persons can use the software
Finding Network Computers
Distributed Metadata Management for Large Cluster-Based Storage Systems

3

Given Input- Click on the particular button to know network system. Expected Output-
Shows all the connected nodes in the network.
Meta Data Creation
Given Input- Search all the files and stores necessary information. Expected Output-
Updation of database with created metadata.
Searching Files
Given Input- File name, or File size
Expected Output- shows link to get particular file.


LIMITATIONS
HBA: Distributed Metadata Management for Large Cluster-Based Storage Systems
developed using .NET using C#. is developed to search the file over network it contains its
own policies and terms.

Distributed Metadata Management for Large Cluster-Based Storage Systems

4

1.3 SYSTEM ENVIRONMENT
Development Environment
Hardware Configuration
Processor : Pentium IV
Memory (RAM) : 1 GB
Hard disk : 2 GB

Software Configuration
Operating System : Windows 7
Front End : Microsoft Visual Studio .Net 2005
Back End : SQL 2005
Coding : C# 2.0

Deployment Environment
Hardware Specification
Processor : Dual Core
Processor Speed : 2.93 GHz
Memory (RAM) : 8GB
Hard Disk : 1 TB

Software Specification
Operating System : Windows 7
Front End : Microsoft Visual Studio .Net 2005
Back End : SQL 2005








Distributed Metadata Management for Large Cluster-Based Storage Systems

5

CHAPTER II
SYSTEM ANALYSIS
2.1 SYSTEM DESCRIPTION
EXISTING SYSTEM
File mapping or file lookup is critical in decentralizing metadata management within a group
of metadata servers. Following approaches are used in the Existing system.
Table-Based Mapping : It fails to balance the load.
Hashing-Based Mapping : It has slow directory operations, such as listing the
directory contents and renaming directories.
Static Tree Partitioning : Cannot balance the load and has a medium lookup
time.
Dynamic Tree Partitioning : Small memory overhead, incurs a large
Migration overhead.
PROPOSED SYSTEM
The proposed system we are using the new approaches called HIERARCHICAL
BLOOM FILTER ARRAYS (HBA), efficiently route metadata request within a group of
metadata servers. There are two arrays used here. First array is used to reduce memory
overhead, because it captures only the destination metadata server information of frequently
accessed files to keep high management efficiency. And the second one is used to maintain
the destination metadata information of all files. Both the arrays are mainly used for fast
local lookup.
Advantages

1. Faster search of different files from all the nodes of a network

2. Good results with file size and part of a file name.
Applications
This software can use in every intranet for the fast search of files in the network.
Distributed Metadata Management for Large Cluster-Based Storage Systems

6

2.2 USE CASE MODEL
The use case model provides detailed information about the behavior of the system or
application. Use cases describe the major behavior that are identified in the requirements and
describe the value that the results give the users.


Figure 2.1: Use Case Design for Distributed Metadata Management for Large Cluster-Based
Storage Systems
Use case name: Distributed Metadata Management for Large Cluster-Based Storage
Systems
Actors : user

Figure 2.1 shows the overall use case model for the Automated Monitoring system. It
contains all the actors and their actions.


Distributed Metadata Management for Large Cluster-Based Storage Systems

7

2.3 SOFTWARE REQUIREMENTS SPECIFICATION
FUNCTIONAL REQUIREMENTS
Login
In Login Form module presents site visitors with a form with username and password
fields. If the user enters a valid username/password combination they will be granted access
to additional resources on website. Which additional resources they will have access to can
be configured separately.
Finding Network Computers
In this module we are going to find out the available computers from the network. And we
are going to share some of the folder in some computers. We are going to find out the
computers those having the shared folder. By this way will get all the information about the
file and we will form the Meta data.
Meta Data Creation
In this module we are creating a metadata for all the system files. The module is going to
save all file names in a database. In addition to that, it also saves some information from the
text file. This mechanism is applied to avoid the long run process of the existing system.

Searching Files
In this module the user going to enter the text for searching the required file. The searching
mechanism is differing from the existing system. When ever the user gives their searching
text, It is going to search from the database. At first, the search is based on the file name.
After that, it contains some related file name. Then it collects some of the file text, it makes
another search. Finally it produces a search result for corresponding related text for the user.

Distributed Metadata Management for Large Cluster-Based Storage Systems

8

NON-FUNCTIONAL REQUIREMENTS
Reliability
The system must be running continuously without any interruption. To check the
reliability of the system, it is made to run for three months continuously and tested.

Availability
The system should be available to the right users at right time and provide the
facilities exactly in right time. The availability of the numbers to users and locking of
numbers without indefinite locking at the same time should be maintained.

Security
The system should be secure to the users because it handles more sensitive data. In
The Distributed Metadata Management for Large Cluster-Based Storage Systems role based
login should be provided such that the user interface changes according to the role and the
information is hidden from the roles that are not supposed to use the information.

Portability
The Distributed Metadata Management for Large Cluster-Based Storage Systems is
designed in C# technology and it is able to be portable in multiple platforms without any
problem.



















Distributed Metadata Management for Large Cluster-Based Storage Systems

9

2.4 TEST PLAN
TEST SCOPE
Testing is done using tools to certain units in the Number Management System and
also manual testing is done for certain units. Certain tools used to track the bugs that occur in
the system. All the data entered to the system are validated manually by testing for all
possible type of inputs.

TESTING TECHNIQUES
UNIT TESTING
Unit testing is done for all the class and the methods of those classes. Unit testing tool
is used to test each unit in the system. Annotations of the Unit tool are used to check for the
format of the fields are checked. File format is validated because the system can upload only
the file in the xls format. Unit provide various features to check the functionality for multiple
inputs.
LOAD TESTING
Load testing should be done to test the bulk upload and download of numbers. Load
testing also done for large number of outlet users to enter the system and reserve bulk amount
of numbers at the same time.















Distributed Metadata Management for Large Cluster-Based Storage Systems

10


CHAPTER III
SYSTEM DESIGN
Software Design is a process through which requirements are translated into a
representation of software. It is the place where quality is fostered. It identifies the software
components, specifies relationships among components, defines program structure and
provides a blue-print for implementation. This chapter gives the software design for the
system.
3.1 STRUCTURAL DESIGN
DATA FLOW DI AGRAM
A data flow diagram is a graphical representation of the "flow" of data through an
information system and used for the visualization of data processing (structured design).


Figure 3.1: Context Level Data flow diagram of Distributed Metadata Management for
Large Cluster-Based Storage Systems
The context level DFD shows the entire system as a single process, and gives no clues as to
its internal organization as shown in Figure 3.1.





Figure 3.2: Level 1 Data flow diagram for the role of finding the system the Distributed
Metadata Management for Large Cluster-Based Storage Systems







User
Login

Verify
Login table
New File info
User

Find system
Distributed Metadata Management for Large Cluster-Based Storage Systems

11




Figure 3.3: Level 1 Data flow diagram for the role of create meta data the Distributed
Metadata Management for Large Cluster-Based Storage Systems








Figure 3.4: Level 1 Data flow diagram for the role of searching the files in Distributed
Metadata Management for Large Cluster-Based Storage Systems
















Meta data

User
Create Meta
data
File info
User
File search
Distributed Metadata Management for Large Cluster-Based Storage Systems

12

MODULE DIAGRAM










Figure 3.5: Module diagram for the Distributed Metadata Management for Large
Cluster-Based Storage Systems

The Modules in the Figure 3.9 is explained in the following.
Login
In Login Form module presents site visitors with a form with username and password fields.
If the user enters a valid username/password combination they will be granted access to
additional resources on website. Which additional resources they will have access to can be
configured separately.

Finding Network Computers
In this module we are going to find out the available computers from the network. And we
are going to share some of the folder in some computers. We are going to find out the
computers those having the shared folder. By this way will get all the information about the
file and we will form the Meta data.

Meta Data Creation
In this module we are creating a metadata for all the system files. The module is going to
save all file names in a database. In addition to that, it also saves some information from the
text file. This mechanism is applied to avoid the long run process of the existing system.

Distributed Metadata
Management for
Large Cluster-Based
Storage Systems
Login
Finding Network Computers

Meta Data Creation

Searching Files

Distributed Metadata Management for Large Cluster-Based Storage Systems

13

Searching Files
In this module the user going to enter the text for searching the required file. The searching
mechanism is differing from the existing system. When ever the user gives their searching
text, It is going to search from the database. At first, the search is based on the file name.
After that, it contains some related file name. Then it collects some of the file text, it makes
another search. Finally it produces a search result for corresponding related text for the user.
Distributed Metadata Management for Large Cluster-Based Storage Systems

14

3.2 BEHAVIOURAL DESIGN
ACTIVITY DIAGRAM











Login.
New User
Search System
CreateMetada
ta
Meatadata
Search File
Metadata
Complete
Search
Finds file
Yes
Distributed Metadata Management for Large Cluster-Based Storage Systems

15

3.3 DATA DESIGN
3.3.1 ENTITY RELATIONSHIP DIAGRAM



















3.3.2 TABLE DESIGN
Table name: file info
Table Description: Used to store details about the file information.
Field Name Data Type Size Constraints Description
systemname Varchar 30 Not Null System name
filename Varchar 10 Primary key File name
filetxt Varchar 300 Not Null File text
filesize float Not Null File size
Dateofcreation datetime Not Null Date of creation
filetype Varchar 50 Not Null File type



User
Metadata
HAS
File Name
info
Login
Details
File Size
Info
Distributed Metadata Management for Large Cluster-Based Storage Systems

16

Table name: login
Table Description: Used to store details about the login information.
Field Name Data Type Size Constraints Description
Uname Varchar 20 Not Null User name
pass Varchar 20 Primary key Password
role Varchar 15 Not Null Role
Uid int Not Null User identification













Table name: new file info
Table Description: Used to store details about the file information.
Field Name Data Type Size Constraints Description
filename Varchar 100 Primary key File name
systemname Varchar 50 Not Null System name



Table name: suggest
Table Description: Used to store details about the file information.
Field Name Data Type Size Constraints Description
Keytext Varchar 100 Primary key File name

Distributed Metadata Management for Large Cluster-Based Storage Systems

17

3.4 USER INTERFACE DESIGN



Screen 3.4.1 login form

The screen 3.4.1 shows the home page of the Distributed Metadata Management for Large
Cluster-Based Storage Systems. It contains user to login






Distributed Metadata Management for Large Cluster-Based Storage Systems

18


Screen 3.4.2 finds the network computer form

The screen 3.4.2 shows the home page of the Distributed Metadata Management for Large
Cluster-Based Storage Systems. It helps to find system over networks

Distributed Metadata Management for Large Cluster-Based Storage Systems

19


Screen 3.4.3 create the meta data form

The screen 3.4.3 shows the home page of the Distributed Metadata Management for Large
Cluster-Based Storage Systems. It helps to create the meta data form



Distributed Metadata Management for Large Cluster-Based Storage Systems

20



Screen 3.4.4 searching the file form

The screen 3.4.4 shows the home page of the Distributed Metadata Management for Large
Cluster-Based Storage Systems. It helps to searching the file

Distributed Metadata Management for Large Cluster-Based Storage Systems

21


Screen 3.4.5 searching the file form

The screen 3.4.5 shows the home page of the Distributed Metadata Management for Large
Cluster-Based Storage Systems. It helps to searching the file


Distributed Metadata Management for Large Cluster-Based Storage Systems

22






Screen 3.4.6 searching the file form

The screen 3.4.6 shows the home page of the Distributed Metadata Management for Large
Cluster-Based Storage Systems. It helps to searching the file







Distributed Metadata Management for Large Cluster-Based Storage Systems

23

3.5 DEPLOYMENT DESIGN


























Figure 3.6: Deployment Diagram for the Distributed Metadata
Management for Large Cluster-Based Storage Systems

Deployment diagrams show the allocation of artifacts to nodes according to the deployments
defined between them. The deployment diagram for Distributed Metadata Management for
Large Cluster-Based Storage Systems is shown in figure 3.13.
System
Distributed Metadata
Management for Large
Cluster-Based
Storage Systems
SQL 2005
LOGIN
FINDING NETWORK
COMPUTERS

META DATA CREATION

SEARCHING FILES

Networks
Distributed Metadata Management for Large Cluster-Based Storage Systems

24

3.6 NAVIGATION DESIGN















Figure 3.7: Navigation Diagram for the Distributed Metadata
Management for Large Cluster-Based Storage Systems

The purpose of a navigation diagram is to illustrate the navigation on a home page e.g. The
individual pages are boxes, with the file name. There's an arrow from one box to another if
there's a link from one to the other. Figure 3.14 shows the overall flow that is navigation
diagram of the Distributed Metadata Management for Large Cluster-Based Storage Systems .
Distributed Metadata Management for Large
Cluster-Based Storage Systems
Login Finding
Network
Computers

Meta Data
Creation

Searching
Files

Distributed Metadata Management for Large Cluster-Based Storage Systems

25

CHAPTER IV
SYSTEM TESTING

The system was tested at various stages of the development process to fix bugs at the
earliest stage and produce a bug-free deliverable. The major quality objective was to provide
correct indicators to the users.

4.1 TEST CASES AND REPORTS
A test case is a set of test inputs, execution conditions, and expected results developed
for a particular objective, for instance to exercise a particular program path or verify
compliance with a specific requirement. The purpose of a test case is to identify and
communicate conditions that will be implemented in test. Test case data are created with the
express intent of determining whether the system will process them correctly. Test cases are
necessary to verify successful and acceptable implementation of the product requirements.
Test cases can be classified into positive and negative test cases. In case of positive
test cases, the valid data is given and the corresponding action is expected. In case of negative
test cases, invalid or incomplete data given and the required action are expected. The test
cases and reports are listed below in Table 4.1














Distributed Metadata Management for Large Cluster-Based Storage Systems

26


Table 4.1 Test Cases and Reports


Step Expected Actual

Username_
01
To verify
that User
name on
login page
must be
greater than
3
characters
Enter User
name less than
3 chars and
password and
click Submit
button
an error message
User name not
less than 3
characters must
be displayed
Actual
result is
same as
expected
result
Pass
Enter User
name 3 chars
and password
and click
Submit button
Login success full
or an error
message Invalid
User name or
Password must
be displayed
Actual
result is
same as
expected
result
Pass
Username_
02
To verify
that User
name on
login page
should not
be greater
than 10
characters
Enter User
name greater
than 10 chars
and password
and click
Submit button
an error message
User name not
greater than 10
characters must
be displayed
Actual
result is
same as
expected
result
Pass
Username_
03
To verify
that User
name on
login page
does not
take special
characters
Enter User
name with
special
characters and
enter password
and click
Submit button
an error message
Special chars not
allowed in login
must be displayed
Actual
result is
same as
expected
result
Pass








Distributed Metadata Management for Large Cluster-Based Storage Systems

27











Pwd_01 Validate
Password
To verify
that
Password on
login page
must be
greater than
3 characters
Enter Password
less than 6 chars
and Login Name
and click Submit
button
an error message
Password not less
than 3 characters
must be displayed
Actual
result is
same as
expected
result
Pass
Enter Password
3 chars and
Login Name and
click Submit
button
Login success full
or an error message
Invalid Login or
Password must be
displayed
Actual
result is
same as
expected
result
Pass
Pwd_02 Validate
Password
To verify
that
Password on
login page
must be less
than 10
characters
Enter Password
greater than 10
chars and Login
Name and click
Submit button
an error message
Password not
greater than 10
characters must be
displayed
Actual
result is
same as
expected
result
Pass
PwPwd_03 Validate
Password
To verify
that
Password on
login page
should not
allow special
characters
Enter Password
with special
characters(say
!@hi&*P) Login
Name and click
Submit button

Login success full
or an error message
Invalid Login or
Password must be
displayed

Actual
result is
same as
expected
result
Pass
Distributed Metadata Management for Large Cluster-Based Storage Systems

28










Mobile_01 Validate
the
mobile
number
To verify
that the
mobile
number on
Registration
page is
exactly 10
digits
Enter mobile
Number less
than 10 digits


an error message
Mobile number
must have 10
digits must be
displayed
Actual
result is
same as
expected
result
Pass
Enter mobile
Number greater
than 10 digits
an error message
Mobile number
must have 10
digits must be
displayed
Actual
result is
same as
expected
result
Pass
Mobile_02 Validate
the
mobile
number
To verify
that the
mobile
number
Registration
page does
not have
any special
characters
or
characters

Enter mobile
number by
including
special
characters
an error message
Special chars not
allowed in mobile
number must be
displayed

Actual
result is
same as
expected
result

Pass





Enter mobile
number by
including
special
characters
an error message
Characters not
allowed in mobile
number must be
displayed
Actual
result is
same as
expected
result
Pass

Distributed Metadata Management for Large Cluster-Based Storage Systems

29


CHAPTER V
SYSTEM IMPLEMENTATION

System implementation is the process of making the newly designed system fully
operational and consistent in performance. That is, implementation is the process of having
the personnel check out and use new equipment and train the users to use the new system. If
the implementation is not carefully planned and controlled, it can cause chaos. Thus it can be
considered to be the most crucial stage in achieving a successful new system and in giving
the users confidence that the new system will work and be effective.























Distributed Metadata Management for Large Cluster-Based Storage Systems

30

CHAPTER VI
CONCLUSION

the efficiency of using the HBA scheme to represent the metadata distribution of all files and
accomplish the metadata distribution and management in cluster-based storage systems with
thousands of nodes. Both our theoretic analysis and simulation results indicated that this
approach cannot scale well with the increase in the number of MSs and has very large
memory overhead when the number of files is large. By exploiting the temporal access
locality of file access patterns, this paper has proposed a hierarchical scheme, called HBA,
that maintains two levels of BF arrays, with the one at the top level succinctly representing
the metadata location of most recently visited files on each MS and the one at the lower level
maintaining metadata distribution information
of all files with lower accuracy in favor of memory efficiency. The top-level array is small in
size but has high lookup accuracy. This high accuracy compensates for the relatively low
lookup accuracy and large memory requirement in the lower level array. Our extensive trace-
driven simulations show that the HBA scheme can achieve an efficacy comparable to that of
PBA but at only 50 percent of memory cost and slightly higher network traffic overhead
(multicast). On the other hand, HBA incurs much less network traffic overhead (multicast)
than the pure LRU BF approach. Moreover, simulation results show that the network traffic
overhead introduced by HBA is minute in modern fast networks. We have implemented our
HBA design in Linux and measured its performance in a real cluster. The experimental
results show that the performance of HBA is very promising. Under heavy workloads,
HBAwith 16MSscan reduce the metadata operation time of a single-metadata-server
architecture by a factor of up to 43.9. Compared with other existing solutions to
decentralizing metadata management, HBA retains most of their advantages while avoiding
their disadvantages. It not only reduces the memory overhead but also balances the metadata
management workload, allows a fully associative placement of metadata of files, and does
not require metadata migration during file or directory renaming and node additions or
deletions. We have also designed an efficient and accurate metadata prefetching algorithm to
further improve metadata operation performance [50]. We are incorporating this prefetching
scheme into HBA.



Distributed Metadata Management for Large Cluster-Based Storage Systems

31


There are several limitations in this research work: . Our workload trace does not include
metadata operations of parallel I/Os used in scientific applications. This is mainly due to the
lack of long-term traces that include sufficient amount of metadata operations. . When a new
MS is added, a self-adaptive mechanism is needed to automatically rebalance the metadata
spatial distribution. For the purpose of this project, we assume that such a mechanism is
available.
Future Enhancements
The project has covered almost all the requirements. Further requirements and
improvements can easily be done since the coding is mainly structured or modular in nature.
Improvements can be appended by changing the existing modules or adding new modules.

You might also like