Spliting and Merging of Video Files in Cloud Environment

Spliting and Merging of Video Files in Cloud Environment
Table of Contents
Chapter Page No.

Chapter 1 Introduction 2
1.1 Problem Statement
1.2 Description of system
Chapter 2 Review of Literature 8
Chapter 3 Project Scheduling 11
3.1 Gantt Chart
Chapter 4 System Analysis 13
4.1 Requirements analysis
4.2 Risk analysis
Chapter 5 System Design 16
5.1 Architectural Diagram
5.2 Class Diagrams, Object Diagrams, Sequence Diagrams, Activity
Diagram (Object Oriented Approach)
Chapter 6 Implementation 23
6.1 Methodology used
6.2 Technologies used
Chapter 7 Testing 32
7.1 Test cases
Chapter 8 Result 38
8.1 Few analysis and results with relevant information
Chapter 9 Conclusion 43
Chapter 10 Bibliography 45
1
CHAPTER 1
INTRODUCTION
2
Large infrastructure investment should be required for delivery of video contents to the end-users
where time is considered to be crucial. Moreover in some cases the criticality of the speed of
publication is a major issue. When some short breaking news needs to be put across, there will
be disruption if the compression and encoding time is not considered. The expense of large
storage devices for the preprocessing is characterized by its great cost and little flexibility, as a
user-specific need. The problem becomes more critical when the volume of information to be
processed is variable, i.e., there is a seasonal variation of demand for processing. In such
situation, the ability to build adaptive systems, capable of using on demand resources provided
by Cloud Computing is very interesting. Cloud computing is an Internet based services, where
we share some of the services like software, platform, infrastructure, storage, databases to
computer or other devices on demand by the users. Services are sold on demand, for a
minute/hourly basis, services are fully managed by the providers and consumer need only is a
computer and Internet access. In Cloud computing platform , physical machines are virtualized,
and a large variety of virtual machines (VMs) form a virtual cluster. Offering virtualized
resources on demand is known as Infrastructure as a Service (IaaS). To achieve higher levels of
resource utilization, techniques such as workload balancing across physical servers and storage
frames can be used. Workload balancing is achieved with VM live migration, which migrate
virtualized applications between physical resources within a resource pool in a way that is
transparent to users and does not interrupt the service provided by the Cloud platform
Cloud Computing Platform
3
1.1 PROBLEM STATEMENT
The phenomenal growth of Internet technologies such as social networking services (SNSs)
allows users to disseminate multimedia objects. SNS and media content providers are constantly
working toward providing multimedia-rich experiences to end users. Although the ability to
share multimedia objects makes the Internet more attractive to consumers, clients and underlying
networks are not always able to keep up with this growing demand. Users access multimedia
objects not only from traditional desktops but also from mobile devices, such as smart phones
and smart pads, whose resources are constrained in terms of processing, storage, and display
capabilities. Multimedia processing is characterized by large amounts of data, requiring large
amounts of processing, storage, and communication resources, thereby imposing a considerable
burden on the computing infrastructure. The traditional approach to transcoding multimedia data
requires specific and expensive hardware because of the high-capacity and high definition
features of multimedia data. Therefore, general purpose devices and methods are not cost
effective, and they have limitations. To solve these problems, new storage system and big data
processing techniques for processing and storing large amounts of data are required to support
the development environments of social networking services. There is also need of converting
this data specially video data using map reduce development environment to easily develop
programming framework to distribute and process large amounts of social media in distributed
processing systems.
To stores a large amount of video data one way is to use HDFS for distributed parallel
processing. The second problem can be solved by processing the stored image data in HDFS
using the MapReduce framework.
Video file is splitted into several independent subparts/chunks, distributed among the available
nodes, and compressed in parallel. For the same, the Hadoop Distribution File System (HDFS)
represents a distributed, scalable and portable file system infrastructure and MapReduce is a
programming model designed for simplifying parallel data processing on clusters.
4
PROPOSED SYSTEM
FFMpeg and Mencoder are Open Source tools for video splitting and compression. An initial
splitter Mencoder or Hadoop (internally) along with the Hadoop distributed file system (HDFS)
has been experimented with, where the video file is splitted with chunks based on time slices and
size and transcoded on a private Cloud. The effect on the reduction on the compression time with
the initial split method based on Mencoder and Hadoop have been experimented with. The
experiment to evaluate the performance was conducted on a private Cloud setup based on
Meghdoot (Open Source Cloud Stack) with 60 nodes divide into two clusters. The architecture is
based upon the assumptions that VMs have similar storage, processing power, memory and
network. These are connected through gigabit Ethernet connection. All the cloud nodes have
single Intel Pentium core i3 (2.5 GHz to 3.0 GHz) processor and 4GB RAM. HDFS is designed
to run on clusters of commodity machines and Hadoop implements MapReduce, where the
application's process is divided into many small fragments of work, each of which may be
executed on any node in the cluster i.e. the actual parallel processing . The basic underlying
model used after the splitting of the frames is the MapReduce model, which has been coexistent
in the HDFS system. In the existing system the Hadoop framework has been pre-installed in all
the nodes. These nodes are behaving as virtual machines and in this case they all are bearing the
same configuration. The task involved in the MapReduce framework is a two-step process. In the
first step the initial file is broken down into fragments and later on submitted to the VMs to work
upon them .In the second step the processed fragments are combined together to obtain the
desired output.
The number of splits based on the split variable (splitted by Mencoder or Hadoop) is compatible
with various implementations of audio and video codecs. At the end of the processing step
(reduce) all fragments are reconstructed by merging them together. While splitting the file at
regular intervals the important point needs to be ensured that no frame coexists in more than one
chunk and that no frame is lost.
5
Experimental State transition diagram for MPEG4 Conversion
6
1.2 DESCRIPTION OF SYSTEM
1. Problems with small files and HDFS
A small file means one which is significantly smaller than the HDFS block size (default 64MB).
If you’re storing small files, then you probably have lots of them (otherwise you wouldn’t turn to
Hadoop), and the problem is that HDFS can’t handle lots of files.
Furthermore, HDFS is not geared up to efficiently accessing small files: it is primarily designed
for streaming access of large files. Reading through small files normally causes lots of seeks and
lots of hopping from data node to data node to retrieve each small file, all of which is an
inefficient data access pattern.
2. Problems with small files and MapReduce
Map tasks usually process a block of input at a time (using the default FileInputFormat). If the file
is very small and there are a lot of them, then each map task processes very little input, and there
are a lot more map tasks, each of which imposes extra bookkeeping overhead. Compare a 1GB
file broken into 16 64MB blocks, and 10,000 or so 100KB files. The 10,000 files use one map
each, and the job time can be tens or hundreds of times slower than the equivalent one with a
single input file.
1.3 LIMITATIONS OF EXISTING SYSTEM:
 General purpose devices and methods are not cost effective, and they have limitations.
 Recently, transcoding based on cloud computing has been investigated in some studies.
 Increased the burden of computing power.
 Required transcoding and transmoding techniques using hardware.
7
CHAPTER 2
LITERATURE REVIEW
8
The distribution of tasks in a cluster for parallel processing is not a new concept, and there are
several techniques that use this idea to optimize the processing of information. The Map-Reduce
paradigm , for example, is a framework for processing huge datasets of certain kinds of
distributable problems using a large number of computers (nodes), collectively referred to as a
cluster. It consists of an initial Map stage, where a master node takes the input, chops it into
smaller or sub-problems, and distributes the parts to worker nodes, which process the
information; following there is the Reduce stage, where the master node collects the answers to
all the sub-problems and combines them to produce the job output.The process is illustrated in a
popular Map-Reduce implementation is Apache’s Hadoop , which consists of one Job Tracker,
to which client applications submit Map-Reduce jobs. The Job Tracker pushes work out to
Permission to make digital or hard copies of all or part of this work for personal or classroom use
is granted without fee provided that copies are not made or distributed for profit or commercial
advantage and that copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee. available Task Tracker nodes in the cluster, which execute the map and
reduce tasks.
However, despite being a very appealing and efficient technique for processing large volumes of
data, there are a number of challenges associated with the deployment of Map-Reduce
architectures. One of them is the required infrastructure, To make the process truly effective, one
needs several machines acting as nodes, which often requires a large upfront investment in
infrastructure. This point is extremely critical in situations where the processing demand is
seasonal. In addition, fault tolerance issues and the need of a shared file system to support
mappers and reducers make the deployment of a Map-Reduce architecture[8] complex and
costly.
9
A. DISTRIBUTED VIDEO PROCESSING
It focuses on a split and merge architecture for processing large amount of distributed data.
The Split and Merge architecture for high performance video processing is a generalization
of the MapReduce paradigm that rationalizes the use of resources by exploring on demand
computing this approach reduces video encoding times to fixed duration, independently of
the input size of the video file by using dynamic resource provisioning in the Cloud.
B. DISTRIBUTED CONTENT ANALYSIS USING MAPREDUCE
This gives a scalable solution for performing video analysis on distributed data with the help
of a programming model called MapReduce [6]. This framework of Apache Hadoop is used
for job scheduling and for distribution of video data. In this [3] face detection is taken as a
case example. The performance measure of distributed content based video analysis is
calculated and is compared with the other alternatives. It is seen from the observations that
Hadoop’s performance is more efficient for this face detection task and it gives minimal
computational overhead. D. Optimization for Large Scale Image Processing.This [4] presents
two approaches for processing large scale of images using Hadoop Distributed File System
(HDFS) [7]. Large number of images causes slowdown of HDFS thereby increasing the
initialization time and the overhead time. The two approaches are to convert the various
small size images into a single large file by merging them and the second approach is
combining several images for a single task without merging them. On carrying out
performance evaluations it is seen that this proposed approach of combining and then
processing is the most effective method of large scale image processing in distributed
environment.
10
CHAPTER 3
PROJECT SCHEDULING
11
3.1 GANTT CHAT
12
CHAPTER 4
SYSTEM ANALYSIS
13
4.1 REQUIREMENTS
4.1.1 Software Requirement:-

 Java v.1.6.31
 MySql Databse
 Eclipse version(4.5)mars
 Operating Systems(Mac,Redhat,Windows vista and etc)
4.1.2 Hardware Requirement:-
 Processor : PENTIUM IV and more

 Ram capacity : 1 GB(minimum)
 Cpue speed : 2.2GHZ and more
14
4.2 RISK ANALYSIS
Although a detailed project plan was devised, giving sufficient amount of time for the different
tasks under normal conditions, it was essential to consider several risks that could hinder the
project. The following table lists possible risks as well as suggested solutions that could
minimize their effects.
RISK PROBABLITY CONTIGENCY PLAN

Change Of Requirements Medium The requirements will be
reviewed and adjusted
throughout the project.
Unrealistic prediction of Low The plan will be reviewed
time required for the task throughout the project and
time is allocated to deal

with small delays
Hardware Failures Medium All work done will be
backed up regularly in
several locations.
Software Failures Low Ensure that the required
software is available in
more than one locations.
Resources are not available Medium Ensure alternate resources
are located if required
15
CHAPTER 5
SYSTEM DESIGN
16
5.1 ARCHITECTURE
The proposed Split&Merge architecture instantiated to video encoding
17
5.2 ACTIVITY DIAGRAM
Activity diagrams are graphical representations of workflows of stepwise activities and

actions with support for choice, iteration and concurrency. In the Unified Modeling Language,
activity diagrams are intended to model both computational and organizational processes (i.e.
workflows). Activity diagrams show the overall flow of control.
Activity diagrams are constructed from a limited number of shapes, connected with arrows. The
most important shape types:
 rounded rectangles represent actions;

 diamonds represent decisions;
 bars represent the start (split) or end (join) of concurrent activities;
 a black circle represents the start (initial state) of the workflow;
 Anencircled black circle represents the end (final state).
Arrows run from the start towards the end and represent the order in which activities
happen.Hence they can be regarded as a form of flowchart. Typical flowchart techniques lack
constructs for expressing concurrency. However, the join and split symbols in activity diagrams
only resolve this for simple cases; the meaning of the model is not clear when they are arbitrarily
combined with decisions or loops.
18
19
Select Split utility Select Merge utility
Specify file size, location Specify ‘.jfs file’ , location

Reinsert the correct values Reinsert the correct values
[ if invalid ] [ if invalid ]
[ if valid ] [ if valid ]
Process splitting action Process merging action
Split activity Merge activity
Select Option Utility
Set the Custom setting or

Default setting
Reinsert the correct
values [ if
invalid ]
[ if valid ]
Process Setting
Action
Set Option Activity

ACTIVITY Diagram
20
5.3 USECASE DIAGRAM
A use-case diagram is a graph of actors, a set of use cases enclosed by a system

boundary, participation associations between the actors and the use-cases, and
generalization among the use cases. In general, the use-case defines the outside
(actors) and inside(use-case) of the system’s typical behavior. A use-case is shown
as an ellipse containing the name of the use-case and is initiated by actors.
USECASE Diagram
21
5.4 SEQUENCE DIAGRAM
The sequence diagrams are an easy and intuitive way of describing the system’s behavior,
which focuses on the interaction between the system and the environment. This notational
diagram shows the interaction arranged in a time sequence. The sequence diagram has two
dimensions: the vertical dimension represents the time, the horizontal dimension represents
different objects. The vertical line also called the object’s lifeline.
SEQUENCE Diagram
22
CHAPTER 6
IMPLEMENTATION
23
6.1 SYSTEM ARCHITECTURE
In Cloud computing platform , physical machines are virtualized, and a large variety of virtual
machines (VMs) form a virtual cluster. Offering virtualized resources on demand is known as
Infrastructure as a Service (IaaS). To achieve higher levels of resource utilization, techniques
such as workload balancing across physical servers and storage frames can be used. Workload
balancing is achieved with VM live migration, which migrate virtualized applications between
physical resources within a resource pool in a way that is transparent to users and does not
interrupt the service provided by the Cloud
Cloud Computing Platform
Video file is splitted into several independent subparts/chunks, distributed among the available
nodes, and compressed in parallel. For the same, the Hadoop Distribution File System (HDFS)
represents a distributed, scalable and portable file system infrastructure and MapReduce is a
programming model designed for simplifying parallel data processing on clusters.
24
The Map-Reduce paradigm, for example, is a framework for processing huge datasets of certain
kinds of distributable problems using a large number of computers (nodes), collectively referred
to as a cluster. It consists of an initial Map stage, where a master node takes the input, chops it
into smaller or sub-problems, and distributes the parts to worker nodes, which process the
information; following there is the Reduce stage, where the master node collects the answers to
all the sub-problems and combines them to produce the job output. A popular Map-Reduce
implementation is Apache’s Hadoop , The Job Tracker pushes work out to Permission to make
digital or hard copies of all or part of this work for personal or classroom use is granted without
fee provided that copies are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to
post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
available Task Tracker nodes in the cluster, which execute the map and reduce tasks.
Map Reduce architecture
25
6.2 SYSTEM ANALYSIS
Video applications require some form of data compression to facilitate storage and transmission.
Digital video compression is one of the main issues in digital video encoding, enabling efficient
distribution and interchange of visual information. The process of high quality video encoding is
usually very costly to the encoder, which, and require a lot of production time. When we
consider situations where there are large content volumes, this is even more critical, since a
single video may require the server’s processing power for long time periods. Moreover, there
are cases where the speed of publication is a critical point. Journalism and breaking news are
typical applications in which the time-to- market the video is very short, so that every second
spent in video encoding may represent a loss of audience. Figure 2 shows the speed of encoding
of a scene, measured in frames per second, with different implementations of the H.264
compression standard [8]. We note that the higher the quality, i.e., the bitrate of the video output,
the lower the speed of encoding. In order to speed up encoding times, there are basically two
solutions. The first one is to augment the investment in encoding hardware infrastructure, to be
used in full capacity only at peak times. The downside is that the infrastructure will be idle the
remaining of the time. The second solution is to try and optimize the use of available resources.
The ideal scenario is to optimize resources by distributing the tasks among them evenly. In the
specific case of video encoding, the intuitive solution is to break a video into several pieces and
distribute the encoding of each piece among several servers in a cluster. The challenge of this
approach is to split, as well as merge video fragments without loss in synchronization. Given a
video encoded in H.264, for example, a split in a frame other than the key-frame (B or P frames)
could be disastrous, because the remaining frames depend on information registered in the key
frame to be regenerated.
26
Our system will be having the following flow of data
Flow chart
27
6.3 TECHNOLGY USED
6.3.1 The Split Step
In what follows we describe a technique for reducing video encoding times, based on distributed
processing over cluster or cloud environments, implemented using the Split&Merge architecture.
The fragmentation of media files and the distribution of encoding tasks in a cluster consists of an
advanced solution for increasing the performance of encoding, and an evolution of the simple
distribution of single complete video encoding tasks in a cluster or cloud. The idea is to break the
media files into smaller files so that its multiple parts can be processed simultaneously on
different machines, thereby reducing the total encoding time of the video file. The problem in the
case of video files is that, unlike a text file, we can not split it anywhere.If the video to be
encoded already provides some form of temporal compression, then it would be necessary to first
identify its key-frames, so that the cuts are made at their exact positions. Furthermore, to avoid
synchronization between audio and video problems, we must separate the two, so that they can
be independently compressed. Situations where the original video does not show temporal
compression, are special cases where the video can be split at specific frame numbers or at
regular intervals. The important point here is to ensure that no frame coexists in more than one
chunk, and that no frame is lost. Assuming an input video in high definition at 1080p, with 29.97
frames per second, encapsulated in AVI and compressed with MJPEG codec (no temporal
compression), and an audio stream stereo PCM, with sampling rate of 44100Hz, the first step in
split task would be separate video stream from the audio stream. This is because video encoding
is much more complex, and requires much more computer resources than audio encoding.
Because the overall impact to the performance is very small, the audio stream is processed in one
piece (no fragmentation). Furthermore, if processed together, chunks containing both audio and
video may generate various synchronization problems, since audio frames do not necessarily
have the same temporal size than video frames.
28
The proposed Split&Merge architecture instantiated to video encoding
We thus avoid processing both streams simultaneously, for it may generate audible glitches,
delays and undesirable effects. After splitting audio from video, the video stream must be broken
at regular intervals. Note that this is only valid in the case where there is no temporal
compression at the input. The ideal here is to make chunks with a constant amount of frames,
and not based on runtime. When using a time shift split, it is important to make sure that there is
no loss or duplication of frames in chunks. A key point in the fragmentation of the input video
is to determine the size of the chunks to be generated. This decision is closely related with the
output that is generated, that is, the video codec and compression parameters passed to it in the
processing step. This is because, after processing, the chunks will present a key- frame in its
beginning and end. Fragmentation in chunks performed indiscriminately, will produce an output
video, after the merge, with an excess of key-frames, which reduces the efficiency of
compression. To have an idea, it is frequent the use of spacing between key-frames of 250
frames, when we have a video with 29.97fps. Thus, if in the split step, chunks are generated with
less than 250 frames, we will inevitably reduce the efficiency of the temporal compression of the
29
encoder. A good approach is to perform the split so that the number of chunks generated is equal
to the number of nodes available for processing. However, when we use an elastic processing
structure, we can further optimize this split, analyzing what is the optimum amount of chunks to
be generated, which certainly varies according the duration of the video, and the characteristics
of the input, and output to be produced. To have this optimized split, would be necessary to
implement a decision-making algorithm to evaluate the characteristics of input and output,
choosing what size of fragment will use resources more efficiently, producing a high quality
result and with an acceptable response time. The implementation of this algorithm is quite
desirable in order to improve the efficiency of the process, however, it is beyond the initial scope
of this work. When we split a video file into several chunks, or smaller files, we must repair the
container of them, rewriting the header and trailer, most of the time. This process can be avoided
with a very interesting method. When we refer to split the video, we are actually preparing the
data to be distributed in a cluster, and to be processed in parallel. If in the split step, instead of
breaking the video file, we just identify the points of beginning and end of each chunk, then it
would not be necessary to rewrite the container, which would consequently reduce the encoding
time. The disadvantage in this case would be that all nodes should have read access to the
original file, which could be implemented through a share of the file system, as an NFS mount,
or even through a distributed file system with high read throughput.
6.3.2 The Process Step
Once video is fragmented, the chunks generated should be distributed among the nodes to be
processed. In the specific case of video compression, this process aims at reducing the size of the
video file by eliminating redundancies. In this step, a compression algorithm is applied to each
chunk, resulting in a compressed chop of the original video. The process of chunk encoding is
exactly equal to what would be done if the video was processed without fragmentation, i.e. it is
independent of the split and the amount of chunks generated. However, if the option to simply
mark the points of beginning and end of chunks was used during the split, then the processing
step should also have read access to all the original video, and must seek to the position of the
30
start frame, and stop the process when the frame that indicates the end of the chunk is achieved.
There are several open source tools for video compression, among the most popular, ffmpeg[12]
and mencoder[13], which are compatible with various implementations of audio and video
codecs. It is possible, for example, use mencoder to implement the processing step, performing a
compression of a high-definition video, generating an output that can be viewed on the Internet,
or even on mobile devices that have a UMTS[11] or HSDPA[12] connectivity. In this case, we
could use the H.264 Baseline Profile with 280kbps, and a 480x360 resolution, performing,
therefore, an aspect ratio adjustment. In addition to processing the video chunks, it is also
necessary to process the audio stream, which was separated during the split step. Audio
compression is a simple process, with low computational cost, and can be performed with the
same tools used for video compression.At the end of the processing step, we have all the
compressed chunks, as well as the audio stream. To obtain the desired output, we must merge all
fragments, thus reconstructing the original content.
6.3.3 The Merge Step
The merge step presents a very interesting challenge, which consists of reconstructing the
original content from its parts, so that the fragmentation process is entirely transparent to the end
user. This means that not only the joining of the fragments of video should be perfect, but also
that the audio and video must be fully synchronized. Note that the audio stream was separated
from the video before the fragmentation process took place. As compression does not affect the
length of the content, in theory after merging the processed chunks, we just need to realign the
streams through content mixing. The first phase of the merge step is to join the chunks of
processed video, which can be accomplished easily by ordering the fragments and rewriting the
container.
After the split, process and merge steps, implemented using the proposed architecture, we
created a fully parallel and distributed video compression process, where the different pieces of
content can be processed simultaneously in a cluster or, alternatively, using resources in the
Cloud.
31
CHAPTER 7
TESTING
32
7.1 TESTING APPROACH

Software testing is an investigation conducted to provide stakeholders with information about
the quality of the product or service under test.Software testing can also provide an objective,
independent view of the software to allow the business to appreciate and understand the risks of
software implementation. Test techniques include, but are not limited to, the process of
executing a program or application with the intent of finding software bugs.
System Test
System testing of software or hardware is testing conducted on a complete, integrated system to
evaluate the system's compliance with its specified requirements. System testing falls within the
scope of black box testing, and as such, should require no knowledge of the inner design of the
code or logic.
White Box Testing

White-box testing (also known as clear box testing, glass box testing, transparent box testing,
and structural testing) is a method of testing software that tests internal structures or workings of
an application, as opposed to its functionality (i.e. black-box testing). In white-box testing an
internal perspective of the system, as well as programming skills, are used to design test cases.
The tester chooses inputs to exercise paths through the code and determine the appropriate
outputs. This is analogous to testing nodes in a circuit
Black Box Testing

Black-box testing is a method of software testing that examines the functionality of an
application without peering into its internal structures or workings. This method of test can
be applied to virtually every level of software testing:unit, integration, system and
acceptance. It typically comprises most if not all higher level testing, but can also dominate
unit testing as well.
33
7.2 TYPE OF TESTING
Unit Testing
In computer programming, unit testing is a software testing method by which individual units of
source code, sets of one or more computer program modules together with associated control
data, usage procedures, and operating procedures, are tested to determine whether they are fit for
use. Test strategy and approach The field testing id performed manually and functional tests will
be written in detail.
Test objectives
Text field shoud be entered properly by checking worling of required field validator. Pages
must be valid and activated from the identified link.
The pop-ups &error messages, entry screen, messages and responses must not be delayed.
Features to be tested
The data Entered In the fields should be proper and as per defined format.
Redundant entries should not be allowed.
Navigation through the link should not misguide the user.
Integration Testing
Integration testing (sometimes called integration and testing, abbreviated I&T) is the phase in
software testing in which individual software modules are combined and tested as a group. It
occurs after unit testing and before validation testing. Integration testing takes as its input
modules that have been unit tested, groups them in larger aggregates, applies tests defined in an
integration test plan to those aggregates, and delivers as its output the integrated system ready
for system testing. Test Results: All the test cases stated above passed successfully. No defects
found.
34
Acceptance Testing
Acceptance testing is a test conducted to determine if the requirements of a specification or
contract are met. It may involve chemical tests, physical tests, orperformance tests. Test Results:
All the test cases stated above passed successfully. No defects found.
Functional test
Functional testing is a quality assurance (QA) process and a type of black box testing that bases
its test cases on the specifications of the software component under test. Functions are tested by
feeding them input and examining the output, and internal program structure is rarely considered
(not like in white-box testing).
Testing is vital to the success of the system. System testing makes a logical assumption that if all
parts of the system are correct, the goal will be successfully achieved. In the testing process we
test the actual system in an organization and gather errors from the new system operates in
full efficiency as stated. System testing is the stage of implementation, which is aimed to
ensuring that the system works accurately and efficiently.
In the testing process we test the actual system in an organization and gather errors from the new
system and take initiatives to correct the same. All the front-end and back-end connectivity are
tested to be sure that the new system operates in full efficiency as stated. System testing is the
stage of implementation, which is aimed at ensuring that the system works accurately and
efficiently.
The main objective of testing is to uncover errors from the system. For the uncovering process
we have to give proper input data to the system. So we should have more conscious to give input
data. It is important to give correct inputs to efficient testing.
Testing is done for each module. After testing all the modules, the modules are integrated and
testing of the final system is done with the test data, specially designed to show that the system
will operate successfully in all its aspects conditions. Thus the system testing is a confirmation
that all is correct and an opportunity to show the user that the system works. Inadequate testing
or non-testing leads to errors that may appear few months later.
35
This will create two problems
Time delay between the cause and appearance of the problem. The effect of the system errors on
files and records within the system.
The purpose of the system testing is to consider all the likely variations to which it will be
suggested and push the system to its limits.
The testing process focuses on logical intervals of the software ensuring that all the statements
have been tested and on the function intervals (i.e.,) conducting tests to uncover errors and
ensure that defined inputs will produce actual results that agree with the required results. Testing
has to be done using the two common steps Unit testing and Integration testing. In the project
system testing is made as follows:
The procedure level testing is made first. By giving improper inputs, the errors occurred are
noted and eliminated. This is the final step in system life cycle. Here we implement the tested
error-free system into real-life environment and make necessary changes, which runs in an
online fashion. Here system maintenance is done every months or year based on company
policies, and is checked for errors like runtime errors, long run errors and other maintenances
like table verification and reports.
System Testing
Testing is done for each module. After testing all the modules, the modules are integrated and
testing of the final system is done with the test data, specially designed to show that the system
will operate successfully in all its aspects conditions. The procedure level testing is made first.
By giving improper inputs, the errors occurred are noted and eliminated. Thus the system
testing is a confirmation that all is correct and an opportunity to show the user that the system
works. The final step involves Validation testing, which determines whether the software
function as the user expected. The end-user rather than the system developer conduct this test
most software developers as a process called “Alpha and Beta test” to uncover that only the end
user seems able to find.
36
This is the final step in system life cycle. Here we implement the tested error-free system into
real-life environment and make necessary changes, which runs in an online fashion. Here system
maintenance is done every months or year based on company policies, and is checked for errors
like runtime errors, long run errors and other maintenances like table verification and reports.
Validations and Verifications Testing

Testing presents an interesting anomaly for the software engineer. During earlier software
engineering activities, the engineer attempts to build software from an abstract concept to a
tangible product. Now comes testing. The engineer creates a series of test cases that are intended
to “demolish” the software that has been built.Software engineers are by their nature
constructive people. Testing requires that the developer discard preconceived notions of the
“correctness” of software just developed and overcome a conflict of interest that occurs when
errors are uncovered.
If testing is conducted successfully (according to the objectives stated previously), it will
uncover errors in the software. As a secondary benefit, testing demonstrates that software
functions appear to be working according to specification, that behavioral and performance
requirements appear to have been met. In addition, data collected as testing is conducted provide
a good indication of software reliability and some indication of software quality as a whole. But
testing cannot show the absence of errors and defects, it can show only that software errors and
defects are present. It is important to keep this (rather gloomy) statement in mind as testing is
being conducted.
Functional testing typically involves six steps:
1. The identification of functions that the software is expected to perform
2. The creation of input data based on the function's specifications
3. The determination of output based on the function's specifications
4. The execution of the test case
5. The comparison of actual and expected outputs
6. To check whether the application works as per the customer need
37
CHAPTER 8
RESULTS
38
The Hadoop is being used as a framework for doing the mapping along with the merging
of the compressed video chunks with the split video file as input. An approach to have a split
count that is a multiple of the number of Virtual machines available in a private cloud is
obtained by the formula. The cluster is scaled properly, wherein the exact number of VM’s
is blocked to perform the required task submitted to it.
formula has been devised:
Vc = (Vi * Rf) / (Vm * Sf ) ………(1)
where Vi is the size of the Video input file, Rf is the Replication factor (default is 3) that is
the property associated with Hadoop fault tolerance, Vm is the number of Virtual machines
used by the Hadoop Clusters and Sf is the scaling factor. For example, if a video file (Vi) is
of 2.7 GB (approx. 2760MB), number of VM are 60, replication factor is 3 and the scaling
factor fixed to 2, then using eq. (1) the size of Video chunk (Vc) evaluated is 69MB.
Hence, we can further calculate the number of Video chunks (Vn)
Vn = Vi / Vc ………(2)
Now, using eq. (2), the number of Video chunks (Vn) will be 40. Similarly, if we assume
replication factor as 1, then for the same 2.7GB input video file the size of chunks will be 23MB
and the number of chunks will be 120.In a small cluster, the map task creation overhead is
considerable. So, dfs.block.size should be large in this case but small enough to utilize all the
cluster resources. The block size should be set according to the size of the cluster, map
task complexity, map task capacity of the cluster and average size of the input file. The
replication factor is set to 1 instead of default 3. The optimal scaling factor (Sf) can also be
calculated based upon the number of Vm’s in the Cloud cluster using the formula below:
Sf = (Vi * Rf) / (Vm * Vc) ………(3)
39
A private Cloud is setup, along with using the Hadoop MapReduce framework on top of
the Cloud and compared the complexities of two cases in terms of time for video file
compression. As a proof of concept a simple MapReduce test is implemented and tested on
the Cloud to provide an analysis of the distributed computation of MapReduce. The two
cases experimented upon are as follows:
Case 1: The file is split with the help of Mencoder (external splitting) and then is being
used as a transcoding tool with only the mapping stage. This is then transferred to the
HDFS system for the final desired output.
Case 2: The file is being split by Hadoop default splitting(internal splitting) and rests
same as Case 1. The number of chunks were allocated to the desired number of VM’s in
the Hadoop Map function and the time was computed which was taken for the entire
job completion i.e submitting of splitted file to the VMs, transcoding, merging together
and giving the output as an MPEG4 converted file.The experiments were conducted for the
above-mentioned cases using different segments i.e. from default splitting size of 64MB to
128MB and time segments of 2Min & 5Min.
` Video
File
Video 1.2 GB 2.7 GB 9.0 GB
Chunks
64 MB 4m 40s 5m 35s 10m
11s
128 MB 10m 9m 57s 9m 09s
46s 9m 09s
2 Min 3m 20s 5m 10s 9m 42s
5 Min 4m 05s 6m 48s 8m 30s
Time evaluation for encoding different video files on Hadoop Private Cloud
40
The experiments were conducted for the above-mentioned cases using different segments i.e.
from default splitting size of 64MB to 128MB and time segments of 2Min & 5Min.
Time variations for a 1.2GB video file with the selection of different no. of VMs
Time variations for a 2.7 GB video file using different chunks with the selection of different
no. of VMs
41
Time variations for a 9.0 GB video file using size & time chunks with the selection of
different no. of VMs
It was noticed that when the video chunks were made by the Mencoder (external splitter) and the
remaining processing was done by Hadoop, time taken to complete the said job was less than if
the splits were made internally by Hadoop. Even as the number of splits is the same for the
9.0GB fie the complete Hadoop processing is slower than the one with the external splitting
because the complexity involved in the split of the video file is more in Hadoop as depicted in
Fig 6. It is a possibility that the synchronization and separating the audio and video for the digital
video file factor has not been dealt with the Hadoop splitting process efficiently is not as good as
for the other data types. Mencoder splitting as per the Figures 4-6 proves more efficient than he
Hadoop splitting. MapReduce works best on the less number of splits/chunks for a particular task
as the complexity reduces. As the file becomes heavier, the scaling factor also comes into picture
where the number of splits is more than the number of VMs available results are better when the
segments are of bigger size or segments are made of larger time segments. The scaling factor
will have a considerable effect as the number of VMs is not infinite and it restricted in the private
Cloud. The devised formula helps us to find the optimal number of chunks a particular video file
should have to be processed in the minimum time. Hence the cost factor can be reduced with the
proper selection of number of splits and proper mapping/VMs.
42
CHAPTER 9
CONCLUSION
43
Distributed computing environment can process large data in less time. Combining Hadoop’s
open source framework with image processing can help in achieving task faster.
The private Clouds typically do not have enough resources to provide the illusion of infinite
capacity. Due to the limitation of VM nodes in private Cloud the elastic processing structure
could not be produced. We can further optimize these splits, analyzing what is the optimum
amount of chunks to be generated, which certainly vary according to the different data types .
Performance of Hadoop MapReduce jobs can be improved without increasing hardware costs, by
tuning several key configuration parameters for cluster specifications, input data size and
processing complexity.
44
CHAPTER 10
BIBLIOGRAPHY
45
[1] Rafael Pereira, Marcello Azambuja,Karin Breitman and Markus Endler.”An

Architecture for Distributed High Performance Video Processing in the Cloud”
*2+ Venkata Lakshmi.K, Venkateswaran.S.”The Research of Smart Surveillance System

Using Hadoop Based On Craniofacial Identification”,IJSET,Volume No.3, Issue No.3,pp :
216-221,2014
[3] Arto Heikkinen, Jouni Sarvanko, Mika Rautiainen and Mika Ylianttila. “Distributed
Multimedia Content Analysis with MapReduce”. In PIMRC, pages 3497 - 3501, 2013
*4+ İ. Demir, A. Sayar,” Hadoop Optimization for Massive Image Processing: Case Study
Face Detection”, International Journal of Computers Communication and Control ISSN
1841-9836, 9(6):664- 671, December, 2014
*5+ Rafael Silva Pereira, Karin K. Breitman, “Video processing in the Cloud” SpringerBriefs
in Computer Science, ISBN: 978-1-4471-2136-7
[6] http://hadoop.apache.org/docs/r0.18.3/mapred_tutorial.html
[7] http://hadoop.apache.org/docs/r1.2.1/hdfs_design.html
[8] http://hipi.cs.virginia.edu/
[9] Converting video formats with FFMpeg, Linux Journal archive- Issue 146, June 2006,
pp. 10
[10] Mencoder – http://www.mplayerhq.hu
[11] Apache Hadoop - http://hadoop.apache.org/mapreduce/
[12] Daniel Gmach & Ludmila Cherkasova, HP Labs, Palo Alto, CA (USA) and Jerry Rolia
HP Labs, Bristol (UK) “Resource and Virtualization Costs up in the Cloud: Models and
Design Choices”, 978-1-4244-9233-6/11, IEEE 2011
*13+ Rakesh Kumar Jha, Upena D Dalal, “On Demand Cloud Computing Performance
Analysis With Low Cost For QoS Application”, International Conference on Multimedia,
IMPACT-2011, 978-1-4577, IEEE 2011
46

Spliting and Merging of Video Files in Cloud Environment

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spliting and Merging of Video Files in Cloud Environment

Uploaded by

Copyright:

Available Formats

Spliting and Merging of Video Files in Cloud Environment

Chapter Page No.

Cloud Computing Platform

1.1 PROBLEM STATEMENT

Experimental State transition diagram for MPEG4 Conversion

1.2 DESCRIPTION OF SYSTEM

1. Problems with small files and HDFS

2. Problems with small files and MapReduce

1.3 LIMITATIONS OF EXISTING SYSTEM:

A. DISTRIBUTED VIDEO PROCESSING

B. DISTRIBUTED CONTENT ANALYSIS USING MAPREDUCE

3.1 GANTT CHAT

4.1.1 Software Requirement:-

4.1.2 Hardware Requirement:-

 Processor : PENTIUM IV and more

4.2 RISK ANALYSIS

RISK PROBABLITY CONTIGENCY PLAN

time is allocated to deal

The proposed Split&Merge architecture instantiated to video encoding

5.2 ACTIVITY DIAGRAM

Activity diagrams are graphical representations of workflows of stepwise activities and

 rounded rectangles represent actions;

Select Split utility Select Merge utility

Specify file size, location Specify ‘.jfs file’ , location

Process splitting action Process merging action

Split activity Merge activity

Select Option Utility

Set the Custom setting or

Set Option Activity

5.3 USECASE DIAGRAM

A use-case diagram is a graph of actors, a set of use cases enclosed by a system

5.4 SEQUENCE DIAGRAM

6.1 SYSTEM ARCHITECTURE

Cloud Computing Platform

Map Reduce architecture

6.2 SYSTEM ANALYSIS

Our system will be having the following flow of data

6.3 TECHNOLGY USED

6.3.1 The Split Step

The proposed Split&Merge architecture instantiated to video encoding

6.3.2 The Process Step

6.3.3 The Merge Step

7.1 TESTING APPROACH

White Box Testing

Black Box Testing

7.2 TYPE OF TESTING

This will create two problems

Validations and Verifications Testing

Vc = (Vi * Rf) / (Vm * Sf ) ………(1)

Sf = (Vi * Rf) / (Vm * Vc) ………(3)

5 Min 4m 05s 6m 48s 8m 30s

[1] Rafael Pereira, Marcello Azambuja,Karin Breitman and Markus Endler.”An

*2+ Venkata Lakshmi.K, Venkateswaran.S.”The Research of Smart Surveillance System

[10] Mencoder – http://www.mplayerhq.hu

[11] Apache Hadoop - http://hadoop.apache.org/mapreduce/

You might also like