Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

White Paper

PERFORMANCE AND TUNING TIPS FOR EMC


DOCUMENTUM FOUNDATION SERVICES
CONTENT TRANSFER

Abstract
This white paper explains the options of transferring content
over Documentum Foundation Services (DFS). It demonstrates
how different factors would impact the performance by the
experiment results. This document summarizes some useful
tuning tips to improve the content transfer performance.

September 2011

Copyright 2011 EMC Corporation. All Rights Reserved.


EMC believes the information in this publication is accurate as
of its publication date. The information is subject to change
without notice.
The information in this publication is provided as is. EMC
Corporation makes no representations or warranties of any kind
with respect to the information in this publication, and
specifically disclaims implied warranties of merchantability or
fitness for a particular purpose.
Use, copying, and distribution of any EMC software described in
this publication requires an applicable software license.
For the most up-to-date listing of EMC product names, see EMC
Corporation Trademarks on EMC.com.

Performance and Tuning Tips for


EMC Documentum Foundation Services Content Transfer

Table of Contents
Executive summary ................................................................
................................................................................................
..................................................................
.................................. 4
Audience ............................................................................................................................4

Introduction ................................................................
................................................................................................
............................................................................
............................................ 4
Recommendations ................................................................
................................................................................................
...................................................................
................................... 5
#1 Use Base64 Transfer Mode Only for Small Files ..............................................................5
#2 Use UCF to Optimize Large File Transmission .................................................................6
#3 Allocate Sufficient JVM Heap Size ..................................................................................7
#4 Re-use ActivityInfo to Avoid Creating New UCF Connections ...........................................8
#5 Use DataPackage to Transfer Multiple DataObject Instances ..........................................9
#6 Optimize UCF Server Configuration ................................................................................9

Conclusion ................................................................
................................................................................................
............................................................................
............................................ 10
References ................................................................
................................................................................................
............................................................................
............................................ 11

Performance and Tuning Tips for


EMC Documentum Foundation Services Content Transfer

Executive summary
EMC Documentum Foundation Services (DFS) supports standard web services transfer
modes (Base64 and MTOM), as well as proprietary technologies (UCF and ACS) that
optimize transfer of content in a distributed environment. The performance in DFS
content transmission is a function of a combination of many different factors. It
depends on Content Server, Java Virtual Machine configurations, and the use of DFS
consumer.
This white paper explains how these setting configurations would impact the
performance by the results of a series of tests. It provides some useful tuning tips and
recommendations to improve the content transfer performance.

Audience
This white paper is intended for application developers using Documentum
Foundation Services (DFS). It assumes that the readers possess a basic knowledge of
DFS.
The paper focuses on content transfer performance of DFS and provides a few tips for
optimization based on the experiment results. The reader is recommended to refer to
the Documentum Foundation Classes Guides for more information.

Introduction
The performance in DFS content transmission is a function of a combination of many
different factors. The purpose of this document is to provide the reader with some
basic guidance and recommendations in improving performance of content transfer
over DFS. A series of performance tests were executed in our performance laboratory
in order to figure out how different configuration settings may impact the performance
metrics. Several recommendations are made, based on the test results we observed.
The results presented in this document are collected in an internal test environment
where Oracle database, Content Server and DFS server were deployed on separate
VMWare virtual machines, as shown in Figure 1. Each virtual machine was allocated
with 4 CPUs and 8 GB memory and virtually located in a LAN environment with a 1
Gbps network interface. A software network simulator was used in DFS Client to
model different network conditions. If not specified, Java productivity layer was used
to build a DFS consumer program but the recommendations should also apply to the
.NET case.
As each customer environment is different, not all recommendations will have the
same effect as observed in these tests, but they have been documented to provide
options when optimizing a DFS solution.

Performance and Tuning Tips for


EMC Documentum Foundation Services Content Transfer

Figure 1 Test environment diagram

Recommendations
#1 Use Base64 Transfer Mode Only for Small Files
DFS supports standard WS transfer modes, both Base64 and MTOM. In Base64 mode,
the binary data is converted to characters which are embedded into SOAP envelope.
The experiment results in Figure 2 show that Base64 is pretty efficient in transferring
small
mall files in LAN environment (1Gbps bandwidth, 0ms latency). The left chart
compares the response time performance of a single thread uploading and the right
chart compares the throughput performance of uploading a batch of files with
multiple threads.
he performance of Base64 mode is comparable with MTOM mode for the file size
The
less than 100KB. In fact, in the test environment, it is even slightly more efficient in
transferring 10KB files as there is some overhead in encoding/decoding MTOM.

Figure 2 Comparison of the small file uploading performance over LAN

Performance
erformance and Tuning Tips for
EMC Documentum Foundation Services Content Transfer

The disadvantage of Base64 is the content is expanded up to 1.3 times which


requires more CPU time, memory usage and bandwidth for the data transmission.
Therefore, the performance
rformance is significantly degraded with large files. Figure 3 shows
that the response time with Base64 was nearly doubled to upload a 100MB file
compared with MTOM. As expected, when using multiple threads to upload a batch of
files, MTOM could achieve a much higher throughput. Besides, throughput thrashing
phenomena will occur as the thread number increases. This is because processing
Base64 content consumes a large amount of memory on the server side. The frequent
JVM garbage collection will decrease the system throughput under heavy loads.

Figure 3 Comparison of the large file uploading performance over LAN

#2 Use UCF to Optimize Large File Transmission


Unified Client Facilities (UCF) is a remote content transfer application, and is available
in the productivity layer in Java (remote mode only) and also in .NET. UCF provides a
series of performance optimizations for content transmission. UCF will compress, by
default, the content and enable direct content transfer between the client machine
and a Content Server host, as opposed to transferring the content from the content
server and the DFS server, and then on to the client machine, which is required for the
other transfer mechanisms.
Figure 4 compares the uploading response ti
time
me of MTOM mode and UCF mode. Note
that in order to provide a byte
byte-by-byte
byte comparison, the UCF server was configured not
to compress the file content before transmission, so the actual transferred bytes are
same for both MTOM and UCF. In normal UCF configu
configuration,
ration, the benefit might be even
larger.
The DFS client will upload the file data to the Content Server directly rather than
through DFS server. Therefore, using UCF is effective in reducing the end
end-to
to-end
response time for uploading a large file.

Performance
erformance and Tuning Tips for
EMC Documentum Foundation Services Content Transfer

Figur
Figure
e 4 Comparison of MTOM and UCF

For the small files, we have seen that MTOM will have a better response time
performance than UCF, as showed in Figure 5. Although the content must be
transferred between Content Server and DFS serve
serverr in MTOM, such cost is quite small
since the servers are usually deployed in a LAN environment. On the other side, the
client will make extra communications with the servers to establish the connection
before the UCF transmission. This will contribute lar
large
ge portion of the response time,
especially when the network latency is high between the client and the server.

Figure 5 Importing a 100KB file (MTOM vs. UCF)

#3 Allocate Sufficient JVM Heap Size


In general, the memory usage of DFS server is increased with the growth of the
number of active threads. Therefore, frequent garbage collection will decrease the
throughput and under heavy load, the memory requirements may be larger than the
amount allocated
ed to the JVM, resulting in OutOfMemoryError exceptions and/or JVM
crashes.
For example, the maximum heap size of the DFS bundled with the Content Server is
set to 256MB by default, which might be too small to effectively handle many
requests simultaneously.
y. It might be necessary to tune the JVM heap size in these
situations.
The lack of memory issue most likely occurs when using BASE64 or MTOM in the .NET
client to transfer large files. As the content data is encoded within a SOAP message in

Performance
erformance and Tuning Tips for
EMC Documentum Foundation Services Content Transfer

BASE64, the DFS server will allocate several buffers to hold the received and the
decoded content. For MTOM, the .NET client will use the buffered transfer mode
provided by WCF, which means the entire content will be buffered in memory before
transfer. These will result in unusually high memory usage, especially in transferring
large content payloads.
The optimal JVM memory configuration settings depend on the system overall
workload and has to be tuned case by case. As rule of thumb, we recommend set JVM
heap size to 1,024 MB and make sure that both DFS server and client JVM run with
the enough memory.

#4 ReRe-use ActivityInfo to Avoid Creating New UCF Connections


When using client-orchestrated UCF, the ActivityInfo install could be cached and
passed in all service operation calls. The following sample code snippet
demonstrates how to import four documents through the same UCF connection.
ActivityInfo theInfo = new ActivityInfo(false);
for ( int i = 0; i < 4; i++ ) {
if (i == 3) { /*Close UCF connection after the last transfer*/
theInfo.setAutoCloseConnection(true);
}
OperationOptions theOptions = new OperationOptions();
ContentTransferProfile theTransferProfile = new ContentTransferProfile();
theTransferProfile.setActivityInfo( theInfo );
theOptions.setContentTransferProfile( theTransferProfile );

/*Create DataPackage*/

theObjectService.create(theDataPackage, theOptions);
}

Caching the ActivityInfo avoids creating new UCF connections to the server for
subsequent content transfer operations. We have seen that the performance
improvement is upper bounded by the reverse of the portion of total time for the
operations not including creating UCF connections.
Table 1 lists the test results of transmission throughput when reusing UCF
connections in LAN environment. Each row in the table represents a set of results for
a certain content size. The 2nd to 5th columns show the measured overall throughput
by re-using UCF connections for 1, 2, 10 and 100 transmissions. The last column is

Performance and Tuning Tips for


EMC Documentum Foundation Services Content Transfer

the throughput upper bound which is estimated by using regression analysis. It is


clear that this approach is more efficient for small files since the time spent in
creating new connections is relatively large.
Table 1 Throughput improvement by re
re--using UCF connection
100 KB
1 MB
10 MB

1
3.41 Mbps
27.4 Mbps
92.9 Mbps

2
4.54 Mbps
31.8 Mbps
99.5 Mbps

10
6.15 Mbps
39.6 Mbps
107.5 Mbps

100
Upper Bound
6.77 Mbps
6.84 Mbps
42.7 Mbps
42.9 Mbps
111.6 Mbps
111.8 Mbps

#5 Use DataPackage to Transfer Multiple DataObject Instances


A DataPackage is a collection of DataObject instances, which is typically passed to,
and returned by, DFS ObjectService operations. ObjectService operations process all
the DataObject instances in the DataPackage sequentially.
Encapsulating multiple DataObject instances into a single DataPackage helps to
improve the performance by reducing the number of round trips between DFS client
and the server. Figure 6 compares the import throughput (24 threads) with different
package sizes. The experimental results demonstrate that the throughput could be
improved significantly by transferring a DataPackage containing multiple (e.g. >10)
DataObject instances.

Figure 6 Throughput of transferring DataPackage with different size over LAN

#6 Optimize UCF Server Configuration


The UCF server is deployed as part of the application in DFS server. The server
configuration file ucf.server.config.xml is located in /APP-INF/classes
INF/classes in DFS
applications.
UCF will compress the content to reduce the number of bytes during the transmission.
However, for some types of the files, such as ZIP, the compression ratio is close to

Performance
erformance and Tuning Tips for
EMC Documentum Foundation Services Content Transfer

zero. In this case, the performance will be reversely impacted because of the
overhead of the compression. A list of file formats are excluded from compression by
default as specified in the compression.exclusion.formats element. The user can add
other file types that can hardly be compressed any further to optimize the content
transfer performance.
Besides the compression ratio, whether or not to compress the content depends on
the network condition, especially the bandwidth. Table 2 lists the test results of
importing document of 50% compression ratio under different network conditions.
It is not too much of a surprise to observe the no-compression UCF transmission has
the better performance in LAN. The reason is because relatively small portion of the
time is spent at the link. On the contrary, a large amount of time could be saved by
reducing the content size when transferring documents over the poor network with
the very limited bandwidth (e.g. WAN).
By comparing the response time between LAN and WAN for the same file, it could be
found the extra seconds spent in WAN condition approximates the number of content
size divides the network bandwidth. We have also seen the overhead of the
compression increases linearly with the original size of the document (about
0.085s/MB for the test environment).
The breakthrough point of the bandwidth could be roughly estimated from these
data. Suppose the compression ratio is and the overhead is second/MB, a quick
analysis conducts that the breakthrough point is 8(1- )/ Mbps. For example, the
number is roughly 50 Mbps for the test scenario in Table 2 ( =0.085s/MB, =0.5),
which means, it would be more efficient to disable compression when the network
bandwidth is much larger than 50 Mbps.
Table 2 Transmission
Transmission time under different network conditions
File Size
40 MB
40 MB
80 MB
80 MB
120 MB
120 MB

Compress
Yes
No
Yes
No
Yes
No

LAN
3.1
2.2
6.0
4.1
8.8
6.2

WAN1
18.9
34.7
38.3
69.0
56.1
103.3

WAN2
20.0
35.2
39.4
69.8
58.5
104.3

* LAN: 1Gbps, 0ms; WAN1: 10Mbps, 0ms; WAN2: 10Mbps, 20ms

Conclusion
This document provides a list of recommendations that can be used as a reference
guide for obtaining optimum performance for DFS content transfer.

Performance and Tuning Tips for


EMC Documentum Foundation Services Content Transfer

10

The first thing to improve the performance is to choose the suitable content transfer
mode. In spite of the complexity of using DFS in real cases, we propose a rule for
selection simply based on the size of the target file. BASE64 works well for the small
files (e.g. less than 100KB); MTOM is suitable in transferring the files with the small
and medium size; and UCF is most efficient for the large files.
Array processing is a common performance optimization technique by reducing the
number of round trips between the client and the server. DFS data module provides
the DataPackage type which can process multiple objects in a single server call. This
will increase the throughput in transferring a large amount of documents such as in a
batch job. For UCF, the ActivityInfo can be cache to re-use the existing connection.
UCF server tuning is discussed a little in the last section, about the content
compression. The overall impact of compression is determined by the file
compression ratio, the network bandwidth and the time spent in compression. In
general, disabling compression may improve the end-to-end response time in the
cases when the compression ratio is low and/or when the network bandwidth is
large.

References


Documentum Foundation Services Development Guide

Performance and Tuning Tips for


EMC Documentum Foundation Services Content Transfer

11

You might also like