Professional Documents
Culture Documents
Dryad and DryadLINQ Installation and Configuration Guide
Dryad and DryadLINQ Installation and Configuration Guide
Abstract
DryadLINQ depends on the Dryad execution engine to run the queries on a Windows®
HPC Server 2008 cluster. This paper describes:
How to install Dryad on a Windows HPC cluster.
How to install DryadLINQ on a workstation and configure it to run
applications on the Windows HPC cluster.
Note:
Most resources discussed in this paper are provided with the DryadLINQ package.
For information about how to obtain all documents referenced in this paper, see
“Resources” at the end of this paper.
For project updates, feedback and community discussion, see
http://connect.microsoft.com/ dryad
Please submit comments on this document at the above community site or to
dryadlnq@microsoft.com.
Contents
Introduction...................................................................................................................3
How to Install Dryad on a Windows HPC Cluster...........................................................4
Cluster System Requirements...................................................................................5
Dryad Installation......................................................................................................6
How to Install DryadLINQ on a Workstation................................................................12
DryadLINQ Workstation System Requirements.......................................................12
How to Install and Configure DryadLINQ.................................................................13
Verify the Installation..............................................................................................14
Resources....................................................................................................................14
Dryad and DryadLINQ Installation and Configuration Guide - 2
Disclaimer: The information contained in this document represents the current view of Microsoft
Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to
changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and
Microsoft cannot guarantee the accuracy of any information presented after the date of publication.
This White Paper is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS,
IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under
copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or
transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or
for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights
covering subject matter in this document. Except as expressly provided in any written license agreement
from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks,
copyrights, or other intellectual property.
Unless otherwise noted, the example companies, organizations, products, domain names, e-mail
addresses, logos, people, places and events depicted herein are fictitious, and no association with any real
company, organization, product, domain name, email address, logo, person, place or event is intended or
should be inferred.
© 2008-2009 Microsoft Corporation. All rights reserved.
Microsoft, Visual C#, Visual Studio, Windows, and Windows Server are either registered trademarks or
trademarks of Microsoft Corporation in the United States and/or other countries.
The names of actual companies and products mentioned herein may be the trademarks of their respective
owners.
Document History
Date Change
Feb 18, 2009 Preview draft v.0.9
June 30, 2009 Version 1.0
November 10, 2009 Version 1.0.1
Introduction
Dryad is a high-performance, general-purpose distributed computing engine that is
designed to simplify the task of implementing distributed applications on clusters of
computers running the Windows® operating system. DryadLINQ allows developers to
implement Dryad applications in managed code by using an extended version of the
LINQ programming model and API.
DryadLINQ depends on the Dryad execution engine to run queries on a Windows HPC
Server 2008 cluster. Figure 1 shows the basic setup for running DryadLINQ applications.
Dryad Cluster
Head
Node
...
Client ...
Workstations
Compute Nodes
In Figure 1:
client workstations
Programmers develop and run DryadLINQ applications on Windows-based
workstations.
Dryad cluster
The DryadLINQ provider on the workstation runs DryadLINQ queries on the Dryad
cluster as a distributed application. A Dryad cluster is a Windows HPC cluster
running Dryad software.
head node
The cluster’s head node manages the cluster.
compute nodes
The cluster’s compute nodes handle the distributed computations. The head
node can also serve as a compute node. Input and output data is also stored on
the compute nodes in shared folders created during installation of Dryad.
Important: Window HPC Server 2008 supports a set of cluster topologies. Depending
on the cluster topology, a client workstation may or may not be able to communicate
with the compute nodes. However, a DryadLINQ application needs access to the data
stored in files on the compute nodes in between executing distributed LINQ queries.
DryadLINQ developers will have most flexibility when their workstation can access
the compute nodes directly. When compute nodes are isolated on a private network,
Version 1.0.1 – November 10, 2009
© 2009 Microsoft Corporation. All rights reserved.
Dryad and DryadLINQ Installation and Configuration Guide - 4
the DryadLINQ application should run on the head node or the workstation should be
joined to the private network.
This paper assumes that you already have acquired a Windows HPC cluster and are
familiar with its operation. For details, see “Windows HPC Server 2008,” listed in
“Resources” at the end of this paper.
later. If you install the Microsoft HPC Pack 2008, the installer automatically
installs SQL Server Express, which is usually sufficient.
For more details on retention policy, see “How to Specify the Data Retention
Policy” later in this paper.
Access to the Windows HPC cluster
Dryad Management Tools programmatically queries the Windows HPC cluster to
determine the status of Dryad jobs.
Access to the data shares that are created when you install the Dryad
computation engine on the compute nodes
Dryad Installation
The two Dryad .msi files are used as follows:
DryadHPC.msi is a full installation that installs Dryad Management Tools on
the head node and installs the Dryad binaries on all online compute nodes.
DryadHPCComputeNode.msi installs the Dryad computation engine on
selected compute nodes.
Important: You must be an administrator for the Windows HPC cluster to run either
installer.
To run DryadHPC.msi
Recommended: Install the management service. The head node must have .NET
Framework Version 3.5 SP1 installed before you can install Dryad Management Tools.
1. On the Dryad Management Tools Installation Options wizard page, select Install
Tools, as shown in Figure 2.
1. On the Management Tools Database and HPC Cluster Settings wizard page,
verify the connection information for the data store.
Figure 3 shows the default settings. Your user account must have the rights to
create a database in this data store.
Note: You may choose to install your own database engine. For example, you
may Install SQL Server Express 2005, which would create a server named
SQLEXPRESS by default. You can select a different name for the server, e.g.
SQLDRYAD, by unselecting “Hide advanced options” during the SQL Server
Express installation. Then, during the Dryad installation, you would specify
“LOCALHOST\SQLDRYAD” (assuming you named the database server SQLDRYAD)
in the SQL Server field of Figure 3.
1. On the Dryad Installation Options wizard page, specify a drive for the Dryad data
and system shares, as shown in Figure 4.
Dryad applications depend on shared access to input data. To provide this access, the
installer creates the following shares and folders on the cluster’s head and compute
nodes:
A share named DryadData to store the user’s input data files.
A share named XC to store any intermediate data files that might be
generated by Dryad applications.
The intermediate files generated by an application are stored in a folder that is
named with the job identifier.
A folder named “output” under each XC share, to serve as the default folder
for user’s output data files.
Output data is stored in the folder specified by the DryadOutputDir directive in
the DryadLINQ configuration file, DryadLINQConfig.xml. The default value for this
folder is XC\output.
A folder named XCSetup under the head node’s XC share.
The installer places DryadHPCComputeNode.msi in XCSetup. If you chose to
install Dryad on the compute nodes, XCSetup also contains the cluster installation
logs generated by individual nodes.
The installer creates the shares and folders on all online compute nodes, and then it
copies the Dryad computation engine binaries to the compute nodes.
Important: The installer creates these shares on the drive specified for each compute
node, so make sure that the drive exists on all nodes.
After Dryad is installed, DryadLINQ users typically create their own folder under the
DryadData shares. A user’s folder is typically named with the user name. For example,
a user named DryadUser_1 would use the following folders on each node to store her
permanent data:
User’s data would be placed on \\NodeName\DryadData\DryadUser_1.
Intermediate files that the system generates when running a DryadLINQ
query would be placed in \\NodeName\XC\DryadUser_1\Job_ID. Job_ID refers to
the ID of the HPC job corresponding to the execution of the DryadLINQ query on
the cluster.
The pieces of an output partitioned file would be placed under the path
specified by the PartitionUncDir configuration parameter. For example, if
PartitionUncDir is DryadData\DryadUser_1\Output, pieces of a partitioned file
will be saved under \\NodeName\ DryadData\DryadUser_1\Output on the
appropriate compute nodes. If the location of a partitioned file is not specified,
the metadata file for the partition will be created in the folder specified by
DryadOutputDir. To save partitioned files
under \\NodeName\DryadData\DryadUser_1\Output by default, one would set
DryadOutputDir to:
file:// \\NodeName\DryadData\DryadUser_1\Output.
Note: The XC and DryadData shares are not affected when Dryad is de-installed. They
continue to be shared. Upon installation, existing shares are re-used. However, if the
shares were created on one disk (e.g. C:\XC and C:\DryadData) and a subsequent
installation uses another disk (e.g. D:\), then sharing on the C drive will be stopped
and new shares will be established on the D drive.
following procedure installs the computation on a single node. For more details on
how to use clusrun, see the clusrun documentation in the Windows HPC Cluster
Manager help file, which is included with the Microsoft HPC Pack 2008 client utilities.
The following procedure shows how to install the binaries on a node. You can install
the binaries on the head node only if it is also a compute node.
1. To open the management console on the head node, from the Start menu, click
All Programs and run Microsoft Research DryadLINQ > Dryad Management
Console.
2. Click View Policies.
Figure 6 shows the System Retention Policy Details tool in the DryadLINQ
Management Console.
2. In the Policy Applicability section, specify the retention time.
Set Older than to the appropriate retention time. The default value is 72 hours,
which removes all Dryad files on the compute node data shares that are older
than 72 hours.
3. Specify the policy enforcement schedule.
The management console performs cleanup on a regular schedule. To specify a
cleanup schedule, set Check policy every to the appropriate value. The default
value is once every hour.
4. Specify the job synchronization schedule.
The management service stores Dryad job data in its own data store. To specify
how frequently the data store should be updated, set Synchronize jobs every to
the appropriate value. The default value is once every 5 minutes.
5. Specify the Dryad data store clean-up policy.
This setting specifies when the details of Dryad jobs that have been removed
from the compute nodes are to be removed from the Dryad data. Set the value of
Older than to the appropriate data-store retention time. The default value is 7
days.
6. Click Save to save the changes.
7. Restart the DryadDLMgmtSvc service.
The changes take effect only after you restart the DryadDLMgmtSvc service. The
display name of this service is Dryad + DryadLINQ Management Services.
Note: Microsoft HPC Pack 2008 also runs on several other versions of Windows,
which will likely run DryadLINQ successfully. However, DryadLINQ has been
tested only on the versions in the preceding list.
These utilities are included with the Windows HPC Server 2008 package, so you
must obtain them from your cluster administrator. They are not included with
Microsoft HPC Pack 2008 SDK.
Microsoft .NET Framework Version 3.5, SP 1
Microsoft Visual Studio® 2008
To Install DryadLINQ
You should be able to use the default values for the other elements and attributes.
The attributes are as follows:
ClusterName Element
Change MyClusterName to the name of your cluster’s head node.
Cluster Element
Set the name attribute to the name of your cluster’s head node. This name and
the value of the ClusterName element must match exactly.
Set the dryadoutputdir attribute to a writeable share on one of the cluster’s
compute nodes by changing the value of NodeName. The value of of
partitionuncdir and dryadoutputdir are customizable.
For more discussion of DryadLINQConfig.xml, see “DryadLINQ Programming Guide.”
Most project-level DryadLINQConfig.xml files can simply point to the global file. The
following example shows the contents of a typical project-level DryadLINQConfig.xml
file. The DryadLinqRoot element specifies the DryadLINQ root folder, which contains
the global file. This example sets DryadLinqRoot to the typical root folder,
C:\Program Files\Microsoft Research DryadLINQ. Most DryadLINQ applications don’t
require any additional settings.
<DryadLinqConfig>
<DryadLinqRoot>
C:\Program Files\Microsoft Research DryadLINQ
</DryadLinqRoot>
</DryadLinqConfig>
Resources
The following list provides links to related information.
Dryad – Microsoft Research Project Page
http://research.microsoft.com/en-us/projects/dryad/
DryadLINQ – Microsoft Research Project Page
http://research.microsoft.com/en-us/projects/dryadlinq/
Dryad and DryadLINQ Manuals
Manuals and samples are provided with the DryadLINQ installation in the folder
<install_path>\Docs, including:
“Dryad and DryadLINQ: An Introduction”
“DryadLINQ Programming Guide”
Windows HPC Server 2008
http://www.microsoft.com/hpc/