More Details On Fluent

Chapter 32.
Parallel Processing
The following sections describe the parallel-processing features of FLUENT. Section 32.1: Introduction to Parallel Processing Section 32.2: Starting the Parallel Version of the Solver Section 32.3: Using the Fluent Launcher (Windows only) Section 32.4: Using a Parallel Network of Workstations Section 32.5: Partitioning the Grid Section 32.6: Checking and Improving Parallel Performance Section 32.7: Running Parallel FLUENT under SGE Section 32.8: Running Parallel FLUENT under LSF Section 32.9: Running Parallel FLUENT under Other Resource Management Tools
32.1
Introduction to Parallel Processing
The FLUENT serial solver manages le input and output, data storage, and ow eld calculations using a single solver process on a single computer. FLUENTs parallel solver allows you to compute a solution by using multiple processes that may be executing on the same computer, or on dierent computers in a network. Figures 32.1.1 and 32.1.2 illustrate the serial and parallel FLUENT architectures. Parallel processing in FLUENT involves an interaction between FLUENT, a host process, and a set of compute-node processes. FLUENT interacts with the host process and the collection of compute nodes using a utility called cortex that manages FLUENTs user interface and basic graphical functions. Parallel FLUENT splits up the grid and data into multiple partitions, then assigns each grid partition to a dierent compute process (or node). The number of partitions is an integral multiple of the number of compute nodes available to you (e.g., 8 partitions for 1, 2, 4, or 8 compute nodes). The compute-node processes can be executed on a massively-parallel computer, a multiple-CPU workstation, or a network of workstations using the same or dierent operating systems.
c Fluent Inc. January 11, 2005
32-1
Parallel Processing
CORTEX
Solver
File Input/Output
Data: Cell Face Node
Disk
Figure 32.1.1: Serial FLUENT Architecture
In general, as the number of compute nodes increases, turnaround time for the solution will decrease. However, parallel eciency decreases as the ratio of communication to computation increases, so you should be careful to choose a large enough problem for the parallel machine.
FLUENT uses a host process that does not contain any grid data. Instead, the host process only interprets commands from FLUENTs graphics-related interface, cortex. The host distributes those commands to the other compute nodes via a socket communicator to a single designated compute node called compute-node-0. This specialized compute node distributes the host commands to the other compute nodes. Each compute node simultaneously executes the same program on its own data set. Communication from the compute nodes to the host is possible only through compute-node-0 and only when all compute nodes have synchronized with each other. Each compute node is virtually connected to every other compute node, and relies on its communicator to perform such functions as sending and receiving arrays, synchronizing, performing global operations (such as summations over all cells), and establishing machine connectivity. A FLUENT communicator is a message-passing library. For example, the message-passing library could be a vendor implementation of the Message Passing Interface (MPI) standard, as depicted in Figure 32.1.2. All of the parallel FLUENT processes (as well as the serial process) are identied by a unique integer ID. The host collects messages from compute-node-0 and performs operations (such as printing, displaying messages, and writing to a le) on all of the data, in the same way as the serial solver.
32-2
32.1 Introduction to Parallel Processing
CORTEX
HOST
FLUENT MPI
File Input/Output
Disk
COMPUTE NODES
Compute Node 0 Compute Node 1
FLUENT MPI
FLUENT MPI
Socket
MP
FLUENT MPI FLUENT MPI
Compute Node 2
Compute Node 3
Figure 32.1.2: Parallel FLUENT Architecture
32-3
Parallel Processing
Recommended Usage of Parallel FLUENT

The recommended procedure for using parallel FLUENT is as follows: 1. Start up the parallel solver and spawn additional compute nodes (if necessary). See Sections 32.2 and 32.4 for details. 2. Read your case le and have FLUENT partition the grid automatically upon loading it. It is best to partition after the problem is set up, since partitioning has some model dependencies (e.g., adaption on non-conformal interfaces, sliding-mesh and shell-conduction encapsulation). Note that there are other approaches for partitioning, including manual partitioning in either the serial or the parallel solver. See Section 32.5: Partitioning the Grid for details. 3. Review the partitions and perform partitioning again, if necessary. See Section 32.5.5: Checking the Partitions for details on checking your partitions. 4. Calculate a solution. See Section 32.6: Checking and Improving Parallel Performance for information on checking and improving the parallel performance.
32.2 Starting the Parallel Version of the Solver

The way you start the parallel version of FLUENT depends on whether you are using a dedicated parallel machine or a workstation cluster.
32.2.1
Starting the Parallel Solver on a UNIX System
You can run FLUENT on a UNIX dedicated parallel machine or a network of UNIX workstations. The procedures for starting these versions are described in this section.
Running on a Multiprocessor UNIX Machine

To run FLUENT on a dedicated parallel machine (i.e., a multiprocessor workstation or a massively parallel machine), type the usual startup command without a version (i.e., fluent), and then use the Select Solver panel (Figure 32.2.1) to specify the parallel architecture and version information. File Run... 1. Under Versions, specify the 3D or 2D single- or double-precision version by turning the 3D and Double Precision options on or o, and turn on the Parallel option.
32-4
Figure 32.2.1: The Select Solver Panel
2. Under Options, select the message-passing library in the Communicator drop-down list. The Default library is recommended, because it selects the library that should provide the best overall parallel performance for your dedicated parallel machine. If you prefer to select a specic library, you can choose either Vendor MPI or Shared Memory MPI (MPICH). Vendor MPI selects the message-passing library optimized by your hardware vendor. If the parallel toolkit supplied by your hardware vendor is installed on your machine, FLUENT will detect it automatically when the Default option is selected. Shared Memory MPI (MPICH) selects the MPICH messagepassing library, a public-domain version of MPI. 3. Set the number of CPUs in the Processes eld. 4. Click the Run button to start the parallel version. No additional setup is required once the solver starts. If you prefer to start the parallel version from the command line, you can type fluent version -tn [-pcomm] [-loadhost] [-pathpath] where version is 2d, 3d, 2ddp, or 3ddp, and n is replaced by the number of CPUs to be used. The remaining arguments are optional, as indicated by the square brackets around
32-5
Parallel Processing
them. (If you enter one or more of these optional arguments, do not include the square brackets.) comm is replaced by the name of the parallel communication library, host is replaced by the hostname of the machine to launch the compute nodes (by default, it is set to the machine youre using when entering this command), and path is replaced by the root path to the Fluent.Inc installation directory.
In general, you will need to specify -pcomm only if you want to override the default communication library (which should provide best overall parallel performance).
The available communicators for dedicated parallel UNIX machines are listed below (Tables 32.2.1 and 32.2.2), along with their associated communication libraries, the corresponding syntax, and the supported architectures (See Step 2, above, for a description of these libraries): Table 32.2.1: Available communicators for UNIX platforms (per platform) Platform Processor Linux 32 bit 64 bit Itanium Ultra 32 bit 64 bit SGI 32 bit 64 bit HP 32 bit 64 bit Parish 64 bit Itanium DEC 64 bit Fujitsu 64 bit IBM 32 bit 64 bit Architecture lnx86 lnia64 ultra ultra64 irix65 mips4 irix65 mips4 64 hpux11 hpux11 64 hpux11 ia64 alpha fujitsu pp aix51 aix51 64 Communicators beo, net, nmpi, scampi, smpi net, nmpi, smpi net, nmpi, smpi, vmpi net, smpi, vmpi net, nmpi, smpi, vmpi net, nmpi, smpi, vmpi net, nmpi, smpi, vmpi net, nmpi, smpi, vmpi net, vmpi net, nmpi, smpi, vmpi, tmpi net, nmpi, vmpi net, nmpi, smpi, vmpi net, nmpi, smpi, vmpi
32-6
Table 32.2.2: Available communicators for UNIX platforms (per communicator) Commu- Syntax nicator (ag) Commun. Supports Vendor Library spawnimpl. ing availnodes able (costs) socket yes no network MPI (MPICH) shared MPI (MPICH) Vendor MPI no no Used with DMM *** yes yes Used with SMM ** yes yes Platform
net nmpi
-pnet -pnmpi
all platforms all platforms all platforms all platforms except Linux Linux Linux DEC
smpi
-psmpi
no
no
no
yes
vmpi
-pvmpi
no
yes
yes
yes
beo * scampi * tmpi
-pbeo beowoulf -pscampi SCAMPI -ptmpi MPI
no no no
no no no
yes yes yes
yes yes yes
* Not formally qualied in FLUENT but vendor might support. * SMM is Shared Memory Machine where the memory is shared between the processors on a single machine. ** DMM is Distributed Memory Machine where each processor has its own memory associated with it. nmpi is recommended to be used with DMM if vmpi is not available, and smpi is recommended to be used with SMM if vmpi is not available.
32-7
Parallel Processing
Running on a UNIX Workstation Cluster

To run FLUENT on a network of UNIX workstations, type the usual startup command without a version (i.e., fluent), and then use the Select Solver panel (Figure 32.2.1) to specify the parallel architecture and version information. File Run... 1. Under Versions, specify the 3D or 2D single- or double-precision version by turning the 3D and Double Precision options on or o, and turn on the Parallel option. 2. Under Options, select the Socket message-passing library in the Communicator dropdown list.
When you start the parallel network version, you must select Socket or Network MPI (MPICH) in the Communicator drop-down list, unless the vendor MPI library (described earlier in this section) supports clustering. If you keep the Default option, one of the MPI parallel versions will start instead, and you will be unable to spawn additional compute nodes.
3. Set the number of initial compute node processes to spawn on the host machine in the Processes eld. You can start with 1 or 0 nodes and spawn the rest later on, as described in Section 32.4.1: Conguring the Network. 4. (optional) Specify the name of a le containing a list of machines, one per line, in the Hosts File eld. If the number of Processes is set to 0, FLUENT will spawn a compute node on each machine listed in the le. 5. Click the Run button to start the parallel network version. If you prefer to start the parallel network version from the command line, you can type fluent version -t1 -pnet (to use the socket communicator) or fluent version -t1 -pnmpi (to use the network MPI communicator) to start the solver with 1 compute node on the host workstation. You can then spawn additional processes on remote workstations using the Network Conguration panel, as described in Section 32.4.1: Conguring the Network. You can type fluent version -t0 -pnet [-cnf=hostsle] (to use the socket communicator) or fluent version -t0 -pnmpi [-cnf=hostsle]
32-8
(to use the network MPI communicator) to start a host process that controls compute nodes situated on remote machines. If the optional -cnf=hostsle is specied, a compute node will be spawned on each machine listed in the le hostsle. (If you enter this optional argument, do not include the square brackets.) Otherwise, you can spawn the processes as described in Section 32.4.1: Conguring the Network.
32.2.2
Starting the Parallel Solver on a LINUX System
You can run FLUENT on a LINUX dedicated parallel machine or a network of LINUX workstations. The procedures for starting these versions are described in this section.
Running on a Multiprocessor LINUX Machine

To run FLUENT on a dedicated parallel machine (i.e., a multiprocessor workstation or a massively parallel machine), type the usual startup command without a version (i.e., fluent), and then use the Select Solver panel (Figure 32.2.1) to specify the parallel architecture and version information. File Run... 1. Under Versions, specify the 3D or 2D single- or double-precision version by turning the 3D and Double Precision options on or o, and turn on the Parallel option. 2. Under Options, select the message-passing library in the Communicator drop-down list. The Default library is recommended, because it selects the library that should provide the best overall parallel performance for your dedicated parallel machine. If you prefer to select a specic library, you can choose either Vendor MPI or Shared Memory MPI (MPICH). Vendor MPI selects the message-passing library optimized by your hardware vendor. If the parallel toolkit supplied by your hardware vendor is installed on your machine, FLUENT will detect it automatically when the Default option is selected. Shared Memory MPI (MPICH) selects the MPICH messagepassing library, a public-domain version of MPI. 3. Set the number of CPUs in the Processes eld. 4. Click the Run button to start the parallel version. No additional setup is required once the solver starts. If you prefer to start the parallel version from the command line, you can type fluent version -tn [-pcomm] [-loadhost] [-pathpath] where version is 2d, 3d, 2ddp, or 3ddp, and n is replaced by the number of CPUs to be used. The remaining arguments are optional, as indicated by the square brackets around them. (If you enter one or more of these optional arguments, do not include the square
32-9
Parallel Processing
brackets.) comm is replaced by the name of the parallel communication library, host is replaced by the hostname of the machine to launch the compute nodes (by default, it is set to the machine youre using when entering this command), and path is replaced by the root path to the Fluent.Inc installation directory.
In general, you will need to specify -pcomm only if you want to override the default communication library (which should provide best overall parallel performance).
The available communicators for dedicated parallel lnx86 LINUX machines are listed in Tables 32.2.1 and 32.2.2, along with the associated communication libraries and corresponding syntax. FLUENT supplies the necessary components for the ssh, nmpi, smpi, and net communicators. As for the rest, you need to contact the vendor directly. See step 2, above, for a description of these libraries.
Running on a LINUX Workstation Cluster

To run FLUENT on a network of LINUX workstations, type the usual startup command without a version (i.e., fluent), and then use the Select Solver panel (Figure 32.2.1) to specify the parallel architecture and version information. File Run... 1. Under Versions, specify the 3D or 2D single- or double-precision version by turning the 3D and Double Precision options on or o, and turn on the Parallel option. 2. Under Options, select the Socket message-passing library in the Communicator dropdown list.
When you start the parallel network version, you must select Socket or Network MPI (MPICH) in the Communicator drop-down list, unless the vendor MPI library (described earlier in this section) supports clustering. If you keep the Default option, one of the MPI parallel versions will start instead, and you will be unable to spawn additional compute nodes.
3. Set the number of initial compute node processes to spawn on the host machine in the Processes eld. You can start with 1 or 0 nodes and spawn the rest later on, as described in Section 32.4.1: Conguring the Network. 4. (optional) Specify the name of a le containing a list of machines, one per line, in the Hosts File eld. If the number of Processes is set to 0, FLUENT will spawn a compute node on each machine listed in the le. 5. Click the Run button to start the parallel network version.
32-10
If you prefer to start the parallel network version from the command line, you can type fluent version -t1 -pnet (to use the socket communicator) or fluent version -t1 -pnmpi (to use the network MPI communicator) to start the solver with 1 compute node on the host workstation. You can then spawn additional processes on remote workstations using the Network Conguration panel, as described in Section 32.4.1: Conguring the Network. You can type fluent version -t0 -pnet [-cnf=hostsle] (to use the socket communicator) or fluent version -t0 -pnmpi [-cnf=hostsle] (to use the network MPI communicator) to start a host process that controls compute nodes situated on remote machines. If the optional -cnf=hostsle is specied, a compute node will be spawned on each machine listed in the le hostsle. (If you enter this optional argument, do not include the square brackets.) Otherwise, you can spawn the processes as described in Section 32.4.1: Conguring the Network.
Running With Multiple Network Cards

For Linux machines (lnx86, lnia64, and lnamd64) that have multiple network cards using either the net or the mpi communicators, you can choose a specic network card for your calculations. When nodes on a cluster have multiple network cards (fast ethernet and gigabyte, for example), FLUENT allows you to choose a particular network card for the computation by specifying the appropriate name or IP address in the host le.
32.2.3
Starting the Parallel Solver on a Windows System
You can run FLUENT on a Windows dedicated parallel machine or a network of Windows machines. The procedures for starting these versions are described in this section.
Running on a Multiprocessor Windows Machine

On a Windows system, you can start the dedicated parallel version of FLUENT from the MS-DOS Command Prompt window. To start the parallel version on x processors, type fluent version -tx at the prompt, replacing version with the solver version (2d, 3d, 2ddp, or 3ddp) and x with the number of processors (e.g., fluent 3d -t3 to run the 3D version on 3 proces-
32-11
Parallel Processing
sors). (See Section 1.1.3: Starting FLUENT on a Windows System for information about modifying your user environment if the fluent command is not recognized.)
Running on a Windows Cluster

There are several ways to run FLUENT in parallel on a network of Windows machines: using one of the communicators that is included with the FLUENT distribution, or using either a vendor-supplied or a public domain message-passing interface. The available communicators for dedicated parallel ntx86 Windows machines, the associated communication libraries for them, and the corresponding syntax are listed below: Table 32.2.3: Available communicators for Windows platform (per platform) Platform Processor Architecture Communicators Windows 32 bit ntx86 net, nmpi, smpi, vmpi
Table 32.2.4: Available communicators for Windows platform (per communicator) Commu- Syntax nicator Commun. Supports Library spawning nodes socket network MPI (MPICH) shared MPI (MPICH) Vendor MPI yes no Vendor Used impl. avail- with able (costs) DMM ** no yes no yes Used with SMM * yes yes
net nmpi
-pnet -pnmpi
smpi
-psmpi
no
no
no
yes
vmpi
-pvmpi
no
yes
yes
yes
* SMM is Shared Memory Machine where the memory is shared between the processors on a single machine. * DMM is Distributed Memory Machine where each processor has its own memory associated with it. nmpi is recommended to be used with DMM if vmpi is not available, and smpi is recommended to be used with SMM if vmpi is not available.
See the installation instructions for Windows parallel for details about obtaining and installing one of these programs. The startup instructions below assume that you have properly set up the necessary software, based on the appropriate installation instructions.
32-12
Starting the Socket-Based Parallel Version of FLUENT If you are using the socket version for network communication, type the following in an MS-DOS Command Prompt window: fluent version -tnprocs -pnet [-cnf=hostle] -pathsharename where version must be replaced by the version of FLUENT you want to run (2d, 3d, 2ddp, or 3ddp). -pathsharename species the shared network name for the Fluent.Inc directory in UNC form. For example, if FLUENT has been installed on computer1, then you should replace sharename by the UNC name for the shared directory, \\computer1\fluent.inc. -cnf=hostle (optional) species the hostle, which contains a list of the computers on which you want to run the parallel job. If the hostle is not located in the directory where you are typing the startup command, you will need to supply the full pathname to the le. (If you include the -cnf option, do not include the square brackets; see the example below.) You can use a plain text editor like Notepad to create the hostle. The only restriction on the lename is that there should be no spaces in it. For example, hosts.txt is an acceptable hostle name, but my hosts.txt is not. Your hostle (e.g., hosts.txt) might contain the following entries: computer1 computer2
The rst computer in the list must be the name of the local computer you are working on. The last entry must be followed by a blank line.
If a computer in the network is a multiprocessor, you can list it more than once. For example, if computer1 has 2 CPUs, then, to take advantage of both CPUs, the hosts.txt le should list computer1 twice: computer1 computer1 computer2 If you do not include the -cnf option, FLUENT will start nprocs (see below) processes on the computer where you type the startup command. You can then use the Network Conguration panel in FLUENT to interactively spawn additional nodes on the cluster. See Section 32.4: Using a Parallel Network of Workstations for details.
32-13
Parallel Processing
-tnprocs species the number of processes to use. If the -cnf option is present, the hostle argument is used to determine which computers to use for the parallel job. For example, if there are 10 computers listed in the hostle and you want to run a job with 5 processes, set nprocs to 5 (i.e., -t5) and FLUENT will use the rst 5 machines listed in the hostle. You can use the Network Conguration panel to kill processes or spawn additional processes after startup. See Section 32.4: Using a Parallel Network of Workstations for details. As an example, the full command line to start a 3D socket-based parallel job on the rst 3 computers listed in a hostle called hosts.txt is as follows:
fluent 3d -t3 -pnet -cnf=hosts.txt -path\\computer1\fluent.inc
Starting the MPI-Based Parallel Version of FLUENT If you are using either vendor-supplied or public domain MPI software for network communication, type the following in an MS-DOS Command Prompt window: fluent version -tnprocs -pcomm -cnf=hostle -pathsharename where comm can be either nmpi or vmpi and the remaining options have the same meanings as for the socket-based startup described above, with the following dierences: The hostle specication is required. You can neither spawn nor kill nodes on the cluster using the Network Conguration panel when MPI software is used. The rst computer listed in the hostle must be the name of the local computer you are working on. As an example, the full command line to start a 3D vendor-MPI-based parallel job on the rst 3 computers listed in a hostle called hosts.txt is as follows:
fluent 3d -t3 -pvmpi -cnf=hosts.txt -path\\computer1\fluent.inc
32-14
32.3 Using the Fluent Launcher (Windows only)
32.3
Using the Fluent Launcher (Windows only)
The Fluent Launcher (Figure 32.3.1), is a stand-alone Windows application that allows you to launch FLUENT jobs from a computer with a Windows operating system to a cluster of computers. The Fluent Launcher takes the options that you specify in the main Fluent Launcher panel and the Fluent Setup panel (see Section 32.3.1: Fluent Launcher Path Setup and Section 32.3.2: Fluent Launcher Machine Setup), and uses those settings to create a FLUENT parallel command. This command will then be distributed to your network where typically another application may manage the session(s). You can create a shortcut on your desktop pointing to the Fluent Launcher executable at FLUENT_INC\fluent6.x\launcher\bin\launcher.exe where FLUENT INC is the root path to where FLUENT is installed, (i.e., usually the FLUENT INC environment variable) and x indicates the release version of FLUENT).
Figure 32.3.1: The Fluent Launcher Panel
The Fluent Launcher allows you to perform the following: 1. Set options for your FLUENT executable, such as specifying an area, indicating a release type, or a version number. 2. Indicate either a serial or parallel execution, along with the number of parallel processes, and a communicator to use for parallel computations.
32-15
Parallel Processing
3. Set additional options such as specifying a working directory, a batch mode, or a journal le. When you are ready to launch your serial or parallel application, click the Launch button.
For parallel applications, you are required to have the RSH daemon installed on each machine.
Using the Fluent Launcher From Another Machine

If you wish to use the Fluent Launcher from another machine, you can create a shortcut on that machine pointing to the original executable (at FLUENT INC/fluent6.x/launcher/ bin/launcher.exe where FLUENT INC is the root path to where FLUENT is installed, (i.e., usually the FLUENT INC environment variable) and x indicates the release version of FLUENT).
Do not copy or move the launcher.exe le from its original directory to any other directory, otherwise the Fluent Launcher application will not work.
Setting Executable Options With the Fluent Launcher

Under Executable Options, you can use the Fluent Launcher to indicate the version of the FLUENT executable that you want to run. You can also specify a release number, and the area from which you are running the code. Under Area, you can choose from either release or prototype. The release option represents the nal version of the current software (either a FLUENT release or a FLUENT maintenance release). The prototype option represents a FLUENT prototype or pre-release (beta) version of the software. Under Release, you can specify the number associated with a given release, maintenance release or prototype application. Under Version, you can specify the dimensionality and the precision of the FLUENT product. There are four possible choices: 2d, 2ddp, 3d, or 3ddp. The 2d and 3d options provide single precision results for two-dimensional or three-dimensional problems, respectively. The 2ddp and 3ddp options provide double precision results for two-dimensional or threedimensional problems, respectively.
32-16
Setting Parallel Options With the Fluent Launcher

Under Parallel Options, you can use the Fluent Launcher to indicate whether you want to run FLUENT in serial mode or in parallel mode. To run FLUENT in serial mode, make sure the Parallel option is turned o. To run FLUENT in parallel, make sure the Parallel option is turned on. When the Parallel option is turned on, you can indicate the number of parallel processes that you will be running, as well as the type of parallel communicator that you need to use. Use the Processes eld to indicate the number of parallel processes. The range of parallel processes ranges from 1 to 1024. If Processes is equal to 1, you might want to consider running the FLUENT job in serial mode. Use the Communicator eld to indicate the type of parallel communicator that you require. There are several options, based on the operating system of the parallel cluster. See Tables 32.2.1, 32.2.2, and 32.2.3 for more information.
Setting Additional Options With the Fluent Launcher

Under Additional Options, you can use the Fluent Launcher to indicate a working directory, whether you want to run FLUENT using batch mode, list executed commands, or whether or not use a journal le. In the Directory eld, enter the path of your current working directory or click Browse... to browse through your directory structure. Select the Journal File option to instruct the Fluent Launcher application to use a journal le. Once selected, provide the path to the journal le and the name of the journal le. Using the journal le, you can automatically load the case, compile any user-dened functions, iterate until the solution converges, and write results to a output le. Select the Batch Mode option in order to run and quit out of FLUENT jobs without the graphical user interface (GUI).
32-17
Parallel Processing
32.3.1
Fluent Launcher Path Setup
The Fluent Launcher can be used to set up path information for your FLUENT jobs through the Paths tab in the Fluent Setup panel (Figure 32.3.2). To access the Fluent Setup panel, click the Setup... button in the Fluent Launcher.
Figure 32.3.2: The Paths Tab in the Fluent Setup Panel
The Paths tab in the Fluent Setup panel allows you to use a set of custom path congurations. New setup information is saved for this session and future sessions when you click the Apply button. When you are nished setting up your custom path conguration, click the Close button to dismiss the Fluent Setup panel.
Windows Setup
When you choose to use your own path conguration information, you can then indicate the Release path on the Windows platform. This eld holds the path to the release area for Windows executables.
Make sure that the path is a UNC path (i.e., accessible to all nodes).
If you have turned on the Enable Prototype option, then you have the additional option of selecting the path to the prototype area for Windows executables using the Prototype eld under Windows Paths
32-18
UNIX Setup
When you choose to use your own path conguration information, you can then indicate the Release path on the UNIX platform. This eld holds the path to the release area for UNIX executables. If you have turned on the Enable Prototype option, then you have the additional option of selecting the path to the prototype area for UNIX executables using the Prototype eld under UNIX Paths.
i
32.3.2
Note that UNIX paths are not veried.
Fluent Launcher Machine Setup
The Fluent Launcher can be used to set up dierent machine congurations for your FLUENT jobs through the Machines tab in the Fluent Setup panel (Figure 32.3.3). To access the Fluent Setup panel, click the Setup... button in the Fluent Launcher. The Machines tab in the Fluent Setup panel allows you to use a dierent machine conguration. New setup conguration is saved for the current session and future sessions when you click the Apply button. Machines listed at the top of the list will be used rst.
Figure 32.3.3: The Machines Tab in the Fluent Setup Panel
Using the Machines tab in the Fluent Setup Panel, you can create and edit a listing of machine names that you want involved in the parallel FLUENT job. You can add a machine name to the Current Machines list by entering a name in the Machine Name eld and clicking the Add button.
32-19
Parallel Processing
You can remove a machine name from the Current Machines list by selecting the name in the list and clicking the Remove button. You can manipulate how the names are listed in the Current Machines list by selecting a name in the list and using the Up button to move the name one listing closer to the top of the list. Likewise, you can move a name one listing closer to the bottom of the list by selecting the name and clicking the Down button. When you are nished setting up your machine conguration, click the Close button to dismiss the Fluent Setup panel.
32.3.3
Fluent Launcher Example
The Fluent Launcher takes the options that you have specied in the main Fluent Launcher panel and the Fluent Setup panel, and uses those settings to create a FLUENT parallel command. This command will then be distributed to your network where typically another application may manage the session(s). For example, if, in the main Fluent Launcher panel, under Executable Options, you selected release for the Area, 6.1.28 for the Release, and 3d for the Version. Then, under Parallel Options, you selected Parallel, chose 2 for the number of Processes, and selected net for the Communicator. Then, in the Fluent Setup panel (clicking the Setup... button), you specied \\Server\Fluent.Inc, for Release under Windows Path, and added my pc to the list of Current Machines in the Machines tab. Finally, you clicked the Apply button in the Fluent Setup panel and then clicked the Launch button. The Fluent Launcher would then generate the following parallel command: FLUENT_INC\ntbin\ntx86\fluent -r6.1.28 3d -t2 -pnet -path\FLUENT\_HOME -cnf="machines_file" where FLUENT INC indicates the directory where Fluent.Inc is located and machines file indicates the location of the machine conguration le that the Fluent Launcher generates. This le contains the names of the machines (e.g., my pc) indicated in the Machines tab in the Fluent Setup panel.
32-20
32.4 Using a Parallel Network of Workstations
32.4
Using a Parallel Network of Workstations
You can create a virtual parallel machine by spawning (and killing) compute node processes on workstations connected by a network. Multiple compute node processes are allowed to exist on the same workstation, even if the workstation contains only a single CPU.
32.4.1
Conguring the Network
If you want to spawn compute nodes on several dierent machines, or if you want to make any changes to the current network conguration (e.g., if you accidentally spawned too many compute nodes on the host machine when you started the solver), you can use the Network Conguration panel (Figure 32.4.1). Parallel Network Congure...
Figure 32.4.1: The Network Conguration Panel
Note that not all communicators allow you to congure a network of spawned compute nodes if you do not start FLUENT using host les. Only -pnet allows you to manually spawn additional compute nodes before reading the case le. Using -pnmpi, for example, does not allow you to congure the network of spawned compute nodes.
32-21
Parallel Processing
Structure of the Network

Compute nodes are labeled sequentially starting at 0. In addition to the compute node processes, there is one host process. The host process is automatically started when FLUENT starts, and it is killed when FLUENT exits. It cannot be killed while running. Compute nodes, however, can be killed at any time, with the exception that compute node 0 can only be killed if it is the last remaining compute node process. The host process always spawns compute node 0. Compute node 0 spawns all other compute nodes.
Steps for Spawning Compute Nodes

The basic steps for spawning compute nodes are as follows: 1. Choose the host machine(s) on which to spawn compute nodes in the Available Hosts list. If the desired machine is not listed, you can use the Host Entry elds to manually add a host (as described below), or you can copy the desired host from the host database (as described in Section 32.4.2: The Hosts Database). 2. Set the number of compute node processes to spawn on each selected host machine in the Spawn Count eld. 3. Click the Spawn button and the new node(s) will be spawned and added to the Spawned Compute Nodes list. Additional functions related to network conguration are described below. Adding Hosts Manually To add a host to the Available Hosts list in the Network Conguration panel manually, you can enter the internet name of the remote machine in the Hostname eld under Host Entry, enter your login name on that machine in the Username eld (unless your accounts all have the same login name, in which case you need not specify a username), and then click the Add button. The specied host will be added to the Available Hosts list.
32-22
Deleting Hosts To delete a host from the Available Hosts list in the Network Conguration panel, select the host and click the Delete button. The host name will be removed from the Available Hosts list (but the hosts database (see Section 32.4.2: The Hosts Database) will not be aected). Killing Compute Nodes If you spawn an undesired compute node, you can easily remove it by selecting it in the Spawned Compute Nodes list and clicking on the Kill button.
Remember that compute node 0 can only be killed if it is the last remaining compute node process.
Saving a Hosts File If you have compiled a group of Available Hosts that you may want to use again in another session, you can save a hosts le containing all entries in the Available Hosts list. Click the Save... button and, in the resulting Select File dialog box, enter the name of the le and save it. In a future session, you can load the contents of this le into the hosts database (see Section 32.4.2: The Hosts Database) and then copy the hosts over to the Network Conguration panel in order to reproduce the current Available Hosts list.
Common Problems Encountered During Node Spawning

The spawning process will try to establish a connection with a new compute node, but if after 50 seconds it receives no response from the new compute node, it will assume the spawn was unsuccessful. The spawn will be unsuccessful, for example, if the remote machine is unable to nd the FLUENT executable. To manually test if the spawning machine can start a new compute node, you can type rsh [-l username] hostname fluent -t0 -v from a shell prompt on the spawning machine. hostname should be replaced with the internet name of the machine on which you want to spawn a compute node, and username should be replaced with your login name on the remote machine specied by hostname.
32-23
Parallel Processing
If all your accounts have the same login name, you do not need to specify a username. (The square brackets around -l username indicate that it is not always required; if you do enter a login name, do not include the square brackets.) Note that on some systems, the remote shell command is remsh instead of rsh.
The spawn test could fail for several reasons: Login incorrect. The machine spawning a new compute node must be able to rsh to the machine where the new process will reside, or the spawn will fail. There are several ways to enable this capability. Consult your systems administrator for assistance. uent: Command not found. The rsh to the remote machine succeeded, but the path to the FLUENT shell script could not be found on that machine. If you are using csh, then the path to the FLUENT shell script should be added to the path variable in your .cshrc le. If that also fails, you can use the parallel/network/ path text command to set the path to the Fluent.Inc installation directory directly before spawning the compute node. parallel network path
32-24
32.4.2
The Hosts Database
When you are creating a parallel network of workstations, it is convenient to start with a list of machines that are part of your local network (a hosts le). You can load a le containing these names into the hosts database and then select the hosts that are available for creating a parallel conguration (or network) on a cluster of workstations using the Hosts Database panel (Figure 32.4.2). Parallel Network Database...
Figure 32.4.2: The Hosts Database Panel
(You can also open this panel by clicking on the Database... button in the Network Conguration panel.) If the hosts le fluent.hosts or .fluent.hosts exists in your home directory, its contents are automatically added to the hosts database at startup. Otherwise, the hosts database will be empty until you read in a host le.
32-25
Parallel Processing
Reading Hosts Files

If you have a hosts le containing a list of machines on your local network, you can load this le into the Hosts Database panel by clicking on the Load... button and specifying the le name in the resulting Select File dialog box. Once the contents of the le have been read, the host names will appear in the Hosts list. (FLUENT will automatically add the IP (Internet Protocol) address for each recognized machine. If a machine is not currently on the local network, it will be labeled unknown.)
Copying Hosts to the Network Conguration Panel

If you want to copy one or more of the Hosts in the Hosts Database panel to the Available Hosts list in the Network Conguration panel, select the desired name(s) in the Hosts list and click the Copy button. The selected hosts will be added to the list of Available Hosts on which you can spawn nodes.
32.4.3 Checking Network Connectivity

For any compute node, you can print network connectivity information that includes the hostname, architecture, process ID, and ID of the selected compute node and all machines connected to it. The ID of the selected compute node is marked with an asterisk. The ID for the FLUENT host process is always host. The compute nodes are numbered sequentially starting from node-0. All compute nodes are completely connected. In addition, compute node 0 is connected to the host process. To obtain connectivity information for a compute node, you can use the Parallel Connectivity panel (Figure 32.4.3). Parallel Show Connectivity...
Figure 32.4.3: The Parallel Connectivity Panel
Indicate the compute node ID for which connectivity information is desired in the Compute Node eld, and then click the Print button. Sample output for compute node 0 is shown below:
32-26
32.5 Partitioning the Grid
-----------------------------------------------------------------------------ID Comm. Hostname O.S. PID Mach ID HW ID Name -----------------------------------------------------------------------------host net balin Linux-32 17272 0 7 Fluent Host n3 smpi balin Linux-32 17307 1 10 Fluent Node n2 smpi filio Linux-32 17306 0 -1 Fluent Node n1 smpi bofur Linux-32 17305 0 1 Fluent Node n0* smpi balin Linux-32 17273 2 11 Fluent Node
O.S is the architecture, Comm. is the communicator, PID is the process ID number, Mach ID is the compute node ID, and HW ID is an identier specic to the communicator used. You can also check connectivity of a compute node in the Network Conguration panel by selecting it in the Spawned Compute Nodes list and clicking on the Connectivity button. If you click the Connectivity button without selecting any of the Spawned Compute Nodes, the Parallel Connectivity panel will open, and you can specify the node there, as described above. If you select more than one of the Spawned Compute Nodes, clicking on the Connectivity button will print connectivity information for each selected node.
32.5
Partitioning the Grid
Information about grid partitioning is provided in the following sections: Section 32.5.1: Overview of Grid Partitioning Section 32.5.2: Partitioning the Grid Automatically Section 32.5.3: Partitioning the Grid Manually Section 32.5.4: Grid Partitioning Methods Section 32.5.5: Checking the Partitions Section 32.5.6: Load Distribution
32.5.1
Overview of Grid Partitioning
When you use the parallel solver in FLUENT, you need to partition or subdivide the grid into groups of cells that can be solved on separate processors (see Figure 32.5.1). You can either use the automatic partitioning algorithms when reading an unpartitioned grid into the parallel solver (recommended approach, described in Section 32.5.2: Partitioning the Grid Automatically), or perform the partitioning yourself in the serial solver or after reading a mesh into the parallel solver (as described in Section 32.5.3: Partitioning the Grid Manually). In either case, the available partitioning methods are those described in Section 32.5.4: Grid Partitioning Methods. You can partition the grid before or after
32-27
Parallel Processing
you set up the problem (by dening models, boundary conditions, etc.), although it is better to partition after the setup, due to some model dependencies (e.g., adaption on non-conformal interfaces, sliding-mesh and shell-conduction encapsulation).
If your case le contains sliding meshes, or non-conformal interfaces on which you plan to perform adaption during the calculation, you will have to partition it in the serial solver. See Sections 32.5.2 and 32.5.3 for more information.
Note that the relative distribution of cells among compute nodes will be maintained during grid adaption, except if non-conformal interfaces are present, so repartitioning after adaption is not required. See Section 32.5.6: Load Distribution for more information. If you use the serial solver to set up the problem before partitioning, the machine on which you perform this task must have enough memory to read in the grid. If your grid is too large to be read into the serial solver, you can read the unpartitioned grid directly into the parallel solver (using the memory available in all the dened hosts) and have it automatically partitioned. In this case you will set up the problem after an initial partition has been made. You will then be able to manually repartition the case if necessary. See Sections 32.5.2 and 32.5.3 for additional details and limitations, and Section 32.5.5: Checking the Partitions for details about checking the partitions.
Before Partitioning
Domain
Interface Boundary
After Partitioning
Partition 0
Partition 1
Figure 32.5.1: Partitioning the Grid
32-28
32.5.2
Partitioning the Grid Automatically
For automatic grid partitioning, you can select the bisection method and other options for creating the grid partitions before reading a case le into the parallel version of the solver. For some of the methods, you can perform pretesting to ensure that the best possible partition is performed. See Section 32.5.4: Grid Partitioning Methods for information about the partitioning methods available in FLUENT. Note that if your case le contains sliding meshes, or non-conformal interfaces on which you plan to perform adaption during the calculation, you will need to partition it in the serial solver, and then read it into the parallel solver, with the Case File option turned on in the Auto Partition Grid panel (the default setting). The procedure for partitioning automatically in the parallel solver is as follows: 1. (optional) Set the partitioning parameters in the Auto Partition Grid panel (Figure 32.5.2). Parallel Auto Partition...
Figure 32.5.2: The Auto Partition Grid Panel
If you are reading in a mesh le or a case le for which no partition information is available, and you keep the Case File option turned on, FLUENT will partition the grid using the method displayed in the Method drop-down list. If you want to specify the partitioning method and associated options yourself, the procedure is as follows: (a) Turn o the Case File option. The other options in the panel will become available. (b) Select the bisection method in the Method drop-down list. The choices are the techniques described in Section 32.5.4: Bisection Methods. (c) You can choose to independently apply partitioning to each cell zone, or you can allow partitions to cross zone boundaries using the Across Zones check
32-29
Parallel Processing
button. It is recommended that you not partition cells zones independently (by turning o the Across Zones check button) unless cells in dierent zones will require signicantly dierent amounts of computation during the solution phase (e.g., if the domain contains both solid and uid zones). (d) If you have chosen the Principal Axes or Cartesian Axes method, you can improve the partitioning by enabling the automatic testing of the dierent bisection directions before the actual partitioning occurs. To use pretesting, turn on the Pre-Test option. Pretesting is described in Section 32.5.4: Pretesting. (e) Click OK. If you have a case le where you have already partitioned the grid, and the number of partitions divides evenly into the number of compute nodes, you can keep the default selection of Case File in the Auto Partition Grid panel. This instructs FLUENT to use the partitions in the case le. 2. Read the case le. File Read Case...
Reporting During Auto Partitioning

As the grid is automatically partitioned, some information about the partitioning process will be printed in the text (console) window. If you want additional information, you can print a report from the Partition Grid panel after the partitioning is completed. Parallel Partition... When you click the Print Active Partitions or Print Stored Partitions button in the Partition Grid panel, FLUENT will print the partition ID, number of cells, faces, and interfaces, and the ratio of interfaces to faces for each active or stored partition in the console window. In addition, it will print the minimum and maximum cell, face, interface, and faceratio variations. See Section 32.5.5: Interpreting Partition Statistics for details. You can examine the partitions graphically by following the directions in Section 32.5.5: Checking the Partitions.
32.5.3
Partitioning the Grid Manually
Automatic partitioning in the parallel solver (described in Section 32.5.2: Partitioning the Grid Automatically) is the recommended approach to grid partitioning, but it is also possible to partition the grid manually in either the serial solver or the parallel solver. After automatic or manual partitioning, you will be able to inspect the partitions created (see Section 32.5.5: Checking the Partitions) and optionally repartition the grid, if necessary. Again, you can do so within the serial or the parallel solver, using the Partition Grid panel. A partitioned grid may also be used in the serial solver without any loss in performance.
32-30
Guidelines for Partitioning the Grid

The following steps are recommended for partitioning a grid manually: 1. Partition the grid using the default bisection method (Principal Axes) and optimization (Smooth). 2. Examine the partition statistics, which are described in Section 32.5.5: Interpreting Partition Statistics. Your aim is to achieve small values of Interface ratio variation and Global interface ratio while maintaining a balanced load (Cell variation). If the statistics are not acceptable, try one of the other bisection methods. 3. Once you determine the best bisection method for your problem, you can turn on Pre-Test (see Section 32.5.4: Pretesting) to improve it further, if desired. 4. You can also improve the partitioning using the Merge optimization, if desired. Instructions for manual partitioning are provided below.
Using the Partition Grid Panel

For grid partitioning, you need to select the bisection method for creating the grid partitions, set the number of partitions, select the zones and/or registers, and choose the optimizations to be used. For some methods, you can also perform pretesting to ensure that the best possible bisection is performed. Once you have set all the parameters in the Partition Grid panel to your satisfaction, click the Partition button to subdivide the grid into the selected number of partitions using the prescribed method and optimization(s). See above for recommended partitioning strategies. You can set the relevant inputs in the Partition Grid panel (Figure 32.5.3 in the parallel solver, or Figure 32.5.4 in the serial solver) in the following manner: Parallel Partition... 1. Select the bisection method in the Method drop-down list. The choices are the techniques described in Section 32.5.4: Bisection Methods. 2. Set the desired number of grid partitions in the Number integer number eld. You can use the counter arrows to increase or decrease the value, instead of typing in the box. The number of grid partitions must be an integral multiple of the number of processors available for parallel computing.
32-31
Parallel Processing
Figure 32.5.3: The Partition Grid Panel in the Parallel Solver
Figure 32.5.4: The Partition Grid Panel in the Serial Solver
32-32
3. You can choose to independently apply partitioning to each cell zone, or you can allow partitions to cross zone boundaries using the Across Zones check button. It is recommended that you not partition cells zones independently (by turning o the Across Zones check button) unless cells in dierent zones will require signicantly dierent amounts of computation during the solution phase (e.g., if the domain contains both solid and uid zones). 4. You can select Encapsulate Grid Interfaces if you would like the cells surrounding all non-conformal grid interfaces in your mesh to reside in a single partition at all times during the calculation. If your case le contains non-conformal interfaces on which you plan to perform adaption during the calculation, you will have to partition it in the serial solver, with the Encapsulate Grid Interfaces and Encapsulate for Adaption options turned on. 5. If you have enabled the Encapsulate Grid Interfaces option in the serial solver, the Encapsulate for Adaption option will also be available. When you select this option, additional layers of cells are encapsulated such that transfer of cells will be unnecessary during parallel adaption. 6. You can activate and control the desired optimization methods (described in Section 32.5.4: Optimizations) using the items under Optimizations. You can activate the Merge and Smooth schemes by turning on the Do check button next to each one. For each scheme, you can also set the number of Iterations. Each optimization scheme will be applied until appropriate criteria are met, or the maximum number of iterations has been executed. If the Iterations counter is set to 0, the optimization scheme will be applied until completion, without limit on the maximum number of iterations. 7. If you have chosen the Principal Axes or Cartesian Axes method, you can improve the partitioning by enabling the automatic testing of the dierent bisection directions before the actual partitioning occurs. To use pretesting, turn on the Pre-Test option. Pretesting is described in Section 32.5.4: Pretesting. 8. In the Zones and/or Registers lists, select the zone(s) and/or register(s) for which you want to partition. For most cases, you will select all Zones (the default) to partition the entire domain. See below for details. 9. Click the Partition button to partition the grid. 10. If you decide that the new partitions are better than the previous ones (if the grid was already partitioned), click the Use Stored Partitions button to make the newly stored cell partitions the active cell partitions. The active cell partition is used for the current calculation, while the stored cell partition (the last partition performed) is used when you save a case le.
32-33
Parallel Processing
11. When using the dynamic mesh model in your parallel simulations, the Partition panel includes an Auto Repartition option and a Repartition Interval setting. These parallel partitioning options are provided because FLUENT migrates cells when local remeshing and smoothing is performed. Therefore, the partition interface becomes very wrinkled and the load balance may deteriorate. By default, the Auto Repartition option is selected, where a percentage of interface faces and loads are automatically traced. When this option is selected, FLUENT automatically determines the most appropriate repartition interval based on various simulation parameters. Sometimes, using the Auto Repartition option provides insucient results, therefore, the Repartition Interval setting can be used. The Repartition Interval setting lets you to specify the interval (in time steps or iterations respectively) when a repartition is enforced. When repartitioning is not desired, then you can set the Repartition Interval to zero.
Note that when dynamic meshes and local remeshing is utilized, updated meshes may be slightly dierent in parallel FLUENT (when compared to serial FLUENTor when compared to a parallel solution created with a different number of compute nodes), resulting in very small dierences in the solutions.
Partitioning Within Zones or Registers The ability to restrict partitioning to cell zones or registers gives you the exibility to apply dierent partitioning strategies to subregions of a domain. For example, if your geometry consists of a cylindrical plenum connected to a rectangular duct, you may want to partition the plenum using the Cylindrical Axes method, and the duct using the Cartesian Axes method. If the plenum and the duct are contained in two dierent cell zones, you can select one at a time and perform the desired partitioning, as described in Section 32.5.3: Using the Partition Grid Panel. If they are not in two dierent cell zones, you can create a cell register (basically a list of cells) for each region using the functions that are used to mark cells for adaption. These functions allow you to mark cells based on physical location, cell volume, gradient or isovalue of a particular variable, and other parameters. See Chapter 27: Grid Adaption for information about marking cells for adaption. Section 27.11.1: Manipulating Adaption Registers provides information about manipulating dierent registers to create new ones. Once you have created a register, you can partition within it as described above.
Note that partitioning within zones or registers is not available when Metis is selected as the partition Method.
For dynamic mesh applications (see item 11 above), FLUENT stores the partition method used to partition the respective zone. Therefore, if repartitioning is done, FLUENT uses the same method that was used to partition the mesh.
32-34
Reporting During Partitioning As the grid is partitioned, information about the partitioning process will be printed in the text (console) window. By default, the solver will print the number of partitions created, the number of bisections performed, the time required for the partitioning, and the minimum and maximum cell, face, interface, and face-ratio variations. (See Section 32.5.5: Interpreting Partition Statistics for details.) If you increase the Verbosity to 2 from the default value of 1, the partition method used, the partition ID, number of cells, faces, and interfaces, and the ratio of interfaces to faces for each partition will also be printed in the console window. If you decrease the Verbosity to 0, only the number of partitions created and the time required for the partitioning will be reported. You can request a portion of this report to be printed again after the partitioning is completed. When you click the Print Active Partitions or Print Stored Partitions button in the parallel solver, FLUENT will print the partition ID, number of cells, faces, and interfaces, and the ratio of interfaces to faces for each active or stored partition in the console window. In addition, it will print the minimum and maximum cell, face, interface, and face-ratio variations. In the serial solver, you will obtain the same information about the stored partition when you click Print Partitions. See Section 32.5.5: Interpreting Partition Statistics for details.
Recall that to make the stored cell partitions the active cell partitions you must click the Use Stored Partitions button. The active cell partition is used for the current calculation, while the stored cell partition (the last partition performed) is used when you save a case le.
Resetting the Partition Parameters If you change your mind about your partition parameter settings, you can easily return to the default settings assigned by FLUENT by clicking on the Default button. When you click the Default button, it will become the Reset button. The Reset button allows you to return to the most recently saved settings (i.e., the values that were set before you clicked on Default). After execution, the Reset button will become the Default button again.
32.5.4
Grid Partitioning Methods
Partitioning the grid for parallel processing has three major goals: Create partitions with equal numbers of cells. Minimize the number of partition interfacesi.e., decrease partition boundary surface area. Minimize the number of partition neighbors.
32-35
Parallel Processing
Balancing the partitions (equalizing the number of cells) ensures that each processor has an equal load and that the partitions will be ready to communicate at about the same time. Since communication between partitions can be a relatively time-consuming process, minimizing the number of interfaces can reduce the time associated with this data interchange. Minimizing the number of partition neighbors reduces the chances for network and routing contentions. In addition, minimizing partition neighbors is important on machines where the cost of initiating message passing is expensive compared to the cost of sending longer messages. This is especially true for workstations connected in a network. The partitioning schemes in FLUENT use bisection algorithms to create the partitions, but unlike other schemes which require the number of partitions to be a factor of two, these schemes have no limitations on the number of partitions. For each available processor, you will create the same number of partitions (i.e., the total number of partitions will be an integral multiple of the number of processors).
Bisection Methods
The grid is partitioned using a bisection algorithm. The selected algorithm is applied to the parent domain, and then recursively applied to the child subdomains. For example, to divide the grid into four partitions, the solver will bisect the entire (parent) domain into two child domains, and then repeat the bisection for each of the child domains, yielding four partitions in total. To divide the grid into three partitions, the solver will bisect the parent domain to create two partitionsone approximately twice as large as the otherand then bisect the larger child domain again to create three partitions in total. The grid can be partitioned using one of the algorithms listed below. The most ecient choice is problem-dependent, so you can try dierent methods until you nd the one that is best for your problem. See Section 32.5.3: Guidelines for Partitioning the Grid for recommended partitioning strategies. Cartesian Axes bisects the domain based on the Cartesian coordinates of the cells (see Figure 32.5.5). It bisects the parent domain and all subsequent child subdomains perpendicular to the coordinate direction with the longest extent of the active domain. It is often referred to as coordinate bisection. Cartesian Strip uses coordinate bisection but restricts all bisections to the Cartesian direction of longest extent of the parent domain (see Figure 32.5.6). You can often minimize the number of partition neighbors using this approach. Cartesian X-, Y-, Z-Coordinate bisects the domain based on the selected Cartesian coordinate. It bisects the parent domain and all subsequent child subdomains perpendicular to the specied coordinate direction. (See Figure 32.5.6.)
32-36
Cartesian R Axes bisects the domain based on the shortest radial distance from the cell centers to that Cartesian axis (x, y, or z) which produces the smallest interface size. This method is available only in 3D. Cartesian RX-, RY-, RZ-Coordinate bisects the domain based on the shortest radial distance from the cell centers to the selected Cartesian axis (x, y, or z). These methods are available only in 3D. Cylindrical Axes bisects the domain based on the cylindrical coordinates of the cells. This method is available only in 3D. Cylindrical R-, Theta-, Z-Coordinate bisects the domain based on the selected cylindrical coordinate. These methods are available only in 3D. Metis uses the METIS software package for partitioning irregular graphs, developed by Karypis and Kumar at the University of Minnesota and the Army HPC Research Center. It uses a multilevel approach in which the vertices and edges on the ne graph are coalesced to form a coarse graph. The coarse graph is partitioned, and then uncoarsened back to the original graph. During coarsening and uncoarsening, algorithms are applied to permit high-quality partitions. Detailed information about METIS can be found in its manual [161].
Note that when using the socket version (-pnet), the METIS partitioner is not available. In this case, METIS partitioning can be obtained using the partition lter, as described below.
Polar Axes bisects the domain based on the polar coordinates of the cells (see Figure 32.5.9). This method is available only in 2D. Polar R-Coordinate, Polar Theta-Coordinate bisects the domain based on the selected polar coordinate (see Figure 32.5.9). These methods are available only in 2D. Principal Axes bisects the domain based on a coordinate frame aligned with the principal axes of the domain (see Figure 32.5.7). This reduces to Cartesian bisection when the principal axes are aligned with the Cartesian axes. The algorithm is also referred to as moment, inertial, or moment-of-inertia partitioning. This is the default bisection method in FLUENT. Principal Strip uses moment bisection but restricts all bisections to the principal axis of longest extent of the parent domain (see Figure 32.5.8). You can often minimize the number of partition neighbors using this approach. Principal X-, Y-, Z-Coordinate bisects the domain based on the selected principal coordinate (see Figure 32.5.8).
32-37
Parallel Processing
Spherical Axes bisects the domain based on the spherical coordinates of the cells. This method is available only in 3D. Spherical Rho-, Theta-, Phi-Coordinate bisects the domain based on the selected spherical coordinate. These methods are available only in 3D.
3.00e+00
2.25e+00
1.50e+00
7.50e-01
0.00e+00 Contours of Cell Partition
Figure 32.5.5: Partitions Created with the Cartesian Axes Method
Optimizations
Additional optimizations can be applied to improve the quality of the grid partitions. The heuristic of bisecting perpendicular to the direction of longest domain extent is not always the best choice for creating the smallest interface boundary. A pre-testing operation (see Section 32.5.4: Pretesting) can be applied to automatically choose the best direction before partitioning. In addition, the following iterative optimization schemes exist: Smooth attempts to minimize the number of partition interfaces by swapping cells between partitions. The scheme traverses the partition boundary and gives cells to the neighboring partition if the interface boundary surface area is decreased. (See Figure 32.5.10.) Merge attempts to eliminate orphan clusters from each partition. An orphan cluster is a group of cells with the common feature that each cell within the group has at least one face which coincides with an interface boundary. (See Figure 32.5.11.) Orphan clusters can degrade multigrid performance and lead to large communication costs.
32-38
3.00e+00
2.25e+00
1.50e+00
7.50e-01
Figure 32.5.6: Partitions Created with the Cartesian Strip or Cartesian XCoordinate Method
3.00e+00
2.25e+00
1.50e+00
7.50e-01
Figure 32.5.7: Partitions Created with the Principal Axes Method
32-39
Parallel Processing
3.00e+00
2.25e+00
1.50e+00
7.50e-01
Figure 32.5.8: Partitions Created with the Principal Strip or Principal XCoordinate Method
3.00e+00
2.25e+00
1.50e+00
7.50e-01
Figure 32.5.9: Partitions Created with the Polar Axes or Polar ThetaCoordinate Method
32-40
Figure 32.5.10: The Smooth Optimization Scheme
Figure 32.5.11: The Merge Optimization Scheme
32-41
Parallel Processing
In general, the Smooth and Merge schemes are relatively inexpensive optimization tools.
Pretesting
If you choose the Principal Axes or Cartesian Axes method, you can improve the bisection by testing dierent directions before performing the actual bisection. If you choose not to use pretesting (the default), FLUENT will perform the bisection perpendicular to the direction of longest domain extent. If pretesting is enabled, it will occur automatically when you click the Partition button in the Partition Grid panel, or when you read in the grid if you are using automatic partitioning. The bisection algorithm will test all coordinate directions and choose the one which yields the fewest partition interfaces for the nal bisection. Note that using pretesting will increase the time required for partitioning. For 2D problems partitioning will take 3 times as long as without pretesting, and for 3D problems it will take 4 times as long.
Using the Partition Filter

As noted above, you can use the METIS partitioning method through a lter in addition to within the Auto Partition Grid and Partition Grid panels. To perform METIS partitioning on an unpartitioned grid, use the File/Import/Partition/Metis... menu item. File Import Partition Metis... FLUENT will use the METIS partitioner to partition the grid, and then read the partitioned grid into the solver. The number of partitions will be equal to the number of processes. You can then proceed with the model denition and solution.
Direct import to the parallel solver through the partition lter requires that the host machine has enough memory to run the lter for the specied grid. If not, you will need to run the lter on a machine that does have enough memory. You can either start the parallel solver on the machine with enough memory and repeat the process described above, or run the lter manually on the new machine and then read the partitioned grid into the parallel solver on the host machine.
To manually partition a grid using the partition lter, enter the following command: utility partition input-lename partition-count output-lename where input-lename is the lename for the grid to be partitioned, partition-count is the number of partitions desired, and output-lename is the lename for the partitioned grid. You can then read the partitioned grid into the solver (using the standard File/Read/Case... menu item) and proceed with the model denition and solution.
32-42
When the File/Import/Partition/Metis... menu item is used to import an unpartitioned grid into the parallel solver, the METIS partitioner partitions the entire grid. You may also partition each cell zone individually, using the File/Import/Partition/Metis Zone... menu item. File Import Partition Metis Zone... This method can be useful for balancing the work load. For example, if a case has a uid zone and a solid zone, the computation in the uid zone is more expensive than in the solid zone, so partitioning each zone individually will result in a more balanced work load.
32.5.5
Checking the Partitions
After partitioning a grid, you should check the partition information and examine the partitions graphically.
Interpreting Partition Statistics

You can request a report to be printed after partitioning (either automatic or manual) is completed. In the parallel solver, click the Print Active Partitions or Print Stored Partitions button in the Partition Grid panel. In the serial solver, click the Print Partitions button. FLUENT distinguishes between two cell partition schemes within a parallel problem: the active cell partition and the stored cell partition. Initially, both are set to the cell partition that was established upon reading the case le. If you re-partition the grid using the Partition Grid panel, the new partition will be referred to as the stored cell partition. To make it the active cell partition, you need to click the Use Stored Partitions button in the Partition Grid panel. The active cell partition is used for the current calculation, while the stored cell partition (the last partition performed) is used when you save a case le. This distinction is made mainly to allow you to partition a case on one machine or network of machines and solve it on a dierent one. Thanks to the two separate partitioning schemes, you could use the parallel solver with a certain number of compute nodes to subdivide a grid into an arbitrary dierent number of partitions, suitable for a dierent parallel machine, save the case le, and then load it into the designated machine. When you click Print Partitions in the serial solver, you will obtain information about the stored partition. The output generated by the partitioning process includes information about the recursive subdivision and iterative optimization processes. This is followed by information about the nal partitioned grid, including the partition ID, number of cells, number of faces, number of interface faces, ratio of interface faces to faces for each partition, number of neighboring partitions, and cell, face, interface, neighbor, mean cell, face ratio, and global face ratio variations. Global face ratio variations are the minimum and maximum values of the respective quantities in the present partitions. For example, in the sample
32-43
Parallel Processing
output below, partitions 0 and 3 have the minimum number of interface faces (10), and partitions 1 and 2 have the maximum number of interface faces (19); hence the variation is 1019. Your aim is to achieve small values of Interface ratio variation and Global interface ratio while maintaining a balanced load (Cell variation).
>> Partitions: P Cells I-Cells Cell Ratio 0 134 10 0.075 1 137 19 0.139 2 134 19 0.142 3 137 10 0.073 -----Partition count Cell variation Mean cell variation Intercell variation Intercell ratio variation Global intercell ratio Face variation Interface variation Interface ratio variation Global interface ratio Neighbor variation
Faces I-Faces Face Ratio Neighbors 217 10 0.046 1 222 19 0.086 2 218 19 0.087 2 223 10 0.045 1 = = = = = = = = = = = 4 (134 - 137) ( -1.1% (10 - 19) ( 7.3% 10.7% (217 - 223) (10 - 19) ( 4.5% 3.4% (1 - 2)
1.1%) 14.2%)
8.7%)
Computing connected regions; type ^C to interrupt. Connected region count = 4
Note that partition IDs correspond directly to compute node IDs when a case le is read into the parallel solver. When the number of partitions in a case le is larger than the number of compute nodes, but is evenly divisible by the number of compute nodes, then the distribution is such that partitions with IDs 0 to (M 1) are mapped onto compute node 0, partitions with IDs M to (2M 1) onto compute node 1, etc., where M is equal to the ratio of the number of partitions to the number of compute nodes.
32-44
Examining Partitions Graphically

To further aid interpretation of the partition information, you can draw contours of the grid partitions, as illustrated in Figures 32.5.532.5.9. Display Contours... To display the active cell partition or the stored cell partition (which are described above), select Active Cell Partition or Stored Cell Partition in the Cell Info... category of the Contours Of drop-down list, and turn o the display of Node Values. (See Section 29.1.2: Displaying Contours and Proles for information about displaying contours.)
i
32.5.6
If you have not already done so in the setup of your problem, you will need to perform a solution initialization in order to use the Contours panel.
Load Distribution
If the speeds of the processors that will be used for a parallel calculation dier signicantly, you can specify a load distribution for partitioning, using the load-distribution text command. parallel partition set load-distribution For example, if you will be solving on three compute nodes, and one machine is twice as fast as the other two, then you may want to assign twice as many cells to the rst machine as to the others (i.e., a load vector of (2 1 1)). During subsequent grid partitioning, partition 0 will end up with twice as many cells as partitions 1 and 2. Note that for this example, you would then need to start up FLUENT such that compute node 0 is the fast machine, since partition 0, with twice as many cells as the others, will be mapped onto compute node 0. Alternatively, in this situation, you could enable the load balancing feature (described in Section 32.6.3: Load Balancing) to have FLUENT automatically attempt to discern any dierence in load among the compute nodes.
If you adapt a grid that contains non-conformal interfaces, and you want to rebalance the load on the compute nodes, you will have to save your case and data les after adaption, read the case and data les into the serial solver, repartition using the Encapsulate Grid Interfaces and Encapsulate for Adaption options in the Partition Grid panel, and save case and data les again. You will then be able to read the manually repartitioned case and data les into the parallel solver, and continue the solution from where you left it.
32-45
Parallel Processing
32.6
Checking and Improving Parallel Performance
To determine how well the parallel solver is working, you can measure computation and communication times, and the overall parallel eciency, using the performance meter. You can also control the amount of communication between compute nodes in order to optimize the parallel solver, and take advantage of the automatic load balancing feature of FLUENT.
32.6.1
Checking Parallel Performance
The performance meter allows you to report the wall clock time elapsed during a computation, as well as message-passing statistics. Since the performance meter is always activated, you can access the statistics by printing them after the computation is completed. To view the current statistics, use the Parallel/Timer/Usage menu item. Parallel Timer Usage Performance statistics will be printed in the text window (console). To clear the performance meter so that you can eliminate past statistics from the future report, use the Parallel/Timer/Reset menu item. Parallel Timer Reset
32.6.2
Improving Input/Output Speed
By default, FLUENT reads in and automatically distributes the complete domain over the entire network of compute nodes, increasing the speed of your parallel processes. If the host machine has sucient memory, you can slightly improve the parallel performance using the text command interface (TUI). parallel set fast-io? The fast-io? command allows you to still maintain the same benets of speed. However, the complete domain is read on the host machine rst and then distributed, thus requiring the host machine to have sucient memory.
32.6.3
Optimizing the Parallel Solver
Increasing the Report Interval

In FLUENT, you can reduce communication and improve parallel performance by increasing the report interval for residual printing/plotting or other solution monitoring reports. You can modify the value for Reporting Interval in the Iterate panel. Solve Iterate...
32-46
32.6 Checking and Improving Parallel Performance
Note that you will be unable to interrupt iterations until the end of each report interval.
Load Balancing
A dynamic load balancing capability is available in FLUENT. The principal reason for using parallel processing is to reduce the turnaround time of your simulation, ideally by a factor proportional to the collective speed of the computing resources used. If, for example, you were using four CPUs to solve your problem, then you would expect to reduce the turnaround time by a factor of four. This is of course the ideal situation, and assumes that there is very little communication needed among the CPUs, that the CPUs are all of equal speed, and that the CPUs are dedicated to your job. In practice, this is often not the case. For example, CPU speeds can vary if you are solving in parallel on a heterogeneous collection of workstations, other jobs may be competing for use of one or more of the CPUs, and network trac either from within the parallel solver or generated from external sources may delay some of the necessary communication among the CPUs. If you enable dynamic load balancing in FLUENT, the load across the computational and networking resources will be monitored periodically. If the load balancer determines that performance can be improved by redistributing the cells among the compute nodes, it will automatically do so. There is a time penalty associated with load balancing itself, and so it is disabled by default. If you will be using a dedicated homogeneous resource, or if you are using a heterogeneous resource but have accounted for dierences in CPU speeds during partitioning by specifying a load distribution (see Section 32.5.6: Load Distribution), then you may not need to use load balancing.
Note that when the shell conduction model is used, you will not be able to turn on load balancing.
To enable and control FLUENTs automatic load balancing feature, use the Load Balance panel (Figure 32.6.1). Load balancing will automatically detect and analyze parallel performance, and redistribute cells between the existing compute nodes to optimize it. Parallel Load Balance... The procedure for using load balancing is as follows: 1. Turn on the Load Balancing option. 2. Select the bisection method to create new grid partitions in the Partition Method drop-down list. The choices are the techniques described in Section 32.5.4: Bisection Methods. As part of the automatic load balancing procedure, the grid will be repartitioned into several small partitions using the specied method. The resulting partitions will then be distributed among the compute nodes to achieve a more balanced load.
32-47
Parallel Processing
Figure 32.6.1: The Load Balance Panel
3. Specify the desired Balance Interval. When a value of 0 is specied, FLUENT will internally determine the best value to use, initially using an interval of 25 iterations. You can override this behavior by specifying a non-zero value. FLUENT will then attempt to perform load balancing after every N iterations, where N is the specied Balance Interval. You should be careful to select an interval that is large enough to outweigh the cost of performing the load balancing operations. Note that you can interrupt the calculation at any time, turn the load balancing feature o (or on), and then continue the calculation.
If problems arise in your computations due to adaption, you can turn o the automatic load balancing, which occurs any time that mesh adaption is performed in parallel.
To instruct the solver to skip the load balancing step, issue the following command: (disable-load-balance-after-adaption) To return to the default behavior use the following command: (enable-load-balance-after-adaption)
32-48
32.7 Running Parallel FLUENT under SGE
32.7
Running Parallel FLUENT under SGE
Sun Grid Engine (SGE) software is a distributed computing resource management tool that you can use with either the serial or the parallel version of FLUENT. FLUENT submits a process to the SGE software, then SGE selects the most suitable machine to process the FLUENT simulation. You can congure SGE and select the criteria by which SGE determines the most suitable machine for the FLUENT simulation. Among many other features, running a FLUENT simulation using SGE allows you to: Save the current status of the job (i.e., in the case of FLUENT saving .case and .data les, also known as checkpointing.) Migrate the simulation to another machine Restart the simulation on the same or another machine
32.7.1
Overview of FLUENT and SGE Integration

Sun Grid Engine software version 5.3 or higher, available online at http://www.sun.com/ Fluent 6.x, available from Fluent Inc.
Requirements
FLUENT and SGE Communication FLUENT and SGE communicate with each other through checkpointing and migration commands. To checkpoint, or save, FLUENT simulations, SGE uses an executable le called checkpoint command.fluent. To migrate FLUENT simulations to another machine, SGE uses another executable le called migration command.fluent. Checkpointing Directories FLUENT creates a checkpointing sub-directory, identied by the job ID. The checkpointing directory contains les related only to the submitted job. Checkpointing Trigger Files When a FLUENT simulation needs to be checkpointed, SGE creates a checkpoint trigger le (.check) in the job subdirectory, which causes FLUENT to checkpoint and continue running. If the job needs to be migrated, because of a machine crash or for some other reason, a dierent trigger le (.exit) is created which causes FLUENT to checkpoint and exit.
32-49
Parallel Processing
Default File Location By default, the following SGE-related FLUENT les are installed in path/Fluent.Inc /addons/sge1.0, where path is the directory in which you have placed the release directory, Fluent.Inc. ckpt command.fluent migr command.fluent sge request kill-fluent sample ckpt obj sample pe These les are described later in this section.
32.7.2
Conguring SGE for FLUENT
SGE must be installed properly if checkpointing is needed or parallel FLUENT is being run under SGE. The checkpoint queues must be congured rst, and they must be congured by someone with manager or root privileges.
General Conguration
Using the SGE graphical interface, the general conguration of SGE and FLUENT requires the following:. 1. The Shell Start Mode must be set to unix behavior. 2. Under Type, the checkpointing option must be marked as true. When running parallel FLUENT simulations, the following options are also important: 1. Under Type, the parallel option must be marked as true. 2. The value of slots should be set to a value greater than 1.
Checkpoint Conguration
Checkpointing is congured using the MinCpuTime eld. This eld species the time interval between checkpoints. The value of MinCpuTime should be a reasonable amount of time. Too low a value for MinCpuTime results in frequent checkpointing operations and writing .case and .data les can be computationally expensive.
32-50
SGE requires checkpointing objects to perform checkpointing operations. FLUENT provides a sample checkpointing object called fluent ckpt obj. Checkpoint conguration also requires root or manager privileges. While creating new checkpointing objects for FLUENT, keep the default values as given in the sample/default object provided by FLUENT and change only the following values: Queue List The queue list should contain the queues that are able to be used as checkpoint objects. Checkpointing and Migration Command This value should only be changed when the executable les are not in the default location, in which case, the full path should be specied. All the les (i.e., ckpt command.fluent and migr command.fluent) should be located in a directory that is accessible from all machines where the FLUENT simulation is running. The default place for these les is path/Fluent.Inc/addons/sge1.0, where path is the directory in which you have placed the release directory, Fluent.Inc. Checkpointing Directory This value dictates where the checkpointing subdirectories are created, and hence users must have the correct permission to this directory. Also this directory should be visible to all machines where the FLUENT simulation is running. The default value is NONE where FLUENT uses the current working directory as the checkpointing directory.
Conguring Parallel Environments

For submitting parallel jobs, SGE needs a parallel environment interface, or PE interface. FLUENT provides a sample parallel environment interface, fluent pe. Parallel environment conguration requires root or manager privileges. While creating any new parallel environment for FLUENT, change only the following values: Queue List This should contain all the queues where qtype has been set to PARALLEL. User/XUser List This contains the list of users who are allowed or denied access to the parallel environment. Stop Proc Args This should be changed only if the kill-fluent executable is not in the default directory, in which case, the full path to the le should be given, and the path should be accessible from every machine.
32-51
Parallel Processing
Slots This should be set to a large numerical value indicating the maximum of slots that can be occupied by all the parallel environment jobs that are running. FLUENT uses fluent pe as the default parallel environment, therefore, there must be one parallel environment with this name.
Conguring a Default Request File

As an alternative to specifying numerous command line options or arguments when you invoke SGE, you can provide a list of SGE options in a default request le. A default request le should be set up when using SGE with FLUENT. FLUENT provides a sample request le called sge request. All default request options and arguments for SGE that are common to all users can be placed under a default request le, sge request located in the <sge root>/<cell>/ common/ directory. Individual users can set their own default arguments and options in a private general request le, .sge request, located in their $HOME directory. Private general request les override the options set by the global sge request le in the <sge root>/<cell>/common/ directory. Any settings found in either the global or private default request le can be overridden by specifying new options in the command line.
32.7.3
Running a FLUENT Simulation under SGE
Running FLUENT under SGE requires additional options for the FLUENT command syntax. The command line syntax is as follows: fluent <other FLUENT options> -sge [-sgeckpt ckpt object] [-sgeq queue name] [-sgepe parallel env N MIN-N MAX ] The additional parameters are explained below. -sge Tells FLUENT to run under SGE. -sgeckpt ckpt object Species the checkpointing object and overrides the checkpointing option specied in the default request le. If this option is not specied in the command line, and the default general request le contains no setting, then the FLUENT simulation is unable to use checkpoints.
32-52
-sgeq queue name Species the name of the queue. sgepe parallel env N MIN-N MAX Species the parallel environment to be used, when FLUENT is run in parallel. This should be specied only if the -sge option is also specied. N MIN and N MAX represent the minimum and maximum number of requested nodes. This parameter is optional. For parallel FLUENT, if this parameter is not specied, but the command line -sge option is given, FLUENT will take fluent pe as the default parallel environment, 1 as N MIN and the number of nodes requested to FLUENT as N MAX. The following examples demonstrate various command line syntax meanings: Serial FLUENT simulation running under SGE. fluent 2d -sge Serial FLUENT simulation with checkpoints. fluent 2d -sge -sgeckpt fluent ckpt Parallel FLUENT simulation running under SGE. fluent 2d -t<N> -pnet -sge Parallel FLUENT under SGE with a dierent parallel environment. fluent 2d -t<N> -pnet -sge -sgepe diff pe <N MIN>-<N MAX>
In this example, note that the -t<N> option will be ignored and the <N MIN>-<N MAX> option will take precedence.
Parallel FLUENT under SGE with a dierent parallel environment and a queue. fluent 2d -t<N> -pnet -sge -sgeq large -sgepe diff pe <N MIN>-<N MAX>
32-53
Parallel Processing
32.8
Running Parallel FLUENT under LSF
Platform Computing Corporations LSF software is a distributed computing resource management tool that you can use with either the serial or the parallel versions of FLUENT. Using LSF, FLUENT simulations can take full advantage of LSF checkpointing (i.e., saving FLUENT .case and .data les) and migration features. LSF is also integrated with FLUENTs vendor-based vmpi and the socket-based net MPI communication libraries for distributed MPI processing, increasing the eciency of the software and data processing.
Running FLUENT under LSF is not currently supported on Windows.
Platforms Standard Edition is the foundation for all LSF products, it oers users load sharing and batch scheduling across distributed UNIX, LINUX and Windows NT computing environments. Platforms LSF Standard Edition provides the following functionality: Comprehensive Distributed Resource Management dynamic load sharing services batch scheduling and resource management policies Flexible Queues and Sharing Control priority load conditions for job scheduling time windows for job processing limits on the number of running jobs and job resource consumption Fair-share Scheduling of Limited Resources manage shares for users and user groups ensure fair sharing of limited computing resources Maximum Fault Tolerance provides batch service as long as one computer is active ensures that no job is lost when the entire network goes down restarts jobs on other compute nodes when a computer goes down
32-54
32.8 Running Parallel FLUENT under LSF
32.8.1
Overview of FLUENT and LSF Integration

LSF Batch 3.2+, available from Platform Computing Corporation http://www.platform.com/ Fluent 6.x, available from Fluent Inc.
Requirements
Optional Requirements The echkpnt.fluent and erestart.fluent shell scripts, available from Platform Computing Corporation or Fluent Inc. permit FLUENT checkpointing and restarting from within LSF Hardware vendor supplied MPI environment for network computing. If available, the vmpi version of FLUENT may perform better than the net version. Obtaining Distribution Files Distribution les for LSF to be used with FLUENT are available from Platform Computing Corporation. Installation instructions are included. The les are available from your LSF vendor, and from Platforms corporate web site ( http://www.platform.com/ ) and FTP site ( ftp.platform.com ). The ftp directory location is /lsf/distrib/integra/Fluent. Access to the download area of the Platform web site and the Platform FTP site is controlled by login name and password. If you are unable to access the distribution les, contact Platform at info@platform.com.
32.8.2
Checkpointing and Restarting
LSF provides utilities to save (i.e., checkpoint), and restart an application. The FLUENT and LSF integration allows FLUENT to take advantage of the checkpoint and restart features of LSF. At the end of each iteration, FLUENT looks for the existence of a checkpoint or checkpoint-exit le. If FLUENT detects the checkpoint le, it writes a case and data le, removes the checkpoint le, and continues iterating. If FLUENT detects the checkpoint-exit le, it writes a case le and data le, then exits. LSFs bchkpnt utility can be used to create the checkpoint and checkpoint-exit les, thereby forcing FLUENT to checkpoint itself, or checkpoint and terminate itself. In addition to writing a case le and data le, FLUENT also creates a simple journal le with instructions to read the checkpointed case le and data le, and continues iterating. FLUENT uses that journal le when restarted with LSFs brestart utility. The greatest benet of the checkpoint facilities occurs when it is used on an automatic basis. By starting jobs with a periodic checkpoint, LSF automatically restarts any jobs that are lost due to host failure from the last checkpoint. This facility can dramatically reduce lost compute time, and also avoids the task of manually restarting failed jobs.
32-55
Parallel Processing
FLUENT Checkpoint Files

To checkpoint FLUENT jobs using LSF, LSF supplies special versions of echkpnt and erestart. These FLUENT checkpoint les are called echkpnt.fluent and erestart.fluent.
Checkpoint Directories
When you submit a checkpointing job, you specify a checkpoint directory. Before the job starts running, LSF sets the environment variable LSB CHKPNT DIR. The value of LSB CHKPNT DIR is a subdirectory of the checkpoint directory specied in the command line. This subdirectory is identied by the job ID and only contains les related to the submitted job.
Checkpoint Trigger Files

When you checkpoint a FLUENT job, LSF creates a checkpoint trigger le (.check) in the job subdirectory, which causes FLUENT to checkpoint and continue running. A special option is used to create a dierent trigger le (.exit), to cause FLUENT to checkpoint and exit the job. FLUENT uses the LSB CHKPNT DIR environment variable to determine the location of checkpoint trigger les. It checks the job subdirectory periodically while running the job. FLUENT does not perform any checkpointing unless it nds the LSF trigger le in the job subdirectory. FLUENT removes the trigger le after checkpointing the job.
Restart Jobs
If a job is restarted, LSF attempts to restart the job with the -restart option appended to the original fluent command. FLUENT uses the checkpointed data and case les to restart the process from that checkpoint point, rather than repeating the entire process. Each time a job is restarted, it is assigned a new job ID, and a new job subdirectory is created in the checkpoint directory. Files in the checkpoint directory are never deleted by LSF, but you may choose to remove old les once the FLUENT job is nished and the job history is no longer required.
32-56
32.8.3
Conguring LSF for FLUENT
LSF provides special versions of echkpnt and erestart called echkpnt.fluent and erestart.fluent to allow checkpointing with FLUENT. You must make sure LSF uses these les instead of the standard versions. 1. To congure LSF (v4.0 or earlier) for FLUENT: Overwrite the standard versions of echkpnt and erestart with the special FLUENT versions. OR Complete the following steps: (a) Leave the standard LSF les in the default location and install the FLUENT versions in a dierent directory. (b) In lsf.conf, modify the LSF ECHKPNTDIR environment variable to point to the FLUENT versions. The LSF ECHKPNTDIR environment variable species the location of the echkpnt and erestart les that LSF will use. If this variable is not dened, LSF uses the les in the default location, identied by the environment variable LSF SERVERDIR. (c) Save the changes to lsf.conf. (d) Recongure the cluster with the commands lsadmin reconfig and badmin reconfig. LSF checks for any conguration errors. If no fatal errors are found, you are asked to conrm reconguration. If fatal errors are found, reconguration is aborted. 2. To congure LSF (v4.1 or later) for FLUENT: Copy the echkpnt.fluent and erestart.fluent les to the LSF SERVERDIR for each architecture that is desired. Submit the job with the method=fluent parameter when specifying the checkpoint information (examples are provided below) .
Note that LSF includes an email notication utility that sends email notices to users when an LSF job has been completed. If a user submits a batch job to LSF and the email notication utility is enabled, LSF will distribute an email containing the output for the particular LSF job. When a FLUENT job is run under LSF with the -g option, the email will also contain information from the FLUENT console window.
32-57
Parallel Processing
32.8.4
Submitting a FLUENT Job
To submit a FLUENT job using LSF, you need to include certain LSF checkpointing parameters in the standard call to FLUENT. Submitting a batch job requires the bsub command. The syntax for the bsub command to submit a FLUENT job is: bsub [-k checkpoint dir | -k "checkpoint dir[checkpoint period]" [bsub options] FLUENT command [FLUENT options] -lsf The checkpointing feature for FLUENT jobs requires all of the following parameters: -k checkpoint dir Regular option to bsub that species the name of the checkpoint directory. FLUENT command Regular command used with FLUENT software. -lsf Special option to the FLUENTcommand. Species that FLUENT is running under LSF, and causes FLUENT to check for trigger les in the checkpoint directory if the environment variable LSB CHKPNT DIR is set.
32.8.5
Checkpointing FLUENT Jobs
Checkpointing a batch job requires the bchkpnt command. The syntax for the bchkpnt command is: bchkpnt [bsub options][-k]job ID -k Regular option to the bchkpnt command, species checkpoint and exit. The job will be killed immediately after being checkpointed. When the job is restarted, it does not have to repeat any operations. job ID The job ID of the FLUENT job. Used to specify which job to checkpoint.
32-58
32.8.6
Restarting FLUENT Jobs
Restarting a batch job requires the brestart command. The syntax for the brestart command is: brestart [bsub options] checkpoint directory job ID checkpoint directory Species the checkpoint directory, where the job subdirectory is located. job ID The job ID of the FLUENT job, species which job to restart. At this point, the restarted job is assigned a new job ID, and the new job ID is used for checkpointing. The job ID changes each time the job is restarted.
32.8.7
Migrating FLUENT Jobs
Migrating a FLUENT job requires the bmig command. The syntax for the bmig command is: bmig [bsub options] job ID job ID The job ID of the FLUENT job, species which job to restart. At this point, the restarted job is assigned a new job ID, and the new job ID is used for checkpointing. The job ID changes each time the job is restarted.
32.8.8
Using FLUENT and LSF
This section describes various examples of running FLUENT and LSF. Serial FLUENT interactive job under LSF bsub -I fluent 3d -lsf Serial FLUENT batch job under LSF bsub fluent 3d -g -i journal file -lsf Parallel FLUENT net version interactive job under LSF, on <N> CPUs bsub -I -n <N> fluent 3d -t0 -pnet -lsf
32-59
Parallel Processing
PAM is an extension of LSF that manages parallel processes by choosing the appropriate compute nodes and launching child processes. When using the net version of FLUENT, PAM is not used to launch FLUENT (so the JOB STARTER argument of the LSF queue should not be set). Instead, LSF will set an environment variable that contains a list of <N> hosts, and FLUENT will use this list to launch itself. The integration between PAM and FLUENT is supported on the following platforms: lnx86; Ultra; HPUX; IBM SP (AIX). Of these platforms, all except lnx86 can use the vmpi version of FLUENT.
Parallel FLUENT net version batch job under LSF, using <N> processes bsub -n <N> fluent 3d -t0 -pnet -g -i journal file -lsf The examples below apply to both interactive and batch submissions. For brevity, only batch submissions are described. Usage of the LSF checkpoint and restart capabilities, requiring echkpnt and erestart, are described as follows: Serial FLUENT batch job under LSF with checkpoint/restart bsub -k " /home/username method=fluent 60" fluent 3d -g -i journal file -lsf Submits a job that uses /home/username as the checkpoint directory, the LSF 4.1 method= specication for which echkpnt/erestart combination to use, and a 60 minute duration between automatic checkpoints. bjobs Returns the <JOB ID>s of the batch jobs in the LSF system bchkpnt <JOB ID> Forces FLUENT to write a case le and a data le as well as a restart journal le at the end of its current iteration The les are saved in a directory named [chkpnt dir]/<JOB ID> FLUENT then continues to iterate bchkpnt -k <JOB ID> Forces FLUENT to write a case le and a data le as well as a restart journal le at the end of its current iteration The les are saved in a directory named [chkpnt dir]/<JOB ID> and then FLUENT exits
32-60
brestart [chkpnt dir] <JOB ID> Starts a FLUENT job using the latest case and data les in the [chkpnt dir]/<JOB ID> directory The restart journal le [chkpnt dir]/<JOB ID>/#restart.inp is used to instruct FLUENT to read the latest case and data les in that directory, and continue iterating Parallel FLUENT VMPI version batch job under LSF with checkpoint/restart, using <N> processes bsub -k [chkpnt dir] -n <N> fluent 3d t<N> -pvmpi -g -i journal file -lsf bjobs Returns the <JOB ID>s of the batch jobs in the LSF system bchkpnt <JOB ID> Forces parallel FLUENT to write a case le and data le as well as a restart journal le at the end of its current iteration The les are saved in a directory named [chkpnt dir]/<JOB ID> Parallel FLUENT then continues to iterate bchkpnt -k <JOB ID> Forces parallel FLUENT to write a case le and a data le and a restart journal le at the end of its current iteration The les are saved in a directory named [chkpnt dir]/<JOB ID> Parallel FLUENT then exits brestart [chkpnt dir] <JOB ID> Starts a FLUENT network parallel job using the latest case and data les in the [chkpnt dir]/<JOB ID> directory The restart journal le [chkpnt dir]/<JOB ID>/#restart.inp is used to instruct FLUENT to read the latest case and data les in that directory, and continue iterating The parallel job will be restarted using same <N> number of processes as that used for the original bsub submission bmig m host 0 All jobs on host host will be checkpointed and moved to another host.
32-61
Parallel Processing
32.9
Running Parallel FLUENT under Other Resource Management Tools
Many hardware vendors oer workload balancing software or computing resource management tools, such as IBMs Load Leveler, or Altair Engineering Inc.s PBSPro. These other resource management tools are not supported by FLUENT . If you experience difculties running FLUENT in conjunction with these or other tools, please contact the appropriate hardware vendor for assistance.
32-62

More Details On Fluent

Uploaded by

Copyright:

Available Formats

You might also like

More Details On Fluent

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

More Details On Fluent

Uploaded by

Copyright:

Available Formats

Chapter 32.

Introduction to Parallel Processing

c Fluent Inc. January 11, 2005

Figure 32.1.1: Serial FLUENT Architecture

c Fluent Inc. January 11, 2005

32.1 Introduction to Parallel Processing

Data: Cell Face Node

Data: Cell Face Node

Data: Cell Face Node

Figure 32.1.2: Parallel FLUENT Architecture

c Fluent Inc. January 11, 2005

Recommended Usage of Parallel FLUENT

32.2 Starting the Parallel Version of the Solver

Starting the Parallel Solver on a UNIX System

Running on a Multiprocessor UNIX Machine

c Fluent Inc. January 11, 2005

32.2 Starting the Parallel Version of the Solver

Figure 32.2.1: The Select Solver Panel

c Fluent Inc. January 11, 2005

c Fluent Inc. January 11, 2005

32.2 Starting the Parallel Version of the Solver

beo * scampi * tmpi

-pbeo beowoulf -pscampi SCAMPI -ptmpi MPI

yes yes yes

yes yes yes

c Fluent Inc. January 11, 2005

Running on a UNIX Workstation Cluster

c Fluent Inc. January 11, 2005

32.2 Starting the Parallel Version of the Solver

Starting the Parallel Solver on a LINUX System

Running on a Multiprocessor LINUX Machine

c Fluent Inc. January 11, 2005

Running on a LINUX Workstation Cluster

c Fluent Inc. January 11, 2005

32.2 Starting the Parallel Version of the Solver

Running With Multiple Network Cards

Starting the Parallel Solver on a Windows System

Running on a Multiprocessor Windows Machine

c Fluent Inc. January 11, 2005

Running on a Windows Cluster

c Fluent Inc. January 11, 2005

32.2 Starting the Parallel Version of the Solver

c Fluent Inc. January 11, 2005

c Fluent Inc. January 11, 2005

32.3 Using the Fluent Launcher (Windows only)

Using the Fluent Launcher (Windows only)

Figure 32.3.1: The Fluent Launcher Panel

c Fluent Inc. January 11, 2005

Using the Fluent Launcher From Another Machine

Setting Executable Options With the Fluent Launcher

c Fluent Inc. January 11, 2005

32.3 Using the Fluent Launcher (Windows only)

Setting Parallel Options With the Fluent Launcher

Setting Additional Options With the Fluent Launcher

c Fluent Inc. January 11, 2005

Fluent Launcher Path Setup

Figure 32.3.2: The Paths Tab in the Fluent Setup Panel