AA V4 I2 Speed Up Simulation With GPU PDF

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

Analysis Tools

Speed Up Simulations
with a GPU
A new feature in ANSYS Mechanical leverages graphics
processing units to significantly lower solution times for
large analysis problem sizes.
By Jeff Beisheim, Senior Software Developer, ANSYS, Inc.

E n g i n e e r s l o o k i n g t o i m p ro v e A new feature called the general- Initial GPU Implementation


turnaround times for increasingly purpose GPU accelerator capability is The GPU accelerator capability
complex engineering simulations — available in a preview version of accelerates only the shared-memory
especially those involving models ANSYS 13.0 to take advantage of equation solvers that support the
with complicated multiple physics these high-end devices when usage of a GPU — the sparse direct
and greatly refined meshes — may performing structural mechanics a n d p re c o n d i t i o n e d c o n j u g a t e
want to investigate using a graphics simulations. The accelerator works by gradient (PCG)/Jacobi conjugate
processing unit (GPU). These devices offloading some of the most numeric- gradient (JCG) iterative solvers. This
have been around for years working intensive algorithms from the CPU includes the use of block Lanczos
alongside the computer’s CPU to onto the GPU. These algorithms are and PCG Lanczos solvers in an
speed up graphics operations. Only part of the equation solutions eigenvalue buckling or mode
recently have GPUs been specifically occurring during a simulation. Other frequency analysis. Other equation
developed with the computational computations during simulation solvers will continue to run on only
precision required for finite element remain on the CPU. In this way, the the CPU and will not see any perfor-
analysis (FEA) solutions as well as the GPU and CPU work in a collaborative mance benefit when using the GPU
computational power to effectively fashion to help speed up the time to accelerator capability.
complement the performance of the solution. While limited to simulations Other limitations when using the
latest CPUs. With hundreds of low- using the shared-memory solvers of GPU accelerator capability include:
power cores on a single socket, GPUs the prerelease version ANSYS 13.0, • Windows® x64 and Linux® x64
have the potential to dramatically this initial support of GPU computing are currently the only platforms
increase computing capacity, pro- for structural mechanics simulations supported. Windows users
vided that the compute workload will is an important first step in leveraging should be aware that use of
fit in the available memory of the devices as a significant new remote desktop disables the
the GPU. re s o u rc e i n h i g h - p e r f o r m a n c e use of the GPU to accelerate
computing (HPC). structural mechanics simulations.

ANSYS Advantage • © 2010 Ansys, Inc. www.ansys.com


Analysis Tools

• Only NVIDIA® Tesla™ GPU is should accelerate the solution, when one ANSYS Mechanical simu-
currently supported for use when possible, without requiring input from lation — to be run at a time on
accelerating ANSYS structural the user. For cases in which it does the GPU.
mechanics simulations. Only the not apply, this new feature will simply • Another hardware setting for
more powerful 20-series (Fermi) have no effect on the program NVIDIA Tesla 20-series GPUs
GPUs are recommended, as behavior. disables error correcting (ECC)
these are the most compu- memory to make use of all the
tationally powerful and, memory on the GPU card as
therefore, the most likely to well as to increase overall
produce faster solution times. memory bandwidth and GPU
• The GPU accelerator capability performance. To ensure FEA
is not currently supported when result accuracy, however, it is
using Distributed ANSYS. recommended that users keep
the default setting of ECC
Activating the New Feature memory enabled.
For commercial license users, the
GPU accelerator capability is enabled NVIDIA Tesla 20-series GPUs such as the When to Use a GPU
C2050 and C2070 are the most computationally
using the ANSYS HPC Pack licensing The amount of acceleration
powerful and, therefore, most likely to produce
model. For academic license users, faster solution times in ANSYS simulations. achievable when using the GPU
the GPU capability is included within accelerator capability will vary greatly
the base ANSYS Academic product Optional Control Settings depending on the hardware
(that provides access to ANSYS A new ACCOPTION command is being used and the model being
Mechanical or higher capability) and available for users who want additional simulated. The following guidelines
no add-on Academic HPC licenses control over various settings related to can help determine whether use of
are required. Engineers can use a the GPU accelerator capability: the GPU accelerator capability will
GPU to accelerate computations on • Activate to control which provide a performance boost. In
conventional multicore processors analysis will use/not use the general, the capability provides the
without any additional GPU-specific GPU accelerator capability g re a t e s t re d u c t i o n s i n o v e r a l l
licensing required. During structural • MinSzThresh, a threshold simulation time when the following
mechanics simulations, ANSYS parameter to determine when conditions are met:
Mechanical APDL software makes the sparse direct solver data size • The simulation spends most of
use of only a single GPU per is large enough to justify using its time on the numerical
simulation. the GPU analysis solution rather than
ANSYS Mechanical APDL users • SPkey to control the use of other tasks, such as pre- and
can activate the accelerator capability single- or double-precision math post-processing. Only the
simply by selecting the H i g h - operations when running the operation of the solver is
P e r f o r m a n c e S e t u p tab in sparse direct solver on the GPU accelerated with a GPU,
the launcher and then checking the including analyses that use the
GPU Accelerator Capability In addition, some hardware sparse direct or PCG /JCG
box. Alternatively, -acc nvidia settings for NVIDIA GPU cards can be iterative solvers (including block
can be added to the list of arguments useful under certain scenarios: Lanczos and PCG Lanczos
supplied on the ANSYS Mechanical • Environment variables are eigensolvers).
A P D L c o m m a n d l i n e . A NSYS available in ANSYS Mechanical • The problem size is in the
Workbench users can choose to APDL to help avoid over- following ranges:
activate the GPU accelerator subscribing the GPU hardware – 500K to 5,000K DOFs for the
capability during solution by for users with multiple GPU sparse direct solver
modifying the GPU acceleration cards or users who run in a – 500K to 3,000K DOFs for
option on the A d v a n c e d multi-user environment, such as PCG/JCG iterative solvers
Properties page of the Solve a server.
Process Settings. • NVIDIA GPU users can consider Size guidelines listed above
Once the GPU accelerator capa- switching their hardware to represent the general range many
bility is activated, when ANSYS exclusive mode, which allows users now routinely work within and
Mechanical APDL is launched it only one process — for example, are based on the NVIDIA Tesla C2050,

ANSYS Advantage • © 2010 Ansys, Inc. www.ansys.com


Analysis Tools

which offers 3 GB of memory. GPUs


9.12
containing more or less memory can 8.39
10 8.63
be expected to adjust these sizes
9
accordingly. For simulations involving 8
model sizes outside these ranges, 7
4.43 4.77
some acceleration may be achieved, 6
3.06
but generally the code will avoid using 5 2.93
4
the GPU and run the entire simulation
3
using the CPU cores.
2
When using the sparse direct 1 4 CPU Cores + 1 GPU
solver, all analysis types are sup- 0 4 CPU Cores
V12cg-1
ported except substructuring. Models (JCG, 1100k) V12ln-2 1 CPU Core + 1 GPU
(LANB, 500k) V12sp-1
V12sp-2
that create nonsymmetric matrices — (Sparse, 430k)
(Sparse, 500k) V12sp-3 1 CPU Core
V12sp-4
such as frictional contact models that (Sparse, 2400k)
(Sparse, 1000k) V12sp-5
(Sparse, 2100k)
use the NROPT, UNSYM command
— are supported with the GPU accel- GPU accelerator capability of solver kernel speedups using a prerelease version of ANSYS 13.0
erator capability. However, models
4.95
that require the use of partial pivoting
4.72
are not supported. Partial pivoting is 5
activated by the solver automatically 3.13
3.03
3.56
4
when certain element types and 2.46
2.29
options are included, such as current- 3

technology elements containing the 2


mixed u-P formulation option and
1
contact elements with the pure 4 CPU Cores + 1 GPU
Lagrange formulation option. In these 0 4 CPU Cores
V12cg-1
V12ln-2
cases, the GPU accelerator capability (JCG, 1100k)
(LANB, 500k) V12sp-1
1 CPU Core + 1 GPU
(Sparse, 430k) V12sp-2
is deactivated and the solution pro- (Sparse, 500k) V12sp-3
(Sparse, 2400k) V12sp-4
1 CPU Core
(Sparse, 1000k) V12sp-5
ceeds using the CPU. (Sparse, 2100k)
When using the PCG/JCG solver, GPU accelerator capability of overall speedups using a prerelease version of ANSYS 13.0
all analysis types supported by these
solvers are also supported when processor (2.8 GHz) as well as double the GPU accelerates only the key
using the GPU accelerator capability. precision computations. The GPU equation solver kernels and nothing
The use of the memory-saving option used for this benchmarking was a else, the overall speedups are
(MSAVE command) for the PCG Tesla C2050 with ECC memory expected to be lower as other
solver will deactivate the capability. enabled. simulation processes are involved in
Results are shown in the the timings.
Comparing GPU and CPU Performance accompanying charts for seven of
Concerning the amount of the 10 problems contained in the Future Directions
speedup obtained when using a GPU, ANSYS 12.0 benchmark set using a As GPU computing trends evolve,
it is important to clarify what is being p r e r e l e a s e v e r s i o n o f A NSYS ANSYS will continue to enhance its
evaluated. Comparison to a single Mechanical APDL software. Of the offerings as necessary for a variety of
CPU core, for example, may look three remaining benchmarks, two use simulation products. Certainly,
impressive for software like the the MSAVE option, making them performance improvements will
ANSYS Mechanical product, which invalid with the GPU accelerator continue as GPUs become compu-
scales on multiple CPU cores, but it is capability, and one was too large to tationally more powerful and extend
not an accurate basis for comparison be run using the Tesla C2050. Results their functionality to other areas of
of performance. In the interest of demonstrate that using a GPU with A NSYS s o f t w a r e . A NSYS i s
accuracy, ANSYS performance bench- one or more CPU cores can lead to investigating the use of AMD/ATI GPU
marking was done on an HP® Z800 impressive speedups in number- cards to accelerate simulation. The
Workstation with 32 GB of RAM. The crunching equation solver kernels, company is also investigating the
process focused on using all four resulting in impressive acceleration potential for supporting multiple
cores of an Intel® Xeon® 5560 series for overall simulation times. Since GPUs and Distributed ANSYS. n

ANSYS Advantage • © 2010 Ansys, Inc. www.ansys.com

You might also like