Ansys and Ibm: Optimized Structural Mechanics Simulations

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

IBM ISV & Developer Relations Manufacturing

Solution Brief

ANSYS and IBM:


optimized structural
mechanics simulations
Improving product development with HPC-capable
structural mechanics software driven by innovative
hardware systems, clusters, storage and software

Solution Overview
Highlights: All manufacturers face enormous pressures to make products that are
stronger and last longer — while simultaneously reducing cost, increasing
• Trusted ANSYS structural mechanics
innovation and shortening the time-frame for development. To address
family of solutions solves complex
structural problems with speed and ease these demands, manufacturers need engineering simulation solutions
that allow users to design and verify products in a virtual, risk-free
• High-performance systems, clusters,
environment, minimizing the need for physical prototypes and tests.
storage and software from IBM provide
the reliable platform required to run
ANSYS structural mechanics software ANSYS and IBM are solving this problem with a combination
with optimal efficiency of ANSYS structural mechanics software and a complete portfolio
• End-to-end solutions with end-to-end of ultra-reliable, high-performance systems, clusters, storage and
support from IBM ensure reliability cluster management software from IBM.

ANSYS structural mechanics software offers a comprehensive solution


for linear or nonlinear and dynamics analysis. It provides a complete set
of elements behavior, material models and equation solvers for a wide
range of engineering problems. In addition, ANSYS solutions offer
thermal analysis and coupled-physics capabilities involving acoustic,
piezoelectric, thermal-structural and thermoelectric analysis.

Across all industries, simulations now require increasingly large models.


To this end, ANSYS structural solutions incorporate parallel algorithms
for faster computation. The entire solution phase runs in parallel, using
shared and distributed memory processing. ANSYS also offers unique
solutions based on Graphics Processing Unit (GPU) technology.
The combination of parallel computing and performance via GPU
can even further accelerate simulations.

As a result, IT requirements for ANSYS structural mechanics software


are considerable. These requirements can be grouped into three
categories: systems (computation (CPU and GPU) and memory),
network communication (when problems are solved with multiple
nodes) and storage management (I/O, when problems are solved in
out-of-core mode; and data and collaboration, when large number
of users collaborate frequently to use a large number of models).
Of course, ANSYS users also need an infrastructure that makes
optimal use of available IT resources.
IBM ISV & Developer Relations Manufacturing
Solution Brief

Solution benefits GPU acceleration


ANSYS software includes highly capable structural mechanics ANSYS and NVIDIA have collaborated on GPU-based
solutions that enable efficient computation of large problems, HPC solutions since December 2010. Certain computations
using models that accurately simulate the materials, structures — such as sparse matrix operations found in ANSYS
and behaviors of complex designs. Combined with the power, structural mechanics solutions — can be accelerated with
performance and reliability of IBM solutions for HPC, GPU technology. This includes the use of the direct sparse
ANSYS structural mechanics software provides everything solver and the iterative PCG solver.
manufacturers need to improve the performance and integrity
of new designs and bring them to market faster and with
ANSYS Mechanical 14.0 Performance on GPU-based Systems
lower costs. Benchmark: V14sp-5, 2.1M DOF non-linear static analysis
System: IBM dx360 M3, Intel Xeon x5650, 2.6 GHz, 6-core
4X QDR Infiniband
Advantages of the IBM and ANSYS solution for structural GPU: NVIDIA M2090, CUDA 4.1
mechanics include:
2.5
•• High performance. The Intelligent Cluster is an ideal

Speed-up relative to 8-core (CPU)


solution for heavy-duty compute workloads, such as model
2.0
structures that typically include several million degrees of
freedom. With the Intelligent Cluster’s broad range of server 1.5
platforms, processor choices, accelerator options, robust
storage solutions, networking/communications fabrics and 1.0

operating systems, users can configure a solution that delivers


speed and performance parameters that are closely aligned 0.5

with specific engineering design applications.


0.0
•• Lower operating costs. Components in the Intelligent
8 cores 8 cores
Cluster are designed to deliver maximum computing power (CPU only) (CPU with GPU)
(1-node) (1-node)
with the most compact, energy-efficient footprint possible.
System management
This means users can right-size the data center for any
design lab, minimizing power consumption and cooling Figure 1: ANSYS Mechanical Performance on GPU-based Systems
demands without compromising the performance of
ANSYS software.
•• Easier deployment. User-friendly ANSYS software makes As shown in Figure 1, sharing computational work among
it easy for new users to reach top productivity more quickly. the CPU-GPU combination can produce productivity
Behind the scenes, IBM Intelligent Cluster solutions are improvements two times greater than with the CPU only.
shipped having been thoroughly tested, assembled, cabled This acceleration can significantly improve the utilization
and prepared for rapid deployment. of existing ANSYS licenses and CPU-based hardware. Please
note that these gains are possible only if there is enough work
High-performance computing to keep a GPU busy (e.g., solid FE models that exceed 500K
DOF). In other words, not all engineering models will benefit
solutions from IBM
from GPU technology.
Systems
It is important to select the right processor and memory so The GPU acceleration option requires an incremental
that ANSYS software can operate efficiently. For computation, investment for each compute node. However, this investment
Intel Xeon E5-2600 series (or E5-4600 series), 8-core and is significantly less than the total cost of ownership (TCO)
2.6 GHz or faster processors are recommended. Sufficient of additional CPU hardware and ANSYS software required
memory using the latest Direct In-line Memory Module to match the resulting performance gains.
(DIMM) technology, which offers speeds up to 1600 MHz,
should be configured so that structural mechanics problems
are solved in-core. This eliminates the risk of bottlenecks
due to slow I/O.

2
IBM ISV & Developer Relations Manufacturing
Solution Brief

Clusters and scalability Storage management


When one system (node) is not sufficient to solve a structural ANSYS structural mechanics solutions rely on efficient
mechanics problem, multiple nodes are connected with storage management. Requirements for efficient storage
a communication network so that a single problem can be management arise during the solver phase and during the
run in parallel. In this situation, the communication delay simulation data management phase.
(latency) and rate of communication among systems
(bandwidth) will affect performance significantly. Solver Phase
During the solver phase (especially when the problem is
IBM server products support industry-leading InfiniBand solved out-of-core), ANSYS structural mechanics solutions
switch modules, offering an easy way to manage high- require high-speed storage to read and write scratch storage.
performance InfiniBand networking capabilities for IBM The IBM storage portfolio includes a vast array of nearline
server systems. The IBM HPC portfolio also offers switch SAS drives (with speeds of 10K rpm and 15K rpm) and SSD
modules from Mellanox and Intel. As indicated in Figure 2, drives (of varying speeds and capacities) to address the scratch
ANSYS Mechanical scales well up to three nodes connected I/O requirements for each task. Some server products are
with a QDR InfiniBand network. designed to offer more internal disks than others.

Simulation Data Management


Scalability Performance of ANSYS Mechanical 14.0 I/O requirements during the simulation data management
Benchmark: V14sp-5, 2.1M DOF non-linear static analysis phase are similar to those in traditional database-centric
System: IBM x3550 M4, Intel Xeon ES-2670, 2.6 GHz, 8-core
4X QDR Infiniband processing. They use popular databases products, such as
the container of simulation data used by ANSYS structural
4.0 mechanics software. To meet these requirements, there are
two general approaches that vary in complexity and capability
Speed-up relative to 8-core run

3.5
an entry-level file server with a large internal storage system;
3.0
and the IBM Storwize V7000 Unified.
2.5

2.0 Entry-level file server with internal storage system


1.5 This is a simple, economical approach that uses a System
x3650 M4 as an NFS file server. The x3650 M4 system
1.0
contains up to 16 internal 1 TB 7.2K RPM nearline SAS
0.5
drives with RAID6 configuration. The file system is mounted
0.0
over the fastest network (Gigabit Ethernet or InfiniBand)
8 cores 16 cores 32 cores 48 cores
(1-node) (1-node) (2-node) (3-node) provided in the cluster. Relevant features include:
Number of cores assigned to a single job
File Server X3650 M4
Figure 2: Scalability Performance of ANSYS Mechanical
Disks 16 2.5” 1 TB nearline SAS 7.2K
RPM = 16 TB with RAID6
Host Attachment 6 Gbps SAS
Network (used by the file system) GbE or IP over InfiniBand

3
IBM ISV & Developer Relations Manufacturing
Solution Brief

IBM Storwize V7000 Unified Sample configurations


The IBM Storwize V7000 Unified storage system can There is an IBM solution for each ANSYS workload.
combine block and file storage into a single system for The following three sample configurations offer increasing
simplified management and lower cost. File modules are levels of complexity and capability. They are:
packaged in a 2U rack-mountable enclosure and provide •• 2-socket-based system without GPU
attachment to 1 Gbps and 10 Gbps NAS environments. •• 2-socket-based system with GPU
For block storage, I/O operations between hosts and
Storwize V7000 nodes are performed using fiber channel Type 2-socket-based system without GPU
connectivity. Relevant features include: Usage Several simultaneous single-node (max. 16 cores)
jobs, each up to 10M DOF
Number of Disk Enclosures Up to 10
Size of Each Enclosure 24 2.5” 1 TB nearline SAS 7.2K and/or multi-node (max. 64 cores) jobs, each up to
RPM = 24 TB 30M DOF

Total Disk Capacity of the System 240 TB System PureFlex 10U chassis with x240 servers

Host Attachment – File Storage 1 Gbps and 10 Gbps Ethernet Cluster size Up to 14 x240 servers
Host Attachment – Block Storage SAN-attached 8 Gbps fiber Processor Intel Xeon E5-2670 2.6 GHz, 8 cores/processor
channel (FC) Server Each x240 server has two processors (16 cores)
Memory Up to 256 GB for in-core; 64 GB for out-of-core
Resource management processing
Once the cluster is obtained, it is important to deploy and
Local scratch 1 nearline SAS HDD 10K RPM 600 GB for in-core;
manage resources efficiently. storage
1 to 2 SATA SSD Drives or 2 nearline SAS HDD 10K
IBM Platform HPC is a complete high performance RPM 900 GB for out-of-core
computing (HPC) management solution in a single product. Network GbE for Cluster Management;
It includes a rich set of out-of-the-box features that empower
high performance technical computing users by reducing 4X QDR InfiniBand when a single job is run over
the complexity of the HPC environment and improving multiple nodes

the time-to-solution. GPU No


OS RedHat, SuSe, Windows HPC
When integrated with the IBM System x solution, File system Optional, with 16 1 TB nearline SAS Drives with
IBM Storwize V7000 Unified delivers simplified cluster RAID6 in a separate X3650 M4 server and
deployment and management, quicker results and improved connected to the computer nodes over GbE
productivity. Platform HPC simplifies the process of deploying Job Platform LSF for LINUX; Microsoft HPC Scheduler
management for Windows
ANSYS engineering tools with its library of extensible,
pre-integrated application templates. Unlike other HPC Cluster Platform HPC for LINUX
management
cluster solutions (which are collections of open-source and
third-party tools), Platform HPC is a complete solution that is
Note: An alternative is x3550 M4 rack servers in a 25U rack
fully integrated, tested and supported with ANSYS application
if more than two drives are required for local storage.
environments. From cluster provisioning and management
to workload management and monitoring, all the functions
required to operate and use the cluster are provided and
automatically configured. The workload management solution
(based on IBM Platform LSF, the industry’s powerful policy
driven workload manager) improves productivity, utilization
and throughput with minimal administrative effort. Platform
HPC makes it easy to take immediate advantage of the
exceptional performance provided by GPUs. Platform
HPC also includes multiple pre-configured, ready-for-use
MPI implementations that are compatible with the choice
of interconnect technology.
4
IBM ISV & Developer Relations Manufacturing
Solution Brief

Type 2-socket-based system with GPU Processor


Usage Several simultaneous single-node (max. 16 cores)
•• 2-socket systems: Xeon E5-2670 2.6 GHz 8 Core
jobs, each up to 10M DOF
Memory
System IBM System x 25U Rack with dx360 M4 servers
•• Allocating sufficient memory to solve in-core improves
Cluster size Up to 10 dx360 M4 CPU+GPU servers
performance significantly and should be considered first
Processor Intel Xeon E5-2670 2.6 GHz, 8 cores/processor
before adding other resources such as more cores or GPUs
Server Each dx360 M4 server has two processors (16 •• Use dual-rank memory modules with 1600 MHz speed
cores)
•• Use the same size DIMM
Memory 128 GB for in-core
•• Populate all memory channels with equal amounts
Local scratch 1 nearline SAS HDD 10K RPM 600 GB for in-core of memory
storage
–– 2-socket system has 8 channels
Network GbE for Cluster Management
•• Populate the memory slots in each channel in this order:
GPU NVIDIA Tesla M2090 GPU in each server –– First slots in all memory channels
OS RedHat, SuSe, Windows HPC –– Second slots in all memory channels
File system Optional, with 16 1 TB nearline SAS Drives
with RAID6 in a separate X3650 M4 server Recommended Memory Configurations
and connected to the computer nodes over GbE
Job Platform LSF for LINUX; Microsoft HPC Scheduler Total memory per node 2-socket systems
management for Windows
64 GB 8 x 8 GB DIMMs
Cluster Platform HPC for LINUX
128 GB 16 x 8 GB DIMMs
management
256 GB 16 x 16 GB DIMMs

Best practices GPU Accelerators


Choosing the right HPC resources requires an understanding
•• Sparse and PCG solvers with dense, block-like models
of how key technologies determine application performance.
should benefit from GPU
In this section, ANSYS and IBM share conclusions about
•• For PCG solver, disable the memory-saving option MSAV
how to optimize the performance of ANSYS structural
for GPU use
mechanics software on available hardware technologies.
•• For Sparse solver, allocate sufficient memory to solve in-core
These recommendations are only guidelines. Please contact
and avoid inefficient utilization of ANSYS due to I/O that
your IBM sales representative or IBM Authorized Business
can affect both CPU-only and CPU-GPU use
Partner to evaluate your specific requirements and design
•• Where appropriate, explore usage of GPUs before adding
a total solution that best meets the needs of your
more servers
ANSYS application.
•• IBM systems that are enabled for GPU usage are:
–– IBM System dx360 M4 with up to two NVIDIA
Systems
GPUs M2090
•• 2-socket-based systems
–– With GPU support
Cluster Interconnect
˚˚ dx360 M4 •• No high-speed communication network is needed for jobs
–– Without GPU support
that run in a single node
˚˚ PureFlex X240 –– Gigabit Ethernet is sufficient to manage the cluster
˚˚ x3550 M4 •• High-speed network is needed when a single job is run on
multiple nodes
–– 4X QDR InfiniBand is recommended

5
IBM ISV & Developer Relations Manufacturing
Solution Brief

Data/Disk Management IBM System x Product Portfolio for


•• If the model is solved out-of-core due to insufficient memory ANSYS Customers
resulting from cost/configuration restrictions, consider: Because ANSYS software is IT resource-intensive, it requires
–– Adding 2 to 4 SSD disks with RAID0 level to each node innovative HPC platforms to maximize efficiency and deliver
–– Adding 4 to 8 10K RPM SAS disks with RAID0 level to the results engineers expect. IBM offers an end-to-end
each node portfolio of HPC systems, storage and cluster management
•• For better performance, use a local file system for I/O during software to satisfy all of these requirements. Plus, IBM
the solver phase provides end-to-end support and services for all of these
•• For simulation data management, consider: offerings. By partnering with IBM, ANSYS users can choose
–– Entry-level X3650 M4-based file server with 16 internal a single, reliable source to implement world-class structural
1 TB SAS drives for small clusters mechanics solutions that optimize product integrity.
–– IBM V7000 Unified for more complex and high-volume
simulation model data management The centerpiece of the IBM portfolio is IBM Intelligent
Cluster™, a factory-integrated and tested solution built on
Job Management
IBM PureFlex®, IBM System x iDataPlex® or IBM System x®
•• For Linux Systems:
rack servers. Each of these products is built using the latest
–– Use Platform LSF
Intel Xeon E5 processor technology and the latest DIMM
•• For Windows Systems:
technology, which offers speeds up to 1600 MHz.
–– Use Microsoft HPC Job Scheduler

Cluster Management
•• Platform HPC suite is recommended for the overall
management of cluster resources, such as node installations,
user management and system updates

IBM technical computing portfolio for computer aided engineering (CAE)


Powerful. Comprehensive. Intuitive.

Integrated Industry Intelligent HOC


Solutions solutions solutions cluster cloud

Platform LSF Platform HPC


Software Platform MPI GPFS
Platform cluster Platform
manager application center

Systems V7K Unified


and iDataPlex PureFlex System x BladeCenter DS3500 DCS3700
storage SONAS

6
IBM ISV & Developer Relations Manufacturing
Solution Brief

IBM PureFlex IBM System x rack servers


PureFlex, the most recent addition to the IBM server IBM System x rack servers include x3550 M4. Relevant
portfolio, is the 10U chassis complete with integrated servers, features include:
storage and networking. The enhanced thermal packaging
in PureFlex allows processors to operate at faster clock X3550 M4
speeds and the nodes to accommodate a larger memory Density Up to 20 servers in a 25U System x
footprint. In addition, PureFlex accommodates several Standard rack
networking switches to integrate into the data center. Processor Intel Xeon E5-2600 - 2-socket
Each chassis can accommodate up to 14 2-socket servers. (max. clock of 2.9 GHz)
Relevant features include: Max. Cores/Server 16
GPU No
X240 Max. Memory Speed 1600 MHz
Density Up to 14 servers in a PureFlex Chassis Max. Memory at Max. 256 GB
Processor Intel Xeon E5-2600, 8-core, 2-socket Memory Speed
(max. clock of 2.9 GHz) Network GbE, 10 GbE, InfiniBand 4X QDR
Max. Cores/Server 16 Max. SSD Drives 8 – 2.5” SATA
GPU Support No Max. Hard Disk Drives 8 – 2.5” SAS
Max. Memory Speed 1600 MHz
Max. Memory at Max. 256 GB ANSYS and IBM: high confidence
Memory Speed With innovative ANSYS software driven by high-performance
Network GbE, 10 GbE, 4X QDR InfiniBand IBM hardware platforms, the joint value of this solution
Max. SSD Drives 2 – 2.5” SATA for design engineers and analysts is considerable. It brings
Max. Hard Disk Drives 2 – 2.5” SAS (10K RPM, 15K RPM) together a leader in structural mechanics with a proven
provider of the solutions required to manage highly
IBM System x iDataPlex specialized computing workloads.
The IBM System x iDataPlex system offers the ultimate in
processor density. Each one offers 84U space, accommodating ANSYS
up to 84 dx360 M4 (IU) servers. The dx360 M4 server can ANSYS is a trusted supplier of engineering simulation
also support standard 25U and 42U racks. The dx360 M4 software that enables product development organizations
product offers a choice of NVIDIA Tesla M2075 or M2090 to confidently predict how their products will operate in the
GPU built around NVIDIA GPU technology. Relevant real world. To add even more value, ANSYS partners with
features include: IBM to ensure that users get the coordinated, expert support
needed at all phases of HPC deployment. From system
dx360 M4 specification to installation, tuning, troubleshooting and
Density Up to 84 servers in a iDataPlex rack, or 25 maintenance, ANSYS and IBM can help minimize risk
servers in a 25U System x Standard Rack and increase productivity.
Processor Intel Xeon E5-2600 – 2-socket
(max. clock of 2.7 GHz) IBM
Max. Cores/Server 16 Companies can rely on IBM as a single, expert supplier for
the entire technical computing system — including servers,
GPU Up to 2 NVIDIA Tesla M2090 cards
software, storage and networking. A single point of contact
Max. Memory Speed 1600 MHz
makes it easier for companies to select and purchase a solution
Max. Memory at Max. 128 GB
that meets specific needs and delivers immediate business
Memory Speed
benefits. IBM knows technical computing from decades of
Network GbE, 10 GbE, InfiniBand 4X QDR
experience, and offers services, training and financing, along
Max. SSD Drives 2 – 2.5” SATA (4 drives when a GPU is
with a large network of IBM Business Partners that are ready
added)
to assist.
Max. Hard Disk Drives 2 – 2.5” SAS (10K RPM, 15K RPM)
(4 drives when a GPU is added)

7
For more information
To learn more about ANSYS, please visit: www.ansys.com

To learn more about the IBM technical computing portfolio,


contact your IBM sales representative or an IBM Authorized
Business Partner or visit: ibm.com/technicalcomputing © Copyright IBM Corporation 2011

IBM
To learn more about NVIDIA’s GPU computing solutions Route 100
Somers, NY 10589
for ANSYS, please visit: www.nvidia.com/ansys
U.S.A.

To learn more about Mellanox InfiniBand Solutions Produced in the United States of America
October 2012
for IBM Servers, please visit: All Rights Reserved
mellanox.com/content/pages.php?pg=ibm&menu_section=54
IBM, the IBM logo, ibm.com, BladeCenter, iDataPlex, Intelligent
Cluster, PureFlex and System x are trademarks or registered trademarks
To learn more about Intel InfiniBand Solutions, please visit: of International Business Machines Corporation in the United States,
www.intel.com/content/www/us/en/infiniband/truescale- other countries, or both. If these and other IBM trademarked terms are
marked on their first occurrence in this information with a trademark
infiniband.html
symbol (® or ™), these symbols indicate U.S. registered or common law
trademarks owned by IBM at the time this information was published.
Such trademarks may also be registered or common law trademarks
in other countries. A current list of IBM trademarks is available on
the Web at “Copyright and trademark information” at:
ibm.com/legal/copytrade.shtml

Other product, company or service names may be trademarks or service


marks of others.

References in this publication to IBM products or services do not


imply that IBM intends to make them available in all countries in
which IBM operates.

Please Recycle

TSS03116-USEN-00

You might also like