JMAG Newsletter March, 2014Product Report

[Back]

JMAG Achieves High-Speed Calculation

JSOL continues daily efforts to develop new technologies for further acceleration of the JMAG solver.
This issue focuses on the performances of the JMAG high parallel solver and GPU solver boasting an exceptional level of calculation speed. This is a must-see for those who wish to calculate large-scale models quickly.

Introduction

Time allowed for electrical equipment design is becoming increasingly shorter every year. An important countermeasure to shorten design time is employment of Computer Aided Engineering (CAE). CAE related to the electromagnetic often uses Finite Element Analysis (FEA) that JMAG also employs, which leads to ultimate demands for eliminating FEA time to achieve shorter design periods.
JMAG also introduces a high-speed computation technology to reduce analysis time. This issue describes performances of the JMAG high parallel solver and GPU solver which have been newly implemented in JMAG-Designer Ver.13, and also explains precautions for using hardware to be selected.

JMAG High Parallel Solver

Up to now, JMAG users have used the SMP parallel solver that effectively utilizes multi-core CPU (Central Processing Unit) in a computer.
However, many users have been expecting reduced calculation times by a solver with high parallelism since the SMP solver supported just 8 parallels at maximum.To satisfy such demand, JSOL developed the JMAG High Parallel Solver (hereafter called MPP Solver) with high parallelism to realize a high-speed computation via a cluster system connected to multiple computers (hereafter called nodes) in a high-speed network. This solver enables using multiple cores in a CPU as well as multiple CPUs in the cluster, which achieves a higher degree of parallelism in analysis and increases calculation speed.

How to Use JMAG MPP Solver

Before using the solver, selection of appropriate hardware and MPI (Message Passing Interface) settings are required in addition to JMAG settings. This section describes the setting method in JMAG, key factors when selecting hardware and supported OS and MPI.

Settings in JMAG

When you click on the solver tag in the JMAG-Designer magnetic field analysis study property, the solver control will appear. Set the parallelism degree by selecting [Distributed Memory Multiprocessing (DMP)] for the parallel calculation type (Fig.1). Then, run an analysis.

Note: Execute an analysis from your job scheduler or command line for JMAG-Designer Ver.13. We provide a sample shell for those who run from the command line.
Starting an analysis from the JMAG Scheduler is also available in JMAG-Designer Ver.13.1 and later versions.

Fig. 1 MPP Solver Settings
Fig. 1 MPP Solver Settings

License

To use this solver, a dedicated license (MPS license) is necessary instead of the conventional SMP parallel solver. When the degree of parallelism is less than or equal to 16, from 17 to 32 and 33 to 64, the solver uses 2, 3, and 4 licenses, respectively.When you are interested in greater parallelism, please contact us.

Key Factors for Selecting Hardware and Supported OS and MPI

The hardware you select is also vital to obtain high parallel performance with JMAG MPP solver.
First, select a CPU with high memory bus performance for each node, such as IntelR XeonR E5 Series or later versions. Filling all memory slots with physical memories also can enhance the hardware parallel performance (Fig. 2).
Supported OSs are shown below: It is possible to calculate using multiple cores in a node, but it is recommended you use multiple nodes for high-parallel calculation. In such cases, use Infiniband for a network between nodes.
Supported OSs are shown below:All are 64-bit OS.

  • Windows
    Microsoft Windows 7
    Microsoft Windows HPC Server 2008 R2
  • Linux
    RedHat Enterprise Linux 5、 6

Fig 2. Examples of Hardware Structure
Fig 2. Examples of Hardware Structure
(Up: Good example of filling all memory slots
Down: Bad example of not filling all memory slots

MPP Solver Performance

Calculation speed evaluation

This section describes effects of enhanced speed performance using the JMAG MPP solver. The following table shows specifications of hardware used in the test (Table 1).

Table 1 Hardware Specifications
CPU Intel® Xeon® E5-2670
Clock frequency(GHz) 2.6
Number of cores / processor 8
Number of processors / node 2
Memory (GB) 32
Number of nodes 16
Network Infiniband (QDR)

Transient Response Analysis of Embedded Type PM Motors

We ran a transient response analysis for one period of electric angle for a large-scale 3D PM synchronous motor ( approx. 2.06 million elements).As a result, only 2.5 hours and 1 hour and 45 min. were necessary for 32 and 64 parallels, respectively (Fig. 4).They are 13 times and 20 times faster than conventional non-parallel computing.
The following figure shows a cogging torque history as calculation results (Fig. 5).The finding shows the same results were obtained in high parallel computing as in non-parallel computing.

Fig. 3 Embedded Type PM Motor Model
Fig. 3 Embedded Type PM Motor Model

Fig. 4 Analysis Time (Embedded Type PM Motor)
Fig. 4 Analysis Time (Embedded Type PM Motor)

Fig. 5 Cogging Torque
Fig. 5 Cogging Torque

Bus Bar Frequency Response Analysis

A frequency response analysis was run for a large-scale 3D bus bar (approx. 2.42 million elements) (Fig. 6).Non-parallel processing required approx. 60 minutes analysis time, but 32 and 64 threads needed approx. 6.4 min and 4.6 min, respectively (Fig. 7).The following figure shows current density distribution as a calculation result (Fig. 8).We obtained the same results both from high-parallel and non-parallel processing.

Fig. 6 Bus bar Model
Fig. 6 Bus bar Model

Fig. 7 Analysis Time (bus bar)
Fig. 7 Analysis Time (bus bar)

Fig. 8 Current Density Distribution
Fig. 8 Current Density Distribution

JMAG GPU Solver

In recent years, performance of GPU (Graphics Processing Units) has greatly improved. GPU overwhelmingly outnumbers CPU in terms of cores and effectiveness on parallel processing.
These days, GPU has been used as an arithmetic device for super computers as well as being used for conventional image processing because of its strength in parallel processing capability. GPU has attracted a lot of attention from the CAE field and GPGPU (General-purpose computing on graphics processing units) using GPU for general purposes, including math calculation, has been gaining popularity.We were early to spot GPGPU and have continued development since we first provided a GPU solver in 2012.

How to Use the JMAG GPU Solver

Settings in JMAG

JMAG GPU solver is easy to use. Clicking the [Solver] tag in [Study Properties] will display [Solver Control]. Then, only selecting the [Use GPU] checkbox enables you to use GPU.

Multi-GPU

JMAG GPU solver also supports the multiple GPUs and accelerates calculation speed by using them simultaneously. If your machine has multiple GPUs for math calculation, you can specify the number of GPUs used simultaneously. For instance, to use two GPUs, select [Shared Memory Multiprocessing (SMP)] by pressing the parallel computing radio button and set "2" for the degree of parallelism (Fig. 9). Either 2, 4 or 8 GPUs can be used simultaneously.

Fig. 9 Setting GPU Solver
Fig. 9 Setting GPU Solver

License

A license for Parallel Accelerator 2 (hereafter called PA2) needs to be installed to use the GPU solver. The number of necessary licenses is the GPU plus one additional license. For example, using one GPU requires two PA2 licenses. Using two GPUs requires three PA2 licenses.

Hardware Environment

JMAG GPU solver supports only GPU for math calculation made by NVIDIA. The following GPUs are currently supported:

  1. Tesla K40
  2. Quadro K6000
  3. Tesla K20
  4. Quadro 6000
  5. Tesla C2075
  6. Tesla C2070
  7. Tesla C2050

You are recommended to use the latest version of GPU as possible to better realize the effectiveness of the GPU solver.
Windows 64-bit is the only platform supported by JMAG GPU solver.
Static and transient response magnetic field analyses are also supported.

Recommended Calculation Target

GPU and CPU continuously communicate in the JMAG GPU solver. Therefore, when a 2D or 3D mesh model as a calculation target has several tens of thousands of elements, communication between the GPU-CPU can bottleneck as calculation times were not originally intended to be so long. When using the JMAG GPU solver, we highly recommend you use a mesh model as large as possible with more than 1 million elements.
JMAG GPU solver also uses special numerical solutions. Therefore, calculation of models including circuits may be poorly converged. Condition settings without using circuits such as using only current conditions are recommended.

GPU Solver Performance

Calculation speed evaluation

This section describes case studies evaluating JMAG GPU solver using NVIDIA's Tesla K40, the latest GPU for math calculation.
JMAG uses most of the calculation time for solution-finding, that is, processing iterative solutions of the linear equation obtained in the finite element method. Especially when using a large-scale mesh model with millions of elements, large proportion of the processing time is required for solution-finding. JMAG GPU solver employs a technology to accelerate such a processing time using GPU. This section shows a comparison of processing times required for solution-finding between the GPU solver and JMAG shared memory multiprocessing parallel solver. Hardware specifications of GPU and CPU used are shown below (Table 2).

Table 2 Hardware Specifications
Hardware CPU Intel® Xeon® X5670 GPU NVIDIA® Tesla® K40
Clock frequency(GHz) 2.93 0.745
Number of cores 12 (2CPU) 2880 (1GPU)
Memory (GB) 24 12
Memory bandwidth(GB/s) 32 288

Static Magnetic Field Analysis of Embedded Type Permanent Magnet Motors

The following figure shows the total processing time used in solution-finding for each single analysis step when running a static magnetic field analysis in an embedded type PM motor model with 4 poles and 24 slots (Fig. 3, 1/8 partial model)(Fig. 10). This model has approx. 2 million elements. As a result, calculation completed in 14.4 min. when using a single GPU, while only 7.7 min. using two GPUs. The result shows that the use of one GPU board leads to approx. 30 times faster calculation time than a single CPU, and two GPUs shortens calculation time by approx. 1.9 times than using one GPU board.

Fig. 10 Analysis Time (Embedded Type PM Motors)
Fig. 10 Analysis Time (Embedded Type PM Motors)

Linear Motor Static Magnetic Field Analysis

The following figure shows the total processing time used in solution-finding for each analysis step when running a static magnetic field analysis in a linear motor model (Fig. 11, 1/2 partial model) (Fig. 12).This model has approx. 7.5 million elements.As a result, it took 7.5 min. when using a single GPU, while only 4.2 min. using two GPUs. The result shows that using one GPU board leads to approx. 20 times faster calculation time than a single CPU core, and two GPU boards shortens calculation time by approx. 1.8 times than using one GPU board.

Fig. 11 Linear Motor Model
Fig. 11 Linear Motor Model

Fig. 12 Analysis Time (Linear Motors)
Fig. 12 Analysis Time (Linear Motors)

Induction Motor Transient Response Magnetic Field Analysis

Finally, this section shows the total processing time used in solution-finding for each single analysis step when running a static magnetic field analysis in an induction motor model with rotor skew (Fig. 13, 1/2 partial model) (Fig. 14).This model has approx. 9 million elements.The GPU memory for Tesla K40 has been boosted to 12 GB, which enables such a large-scale calculation with a single GPU board.With a single GPU board 71.3 min. was required, while taking 34.5 min. as two GPU boards were used.Considering a single CPU core needs 965.3 min. (approx.16.1 hours) of calculation time, GPU calculation is extremely fast.The result shows that using a single GPU board leads to approx. 14 times faster calculation time than a single CPU core, and two GPU boards shortens calculation time by approx. 2.0 times than using one GPU board.

Fig. 13 Induction Motor Model with Rotor Skew
Fig. 13 Induction Motor Model with Rotor Skew

Fig. 14 Analysis Time (Induction Motors)
Fig. 14 Analysis Time (Induction Motors)

In closing

This issue described performances and how to use the JMAG MPP solver and GPU solver that reduce a great deal of calculation time.
In terms of the MPP solver, we continue to pursue a further improvement in the precision by shorting processing time in non-paralleled parts. GPU solver will support frequency response magnetic field analyses in the near future and cover wider range of analysis types.
Give the JMAG MPP solver a try and, at the same time, find out for yourself about the GPU solver performance.

(Masahiko Miwa, Kazuki Senba)

Contents

  1. Implementing JMAG   - John Deere   Virtual Prototyping in Developing Next Generation Heavy Equipment -
  2. Product Report   - JMAG Achieves High-Speed Calculation -
  3. Product Report   - PM Motor Design in JMAG-Express -
  4. Motor Design Course   - Issue 3 How to Perform Detailed Motor Design -
  5. Solutions   - Issue 1 Thermal Analysis Solutions -
  6. JSOL Activity Report   - Recent initiations with iron loss analysis -
  7. Fully Mastering JMAG   - Common Questions for JMAG -
  8. JMAG Product Partner Introduction   - MapleSoft -
  9. Event Information


Top of Page

Contact US

Free Trial

Latest Issue
NewsLetter
January, 2016
Back Issue
Back Issue