JMAG Newsletter March, 2014Product Report

[Back]
JMAG Achieves HighSpeed Calculation

JSOL continues daily efforts to develop new technologies for further acceleration of the JMAG solver. This issue focuses on the performances of the JMAG high parallel solver and GPU solver boasting an exceptional level of calculation speed. This is a mustsee for those who wish to calculate largescale models quickly.

Introduction
Time allowed for electrical equipment design is becoming increasingly shorter every year. An important countermeasure to shorten design time is employment of Computer Aided Engineering (CAE). CAE related to the electromagnetic often uses Finite Element Analysis (FEA) that JMAG also employs, which leads to ultimate demands for eliminating FEA time to achieve shorter design periods.
JMAG also introduces a highspeed computation technology to reduce analysis time. This issue describes performances of the JMAG high parallel solver and GPU solver which have been newly implemented in JMAGDesigner Ver.13, and also explains precautions for using hardware to be selected.

JMAG High Parallel Solver
Up to now, JMAG users have used the SMP parallel solver that effectively utilizes multicore CPU (Central Processing Unit) in a computer.
However, many users have been expecting reduced calculation times by a solver with high parallelism since the SMP solver supported just 8 parallels at maximum.To satisfy such demand, JSOL developed the JMAG High Parallel Solver (hereafter called MPP Solver) with high parallelism to realize a highspeed computation via a cluster system connected to multiple computers (hereafter called nodes) in a highspeed network. This solver enables using multiple cores in a CPU as well as multiple CPUs in the cluster, which achieves a higher degree of parallelism in analysis and increases calculation speed.

How to Use JMAG MPP Solver
Before using the solver, selection of appropriate hardware and MPI (Message Passing Interface) settings are required in addition to JMAG settings. This section describes the setting method in JMAG, key factors when selecting hardware and supported OS and MPI.
Settings in JMAG
When you click on the solver tag in the JMAGDesigner magnetic field analysis study property, the solver control will appear. Set the parallelism degree by selecting [Distributed Memory Multiprocessing (DMP)] for the parallel calculation type (Fig.1). Then, run an analysis.
Note: Execute an analysis from your job scheduler or command line for JMAGDesigner Ver.13. We provide a sample shell for those who run from the command line. Starting an analysis from the JMAG Scheduler is also available in JMAGDesigner Ver.13.1 and later versions.
Fig. 1 MPP Solver Settings
License
To use this solver, a dedicated license (MPS license) is necessary instead of the conventional SMP parallel solver. When the degree of parallelism is less than or equal to 16, from 17 to 32 and 33 to 64, the solver uses 2, 3, and 4 licenses, respectively.When you are interested in greater parallelism, please contact us.
Key Factors for Selecting Hardware and Supported OS and MPI
The hardware you select is also vital to obtain high parallel performance with JMAG MPP solver.
First, select a CPU with high memory bus performance for each node, such as IntelR XeonR E5 Series or later versions. Filling all memory slots with physical memories also can enhance the hardware parallel performance (Fig. 2).
Supported OSs are shown below: It is possible to calculate using multiple cores in a node, but it is recommended you use multiple nodes for highparallel calculation. In such cases, use Infiniband for a network between nodes.
Supported OSs are shown below:All are 64bit OS.
 Windows
Microsoft Windows 7
Microsoft Windows HPC Server 2008 R2
 Linux
RedHat Enterprise Linux 5､ 6
Fig 2. Examples of Hardware Structure (Up: Good example of filling all memory slots Down: Bad example of not filling all memory slots

MPP Solver Performance
Calculation speed evaluation
This section describes effects of enhanced speed performance using the JMAG MPP solver. The following table shows specifications of hardware used in the test (Table 1).
Table 1 Hardware Specifications
CPU 
Intel® Xeon® E52670 
Clock frequency(GHz) 
2.6 
Number of cores / processor 
8 
Number of processors / node 
2 
Memory (GB) 
32 
Number of nodes 
16 
Network 
Infiniband (QDR) 
Transient Response Analysis of Embedded Type PM Motors
We ran a transient response analysis for one period of electric angle for a largescale 3D PM synchronous motor ( approx. 2.06 million elements).As a result, only 2.5 hours and 1 hour and 45 min. were necessary for 32 and 64 parallels, respectively (Fig. 4).They are 13 times and 20 times faster than conventional nonparallel computing.
The following figure shows a cogging torque history as calculation results (Fig. 5).The finding shows the same results were obtained in high parallel computing as in nonparallel computing.
Fig. 3 Embedded Type PM Motor Model
Fig. 4 Analysis Time (Embedded Type PM Motor)
Fig. 5 Cogging Torque
Bus Bar Frequency Response Analysis
A frequency response analysis was run for a largescale 3D bus bar (approx. 2.42 million elements) (Fig. 6).Nonparallel processing required approx. 60 minutes analysis time, but 32 and 64 threads needed approx. 6.4 min and 4.6 min, respectively (Fig. 7).The following figure shows current density distribution as a calculation result (Fig. 8).We obtained the same results both from highparallel and nonparallel processing.
Fig. 6 Bus bar Model
Fig. 7 Analysis Time (bus bar)
Fig. 8 Current Density Distribution

JMAG GPU Solver
In recent years, performance of GPU (Graphics Processing Units) has greatly improved. GPU overwhelmingly outnumbers CPU in terms of cores and effectiveness on parallel processing.
These days, GPU has been used as an arithmetic device for super computers as well as being used for conventional image processing because of its strength in parallel processing capability. GPU has attracted a lot of attention from the CAE field and GPGPU (Generalpurpose computing on graphics processing units) using GPU for general purposes, including math calculation, has been gaining popularity.We were early to spot GPGPU and have continued development since we first provided a GPU solver in 2012.

How to Use the JMAG GPU Solver
Settings in JMAG
JMAG GPU solver is easy to use. Clicking the [Solver] tag in [Study Properties] will display [Solver Control]. Then, only selecting the [Use GPU] checkbox enables you to use GPU.
MultiGPU
JMAG GPU solver also supports the multiple GPUs and accelerates calculation speed by using them simultaneously. If your machine has multiple GPUs for math calculation, you can specify the number of GPUs used simultaneously. For instance, to use two GPUs, select [Shared Memory Multiprocessing (SMP)] by pressing the parallel computing radio button and set "2" for the degree of parallelism (Fig. 9). Either 2, 4 or 8 GPUs can be used simultaneously.
Fig. 9 Setting GPU Solver
License
A license for Parallel Accelerator 2 (hereafter called PA2) needs to be installed to use the GPU solver. The number of necessary licenses is the GPU plus one additional license. For example, using one GPU requires two PA2 licenses. Using two GPUs requires three PA2 licenses.
Hardware Environment
JMAG GPU solver supports only GPU for math calculation made by NVIDIA. The following GPUs are currently supported:
 Tesla K40
 Quadro K6000
 Tesla K20
 Quadro 6000
 Tesla C2075
 Tesla C2070
 Tesla C2050
You are recommended to use the latest version of GPU as possible to better realize the effectiveness of the GPU solver.
Windows 64bit is the only platform supported by JMAG GPU solver.
Static and transient response magnetic field analyses are also supported.
Recommended Calculation Target
GPU and CPU continuously communicate in the JMAG GPU solver. Therefore, when a 2D or 3D mesh model as a calculation target has several tens of thousands of elements, communication between the GPUCPU can bottleneck as calculation times were not originally intended to be so long. When using the JMAG GPU solver, we highly recommend you use a mesh model as large as possible with more than 1 million elements.
JMAG GPU solver also uses special numerical solutions. Therefore, calculation of models including circuits may be poorly converged. Condition settings without using circuits such as using only current conditions are recommended.

GPU Solver Performance
Calculation speed evaluation
This section describes case studies evaluating JMAG GPU solver using NVIDIA's Tesla K40, the latest GPU for math calculation.
JMAG uses most of the calculation time for solutionfinding, that is, processing iterative solutions of the linear equation obtained in the finite element method. Especially when using a largescale mesh model with millions of elements, large proportion of the processing time is required for solutionfinding. JMAG GPU solver employs a technology to accelerate such a processing time using GPU. This section shows a comparison of processing times required for solutionfinding between the GPU solver and JMAG shared memory multiprocessing parallel solver. Hardware specifications of GPU and CPU used are shown below (Table 2).
Table 2 Hardware Specifications
Hardware 
CPU Intel® Xeon® X5670 
GPU NVIDIA® Tesla® K40 
Clock frequency(GHz) 
2.93 
0.745 
Number of cores 
12 (2CPU) 
2880 (1GPU) 
Memory (GB) 
24 
12 
Memory bandwidth(GB/s) 
32 
288 
Static Magnetic Field Analysis of Embedded Type Permanent Magnet Motors
The following figure shows the total processing time used in solutionfinding for each single analysis step when running a static magnetic field analysis in an embedded type PM motor model with 4 poles and 24 slots (Fig. 3, 1/8 partial model)(Fig. 10). This model has approx. 2 million elements. As a result, calculation completed in 14.4 min. when using a single GPU, while only 7.7 min. using two GPUs. The result shows that the use of one GPU board leads to approx. 30 times faster calculation time than a single CPU, and two GPUs shortens calculation time by approx. 1.9 times than using one GPU board.
Fig. 10 Analysis Time (Embedded Type PM Motors)
Linear Motor Static Magnetic Field Analysis
The following figure shows the total processing time used in solutionfinding for each analysis step when running a static magnetic field analysis in a linear motor model (Fig. 11, 1/2 partial model) (Fig. 12).This model has approx. 7.5 million elements.As a result, it took 7.5 min. when using a single GPU, while only 4.2 min. using two GPUs. The result shows that using one GPU board leads to approx. 20 times faster calculation time than a single CPU core, and two GPU boards shortens calculation time by approx. 1.8 times than using one GPU board.
Fig. 11 Linear Motor Model
Fig. 12 Analysis Time (Linear Motors)
Induction Motor Transient Response Magnetic Field Analysis
Finally, this section shows the total processing time used in solutionfinding for each single analysis step when running a static magnetic field analysis in an induction motor model with rotor skew (Fig. 13, 1/2 partial model) (Fig. 14).This model has approx. 9 million elements.The GPU memory for Tesla K40 has been boosted to 12 GB, which enables such a largescale calculation with a single GPU board.With a single GPU board 71.3 min. was required, while taking 34.5 min. as two GPU boards were used.Considering a single CPU core needs 965.3 min. (approx.16.1 hours) of calculation time, GPU calculation is extremely fast.The result shows that using a single GPU board leads to approx. 14 times faster calculation time than a single CPU core, and two GPU boards shortens calculation time by approx. 2.0 times than using one GPU board.
Fig. 13 Induction Motor Model with Rotor Skew
Fig. 14 Analysis Time (Induction Motors)

In closing
This issue described performances and how to use the JMAG MPP solver and GPU solver that reduce a great deal of calculation time.
In terms of the MPP solver, we continue to pursue a further improvement in the precision by shorting processing time in nonparalleled parts. GPU solver will support frequency response magnetic field analyses in the near future and cover wider range of analysis types.
Give the JMAG MPP solver a try and, at the same time, find out for yourself about the GPU solver performance.
(Masahiko Miwa, Kazuki Senba)

Contents
1. Implementing JMAG  John Deere Virtual Prototyping in Developing Next Generation Heavy Equipment 
2. Product Report  JMAG Achieves HighSpeed Calculation 
3. Product Report  PM Motor Design in JMAGExpress 
4. Motor Design Course  Issue 3 How to Perform Detailed Motor Design 
5. Solutions  Issue 1 Thermal Analysis Solutions 
6. JSOL Activity Report  Recent initiations with iron loss analysis 
7. Fully Mastering JMAG  Common Questions for JMAG 
8. JMAG Product Partner Introduction  MapleSoft 
9. Event Information



