Combustion and Flame, Vol.162, No.4, 1375-1394, 2015
A methodology for the integration of stiff chemical kinetics on GPUs
Numerical schemes for reacting flows typically invoke the method of fractional steps in order to isolate the chemical kinetics model from diffusion/convection phenomena. Here, the reaction fractional step requires the solution of a collection of independent ODE systems which may be severely stiff. Recently, researchers have begun to explore the highly parallel structure of graphics processing units (GPUs) in order to accelerate integration schemes for these ODE systems. However, much of the existing work concentrates on explicit integration algorithms which may fall short in the presence of stiffness. In this light, we have carefully reimplemented in OpenCL C the Fortran 77 program of the 3-stage/5th order implicit Runge-Kutta method Radau5 by Hairer and Wanner (1991) and tested it extensively in the context of a transient equilibrium scheme for the flamelet model. Our implementation can easily be integrated with any existing reactive flow software in order to solve the reaction fractional step on an OpenCL-enabled GPU. Moreover, it is suited for any Chemkin-format reaction mechanism with less than or similar to 200 species without incurring a loss in occupancy and it reaches its limit speedup (which is largely independent of the mechanism size) at a small problem size (approximate to 500 ODE systems). In view of memory constraints, we include an optimized scheme for splitting the ODE systems across several kernel invocations and overlapping the kernel execution with data transfers. An in-depth evaluation is based upon runtime measurements of the CPU and the CPU implementation on a user level and a high-end CPU/GPU for an increasing number of ODE systems, reduced and detailed reaction mechanisms and a range of time step sizes. (C) 2014 The Combustion Institute. Published by Elsevier Inc. All rights reserved.