CUSF
CUSF is a math library written in CUDA that provides a subset of mathematical special functions.
Python interface for torch
(torch::Tensor
) is supported.
Installation
You must have torch
and scipy
installed in your current (virtual) python environment.
git clone https://gitlab.gbar.dtu.dk/hhbs/cusf.git
cd cusf
make
Running the examples
python3 example_bessel.py
python3 example_erf.py
python3 example_hypergeometric.py
Running all tests
make tests
Tested using HPC system modules
module purge
module load cuda/11.3
module load gcc/9.3.0-binutils-2.34
module load boost/1.73.0-gcc-9.3.0
#module load scipy/1.5.4-python-3.8.9
module load ninja/1.10.1
This can be sourced directly from the cusf/ root folder as:
source source.cusf
Status of the functions
Function | CPU status | GPU status | PyTorch |
---|---|---|---|
Hyp2f1 | Working | Crashing | Not implemented |
Hyp1f1 | Working | Working | Working |
faddeeva w | Working | Working | TBD |
erf | Working | Working | Working |
jy | Working | Working | TBD |
kv | Working | Working | TBD |
iv | Working | Working | TBD |
j0 | Working | Working | TBD |
yv | Working | Working | TBD |
jvp | Working | Working | TBD |
y0 | Working | Working | TBD |
jv | Working | Working | TBD |
k0 | Working | Working | TBD |
kvp | Working | Working | TBD |
i0 | Working | Working | Working |
ivp | Working | Working | TBD |
yvp | Working | Working | TBD |
Helper functions | CPU status | GPU status | PyTorch |
---|---|---|---|
digamma (psi) | TBD | TBD | TBD |
gamma | TBD | TBD | TBD |
tgamma1pmv | TBD | TBD | TBD |
chebyshev | TBD | TBD | TBD |
polynomial | TBD | TBD | TBD |
poly rational | TBD | TBD | TBD |
Hyp2f1
Hypergeometric function
Hyp1f1
Hypergeometric function
erf
Error function taking both complex and real inputs. If the input is real then the erf/erff functions from CUDA's and C++'s math APIs are used.
Run time and accuracy
The boost library is used as reference solution when possible and hence has zero error. In some cases other functions are used as the baseline which will be noted on the specific cases (and visible by the zero errors).
Using parallel processing when using the CPU.
Using an NVIDIA A100. CUDA Version: 11.7
Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz (OMP_NUM_THREADS=16)
The timing does not include transfer times from the CPU to GPU or vice versa.
Bessel i0
Based on 200,000,000 test cases
Sampling domain: [0, 1e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_i | 0.00000e+00 | 1097.861 ms |
std::cyl_bessel_i | 3.82249e-15 | 5597.835 ms |
gsl_sf_bessel_I0 | 9.44863e-16 | 1839.631 ms |
cusf::cpu::bessel::i0() | 6.98190e-16 | 630.067 ms |
cusf::gpu::bessel::i0() | 6.65807e-16 | 9.618 ms |
Bessel iv
Based on 20,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_i | 0.00000e+00 | 1830.353 ms |
std::cyl_bessel_i | 2.53776e-15 | 539.740 ms |
gsl_sf_bessel_Inu | 2.86962e-15 | 897.149 ms |
cusf::cpu::bessel::iv() | 3.07329e-15 | 594.268 ms |
cusf::gpu::bessel::iv() | 2.92510e-15 | 31.860 ms |
Bessel ivp
Mathematica is used to calculate the reference solutions.
Calculates the first derivative in all tests. There are 10,000 test cases sampled randomly in [0,10]x[0,10]. The runtimes have a variance due to the low number of test cases.
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_i_prime | 1.18805e-15 | 4.287 ms |
cusf::cpu::bessel::ivp() | 2.43492e-15 | 1.640 ms |
cusf::gpu::bessel::ivp() | 2.43492e-15 | 3.425 ms |
Using boost as the reference solution on 50,000,000 test cases sampled randomly in [0,10]x[0,10] gives the following.
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_i_prime | 0.00000e+00 | 8845.204 ms |
cusf::cpu::bessel::ivp() | 1.85718e-09 | 2966.189 ms |
cusf::gpu::bessel::ivp() | 1.85718e-09 | 174.116 ms |
Bessel j0
Based on 200,000,000 test cases
Sampling domain: [0, 1e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_j | 0.00000e+00 | 3142.450 ms |
std::cyl_bessel_j | 7.73255e-06 | 12141.822 ms |
gsl_sf_bessel_J0 | 3.84527e-08 | 3765.536 ms |
cusf::cpu::bessel::j0() | 3.88363e-08 | 1202.189 ms |
cusf::gpu::bessel::j0() | 3.88347e-08 | 17.563 ms |
Bessel jv
Based on 100,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_j | 0.00000e+00 | 4448.786 ms |
std::cyl_bessel_j | 2.60624e-06 | 5845.212 ms |
gsl_sf_bessel_Jnu | 1.60455e-07 | 5957.275 ms |
cusf::cpu::bessel::jv() | 1.74503e-08 | 1789.848 ms |
cusf::gpu::bessel::jv() | 1.56622e-08 | 127.943 ms |
Bessel jvp
Based on 100,000 test cases using Mathematica as the reference solution
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
Calculating the first derivative in all tests. The runtimes have a variance due to the low number of test cases.
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_j_prime | 9.50225e-09 | 14.425 ms |
cusf::cpu::bessel::jvp() | 8.96248e-09 | 6.670 ms |
cusf::gpu::bessel::jvp() | 8.96248e-09 | 3.919 ms |
Based on 100,000 test cases using Mathematica as the reference solution
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
Calculating the second derivative in all tests. The runtimes have a variance due to the low number of test cases.
Function | rel. ∞-norm | Time |
---|---|---|
cusf::cpu::bessel::jvp() | 1.38414e-12 | 13.231 ms |
cusf::gpu::bessel::jvp() | 1.38414e-12 | 3.435 ms |
Based on 50,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
Calculating the first derivative in all tests
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_j_prime | 0.00000e+00 | 4277.177 ms |
cusf::cpu::bessel::jvp() | 1.85728e-09 | 2971.017 ms |
cusf::gpu::bessel::jvp() | 1.85728e-09 | 228.466 ms |
Bessel jy
Based on 100,000,000 test cases
Sampling domain: [-5.0, 5.0] x [0.0, 1.0e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_j | 0.00000e+00 | 5522.207 ms |
cusf::cpu::bessel::jy() | 1.28028e-08 | 3021.482 ms |
cusf::gpu::bessel::jy() | 1.39442e-08 | 299.698 ms |
Bessel k0
Based on 400,000,000 test cases
Sampling domain: [0.0, 1.0e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_k | 0.00000e+00 | 2352.512 ms |
std::cyl_bessel_k | 3.16452e-15 | 11465.057 ms |
gsl_sf_bessel_K0 | 2.56304e-15 | 2648.814 ms |
cusf::cpu::bessel::k0() | 6.14828e-16 | 784.413 ms |
cusf::gpu::bessel::k0() | 6.09491e-16 | 19.289 ms |
Bessel kv
Based on 40,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_k | 0.00000e+00 | 2765.835 ms |
std::cyl_bessel_k | 4.01006e-11 | 1114.459 ms |
gsl_sf_bessel_Knu | 7.94593e-15 | 653.502 ms |
cusf::cpu::bessel::kv() | 7.94593e-15 | 490.623 ms |
cusf::gpu::bessel::kv() | 7.72758e-15 | 43.718 ms |
Bessel kvp
Based on 10,000 test cases using Mathematica as the reference solution
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
Calculating the first derivative in all tests. The runtimes have a variance due to the low number of test cases.
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_k_prime | 2.51439e-14 | 3.945 ms |
cusf::cpu::bessel::kvp() | 2.53403e-14 | 1.300 ms |
cusf::gpu::bessel::kvp() | 2.53403e-14 | 3.701 ms |
Based on 40,000,000 test cases using the
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
Calculating the first derivative in all tests
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_k_prime | 0.00000e+00 | 6423.573 ms |
cusf::cpu::bessel::kvp() | 3.28007e-15 | 1925.247 ms |
cusf::gpu::bessel::kvp() | 3.28007e-15 | 138.935 ms |
Bessel y0
Based on 200,000,000 test cases
Sampling domain: [0.0, 1.0e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_neumann | 0.00000e+00 | 3249.236 ms |
std::cyl_neumann | 5.71018e-06 | 11913.487 ms |
gsl_sf_bessel_Y0 | 2.91434e-09 | 3575.293 ms |
cusf::cpu::bessel::y0() | 2.91826e-09 | 1266.289 ms |
cusf::gpu::bessel::y0() | 2.98824e-09 | 37.802 ms |
Bessel yv
Based on 40,000,000 test cases
Sampling domain: [1.0e-8, 1.0e1] x [1.0e-8, 1.0e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_neumann | 0.00000e+00 | 1825.367 ms |
std::cyl_neumann | 6.37233e-05 | 2281.104 ms |
gsl_sf_bessel_Ynu | 3.63250e-06 | 2613.619 ms |
cusf::cpu::bessel::yv() | 2.99352e-07 | 723.291 ms |
cusf::gpu::bessel::yv() | 3.92625e-07 | 61.060 ms |
Bessel yvp
Based on 100,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
Calculating the first derivative in all tests
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::cyl_bessel_y_prime | 0.00000e+00 | 9626.878 ms |
cusf::cpu::bessel::yvp() | 3.25158e-08 | 3914.685 ms |
cusf::gpu::bessel::yvp() | 3.79765e-08 | 698.263 ms |
Faddeeva w
Based on 400,000,000 test cases
Sampling domain: [-1.0e3, 1.0e3] x [-1.0e3i, 1.0e3i]
Function | rel. ∞-norm | Time |
---|---|---|
Faddeeva::w() | 0.00000e+00 | 2411.046 ms |
cusf::cpu::faddeeva::w() | 1.05769e-09 | 1814.849 ms |
cusf::gpu::faddeeva::w() | 1.05769e-09 | 56.146 ms |
Hypergeometric 1f1
Based on 100,000,000 test cases
Sampling domain: [-1.0e1, 1.0e1] x [-1.0e1, 1.0e1] x [0.0, 1.0e2]
Function | rel. ∞-norm | Time |
---|---|---|
boost::math::hypergeometric_1F1 | 0.00000e+00 | 1065.779 ms |
std::tr1::conf_hyperg | 2.21643e-16 | 155.877 ms |
gsl_sf_hyperg_1F1 | 2.21944e-16 | 267.704 ms |
cusf::cpu::hypergeometric::hyp1f1() | 2.21927e-16 | 154.958 ms |
cusf::gpu::hypergeometric::hyp1f1() | 2.22043e-16 | 6.341 ms |
Erf
Faddeeva library is used as the reference solution on 500,000,000 random test cases.
Complex sampling domain: [-1e3, 1e3] x [-1e3i, 1e3i]
Real sampling domain: [0, 1e3]
Function | rel. ∞-norm | Time | Type |
---|---|---|---|
Faddeeva::erf() | 0.00000e+00 | 4653.006 ms | Complex |
Faddeeva::erf() | 0.00000e+00 | 183.772 ms | Real |
cusf::cpu::error::erf() | 1.97458e-08 | 3891.718 ms | Complex |
cusf::cpu::error::erf() | 1.26193e-13 | 161.622 ms | Real |
cusf::gpu::error::erf() | 2.35130e-08 | 71.357 ms | Complex |
cusf::gpu::error::erf() | 1.30277e-13 | 14.786 ms | Real |