Snippets Groups Projects

cusf

Merge of hanging changes

Andreas L. Plesner authored 9 months ago

6477ef27

6477ef27 9 months ago

Name	Last commit	Last update
__pycache__
autograd
cusf
torch
.gitignore
LICENSE
Makefile
README.md
STYLEGUIDE.txt
TODO
example_bessel.py
example_erf.py
example_hypergeometric.py
generate_testdata_hyp2f1.m
requirements.txt
source.cusf

CUSF

CUSF is a math library written in CUDA that provides a subset of mathematical special functions.

Python interface for torch (torch::Tensor) is supported.

Installation

You must have torch and scipy installed in your current (virtual) python environment.

git clone https://gitlab.gbar.dtu.dk/hhbs/cusf.git
cd cusf
make

Running the examples

python3 example_bessel.py
python3 example_erf.py
python3 example_hypergeometric.py

Running all tests

make tests

Tested using HPC system modules

module purge
module load cuda/11.3
module load gcc/9.3.0-binutils-2.34
module load boost/1.73.0-gcc-9.3.0
#module load scipy/1.5.4-python-3.8.9
module load ninja/1.10.1

This can be sourced directly from the cusf/ root folder as:

source source.cusf

Status of the functions

Function	CPU status	GPU status	PyTorch
Hyp2f1	Working	Crashing	Not implemented
Hyp1f1	Working	Working	Working
faddeeva w	Working	Working	TBD
erf	Working	Working	Working
jy	Working	Working	TBD
kv	Working	Working	TBD
iv	Working	Working	TBD
j0	Working	Working	TBD
yv	Working	Working	TBD
jvp	Working	Working	TBD
y0	Working	Working	TBD
jv	Working	Working	TBD
k0	Working	Working	TBD
kvp	Working	Working	TBD
i0	Working	Working	Working
ivp	Working	Working	TBD
yvp	Working	Working	TBD

Helper functions	CPU status	GPU status	PyTorch
digamma (psi)	TBD	TBD	TBD
gamma	TBD	TBD	TBD
tgamma1pmv	TBD	TBD	TBD
chebyshev	TBD	TBD	TBD
polynomial	TBD	TBD	TBD
poly rational	TBD	TBD	TBD

Hyp2f1

Hypergeometric function

_2f_1

(a,b;c;x)

Hyp1f1

Hypergeometric function

_1f_1

(a;b;x)

erf

Error function taking both complex and real inputs. If the input is real then the erf/erff functions from CUDA's and C++'s math APIs are used.

Run time and accuracy

The boost library is used as reference solution when possible and hence has zero error. In some cases other functions are used as the baseline which will be noted on the specific cases (and visible by the zero errors).
Using parallel processing when using the CPU.

Using an NVIDIA A100. CUDA Version: 11.7
Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz (OMP_NUM_THREADS=16)

The timing does not include transfer times from the CPU to GPU or vice versa.

Bessel i0

Based on 200,000,000 test cases
Sampling domain: [0, 1e2]

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_i	0.00000e+00	1097.861 ms
std::cyl_bessel_i	3.82249e-15	5597.835 ms
gsl_sf_bessel_I0	9.44863e-16	1839.631 ms
cusf::cpu::bessel::i0()	6.98190e-16	630.067 ms
cusf::gpu::bessel::i0()	6.65807e-16	9.618 ms

Bessel iv

Based on 20,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_i	0.00000e+00	1830.353 ms
std::cyl_bessel_i	2.53776e-15	539.740 ms
gsl_sf_bessel_Inu	2.86962e-15	897.149 ms
cusf::cpu::bessel::iv()	3.07329e-15	594.268 ms
cusf::gpu::bessel::iv()	2.92510e-15	31.860 ms

Bessel ivp

Mathematica is used to calculate the reference solutions.

Calculates the first derivative in all tests. There are 10,000 test cases sampled randomly in [0,10]x[0,10]. The runtimes have a variance due to the low number of test cases.

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_i_prime	1.18805e-15	4.287 ms
cusf::cpu::bessel::ivp()	2.43492e-15	1.640 ms
cusf::gpu::bessel::ivp()	2.43492e-15	3.425 ms

Using boost as the reference solution on 50,000,000 test cases sampled randomly in [0,10]x[0,10] gives the following.

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_i_prime	0.00000e+00	8845.204 ms
cusf::cpu::bessel::ivp()	1.85718e-09	2966.189 ms
cusf::gpu::bessel::ivp()	1.85718e-09	174.116 ms

Bessel j0

Based on 200,000,000 test cases
Sampling domain: [0, 1e2]

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_j	0.00000e+00	3142.450 ms
std::cyl_bessel_j	7.73255e-06	12141.822 ms
gsl_sf_bessel_J0	3.84527e-08	3765.536 ms
cusf::cpu::bessel::j0()	3.88363e-08	1202.189 ms
cusf::gpu::bessel::j0()	3.88347e-08	17.563 ms

Bessel jv

Based on 100,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_j	0.00000e+00	4448.786 ms
std::cyl_bessel_j	2.60624e-06	5845.212 ms
gsl_sf_bessel_Jnu	1.60455e-07	5957.275 ms
cusf::cpu::bessel::jv()	1.74503e-08	1789.848 ms
cusf::gpu::bessel::jv()	1.56622e-08	127.943 ms

Bessel jvp

Based on 100,000 test cases using Mathematica as the reference solution
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
Calculating the first derivative in all tests. The runtimes have a variance due to the low number of test cases.

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_j_prime	9.50225e-09	14.425 ms
cusf::cpu::bessel::jvp()	8.96248e-09	6.670 ms
cusf::gpu::bessel::jvp()	8.96248e-09	3.919 ms

Based on 100,000 test cases using Mathematica as the reference solution
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
Calculating the second derivative in all tests. The runtimes have a variance due to the low number of test cases.

Function	rel. ∞-norm	Time
cusf::cpu::bessel::jvp()	1.38414e-12	13.231 ms
cusf::gpu::bessel::jvp()	1.38414e-12	3.435 ms

Based on 50,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
Calculating the first derivative in all tests

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_j_prime	0.00000e+00	4277.177 ms
cusf::cpu::bessel::jvp()	1.85728e-09	2971.017 ms
cusf::gpu::bessel::jvp()	1.85728e-09	228.466 ms

Bessel jy

Based on 100,000,000 test cases
Sampling domain: [-5.0, 5.0] x [0.0, 1.0e2]

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_j	0.00000e+00	5522.207 ms
cusf::cpu::bessel::jy()	1.28028e-08	3021.482 ms
cusf::gpu::bessel::jy()	1.39442e-08	299.698 ms

Bessel k0

Based on 400,000,000 test cases
Sampling domain: [0.0, 1.0e2]

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_k	0.00000e+00	2352.512 ms
std::cyl_bessel_k	3.16452e-15	11465.057 ms
gsl_sf_bessel_K0	2.56304e-15	2648.814 ms
cusf::cpu::bessel::k0()	6.14828e-16	784.413 ms
cusf::gpu::bessel::k0()	6.09491e-16	19.289 ms

Bessel kv

Based on 40,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_k	0.00000e+00	2765.835 ms
std::cyl_bessel_k	4.01006e-11	1114.459 ms
gsl_sf_bessel_Knu	7.94593e-15	653.502 ms
cusf::cpu::bessel::kv()	7.94593e-15	490.623 ms
cusf::gpu::bessel::kv()	7.72758e-15	43.718 ms

Bessel kvp

Based on 10,000 test cases using Mathematica as the reference solution
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
Calculating the first derivative in all tests. The runtimes have a variance due to the low number of test cases.

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_k_prime	2.51439e-14	3.945 ms
cusf::cpu::bessel::kvp()	2.53403e-14	1.300 ms
cusf::gpu::bessel::kvp()	2.53403e-14	3.701 ms

Based on 40,000,000 test cases using the
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
Calculating the first derivative in all tests

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_k_prime	0.00000e+00	6423.573 ms
cusf::cpu::bessel::kvp()	3.28007e-15	1925.247 ms
cusf::gpu::bessel::kvp()	3.28007e-15	138.935 ms

Bessel y0

Based on 200,000,000 test cases
Sampling domain: [0.0, 1.0e2]

Function	rel. ∞-norm	Time
boost::math::cyl_neumann	0.00000e+00	3249.236 ms
std::cyl_neumann	5.71018e-06	11913.487 ms
gsl_sf_bessel_Y0	2.91434e-09	3575.293 ms
cusf::cpu::bessel::y0()	2.91826e-09	1266.289 ms
cusf::gpu::bessel::y0()	2.98824e-09	37.802 ms

Bessel yv

Based on 40,000,000 test cases
Sampling domain: [1.0e-8, 1.0e1] x [1.0e-8, 1.0e2]

Function	rel. ∞-norm	Time
boost::math::cyl_neumann	0.00000e+00	1825.367 ms
std::cyl_neumann	6.37233e-05	2281.104 ms
gsl_sf_bessel_Ynu	3.63250e-06	2613.619 ms
cusf::cpu::bessel::yv()	2.99352e-07	723.291 ms
cusf::gpu::bessel::yv()	3.92625e-07	61.060 ms

Bessel yvp

Based on 100,000,000 test cases
Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
Calculating the first derivative in all tests

Function	rel. ∞-norm	Time
boost::math::cyl_bessel_y_prime	0.00000e+00	9626.878 ms
cusf::cpu::bessel::yvp()	3.25158e-08	3914.685 ms
cusf::gpu::bessel::yvp()	3.79765e-08	698.263 ms

Faddeeva w

Based on 400,000,000 test cases
Sampling domain: [-1.0e3, 1.0e3] x [-1.0e3i, 1.0e3i]

Function	rel. ∞-norm	Time
Faddeeva::w()	0.00000e+00	2411.046 ms
cusf::cpu::faddeeva::w()	1.05769e-09	1814.849 ms
cusf::gpu::faddeeva::w()	1.05769e-09	56.146 ms

Hypergeometric 1f1

Based on 100,000,000 test cases
Sampling domain: [-1.0e1, 1.0e1] x [-1.0e1, 1.0e1] x [0.0, 1.0e2]

Function	rel. ∞-norm	Time
boost::math::hypergeometric_1F1	0.00000e+00	1065.779 ms
std::tr1::conf_hyperg	2.21643e-16	155.877 ms
gsl_sf_hyperg_1F1	2.21944e-16	267.704 ms
cusf::cpu::hypergeometric::hyp1f1()	2.21927e-16	154.958 ms
cusf::gpu::hypergeometric::hyp1f1()	2.22043e-16	6.341 ms

Erf

Faddeeva library is used as the reference solution on 500,000,000 random test cases.

Complex sampling domain: [-1e3, 1e3] x [-1e3i, 1e3i]
Real sampling domain: [0, 1e3]

Function	rel. ∞-norm	Time	Type
Faddeeva::erf()	0.00000e+00	4653.006 ms	Complex
Faddeeva::erf()	0.00000e+00	183.772 ms	Real
cusf::cpu::error::erf()	1.97458e-08	3891.718 ms	Complex
cusf::cpu::error::erf()	1.26193e-13	161.622 ms	Real
cusf::gpu::error::erf()	2.35130e-08	71.357 ms	Complex
cusf::gpu::error::erf()	1.30277e-13	14.786 ms	Real