Skip to content
Snippets Groups Projects
Select Git revision
  • master
1 result

cusf

  • Open with
  • Download source code
  • Your workspaces

      A workspace is a virtual sandbox environment for your code in GitLab.

      No agents available to create workspaces. Please consult Workspaces documentation for troubleshooting.

  • user avatar
    Andreas L. Plesner authored
    6477ef27
    History

    CUSF

    CUSF is a math library written in CUDA that provides a subset of mathematical special functions.

    Python interface for torch (torch::Tensor) is supported.

    Installation

    You must have torch and scipy installed in your current (virtual) python environment.

    git clone https://gitlab.gbar.dtu.dk/hhbs/cusf.git
    cd cusf
    make

    Running the examples

    python3 example_bessel.py
    python3 example_erf.py
    python3 example_hypergeometric.py

    Running all tests

    make tests

    Tested using HPC system modules

    module purge
    module load cuda/11.3
    module load gcc/9.3.0-binutils-2.34
    module load boost/1.73.0-gcc-9.3.0
    #module load scipy/1.5.4-python-3.8.9
    module load ninja/1.10.1

    This can be sourced directly from the cusf/ root folder as:

    source source.cusf

    Status of the functions

    Function CPU status GPU status PyTorch
    Hyp2f1 Working Crashing Not implemented
    Hyp1f1 Working Working Working
    faddeeva w Working Working TBD
    erf Working Working Working
    jy Working Working TBD
    kv Working Working TBD
    iv Working Working TBD
    j0 Working Working TBD
    yv Working Working TBD
    jvp Working Working TBD
    y0 Working Working TBD
    jv Working Working TBD
    k0 Working Working TBD
    kvp Working Working TBD
    i0 Working Working Working
    ivp Working Working TBD
    yvp Working Working TBD
    Helper functions CPU status GPU status PyTorch
    digamma (psi) TBD TBD TBD
    gamma TBD TBD TBD
    tgamma1pmv TBD TBD TBD
    chebyshev TBD TBD TBD
    polynomial TBD TBD TBD
    poly rational TBD TBD TBD

    Hyp2f1

    Hypergeometric function

    2f1_2f_1
    (a,b;c;x)

    Hyp1f1

    Hypergeometric function

    1f1_1f_1
    (a;b;x)

    erf

    Error function taking both complex and real inputs. If the input is real then the erf/erff functions from CUDA's and C++'s math APIs are used.

    Run time and accuracy

    The boost library is used as reference solution when possible and hence has zero error. In some cases other functions are used as the baseline which will be noted on the specific cases (and visible by the zero errors).
    Using parallel processing when using the CPU.

    Using an NVIDIA A100. CUDA Version: 11.7
    Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz (OMP_NUM_THREADS=16)

    The timing does not include transfer times from the CPU to GPU or vice versa.

    Bessel i0

    Based on 200,000,000 test cases
    Sampling domain: [0, 1e2]

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_i 0.00000e+00 1097.861 ms
    std::cyl_bessel_i 3.82249e-15 5597.835 ms
    gsl_sf_bessel_I0 9.44863e-16 1839.631 ms
    cusf::cpu::bessel::i0() 6.98190e-16 630.067 ms
    cusf::gpu::bessel::i0() 6.65807e-16 9.618 ms

    Bessel iv

    Based on 20,000,000 test cases
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_i 0.00000e+00 1830.353 ms
    std::cyl_bessel_i 2.53776e-15 539.740 ms
    gsl_sf_bessel_Inu 2.86962e-15 897.149 ms
    cusf::cpu::bessel::iv() 3.07329e-15 594.268 ms
    cusf::gpu::bessel::iv() 2.92510e-15 31.860 ms

    Bessel ivp

    Mathematica is used to calculate the reference solutions.

    Calculates the first derivative in all tests. There are 10,000 test cases sampled randomly in [0,10]x[0,10]. The runtimes have a variance due to the low number of test cases.

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_i_prime 1.18805e-15 4.287 ms
    cusf::cpu::bessel::ivp() 2.43492e-15 1.640 ms
    cusf::gpu::bessel::ivp() 2.43492e-15 3.425 ms

    Using boost as the reference solution on 50,000,000 test cases sampled randomly in [0,10]x[0,10] gives the following.

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_i_prime 0.00000e+00 8845.204 ms
    cusf::cpu::bessel::ivp() 1.85718e-09 2966.189 ms
    cusf::gpu::bessel::ivp() 1.85718e-09 174.116 ms

    Bessel j0

    Based on 200,000,000 test cases
    Sampling domain: [0, 1e2]

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_j 0.00000e+00 3142.450 ms
    std::cyl_bessel_j 7.73255e-06 12141.822 ms
    gsl_sf_bessel_J0 3.84527e-08 3765.536 ms
    cusf::cpu::bessel::j0() 3.88363e-08 1202.189 ms
    cusf::gpu::bessel::j0() 3.88347e-08 17.563 ms

    Bessel jv

    Based on 100,000,000 test cases
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_j 0.00000e+00 4448.786 ms
    std::cyl_bessel_j 2.60624e-06 5845.212 ms
    gsl_sf_bessel_Jnu 1.60455e-07 5957.275 ms
    cusf::cpu::bessel::jv() 1.74503e-08 1789.848 ms
    cusf::gpu::bessel::jv() 1.56622e-08 127.943 ms

    Bessel jvp

    Based on 100,000 test cases using Mathematica as the reference solution
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
    Calculating the first derivative in all tests. The runtimes have a variance due to the low number of test cases.

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_j_prime 9.50225e-09 14.425 ms
    cusf::cpu::bessel::jvp() 8.96248e-09 6.670 ms
    cusf::gpu::bessel::jvp() 8.96248e-09 3.919 ms

    Based on 100,000 test cases using Mathematica as the reference solution
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
    Calculating the second derivative in all tests. The runtimes have a variance due to the low number of test cases.

    Function rel. ∞-norm Time
    cusf::cpu::bessel::jvp() 1.38414e-12 13.231 ms
    cusf::gpu::bessel::jvp() 1.38414e-12 3.435 ms

    Based on 50,000,000 test cases
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
    Calculating the first derivative in all tests

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_j_prime 0.00000e+00 4277.177 ms
    cusf::cpu::bessel::jvp() 1.85728e-09 2971.017 ms
    cusf::gpu::bessel::jvp() 1.85728e-09 228.466 ms

    Bessel jy

    Based on 100,000,000 test cases
    Sampling domain: [-5.0, 5.0] x [0.0, 1.0e2]

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_j 0.00000e+00 5522.207 ms
    cusf::cpu::bessel::jy() 1.28028e-08 3021.482 ms
    cusf::gpu::bessel::jy() 1.39442e-08 299.698 ms

    Bessel k0

    Based on 400,000,000 test cases
    Sampling domain: [0.0, 1.0e2]

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_k 0.00000e+00 2352.512 ms
    std::cyl_bessel_k 3.16452e-15 11465.057 ms
    gsl_sf_bessel_K0 2.56304e-15 2648.814 ms
    cusf::cpu::bessel::k0() 6.14828e-16 784.413 ms
    cusf::gpu::bessel::k0() 6.09491e-16 19.289 ms

    Bessel kv

    Based on 40,000,000 test cases
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_k 0.00000e+00 2765.835 ms
    std::cyl_bessel_k 4.01006e-11 1114.459 ms
    gsl_sf_bessel_Knu 7.94593e-15 653.502 ms
    cusf::cpu::bessel::kv() 7.94593e-15 490.623 ms
    cusf::gpu::bessel::kv() 7.72758e-15 43.718 ms

    Bessel kvp

    Based on 10,000 test cases using Mathematica as the reference solution
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e1]
    Calculating the first derivative in all tests. The runtimes have a variance due to the low number of test cases.

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_k_prime 2.51439e-14 3.945 ms
    cusf::cpu::bessel::kvp() 2.53403e-14 1.300 ms
    cusf::gpu::bessel::kvp() 2.53403e-14 3.701 ms

    Based on 40,000,000 test cases using the
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
    Calculating the first derivative in all tests

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_k_prime 0.00000e+00 6423.573 ms
    cusf::cpu::bessel::kvp() 3.28007e-15 1925.247 ms
    cusf::gpu::bessel::kvp() 3.28007e-15 138.935 ms

    Bessel y0

    Based on 200,000,000 test cases
    Sampling domain: [0.0, 1.0e2]

    Function rel. ∞-norm Time
    boost::math::cyl_neumann 0.00000e+00 3249.236 ms
    std::cyl_neumann 5.71018e-06 11913.487 ms
    gsl_sf_bessel_Y0 2.91434e-09 3575.293 ms
    cusf::cpu::bessel::y0() 2.91826e-09 1266.289 ms
    cusf::gpu::bessel::y0() 2.98824e-09 37.802 ms

    Bessel yv

    Based on 40,000,000 test cases
    Sampling domain: [1.0e-8, 1.0e1] x [1.0e-8, 1.0e2]

    Function rel. ∞-norm Time
    boost::math::cyl_neumann 0.00000e+00 1825.367 ms
    std::cyl_neumann 6.37233e-05 2281.104 ms
    gsl_sf_bessel_Ynu 3.63250e-06 2613.619 ms
    cusf::cpu::bessel::yv() 2.99352e-07 723.291 ms
    cusf::gpu::bessel::yv() 3.92625e-07 61.060 ms

    Bessel yvp

    Based on 100,000,000 test cases
    Sampling domain: [0.0, 1.0e1] x [0.0, 1.0e2]
    Calculating the first derivative in all tests

    Function rel. ∞-norm Time
    boost::math::cyl_bessel_y_prime 0.00000e+00 9626.878 ms
    cusf::cpu::bessel::yvp() 3.25158e-08 3914.685 ms
    cusf::gpu::bessel::yvp() 3.79765e-08 698.263 ms

    Faddeeva w

    Based on 400,000,000 test cases
    Sampling domain: [-1.0e3, 1.0e3] x [-1.0e3i, 1.0e3i]

    Function rel. ∞-norm Time
    Faddeeva::w() 0.00000e+00 2411.046 ms
    cusf::cpu::faddeeva::w() 1.05769e-09 1814.849 ms
    cusf::gpu::faddeeva::w() 1.05769e-09 56.146 ms

    Hypergeometric 1f1

    Based on 100,000,000 test cases
    Sampling domain: [-1.0e1, 1.0e1] x [-1.0e1, 1.0e1] x [0.0, 1.0e2]

    Function rel. ∞-norm Time
    boost::math::hypergeometric_1F1 0.00000e+00 1065.779 ms
    std::tr1::conf_hyperg 2.21643e-16 155.877 ms
    gsl_sf_hyperg_1F1 2.21944e-16 267.704 ms
    cusf::cpu::hypergeometric::hyp1f1() 2.21927e-16 154.958 ms
    cusf::gpu::hypergeometric::hyp1f1() 2.22043e-16 6.341 ms

    Erf

    Faddeeva library is used as the reference solution on 500,000,000 random test cases.

    Complex sampling domain: [-1e3, 1e3] x [-1e3i, 1e3i]
    Real sampling domain: [0, 1e3]

    Function rel. ∞-norm Time Type
    Faddeeva::erf() 0.00000e+00 4653.006 ms Complex
    Faddeeva::erf() 0.00000e+00 183.772 ms Real
    cusf::cpu::error::erf() 1.97458e-08 3891.718 ms Complex
    cusf::cpu::error::erf() 1.26193e-13 161.622 ms Real
    cusf::gpu::error::erf() 2.35130e-08 71.357 ms Complex
    cusf::gpu::error::erf() 1.30277e-13 14.786 ms Real