I have the following minimal example of summing an array (taken from here):
#lib.cpp
template<typename T>
T arr_sum(T *arr, int size)
{
T temp=0;
for (int i=0; i != size; ++i){
temp += arr[i];
}
return temp;
}
#lib_wrapper.pyx
cimport cython
ctypedef fused float_t:
cython.float
cython.double
cdef extern from "lib.cpp" nogil:
T arr_sum[T](T *arr, size_t size)
def py_arr_sum(float_t[:] arr not None):
return arr_sum(&arr[0], arr.shape[0])
#setup.py
from setuptools import setup
from setuptools.extension import Extension
from Cython.Distutils import build_ext
import numpy as np
ext_modules = [Extension("lib_wrapper", ["lib_wrapper.pyx"],
include_dirs=[np.get_include()],
extra_compile_args=["-std=c++11", "-O1"],
language="c++")]
setup(
name='Rank Filter 1D Cython',
cmdclass={'build_ext': build_ext},
ext_modules=ext_modules
)
applying python setup.py build_ext --inplace
produces a 202K size shared object lib_wrapper.cpython-39-darwin.so. gcc -shared -fPIC -O1 -o lib.so lib.cpp
would produce a smaller object of ~4K size.
I assume that the redundancy in the file size comes from the C++- Python bridge, created by Cython.
Considering the numerous methods such as Numpy-C API, pybind11, etc, which one would allow the creation of this bridge without such a large file size overhead? Please exclude ctypes from the suggestions - it seems to bring a large addition to access time.