C.1 Tutorial: Advanced Tutorial on ctypes, ccfi and pybind (C and Fortran Use)
The final state of the files we used in this demo is available on Github here Links to an external site..
Introduction
Python, together with C/C++ and Fortran, is one of the most popular languages that is used by scientists. The easy-to-use interface and a large number of packages make it an attractive choice for prototyping. However, Python suffers from performance issues (such as GIL Links to an external site.). For this reason, (pure) Python applications are rarely used in a performance-critical setting. While Python maybe not be the fastest language, it is possible to extend and offload performance-critical functionalities to extensions. These extensions are written in C/C++/Fortran and can be imported as packages. One of the most commonly used packages that use this technique is Numpy. Another example, TensorFlow Links to an external site., one of the most widely used Deep-Learning frameworks exposes a Python interface while having a high-performance runtime that is written in C++.
The capability of running C code or using libraries developed in C/C++ is not unique to Python. Fortran also allows the execution of C functions in subroutines. This can be done through a binding interface, that translates the C typings into Fortran typings.
To work with this tutorial, we recommend using the Windows Subsystem for Linux 2 (WSL 2) and Anaconda as a Python distribution.
To install Anaconda, refer to https://www.anaconda.com/products/individual Links to an external site..
To prepare for the Python environment after installing Anaconda, create a new conda environment and activate it so that we can isolate our activities.
$ conda create -n dd2358 python==3.8
$ conda activate dd2358
$ conda install numpy
$ conda install cython
$ conda install pybind11
Objective
The goal of this tutorial is to give a broad and non-technical overview of the most common approaches to cross-language development with Fortran, Python, and C/C++.
For Fortran, we will go through the steps of developing a subroutine that calls a C function. We illustrate the use of a binding interface, and how to wrap it as a subroutine, using a small example.
For Python, we will go through a number of methods for developing Python extensions using C/C++ and Fortran, followed by a practical step-by-step example of building a simple math kernel extension that computes AXPY on Numpy arrays.
By the end of this tutorial, you should have a basic understanding of cross-language development. You will also be able to develop a minimal extension that operates on Numpy arrays in C++ or Fortran, as well as running C code in a Fortran program.
Table of content
- Running C code in Fortran using ISO binding
- Calling shared library directly in Python using ctype
- Building Python modules from Fortran modules using f2py
- Calling shared libraries in Python through Cython
- Creating Python modules from C++ code using Pybind11
- A step-by-step example on building a module using Pybind11
- Real world examples
- References
Running C code in Fortran
In this section, we learn how to call C functions in Fortran. While Fortran is a popular language among the scientific community, a large number of third-party libraries are written in C. Fortunately, it is possible to call C functions from Fortran code through an ISO binding, or vice versa. One challenge of using C code in Fortran is the translation of data types (besides memory layout ordering of arrays). Furthermore, the function symbols are handled differently (e.g., Fortran functions end with a _
suffix implicitly). In order to call C functions from Fortran, we need to define an interface, that translates the C types to Fortran types. Additionally, we can further wrap it as a subroutine to expose as little C details as possible.
Example
Consider a C function that computes a product between two double-precision floating-point numbers called product.c
.
double product(double, double);
double product(double a, double b)
{
return a * b;
}
Consider a Fortran program main.f90
that does two things: 1) to add two numbers; 2) to get the product of two numbers, where the product function is written in C.
module math
interface
real(c_double) function c_product(a, b) bind(c, name="product")
use iso_c_binding
implicit none
real(kind=c_double), value :: a
real(kind=c_double), value :: b
end function
end interface
end module
subroutine times(c, a, b)
use, intrinsic :: iso_c_binding, only: c_double
use math
use iso_fortran_env, only : real64
real(kind=real64), intent(in) :: a
real(kind=real64), intent(in) :: b
real(kind=real64), intent(out) :: c
c = c_product(a, b)
end subroutine
subroutine add(c, a, b)
use iso_fortran_env, only : real64
real(kind=real64), intent(in) :: a
real(kind=real64), intent(in) :: b
real(kind=real64), intent(out) :: c
c = a + b
end subroutine
program main
use iso_fortran_env, only : real64
real(kind=real64) :: a
real(kind=real64) :: b
real(kind=real64) :: c
real(kind=real64) :: d
a = 0.5
b = 0.5
call add(c, a, b)
call times(d, a, b)
print *, a, " + ", b, " = ", c
print *, a, " * ", b, " = ", d
end program
In this code, we define a module called math, with an interface containing a function called c_product
, that accepts two arguments and give a return value in double precision. We define that we want to bind this function to a C function called product()
. Next, we define that we want to use the ISO standard C binding. We specify that we have two variables in the signature a, and b, that are passed by value.
Now that we have an interface, we build a subroutine that calls this C function. Inside the subroutine, we define that we use the ISO C binding with the double
data type in C. We also specify that we want to use the MATH
module that we defined. Additionally, we list the arguments of the subroutine and whether they are input or output variables. Finally, we call the C function and update the output variable.
We compile the C function into object code using a C compiler.
$ gcc -c product.c -o product.o
After that, we compile the Fortran program.
$ gfortran -c main.f90 -o main.o
Finally, we use the Fortran linker (that understands Fortran symbols) to link the objects into an executable.
$ gfortran product.o main.o -o main.out
$ ./main.out
0.50000000000000000 + 0.50000000000000000 = 1.0000000000000000
0.50000000000000000 * 0.50000000000000000 = 0.25000000000000000
We leave calling Fortran subroutine from a C code as an exercise.
Developing Python Extension
In this section, we learn the four ways to write Python Extensions in C/C++ and Fortran:
- ctype
- f2py
- Cython
- Pybind11
Finally, we will create an AXPY kernel for Numpy arrays using Pybind11.
ctype
ctype
Links to an external site.is the simplest way to run a foreign function in a shared library through Python. A third party library of functions can be called directly. Consider the following function in a sum_src.c
.
#include <stdio.h>
#include "sum.h"
double sum(double a, double b)
{
printf("hello world from sum()\n");
return a + b;
}
With a header file sum.h
:
#ifndef __SUM_H__
#define __SUM_H__
double sum(double, double);
#endif
Compiled into a shared library libsum.so
:
$ gcc -fPIC -shared -o libsum.so sum_src.c
We can run this function through Python directly:
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import ctypes
>>> _libsum = ctypes.CDLL('libsum.so')
>>> _libsum.sum.argtypes = [ctypes.c_double, ctypes.c_double]
>>> _libsum.sum.restype = ctypes.c_double
>>> _libsum.sum(ctypes.c_double(3.1), ctypes.c_double(3.3))
hello world from sum()
6.4
>>>
The first step when using a shared library through ctypes is to load the shared library. After that, the function signature must be defined such that ctypes can translate the types. Finally, the return type must also be specified. With all information set, the function can be called directly, as if the shared library is a package. One important issue is that all the arguments must be translated to the respective primitive data types. A feature of ctypes is that callback functions can be defined to allow Python functions to be called by a C program. For more information, refer to the documentation.
f2py to run Fortran code in Python
While ctype enables the running of shared libraries in C, f2py
Links to an external site. enables us to compile a list of subroutines into a Python module. Consider the following program, which contains a module math_kernels
with a subroutine called axpy
.
module kernels
implicit none
contains
subroutine axpy(a, x, y)
real(kind=8), intent(in) :: a
real(kind=8), dimension(:), intent(in) :: x
real(kind=8), dimension(:), intent(inout) :: y
y = a * x + y
end subroutine
end module kernels
program main
use, intrinsic :: iso_fortran_env, only : int32, real64
use kernels
implicit none
real(kind=real64) :: a
real(kind=real64), allocatable, dimension(:) :: x
real(kind=real64), allocatable, dimension(:) :: y
integer(kind=int32) :: length = 10
integer(kind=int32) :: i = 10
allocate(x(length))
allocate(y(length))
a = 0.8
do i = 1, length
x(i) = 0.5 * i
y(i) = 0.1 * i
end do
do i = 1, length
print *, a * x(i) + y(i)
end do
call axpy(a, x, y)
print *, y(:)
deallocate(x)
deallocate(y)
end program
We can compile and run this code and run it. What if we want to create an axpy()
as a Python module? We can create a file called math_kernels.f90
, and put only module that we want in our module, in this file.
module kernels
implicit none
contains
subroutine axpy(a, x, y)
real(kind=8), intent(in) :: a
real(kind=8), dimension(:), intent(in) :: x
real(kind=8), dimension(:), intent(inout) :: y
y = a * x + y
end subroutine
end module kernels
We use f2py
to compile this into a module.
$ f2py -c math_kernels.f90 -m math_kernels
After that, we will have a module, called math_kernels.cpython-38-x86_64-linux-gnu.so
. We can use this module in Python directly by importing the module inside. For example:
$ python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from math_kernels import kernels
>>> import numpy as np
>>> x = np.random.random(5)
>>> y = np.random.random(5)
>>> a = 0.2
>>> z = a * x + y
>>> kernels.axpy(a, x, y)
>>> z
array([0.98274619, 0.25840269, 0.87331128, 0.84778364, 0.32389878])
>>> y
array([0.98274619, 0.25840269, 0.87331128, 0.84778364, 0.32389878])
>>> z - y
array([0., 0., 0., 0., 0.])
>>>
One thing to notice is that we used real(kind=8)
instead of real64
. The reason is that f2py only accepts constant integers as an input.
Cython
Cython
Links to an external site. is a source-to-source compiler that Cythonize "Python-like" source code into C code which in turn can be compiled by a C compiler. The major addition of Cython is the addition of C-like static typing. Consider the previous summation example, we can write this program in Cython and call it cython_sum.pyx
:
cdef double cython_sum(double a, double b): # define a callable function within Cython with cdef
return a + b
cpdef double sum(double a, double b): # define a wrapper function callable by Python with cpdef
return cython_sum(a, b)
We can directly Cythonize this program using cython
to generate a C program. However, one convenient way when developing a module as part of a larger Python package is to use setuptools
Links to an external site.. We can create a build system setup.py
using:
from setuptools import setup
from setuptools.extension import Extension
from Cython.Build import cythonize
ext = Extension(name="sum", sources=["cython_sum.pyx"])
setup(ext_modules=cythonize(ext))
We compile this module using:
$ python setup.py build_ext --inplace
Compiling cython_sum.pyx because it changed.
[1/1] Cythonizing cython_sum.pyx
/home/steven/anaconda3/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/cython_sum.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
running build_ext
building 'sum' extension
creating build
creating build/temp.linux-x86_64-3.7
gcc -pthread -B /home/steven/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/steven/anaconda3/include/python3.7m -c cython_sum.c -o build/temp.linux-x86_64-3.7/cython_sum.o
creating build/lib.linux-x86_64-3.7
gcc -pthread -shared -B /home/steven/anaconda3/compiler_compat -L/home/steven/anaconda3/lib -Wl,-rpath=/home/steven/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/cython_sum.o -o build/lib.linux-x86_64-3.7/sum.cpython-37m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.7/sum.cpython-37m-x86_64-linux-gnu.so ->
After that, a module will be generated, called sum.cpython-37m-x86_64-linux-gnu.so
. We can use this module directly:
$ python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sum
>>> sum.sum(10.1, 5.2)
15.3
>>>
However, what is the practical benefit of this apart from generating fast code in C and to perform type checking? One major benefit is that it is possible to generate code that calls other shared libraries, effectively making it a wrapper. Consider a more complicated case where we reuse the libsum.so
library that we created for the ctypes example. We can call that library in Cython, effectively creating a wrapper.
However, we first need to setup a header file sum.h
, and a respective .pxd
file, libsum.pxd
such that function signatures can be resolved. It is noteworthy that Cython comes with a large number of predefined .pxd
files for commonly used libraries, such as libc
, so that one does not need to create one such definition for every library used. Consider libsum.pxd
that uses sum.h
:
cdef extern from "sum.h":
double sum(double, double)
Now that we have the function definitions well defined, we can use the cimport
in the Cython code to import this library, where cython_sum.pyx
will become:
cimport libsum # use cimport to import libsum.so
cpdef double sum(double a, double b): # Now that we depend on an external library, we only need the Python wrapper
return libsum.sum(a, b) # call the sum function
Similarly, we rebuild the Cython module using setup.py
. However, since now we depend on an external library, it has to be linked. This can be specified as the Library
argument. setup.py
becomes:
from setuptools import setup
from setuptools.extension import Extension
from Cython.Build import cythonize
ext = Extension(name="sum", sources=["cython_sum.pyx"], libraries=["sum"])
setup(ext_modules=cythonize(ext))
We rebuild the module with the new configuration:
$ python setup.py build_ext --inplace
Compiling cython_sum.pyx because it changed.
[1/1] Cythonizing cython_sum.pyx
/home/steven/anaconda3/lib/python3.7/site-packages/Cython/Compiler/Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: /tmp/cython_sum.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
running build_ext
building 'sum' extension
gcc -pthread -B /home/steven/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I. -I/home/steven/anaconda3/include/python3.7m -c cython_sum.c -o build/temp.linux-x86_64-3.7/cython_sum.o
gcc -pthread -shared -B /home/steven/anaconda3/compiler_compat -L/home/steven/anaconda3/lib -Wl,-rpath=/home/steven/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/cython_sum.o -lsum -o build/lib.linux-x86_64-3.7/sum.cpython-37m-x86_64-linux-gnu.so
copying build/lib.linux-x86_64-3.7/sum.cpython-37m-x86_64-linux-gnu.so ->
We will get the same Python module as before, and we can run it the same way as before:
$ python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sum
>>> sum.sum(10.1, 5.2)
hello world from sum()
15.3
>>>
The printout "hello world from sum()" is a clear indication that our shared library is being used by the Cython module.
PyBind11
In all the previous examples, we use independent shared libraries written in C and provide Python bindings to turn them into modules. PyBind11
Links to an external site. takes the opposite approach and provides a C++ API for binding C++ Classes and functions and for generating Python modules. Pybind11 is C++ only and uses the latest C++11 features. In fact, TensorFlow also uses Pybind11 to expose functionalities in its Python interface. One added advantage of using Pybind11 is that underlying data and types coming from Python (such as Numpy arrays) can be easily handled. Consider our previous example sum_src.c
, we create a new version called sum_cpp.cc
with exact same content. To generate a Python module using Pybind11, we modify the code to the following:
#include "sum.h"
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h> // Include the Pybind11 header
double sum(double a, double b)
{
printf("hello world from sum()\n");
return a + b;
}
/* name of module- ---a variable with type py::module_ to create the binding */
/* | | */
/* v v */
PYBIND11_MODULE(sum, m) { // Create a module using the PYBIND11_MODULE macro
m.doc() = "pybind11 sum module";
m.def("sum", &sum, "sum two numbers in double"); // calls module::def() to create generate binding code that exposes sum()
}
Compile the code with:
$ g++ -O3 -Wall -shared -std=c++11 -fPIC `python3 -m pybind11 --includes` sum_cpp.cc -o sum`python3-config --extension-suffix`
We will have a module called sum.cpython-37m-x86_64-linux-gnu.so
. This module can be directly used just like before:
$ python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sum
>>> sum.sum(10.6, 5.4)
hello world from sum()
16.0
>>>
That's it! No extra binding is needed.
A Simple Math Kernel Extension for Python
In this section, we will learn how to write a simple AXPY Python extension using PyBind11. AXPY is a widely used math operation in scientific applications where a vector x is scaled by a scalar
α and added to a vector
y. The operation takes the form:
y←αx+y
There are two variants of AXPY, SAXPY, and DAXPY, where S and D refer to Single Precision (32-bit float) and Double Precision (64-bit double) respectively.
Basic Kernel
We are going to implement a simple SAXPY that is able to take in the variables from Python in Numpy arrays. We implement the basic kernel in math_kernels.cc
:
void axpy(size_t n, float a, float *x, float *y)
{
for (size_t i = 0; i < n; i++)
y[i] = a * x[i] + y[i];
}
Nothing fancy here. Now that we have our kernel implemented, we need to create a module. However, instead of directly using PYBIND11_MODULE
to create a module, we need to first create a wrapper to handle the Numpy arrays.
Handling Numpy arrays
Since Numpy arrays are managed objects by Numpy, we cannot access them as if they are C/C++ arrays. Fortunately, Pybind11 already provides the necessary bindings for us to inspect and unpack Numpy arrays. For this, we can include the header #include <pybind11/numpy.h>
and using namespace py = pybind11
. The main class to handle Numpy array is py::array_t<T>
where T
is the datatype. Python supports a very general and convenient way for exchanging data between libraries through a buffer protocol. In this sense, the py::array_t<T>
can be considered a buffer object that is restricted to Numpy arrays. The information of the buffer, such as the dimension and size, are stored in an information object called py::buffer_info
. To create a function that accepts Numpy arrays in C++, we can do the following:
void axpy_wrapper(float a, py::array_t<float> py_x, py::array_t<float> py_y)
{
// Request for buffer information
py::buffer_info x_buffer = py_x.request();
py::buffer_info y_buffer = py_y.request();
// check dim
if (x_buffer.ndim != 1 || y_buffer.ndim != 1) {
throw std::runtime_error("Error: dimension of vector should be 1");
}
// check shape match
if (x_buffer.shape[0] != y_buffer.shape[0]) {
throw std::runtime_error("Error: size of X and Y do not match");
}
// extract raw pointer
float *x = (float*)x_buffer.ptr;
float *y = (float*)y_buffer.ptr;
return axpy(x_buffer.shape[0], a, x, y);
}
In this example, we see how we extract a raw pointer from a Numpy array and get information such as the length of dimension zero. Unless we are certain of how this function will be called, we should perform simple sanity checking, for example on dimensionality and sizes. Furthermore, we should ensure that the array we are handling is a dense array (not irregular). For more information on format checking and performance optimization, refer to here Links to an external site. and here Links to an external site..
Putting them all together
Now that we have a kernel, a wrapper to handle incoming Numpy arrays, we can put them together:
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h> // Include header that supports Numpy arrays
namespace py = pybind11;
void axpy(size_t n, float a, float *x, float *y)
{
for (size_t i = 0; i < n; i++)
y[i] = a * x[i] + y[i];
}
void axpy_wrapper(float a, py::array_t<float> py_x, py::array_t<float> py_y)
{
// Request for buffer information
py::buffer_info x_buffer = py_x.request();
py::buffer_info y_buffer = py_y.request();
// check dim
if (x_buffer.ndim != 1 || y_buffer.ndim != 1) {
throw std::runtime_error("Error: dimension of vector should be 1");
}
// check shape match
if (x_buffer.shape[0] != y_buffer.shape[0]) {
throw std::runtime_error("Error: size of X and Y do not match");
}
// extract raw pointer
float *x = (float*)x_buffer.ptr;
float *y = (float*)y_buffer.ptr;
return axpy(x_buffer.shape[0], a, x, y);
}
PYBIND11_MODULE(math_kernels, m)
{
m.def("axpy", &axpy_wrapper);
}
We can compile with the following:
$ g++ -O3 -Wall -shared -std=c++11 -fPIC `python3 -m pybind11 --includes` math_kernels.cc -o math_kernels`python3-config --extension-suffix`
After that we can use the module in Python:
$ python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import math_kernels
>>> x = np.random.random(5)
>>> y = np.random.random(5)
>>> a = 0.5
>>> z = a * x + y
>>> math_kernels.axpy(a, x, y)
>>> z
array([0.72109651, 0.80642238, 1.3581158 , 0.68677673, 0.47000476])
>>> y
array([0.67480229, 0.67799412, 0.97670182, 0.20975148, 0.35173144])
>>> z - y
array([0.04629421, 0.12842827, 0.38141398, 0.47702525, 0.11827332])
>>>
This is completely wrong, what has happened?
Type Check
Recall that we are implementing SAXPY. However, notice that the default, Numpy arrays have type np.float64, or double precision. The error is due to the implicit conversion of Pybind11 between datatypes. The easiest way to enforce typing is to specify the list of arguments, and whether the arguments should be converted automatically. In this case, we can change your module creation code to:
PYBIND11_MODULE(math_kernels, m)
{
m.def("axpy", &axpy_wrapper, py::arg("a"), py::arg("x").noconvert(true), py::arg("y").noconvert(true));
}
Where we specify that the type of the two vectors must match exactly. Compile again and we now try the following example, where one vector is in single-precision while the other is in double-precision. We compile the code and run again:
$ python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import math_kernels
>>> x = np.random.random(5).astype(np.float32)
>>> y = np.random.random(5)
>>> math_kernels.axpy(0.5, x, y)
Traceback (most recent call last):
File "", line 1, in
TypeError: axpy(): incompatible function arguments. The following argument types are supported:
1. (a: float, x: numpy.ndarray[float32], y: numpy.ndarray[float32]) -> None
Invoked with: 0.5, array([0.583261 , 0.4329178 , 0.1882612 , 0.94220173, 0.56544894],
dtype=float32), array([0.65019744, 0.12314781, 0.04364232, 0.54704202, 0.18679054])
>>>
We can see that Python now throws an error.
Overloading
In the case we want to support both double-precision and single-precision, what can we do? In C++ we can easily support this using templates. In fact, Pybind11 supports this binding. We rewrite our code to the following:
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h> // Include header that supports Numpy arrays
namespace py = pybind11;
template <typename T>
void axpy(size_t n, T a, T *x, T *y)
{
for (size_t i = 0; i < n; i++)
y[i] = a * x[i] + y[i];
}
template <typename T>
void axpy_wrapper(T a, py::array_t<T> py_x, py::array_t<T> py_y)
{
// Request for buffer information
py::buffer_info x_buffer = py_x.request();
py::buffer_info y_buffer = py_y.request();
// check dim
if (x_buffer.ndim != 1 || y_buffer.ndim != 1) {
throw std::runtime_error("Error: dimension of vector should be 1");
}
// check shape match
if (x_buffer.shape[0] != y_buffer.shape[0]) {
throw std::runtime_error("Error: size of X and Y not match");
}
// extract raw pointer
T *x = (T*)x_buffer.ptr;
T *y = (T*)y_buffer.ptr;
return axpy<T>(x_buffer.shape[0], a, x, y);
}
PYBIND11_MODULE(math_kernels, m)
{
m.def("axpy", &axpy_wrapper<float>, py::arg("a"), py::arg("x").noconvert(true), py::arg("y").noconvert(true));
m.def("axpy", &axpy_wrapper<double>, py::arg("a"), py::arg("x").noconvert(true), py::arg("y").noconvert(true));
}
We now have a module that supports both, and only, single-precision and double-precision. We compile the code and run the module in Python:
$ python
Python 3.8.0 (default, Nov 6 2019, 21:49:08)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> import math_kernels
>>> x = np.random.random(5).astype(np.float32)
>>> y = np.random.random(5).astype(np.float32)
>>> a = 0.5
>>> z = a * x + y
>>> math_kernels.axpy(a, x, y)
>>> y - z
array([0., 0., 0., 0., 0.], dtype=float32)
>>> x = np.random.random(5).astype(np.float64)
>>> y = np.random.random(5).astype(np.float64)
>>> z = a * x + y
>>> math_kernels.axpy(a, x, y)
>>> y - z
array([0., 0., 0., 0., 0.])
>>>
It now works with both precisions!
Build system
While we can depend on running the compilation directly, it is common practice to put the wrapper and module creation code in separate locations. Since Pybind11 is a header-only library, it is convenient enough to embed it into the project you are developing. The supported build system of Pybind11 is CMake. Refer to https://pybind11.readthedocs.io/en/stable/compiling.html#building-with-cmake Links to an external site. for more information.
We have created a simple CMakeLists.txt
here
Links to an external site.. Since Pybind11 is a header only library, it is convenient to embed it to a project to avoid extra dependencies. One can obtain PyBind11 using the following. Optionally, one can checkout a specific version.
$ git clone https://github.com/pybind/pybind11.git
$ git checkout tags/v2.6.2
(Optional) More advanced examples
In this section, we take a look at two more advanced real-world examples.
StreamBrain
In this example, we look at one of our research projects StreamBrain Links to an external site., a framework for exploring how emerging neuromorphic learning algorithms can make use of HPC hardware in a parallel environment. StreamBrain is a Python framework that implements the BCPNN algorithm, where important steps in the algorithm (called the kernel) are offloaded to different accelerators, such as GPU, FPGA, Vectorized OpenMP, and MPI for distributed computing.
Take a look at how these kernels are bind together with a Pybind11 binding, wrapper, and concrete implementation: https://github.com/KTH-HPC/StreamBrain/blob/mpi%2Bomp/BCPNN/backend/_kernels/kernels_openmp_internal.cc Links to an external site.
Take a look at how Pybind11's source code is integrated into the build system: https://github.com/KTH-HPC/StreamBrain/blob/mpi%2Bomp/CMakeLists.txt Links to an external site.
TensorFlow
TensorFlow is perhaps one of the most famous projects that makes use of Pybind11 to expose functionalities of its runtime. Take a look at the TensorFlow Python profiler interface and see how Pybind11 is used to expose a TraceMe wrapper (you do not need to understand the source code, but try to understand the binding). There are Pybind11 functions that are not covered in this tutorial, try to look for them in the documentation and over the internet:
- https://github.com/tensorflow/tensorflow/blob/dec8e0b11f4f87693b67e125e67dfbc68d26c205/tensorflow/python/profiler/internal/traceme_wrapper.cc Links to an external site.
- https://github.com/tensorflow/tensorflow/blob/dec8e0b11f4f87693b67e125e67dfbc68d26c205/tensorflow/python/profiler/internal/traceme_wrapper.h Links to an external site.
If you are interested in the TensorFlow profiler (which out of the scope of this course), read this paper.
References
This tutorial is inspired by:
- http://fortranwiki.org/fortran/show/Generating+C+Interfaces Links to an external site.
- https://www.l3harrisgeospatial.com/docs/FORTRANExamples.html Links to an external site.
- https://pages.tacc.utexas.edu/~eijkhout/istc/html/language.html Links to an external site.
- https://docs.python.org/3/library/ctypes.html Links to an external site.
- https://cython.readthedocs.io/en/latest/src/tutorial/cython_tutorial.html Links to an external site.
- https://github.com/pybind/python_example Links to an external site.
- https://coderefinery.github.io/mma/03-pybind11/ Links to an external site.
- https://medium.com/practical-coding/setting-up-a-c-python-project-with-pybind11-and-cmake-8de391494fca Links to an external site.
- https://github.com/chris4540/DD2424_Deep_Learning/blob/master/lib_clsr/ann_f.f90 Links to an external site.