diff options
Diffstat (limited to 'doc/source/embedding.rst')
-rw-r--r-- | doc/source/embedding.rst | 524 |
1 files changed, 524 insertions, 0 deletions
diff --git a/doc/source/embedding.rst b/doc/source/embedding.rst new file mode 100644 index 0000000..181249c --- /dev/null +++ b/doc/source/embedding.rst @@ -0,0 +1,524 @@ +================================ +Using CFFI for embedding +================================ + +.. contents:: + +You can use CFFI to generate C code which exports the API of your choice +to any C application that wants to link with this C code. This API, +which you define yourself, ends up as the API of a ``.so/.dll/.dylib`` +library---or you can statically link it within a larger application. + +Possible use cases: + +* Exposing a library written in Python directly to C/C++ programs. + +* Using Python to make a "plug-in" for an existing C/C++ program that is + already written to load them. + +* Using Python to implement part of a larger C/C++ application (with + static linking). + +* Writing a small C/C++ wrapper around Python, hiding the fact that the + application is actually written in Python (to make a custom + command-line interface; for distribution purposes; or simply to make + it a bit harder to reverse-engineer the application). + +The general idea is as follows: + +* You write and execute a Python script, which produces a ``.c`` file + with the API of your choice (and optionally compile it into a + ``.so/.dll/.dylib``). The script also gives some Python code to be + "frozen" inside the ``.so``. + +* At runtime, the C application loads this ``.so/.dll/.dylib`` (or is + statically linked with the ``.c`` source) without having to know that + it was produced from Python and CFFI. + +* The first time a C function is called, Python is initialized and + the frozen Python code is executed. + +* The frozen Python code defines more Python functions that implement the + C functions of your API, which are then used for all subsequent C + function calls. + +One of the goals of this approach is to be entirely independent from +the CPython C API: no ``Py_Initialize()`` nor ``PyRun_SimpleString()`` +nor even ``PyObject``. It works identically on CPython and PyPy. + +This is entirely *new in version 1.5.* (PyPy contains CFFI 1.5 since +release 5.0.) + + +Usage +----- + +.. __: overview.html#embedding + +See the `paragraph in the overview page`__ for a quick introduction. +In this section, we explain every step in more details. We will use +here this slightly expanded example: + +.. code-block:: c + + /* file plugin.h */ + typedef struct { int x, y; } point_t; + extern int do_stuff(point_t *); + +.. code-block:: c + + /* file plugin.h, Windows-friendly version */ + typedef struct { int x, y; } point_t; + + /* When including this file from ffibuilder.set_source(), the + following macro is defined to '__declspec(dllexport)'. When + including this file directly from your C program, we define + it to 'extern __declspec(dllimport)' instead. + + With non-MSVC compilers we simply define it to 'extern'. + (The 'extern' is needed for sharing global variables; + functions would be fine without it. The macros always + include 'extern': you must not repeat it when using the + macros later.) + */ + #ifndef CFFI_DLLEXPORT + # if defined(_MSC_VER) + # define CFFI_DLLEXPORT extern __declspec(dllimport) + # else + # define CFFI_DLLEXPORT extern + # endif + #endif + + CFFI_DLLEXPORT int do_stuff(point_t *); + +.. code-block:: python + + # file plugin_build.py + import cffi + ffibuilder = cffi.FFI() + + with open('plugin.h') as f: + # read plugin.h and pass it to embedding_api(), manually + # removing the '#' directives and the CFFI_DLLEXPORT + data = ''.join([line for line in f if not line.startswith('#')]) + data = data.replace('CFFI_DLLEXPORT', '') + ffibuilder.embedding_api(data) + + ffibuilder.set_source("my_plugin", r''' + #include "plugin.h" + ''') + + ffibuilder.embedding_init_code(""" + from my_plugin import ffi + + @ffi.def_extern() + def do_stuff(p): + print("adding %d and %d" % (p.x, p.y)) + return p.x + p.y + """) + + ffibuilder.compile(target="plugin-1.5.*", verbose=True) + # or: ffibuilder.emit_c_code("my_plugin.c") + +Running the code above produces a *DLL*, i,e, a dynamically-loadable +library. It is a file with the extension ``.dll`` on Windows, +``.dylib`` on Mac OS/X, or ``.so`` on other platforms. As usual, it +is produced by generating some intermediate ``.c`` code and then +calling the regular platform-specific C compiler. See below__ for +some pointers to C-level issues with using the produced library. + +.. __: `Issues about using the .so`_ + +Here are some details about the methods used above: + +* **ffibuilder.embedding_api(source):** parses the given C source, which + declares functions that you want to be exported by the DLL. It can + also declare types, constants and global variables that are part of + the C-level API of your DLL. + + The functions that are found in ``source`` will be automatically + defined in the ``.c`` file: they will contain code that initializes + the Python interpreter the first time any of them is called, + followed by code to call the attached Python function (with + ``@ffi.def_extern()``, see next point). + + The global variables, on the other hand, are not automatically + produced. You have to write their definition explicitly in + ``ffibuilder.set_source()``, as regular C code (see the point after next). + +* **ffibuilder.embedding_init_code(python_code):** this gives + initialization-time Python source code. This code is copied + ("frozen") inside the DLL. At runtime, the code is executed when + the DLL is first initialized, just after Python itself is + initialized. This newly initialized Python interpreter has got an + extra "built-in" module that can be loaded magically without + accessing any files, with a line like "``from my_plugin import ffi, + lib``". The name ``my_plugin`` comes from the first argument to + ``ffibuilder.set_source()``. This module represents "the caller's C world" + from the point of view of Python. + + The initialization-time Python code can import other modules or + packages as usual. You may have typical Python issues like needing + to set up ``sys.path`` somehow manually first. + + For every function declared within ``ffibuilder.embedding_api()``, the + initialization-time Python code or one of the modules it imports + should use the decorator ``@ffi.def_extern()`` to attach a + corresponding Python function to it. + + If the initialization-time Python code fails with an exception, then + you get a traceback printed to stderr, along with more information + to help you identify problems like wrong ``sys.path``. If some + function remains unattached at the time where the C code tries to + call it, an error message is also printed to stderr and the function + returns zero/null. + + Note that the CFFI module never calls ``exit()``, but CPython itself + contains code that calls ``exit()``, for example if importing + ``site`` fails. This may be worked around in the future. + +* **ffibuilder.set_source(c_module_name, c_code):** set the name of the + module from Python's point of view. It also gives more C code which + will be included in the generated C code. In trivial examples it + can be an empty string. It is where you would ``#include`` some + other files, define global variables, and so on. The macro + ``CFFI_DLLEXPORT`` is available to this C code: it expands to the + platform-specific way of saying "the following declaration should be + exported from the DLL". For example, you would put "``extern int + my_glob;``" in ``ffibuilder.embedding_api()`` and "``CFFI_DLLEXPORT int + my_glob = 42;``" in ``ffibuilder.set_source()``. + + Currently, any *type* declared in ``ffibuilder.embedding_api()`` must also + be present in the ``c_code``. This is automatic if this code + contains a line like ``#include "plugin.h"`` in the example above. + +* **ffibuilder.compile([target=...] [, verbose=True]):** make the C code and + compile it. By default, it produces a file called + ``c_module_name.dll``, ``c_module_name.dylib`` or + ``c_module_name.so``, but the default can be changed with the + optional ``target`` keyword argument. You can use + ``target="foo.*"`` with a literal ``*`` to ask for a file called + ``foo.dll`` on Windows, ``foo.dylib`` on OS/X and ``foo.so`` + elsewhere. One reason for specifying an alternate ``target`` is to + include characters not usually allowed in Python module names, like + "``plugin-1.5.*``". + + For more complicated cases, you can call instead + ``ffibuilder.emit_c_code("foo.c")`` and compile the resulting ``foo.c`` + file using other means. CFFI's compilation logic is based on the + standard library ``distutils`` package, which is really developed + and tested for the purpose of making CPython extension modules; it + might not always be appropriate for making general DLLs. Also, just + getting the C code is what you need if you do not want to make a + stand-alone ``.so/.dll/.dylib`` file: this C file can be compiled + and statically linked as part of a larger application. + + +More reading +------------ + +If you're reading this page about embedding and you are not familiar +with CFFI already, here are a few pointers to what you could read +next: + +* For the ``@ffi.def_extern()`` functions, integer C types are passed + simply as Python integers; and simple pointers-to-struct and basic + arrays are all straightforward enough. However, sooner or later you + will need to read about this topic in more details here__. + +* ``@ffi.def_extern()``: see `documentation here,`__ notably on what + happens if the Python function raises an exception. + +* To create Python objects attached to C data, one common solution is + to use ``ffi.new_handle()``. See documentation here__. + +* In embedding mode, the major direction is C code that calls Python + functions. This is the opposite of the regular extending mode of + CFFI, in which the major direction is Python code calling C. That's + why the page `Using the ffi/lib objects`_ talks first about the + latter, and why the direction "C code that calls Python" is + generally referred to as "callbacks" in that page. If you also + need to have your Python code call C code, read more about + `Embedding and Extending`_ below. + +* ``ffibuilder.embedding_api(source)``: follows the same syntax as + ``ffibuilder.cdef()``, `documented here.`__ You can use the "``...``" + syntax as well, although in practice it may be less useful than it + is for ``cdef()``. On the other hand, it is expected that often the + C sources that you need to give to ``ffibuilder.embedding_api()`` would be + exactly the same as the content of some ``.h`` file that you want to + give to users of your DLL. That's why the example above does this:: + + with open('foo.h') as f: + ffibuilder.embedding_api(f.read()) + + Note that a drawback of this approach is that ``ffibuilder.embedding_api()`` + doesn't support ``#ifdef`` directives. You may have to use a more + convoluted expression like:: + + with open('foo.h') as f: + lines = [line for line in f if not line.startswith('#')] + ffibuilder.embedding_api(''.join(lines)) + + As in the example above, you can also use the same ``foo.h`` from + ``ffibuilder.set_source()``:: + + ffibuilder.set_source('module_name', r''' + #include "foo.h" + ''') + + +.. __: using.html#working +.. __: using.html#def-extern +.. __: ref.html#ffi-new-handle +.. __: cdef.html#cdef + +.. _`Using the ffi/lib objects`: using.html + + +Troubleshooting +--------------- + +* The error message + + cffi extension module 'c_module_name' has unknown version 0x2701 + + means that the running Python interpreter located a CFFI version older + than 1.5. CFFI 1.5 or newer must be installed in the running Python. + +* On PyPy, the error message + + debug: pypy_setup_home: directories 'lib-python' and 'lib_pypy' not + found in pypy's shared library location or in any parent directory + + means that the ``libpypy-c.so`` file was found, but the standard library + was not found from this location. This occurs at least on some Linux + distributions, because they put ``libpypy-c.so`` inside ``/usr/lib/``, + instead of the way we recommend, which is: keep that file inside + ``/opt/pypy/bin/`` and put a symlink to there from ``/usr/lib/``. + The quickest fix is to do that change manually. + + +Issues about using the .so +-------------------------- + +This paragraph describes issues that are not necessarily specific to +CFFI. It assumes that you have obtained the ``.so/.dylib/.dll`` file as +described above, but that you have troubles using it. (In summary: it +is a mess. This is my own experience, slowly built by using Google and +by listening to reports from various platforms. Please report any +inaccuracies in this paragraph or better ways to do things.) + +* The file produced by CFFI should follow this naming pattern: + ``libmy_plugin.so`` on Linux, ``libmy_plugin.dylib`` on Mac, or + ``my_plugin.dll`` on Windows (no ``lib`` prefix on Windows). + +* First note that this file does not contain the Python interpreter + nor the standard library of Python. You still need it to be + somewhere. There are ways to compact it to a smaller number of files, + but this is outside the scope of CFFI (please report if you used some + of these ways successfully so that I can add some links here). + +* In what we'll call the "main program", the ``.so`` can be either + used dynamically (e.g. by calling ``dlopen()`` or ``LoadLibrary()`` + inside the main program), or at compile-time (e.g. by compiling it + with ``gcc -lmy_plugin``). The former case is always used if you're + building a plugin for a program, and the program itself doesn't need + to be recompiled. The latter case is for making a CFFI library that + is more tightly integrated inside the main program. + +* In the case of compile-time usage: you can add the gcc + option ``-Lsome/path/`` before ``-lmy_plugin`` to describe where the + ``libmy_plugin.so`` is. On some platforms, notably Linux, ``gcc`` + will complain if it can find ``libmy_plugin.so`` but not + ``libpython27.so`` or ``libpypy-c.so``. To fix it, you need to call + ``LD_LIBRARY_PATH=/some/path/to/libpypy gcc``. + +* When actually executing the main program, it needs to find the + ``libmy_plugin.so`` but also ``libpython27.so`` or ``libpypy-c.so``. + For PyPy, unpack a PyPy distribution and you get a full directory + structure with ``libpypy-c.so`` inside a ``bin`` subdirectory, or on + Windows ``pypy-c.dll`` inside the top directory; you must not move + this file around, but just point to it. One way to point to it is by + running the main program with some environment variable: + ``LD_LIBRARY_PATH=/some/path/to/libpypy`` on Linux, + ``DYLD_LIBRARY_PATH=/some/path/to/libpypy`` on OS/X. + +* You can avoid the ``LD_LIBRARY_PATH`` issue if you compile + ``libmy_plugin.so`` with the path hard-coded inside in the first + place. On Linux, this is done by ``gcc -Wl,-rpath=/some/path``. You + would put this option in ``ffibuilder.set_source("my_plugin", ..., + extra_link_args=['-Wl,-rpath=/some/path/to/libpypy'])``. The path can + start with ``$ORIGIN`` to mean "the directory where + ``libmy_plugin.so`` is". You can then specify a path relative to that + place, like ``extra_link_args=['-Wl,-rpath=$ORIGIN/../venv/bin']``. + Use ``ldd libmy_plugin.so`` to look at what path is currently compiled + in after the expansion of ``$ORIGIN``.) + + After this, you don't need ``LD_LIBRARY_PATH`` any more to locate + ``libpython27.so`` or ``libpypy-c.so`` at runtime. In theory it + should also cover the call to ``gcc`` for the main program. I wasn't + able to make ``gcc`` happy without ``LD_LIBRARY_PATH`` on Linux if + the rpath starts with ``$ORIGIN``, though. + +* The same rpath trick might be used to let the main program find + ``libmy_plugin.so`` in the first place without ``LD_LIBRARY_PATH``. + (This doesn't apply if the main program uses ``dlopen()`` to load it + as a dynamic plugin.) You'd make the main program with ``gcc + -Wl,-rpath=/path/to/libmyplugin``, possibly with ``$ORIGIN``. The + ``$`` in ``$ORIGIN`` causes various shell problems on its own: if + using a common shell you need to say ``gcc + -Wl,-rpath=\$ORIGIN``. From a Makefile, you need to say + something like ``gcc -Wl,-rpath=\$$ORIGIN``. + +* On some Linux distributions, notably Debian, the ``.so`` files of + CPython C extension modules may be compiled without saying that they + depend on ``libpythonX.Y.so``. This makes such Python systems + unsuitable for embedding if the embedder uses ``dlopen(..., + RTLD_LOCAL)``. You get an ``undefined symbol`` error. See + `issue #264`__. A workaround is to first call + ``dlopen("libpythonX.Y.so", RTLD_LAZY|RTLD_GLOBAL)``, which will + force ``libpythonX.Y.so`` to be loaded first. + +.. __: https://bitbucket.org/cffi/cffi/issues/264/ + + +Using multiple CFFI-made DLLs +----------------------------- + +Multiple CFFI-made DLLs can be used by the same process. + +Note that all CFFI-made DLLs in a process share a single Python +interpreter. The effect is the same as the one you get by trying to +build a large Python application by assembling a lot of unrelated +packages. Some of these might be libraries that monkey-patch some +functions from the standard library, for example, which might be +unexpected from other parts. + + +Multithreading +-------------- + +Multithreading should work transparently, based on Python's standard +Global Interpreter Lock. + +If two threads both try to call a C function when Python is not yet +initialized, then locking occurs. One thread proceeds with +initialization and blocks the other thread. The other thread will be +allowed to continue only when the execution of the initialization-time +Python code is done. + +If the two threads call two *different* CFFI-made DLLs, the Python +initialization itself will still be serialized, but the two pieces of +initialization-time Python code will not. The idea is that there is a +priori no reason for one DLL to wait for initialization of the other +DLL to be complete. + +After initialization, Python's standard Global Interpreter Lock kicks +in. The end result is that when one CPU progresses on executing +Python code, no other CPU can progress on executing more Python code +from another thread of the same process. At regular intervals, the +lock switches to a different thread, so that no single thread should +appear to block indefinitely. + + +Testing +------- + +For testing purposes, a CFFI-made DLL can be imported in a running +Python interpreter instead of being loaded like a C shared library. + +You might have some issues with the file name: for example, on +Windows, Python expects the file to be called ``c_module_name.pyd``, +but the CFFI-made DLL is called ``target.dll`` instead. The base name +``target`` is the one specified in ``ffibuilder.compile()``, and on Windows +the extension is ``.dll`` instead of ``.pyd``. You have to rename or +copy the file, or on POSIX use a symlink. + +The module then works like a regular CFFI extension module. It is +imported with "``from c_module_name import ffi, lib``" and exposes on +the ``lib`` object all C functions. You can test it by calling these +C functions. The initialization-time Python code frozen inside the +DLL is executed the first time such a call is done. + + +Embedding and Extending +----------------------- + +The embedding mode is not incompatible with the non-embedding mode of +CFFI. + +You can use *both* ``ffibuilder.embedding_api()`` and +``ffibuilder.cdef()`` in the +same build script. You put in the former the declarations you want to +be exported by the DLL; you put in the latter only the C functions and +types that you want to share between C and Python, but not export from +the DLL. + +As an example of that, consider the case where you would like to have +a DLL-exported C function written in C directly, maybe to handle some +cases before calling Python functions. To do that, you must *not* put +the function's signature in ``ffibuilder.embedding_api()``. (Note that this +requires more hacks if you use ``ffibuilder.embedding_api(f.read())``.) +You must only write the custom function definition in +``ffibuilder.set_source()``, and prefix it with the macro CFFI_DLLEXPORT: + +.. code-block:: c + + CFFI_DLLEXPORT int myfunc(int a, int b) + { + /* implementation here */ + } + +This function can, if it wants, invoke Python functions using the +general mechanism of "callbacks"---called this way because it is a +call from C to Python, although in this case it is not calling +anything back: + +.. code-block:: python + + ffibuilder.cdef(""" + extern "Python" int mycb(int); + """) + + ffibuilder.set_source("my_plugin", r""" + + static int mycb(int); /* the callback: forward declaration, to make + it accessible from the C code that follows */ + + CFFI_DLLEXPORT int myfunc(int a, int b) + { + int product = a * b; /* some custom C code */ + return mycb(product); + } + """) + +and then the Python initialization code needs to contain the lines: + +.. code-block:: python + + @ffi.def_extern() + def mycb(x): + print "hi, I'm called with x =", x + return x * 10 + +This ``@ffi.def_extern`` is attaching a Python function to the C +callback ``mycb()``, which in this case is not exported from the DLL. +Nevertheless, the automatic initialization of Python occurs when +``mycb()`` is called, if it happens to be the first function called +from C. More precisely, it does not happen when ``myfunc()`` is +called: this is just a C function, with no extra code magically +inserted around it. It only happens when ``myfunc()`` calls +``mycb()``. + +As the above explanation hints, this is how ``ffibuilder.embedding_api()`` +actually implements function calls that directly invoke Python code; +here, we have merely decomposed it explicitly, in order to add some +custom C code in the middle. + +In case you need to force, from C code, Python to be initialized +before the first ``@ffi.def_extern()`` is called, you can do so by +calling the C function ``cffi_start_python()`` with no argument. It +returns an integer, 0 or -1, to tell if the initialization succeeded +or not. Currently there is no way to prevent a failing initialization +from also dumping a traceback and more information to stderr. |