summaryrefslogtreecommitdiff
path: root/doc/source/embedding.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/source/embedding.rst')
-rw-r--r--doc/source/embedding.rst524
1 files changed, 524 insertions, 0 deletions
diff --git a/doc/source/embedding.rst b/doc/source/embedding.rst
new file mode 100644
index 0000000..181249c
--- /dev/null
+++ b/doc/source/embedding.rst
@@ -0,0 +1,524 @@
+================================
+Using CFFI for embedding
+================================
+
+.. contents::
+
+You can use CFFI to generate C code which exports the API of your choice
+to any C application that wants to link with this C code. This API,
+which you define yourself, ends up as the API of a ``.so/.dll/.dylib``
+library---or you can statically link it within a larger application.
+
+Possible use cases:
+
+* Exposing a library written in Python directly to C/C++ programs.
+
+* Using Python to make a "plug-in" for an existing C/C++ program that is
+ already written to load them.
+
+* Using Python to implement part of a larger C/C++ application (with
+ static linking).
+
+* Writing a small C/C++ wrapper around Python, hiding the fact that the
+ application is actually written in Python (to make a custom
+ command-line interface; for distribution purposes; or simply to make
+ it a bit harder to reverse-engineer the application).
+
+The general idea is as follows:
+
+* You write and execute a Python script, which produces a ``.c`` file
+ with the API of your choice (and optionally compile it into a
+ ``.so/.dll/.dylib``). The script also gives some Python code to be
+ "frozen" inside the ``.so``.
+
+* At runtime, the C application loads this ``.so/.dll/.dylib`` (or is
+ statically linked with the ``.c`` source) without having to know that
+ it was produced from Python and CFFI.
+
+* The first time a C function is called, Python is initialized and
+ the frozen Python code is executed.
+
+* The frozen Python code defines more Python functions that implement the
+ C functions of your API, which are then used for all subsequent C
+ function calls.
+
+One of the goals of this approach is to be entirely independent from
+the CPython C API: no ``Py_Initialize()`` nor ``PyRun_SimpleString()``
+nor even ``PyObject``. It works identically on CPython and PyPy.
+
+This is entirely *new in version 1.5.* (PyPy contains CFFI 1.5 since
+release 5.0.)
+
+
+Usage
+-----
+
+.. __: overview.html#embedding
+
+See the `paragraph in the overview page`__ for a quick introduction.
+In this section, we explain every step in more details. We will use
+here this slightly expanded example:
+
+.. code-block:: c
+
+ /* file plugin.h */
+ typedef struct { int x, y; } point_t;
+ extern int do_stuff(point_t *);
+
+.. code-block:: c
+
+ /* file plugin.h, Windows-friendly version */
+ typedef struct { int x, y; } point_t;
+
+ /* When including this file from ffibuilder.set_source(), the
+ following macro is defined to '__declspec(dllexport)'. When
+ including this file directly from your C program, we define
+ it to 'extern __declspec(dllimport)' instead.
+
+ With non-MSVC compilers we simply define it to 'extern'.
+ (The 'extern' is needed for sharing global variables;
+ functions would be fine without it. The macros always
+ include 'extern': you must not repeat it when using the
+ macros later.)
+ */
+ #ifndef CFFI_DLLEXPORT
+ # if defined(_MSC_VER)
+ # define CFFI_DLLEXPORT extern __declspec(dllimport)
+ # else
+ # define CFFI_DLLEXPORT extern
+ # endif
+ #endif
+
+ CFFI_DLLEXPORT int do_stuff(point_t *);
+
+.. code-block:: python
+
+ # file plugin_build.py
+ import cffi
+ ffibuilder = cffi.FFI()
+
+ with open('plugin.h') as f:
+ # read plugin.h and pass it to embedding_api(), manually
+ # removing the '#' directives and the CFFI_DLLEXPORT
+ data = ''.join([line for line in f if not line.startswith('#')])
+ data = data.replace('CFFI_DLLEXPORT', '')
+ ffibuilder.embedding_api(data)
+
+ ffibuilder.set_source("my_plugin", r'''
+ #include "plugin.h"
+ ''')
+
+ ffibuilder.embedding_init_code("""
+ from my_plugin import ffi
+
+ @ffi.def_extern()
+ def do_stuff(p):
+ print("adding %d and %d" % (p.x, p.y))
+ return p.x + p.y
+ """)
+
+ ffibuilder.compile(target="plugin-1.5.*", verbose=True)
+ # or: ffibuilder.emit_c_code("my_plugin.c")
+
+Running the code above produces a *DLL*, i,e, a dynamically-loadable
+library. It is a file with the extension ``.dll`` on Windows,
+``.dylib`` on Mac OS/X, or ``.so`` on other platforms. As usual, it
+is produced by generating some intermediate ``.c`` code and then
+calling the regular platform-specific C compiler. See below__ for
+some pointers to C-level issues with using the produced library.
+
+.. __: `Issues about using the .so`_
+
+Here are some details about the methods used above:
+
+* **ffibuilder.embedding_api(source):** parses the given C source, which
+ declares functions that you want to be exported by the DLL. It can
+ also declare types, constants and global variables that are part of
+ the C-level API of your DLL.
+
+ The functions that are found in ``source`` will be automatically
+ defined in the ``.c`` file: they will contain code that initializes
+ the Python interpreter the first time any of them is called,
+ followed by code to call the attached Python function (with
+ ``@ffi.def_extern()``, see next point).
+
+ The global variables, on the other hand, are not automatically
+ produced. You have to write their definition explicitly in
+ ``ffibuilder.set_source()``, as regular C code (see the point after next).
+
+* **ffibuilder.embedding_init_code(python_code):** this gives
+ initialization-time Python source code. This code is copied
+ ("frozen") inside the DLL. At runtime, the code is executed when
+ the DLL is first initialized, just after Python itself is
+ initialized. This newly initialized Python interpreter has got an
+ extra "built-in" module that can be loaded magically without
+ accessing any files, with a line like "``from my_plugin import ffi,
+ lib``". The name ``my_plugin`` comes from the first argument to
+ ``ffibuilder.set_source()``. This module represents "the caller's C world"
+ from the point of view of Python.
+
+ The initialization-time Python code can import other modules or
+ packages as usual. You may have typical Python issues like needing
+ to set up ``sys.path`` somehow manually first.
+
+ For every function declared within ``ffibuilder.embedding_api()``, the
+ initialization-time Python code or one of the modules it imports
+ should use the decorator ``@ffi.def_extern()`` to attach a
+ corresponding Python function to it.
+
+ If the initialization-time Python code fails with an exception, then
+ you get a traceback printed to stderr, along with more information
+ to help you identify problems like wrong ``sys.path``. If some
+ function remains unattached at the time where the C code tries to
+ call it, an error message is also printed to stderr and the function
+ returns zero/null.
+
+ Note that the CFFI module never calls ``exit()``, but CPython itself
+ contains code that calls ``exit()``, for example if importing
+ ``site`` fails. This may be worked around in the future.
+
+* **ffibuilder.set_source(c_module_name, c_code):** set the name of the
+ module from Python's point of view. It also gives more C code which
+ will be included in the generated C code. In trivial examples it
+ can be an empty string. It is where you would ``#include`` some
+ other files, define global variables, and so on. The macro
+ ``CFFI_DLLEXPORT`` is available to this C code: it expands to the
+ platform-specific way of saying "the following declaration should be
+ exported from the DLL". For example, you would put "``extern int
+ my_glob;``" in ``ffibuilder.embedding_api()`` and "``CFFI_DLLEXPORT int
+ my_glob = 42;``" in ``ffibuilder.set_source()``.
+
+ Currently, any *type* declared in ``ffibuilder.embedding_api()`` must also
+ be present in the ``c_code``. This is automatic if this code
+ contains a line like ``#include "plugin.h"`` in the example above.
+
+* **ffibuilder.compile([target=...] [, verbose=True]):** make the C code and
+ compile it. By default, it produces a file called
+ ``c_module_name.dll``, ``c_module_name.dylib`` or
+ ``c_module_name.so``, but the default can be changed with the
+ optional ``target`` keyword argument. You can use
+ ``target="foo.*"`` with a literal ``*`` to ask for a file called
+ ``foo.dll`` on Windows, ``foo.dylib`` on OS/X and ``foo.so``
+ elsewhere. One reason for specifying an alternate ``target`` is to
+ include characters not usually allowed in Python module names, like
+ "``plugin-1.5.*``".
+
+ For more complicated cases, you can call instead
+ ``ffibuilder.emit_c_code("foo.c")`` and compile the resulting ``foo.c``
+ file using other means. CFFI's compilation logic is based on the
+ standard library ``distutils`` package, which is really developed
+ and tested for the purpose of making CPython extension modules; it
+ might not always be appropriate for making general DLLs. Also, just
+ getting the C code is what you need if you do not want to make a
+ stand-alone ``.so/.dll/.dylib`` file: this C file can be compiled
+ and statically linked as part of a larger application.
+
+
+More reading
+------------
+
+If you're reading this page about embedding and you are not familiar
+with CFFI already, here are a few pointers to what you could read
+next:
+
+* For the ``@ffi.def_extern()`` functions, integer C types are passed
+ simply as Python integers; and simple pointers-to-struct and basic
+ arrays are all straightforward enough. However, sooner or later you
+ will need to read about this topic in more details here__.
+
+* ``@ffi.def_extern()``: see `documentation here,`__ notably on what
+ happens if the Python function raises an exception.
+
+* To create Python objects attached to C data, one common solution is
+ to use ``ffi.new_handle()``. See documentation here__.
+
+* In embedding mode, the major direction is C code that calls Python
+ functions. This is the opposite of the regular extending mode of
+ CFFI, in which the major direction is Python code calling C. That's
+ why the page `Using the ffi/lib objects`_ talks first about the
+ latter, and why the direction "C code that calls Python" is
+ generally referred to as "callbacks" in that page. If you also
+ need to have your Python code call C code, read more about
+ `Embedding and Extending`_ below.
+
+* ``ffibuilder.embedding_api(source)``: follows the same syntax as
+ ``ffibuilder.cdef()``, `documented here.`__ You can use the "``...``"
+ syntax as well, although in practice it may be less useful than it
+ is for ``cdef()``. On the other hand, it is expected that often the
+ C sources that you need to give to ``ffibuilder.embedding_api()`` would be
+ exactly the same as the content of some ``.h`` file that you want to
+ give to users of your DLL. That's why the example above does this::
+
+ with open('foo.h') as f:
+ ffibuilder.embedding_api(f.read())
+
+ Note that a drawback of this approach is that ``ffibuilder.embedding_api()``
+ doesn't support ``#ifdef`` directives. You may have to use a more
+ convoluted expression like::
+
+ with open('foo.h') as f:
+ lines = [line for line in f if not line.startswith('#')]
+ ffibuilder.embedding_api(''.join(lines))
+
+ As in the example above, you can also use the same ``foo.h`` from
+ ``ffibuilder.set_source()``::
+
+ ffibuilder.set_source('module_name', r'''
+ #include "foo.h"
+ ''')
+
+
+.. __: using.html#working
+.. __: using.html#def-extern
+.. __: ref.html#ffi-new-handle
+.. __: cdef.html#cdef
+
+.. _`Using the ffi/lib objects`: using.html
+
+
+Troubleshooting
+---------------
+
+* The error message
+
+ cffi extension module 'c_module_name' has unknown version 0x2701
+
+ means that the running Python interpreter located a CFFI version older
+ than 1.5. CFFI 1.5 or newer must be installed in the running Python.
+
+* On PyPy, the error message
+
+ debug: pypy_setup_home: directories 'lib-python' and 'lib_pypy' not
+ found in pypy's shared library location or in any parent directory
+
+ means that the ``libpypy-c.so`` file was found, but the standard library
+ was not found from this location. This occurs at least on some Linux
+ distributions, because they put ``libpypy-c.so`` inside ``/usr/lib/``,
+ instead of the way we recommend, which is: keep that file inside
+ ``/opt/pypy/bin/`` and put a symlink to there from ``/usr/lib/``.
+ The quickest fix is to do that change manually.
+
+
+Issues about using the .so
+--------------------------
+
+This paragraph describes issues that are not necessarily specific to
+CFFI. It assumes that you have obtained the ``.so/.dylib/.dll`` file as
+described above, but that you have troubles using it. (In summary: it
+is a mess. This is my own experience, slowly built by using Google and
+by listening to reports from various platforms. Please report any
+inaccuracies in this paragraph or better ways to do things.)
+
+* The file produced by CFFI should follow this naming pattern:
+ ``libmy_plugin.so`` on Linux, ``libmy_plugin.dylib`` on Mac, or
+ ``my_plugin.dll`` on Windows (no ``lib`` prefix on Windows).
+
+* First note that this file does not contain the Python interpreter
+ nor the standard library of Python. You still need it to be
+ somewhere. There are ways to compact it to a smaller number of files,
+ but this is outside the scope of CFFI (please report if you used some
+ of these ways successfully so that I can add some links here).
+
+* In what we'll call the "main program", the ``.so`` can be either
+ used dynamically (e.g. by calling ``dlopen()`` or ``LoadLibrary()``
+ inside the main program), or at compile-time (e.g. by compiling it
+ with ``gcc -lmy_plugin``). The former case is always used if you're
+ building a plugin for a program, and the program itself doesn't need
+ to be recompiled. The latter case is for making a CFFI library that
+ is more tightly integrated inside the main program.
+
+* In the case of compile-time usage: you can add the gcc
+ option ``-Lsome/path/`` before ``-lmy_plugin`` to describe where the
+ ``libmy_plugin.so`` is. On some platforms, notably Linux, ``gcc``
+ will complain if it can find ``libmy_plugin.so`` but not
+ ``libpython27.so`` or ``libpypy-c.so``. To fix it, you need to call
+ ``LD_LIBRARY_PATH=/some/path/to/libpypy gcc``.
+
+* When actually executing the main program, it needs to find the
+ ``libmy_plugin.so`` but also ``libpython27.so`` or ``libpypy-c.so``.
+ For PyPy, unpack a PyPy distribution and you get a full directory
+ structure with ``libpypy-c.so`` inside a ``bin`` subdirectory, or on
+ Windows ``pypy-c.dll`` inside the top directory; you must not move
+ this file around, but just point to it. One way to point to it is by
+ running the main program with some environment variable:
+ ``LD_LIBRARY_PATH=/some/path/to/libpypy`` on Linux,
+ ``DYLD_LIBRARY_PATH=/some/path/to/libpypy`` on OS/X.
+
+* You can avoid the ``LD_LIBRARY_PATH`` issue if you compile
+ ``libmy_plugin.so`` with the path hard-coded inside in the first
+ place. On Linux, this is done by ``gcc -Wl,-rpath=/some/path``. You
+ would put this option in ``ffibuilder.set_source("my_plugin", ...,
+ extra_link_args=['-Wl,-rpath=/some/path/to/libpypy'])``. The path can
+ start with ``$ORIGIN`` to mean "the directory where
+ ``libmy_plugin.so`` is". You can then specify a path relative to that
+ place, like ``extra_link_args=['-Wl,-rpath=$ORIGIN/../venv/bin']``.
+ Use ``ldd libmy_plugin.so`` to look at what path is currently compiled
+ in after the expansion of ``$ORIGIN``.)
+
+ After this, you don't need ``LD_LIBRARY_PATH`` any more to locate
+ ``libpython27.so`` or ``libpypy-c.so`` at runtime. In theory it
+ should also cover the call to ``gcc`` for the main program. I wasn't
+ able to make ``gcc`` happy without ``LD_LIBRARY_PATH`` on Linux if
+ the rpath starts with ``$ORIGIN``, though.
+
+* The same rpath trick might be used to let the main program find
+ ``libmy_plugin.so`` in the first place without ``LD_LIBRARY_PATH``.
+ (This doesn't apply if the main program uses ``dlopen()`` to load it
+ as a dynamic plugin.) You'd make the main program with ``gcc
+ -Wl,-rpath=/path/to/libmyplugin``, possibly with ``$ORIGIN``. The
+ ``$`` in ``$ORIGIN`` causes various shell problems on its own: if
+ using a common shell you need to say ``gcc
+ -Wl,-rpath=\$ORIGIN``. From a Makefile, you need to say
+ something like ``gcc -Wl,-rpath=\$$ORIGIN``.
+
+* On some Linux distributions, notably Debian, the ``.so`` files of
+ CPython C extension modules may be compiled without saying that they
+ depend on ``libpythonX.Y.so``. This makes such Python systems
+ unsuitable for embedding if the embedder uses ``dlopen(...,
+ RTLD_LOCAL)``. You get an ``undefined symbol`` error. See
+ `issue #264`__. A workaround is to first call
+ ``dlopen("libpythonX.Y.so", RTLD_LAZY|RTLD_GLOBAL)``, which will
+ force ``libpythonX.Y.so`` to be loaded first.
+
+.. __: https://bitbucket.org/cffi/cffi/issues/264/
+
+
+Using multiple CFFI-made DLLs
+-----------------------------
+
+Multiple CFFI-made DLLs can be used by the same process.
+
+Note that all CFFI-made DLLs in a process share a single Python
+interpreter. The effect is the same as the one you get by trying to
+build a large Python application by assembling a lot of unrelated
+packages. Some of these might be libraries that monkey-patch some
+functions from the standard library, for example, which might be
+unexpected from other parts.
+
+
+Multithreading
+--------------
+
+Multithreading should work transparently, based on Python's standard
+Global Interpreter Lock.
+
+If two threads both try to call a C function when Python is not yet
+initialized, then locking occurs. One thread proceeds with
+initialization and blocks the other thread. The other thread will be
+allowed to continue only when the execution of the initialization-time
+Python code is done.
+
+If the two threads call two *different* CFFI-made DLLs, the Python
+initialization itself will still be serialized, but the two pieces of
+initialization-time Python code will not. The idea is that there is a
+priori no reason for one DLL to wait for initialization of the other
+DLL to be complete.
+
+After initialization, Python's standard Global Interpreter Lock kicks
+in. The end result is that when one CPU progresses on executing
+Python code, no other CPU can progress on executing more Python code
+from another thread of the same process. At regular intervals, the
+lock switches to a different thread, so that no single thread should
+appear to block indefinitely.
+
+
+Testing
+-------
+
+For testing purposes, a CFFI-made DLL can be imported in a running
+Python interpreter instead of being loaded like a C shared library.
+
+You might have some issues with the file name: for example, on
+Windows, Python expects the file to be called ``c_module_name.pyd``,
+but the CFFI-made DLL is called ``target.dll`` instead. The base name
+``target`` is the one specified in ``ffibuilder.compile()``, and on Windows
+the extension is ``.dll`` instead of ``.pyd``. You have to rename or
+copy the file, or on POSIX use a symlink.
+
+The module then works like a regular CFFI extension module. It is
+imported with "``from c_module_name import ffi, lib``" and exposes on
+the ``lib`` object all C functions. You can test it by calling these
+C functions. The initialization-time Python code frozen inside the
+DLL is executed the first time such a call is done.
+
+
+Embedding and Extending
+-----------------------
+
+The embedding mode is not incompatible with the non-embedding mode of
+CFFI.
+
+You can use *both* ``ffibuilder.embedding_api()`` and
+``ffibuilder.cdef()`` in the
+same build script. You put in the former the declarations you want to
+be exported by the DLL; you put in the latter only the C functions and
+types that you want to share between C and Python, but not export from
+the DLL.
+
+As an example of that, consider the case where you would like to have
+a DLL-exported C function written in C directly, maybe to handle some
+cases before calling Python functions. To do that, you must *not* put
+the function's signature in ``ffibuilder.embedding_api()``. (Note that this
+requires more hacks if you use ``ffibuilder.embedding_api(f.read())``.)
+You must only write the custom function definition in
+``ffibuilder.set_source()``, and prefix it with the macro CFFI_DLLEXPORT:
+
+.. code-block:: c
+
+ CFFI_DLLEXPORT int myfunc(int a, int b)
+ {
+ /* implementation here */
+ }
+
+This function can, if it wants, invoke Python functions using the
+general mechanism of "callbacks"---called this way because it is a
+call from C to Python, although in this case it is not calling
+anything back:
+
+.. code-block:: python
+
+ ffibuilder.cdef("""
+ extern "Python" int mycb(int);
+ """)
+
+ ffibuilder.set_source("my_plugin", r"""
+
+ static int mycb(int); /* the callback: forward declaration, to make
+ it accessible from the C code that follows */
+
+ CFFI_DLLEXPORT int myfunc(int a, int b)
+ {
+ int product = a * b; /* some custom C code */
+ return mycb(product);
+ }
+ """)
+
+and then the Python initialization code needs to contain the lines:
+
+.. code-block:: python
+
+ @ffi.def_extern()
+ def mycb(x):
+ print "hi, I'm called with x =", x
+ return x * 10
+
+This ``@ffi.def_extern`` is attaching a Python function to the C
+callback ``mycb()``, which in this case is not exported from the DLL.
+Nevertheless, the automatic initialization of Python occurs when
+``mycb()`` is called, if it happens to be the first function called
+from C. More precisely, it does not happen when ``myfunc()`` is
+called: this is just a C function, with no extra code magically
+inserted around it. It only happens when ``myfunc()`` calls
+``mycb()``.
+
+As the above explanation hints, this is how ``ffibuilder.embedding_api()``
+actually implements function calls that directly invoke Python code;
+here, we have merely decomposed it explicitly, in order to add some
+custom C code in the middle.
+
+In case you need to force, from C code, Python to be initialized
+before the first ``@ffi.def_extern()`` is called, you can do so by
+calling the C function ``cffi_start_python()`` with no argument. It
+returns an integer, 0 or -1, to tell if the initialization succeeded
+or not. Currently there is no way to prevent a failing initialization
+from also dumping a traceback and more information to stderr.