Understanding Cython

Exclusive offer: get 50% off this eBook here
Learning Cython Programming

Learning Cython Programming — Save 50%

Expand your existing legacy applications in C using Python with this book and ebook

£11.99    £6.00
by Philip Herron | October 2013 | .NET Open Source

In this article by Philip Herron, author of the Learning Cython Programming, will start to get serious with Cython and will discuss how to describe C declarations with respect to Cython along with calling conventions and type conversion.

If you were to create an API for Python, you should write it using Cython to create a more type-safe Python API. Or, you could take the C types from Cython to implement the same algorithms in your Python code, and they will be faster because you're specifying the types and you avoid a lot of the type conversion required.

Consider you are implementing a fresh project in C. There are a few issues we always come across in starting fresh; for example, choosing the logging or configuration system we will use or implement.

With Cython, we can reuse the Python logging system as well as the ConfigParser standard libraries from Python in our C code to get a head start. If this doesn't prove to be the correct solution, we can chop and change easily. We can even extend and get Python to handle all usage. Since the Python API is very powerful, we might as well make Python do as much as it can to get us off the ground. Another question is do we want Python be our "driver" (main entry function) or do we want to handle this from our C code?

Cython cdef

In the next two examples, I will demonstrate how we can reuse the Python logging and Python ConfigParser modules directly from C code. But there are a few formalities to get over first, namely the Python initialization API and the link load model for fully embedded Python applications for using the shared library method.

It's very simple to embed Python within a C/C++ application; you will require the following boilerplate:

#include <Python.h>
int main (int argc, char ** argv)
{
Py_SetProgramName (argv [0]);
Py_Initialize ();

/* Do all your stuff in side here...*/
Py_Finalize ();
return 0;
}

Make sure you always put the Python.h header at the very beginning of each C file, because Python contains a lot of headers defined for system headers to turn things on and off to make things behave correctly on your system.

Later, I will introduce some important concepts about the GIL that you should know and the relevant Python API code you will need to use from time to time. But for now, these few calls will be enough for you to get off the ground.

Linking models

Linking models are extremely important when considering how we can extend or embed things in native applications. There are two main linking models for Cython: fully embedded Python and code, which looks like the following figure:

This demonstrates a fully embedded Python application where the Python runtime is linked into the final binary. This means we already have the Python runtime, whereas before we had to run the Python interpreter to call into our Cython module. There is also a Python shared object module as shown in the following figure:

We have now fully modularized Python. This would be a more Pythonic approach to Cython, and if your code base is mostly Python, this is the approach you should take if you simply want to have a native module to call into some native code, as this lends your code to be more dynamic and reusable.

The public keyword

Moving on from linking models, we should next look at the public keyword, which allows Cython to generate a C/C++ header file that we can include with the prototypes to call directly into Python code from C.

The main caveat if you're going to call Python public declarations directly from C is if your link model is fully embedded and linked against libpython.so; you need to use the boilerplate code as shown in the previous section. And before calling anything with the function, you need to initialize the Python module example if you have a cythonfile.pyx file and compile it with public declarations such as the following:

cdef public void cythonFunction ():
print "inside cython function!!!"

You will not only get a cythonfile.c file but also cythonfile.h; this declares a function called extern void initcythonfile (void). So, before calling anything to do with the Cython code, use the following:

/* Boiler plate init Python */
Py_SetProgramName (argv [0]);
Py_Initialize ();
/* Init our config module into Python memory */

initpublicTest ();
cythonFunction ();

/* cleanup python before exit ... */
Py_Finalize ();

Calling initcythonfile can be considered as the following in Python:

import cythonfile

Just like the previous examples, this only affects you if you're generating a fully embedded Python binary.

Logging into Python

A good example of Cython's abilities in my opinion is reusing the Python logging module directly from C. So, for example, we want a few macros we can rely on, such as info (…) that can handle VA_ARGS and feels as if we are calling a simple printf method.

I think that after this example, you should start to see how things might work when mixing C and Python now that the cdef and public keywords start to bring things to life:

import logging
cdef public void initLogging (char * logfile):
logging.basicConfig (filename = logfile,
level = logging.DEBUG,
format = '%(levelname)s %(asctime)s:
%(message)s',
datefmt = '%m/%d/%Y %I:%M:%S')
cdef public void pyinfo (char * message):
logging.info (message)
cdef public void pydebug (char * message):
logging.debug (message)
cdef public void pyerror (char * message):
logging.error (message)

This could serve as a simple wrapper for calling directly into the Python logger, but we can make this even more awesome in our C code with C99 __VA_ARGS__ and an attribute that is similar to GCC printf. This will make it look and work just like any function that is similar to printf. We can define some headers to wrap our calls to this in C as follows:

#ifndef __MAIN_H__
#define __MAIN_H__
#include <Python.h>
#include <stdio.h>
#include <stdarg.h>
#define printflike \
__attribute__ ((format (printf, 3, 4)))

extern void printflike cinfo (const char *, unsigned, const char *,
...);
extern void printflike cdebug (const char *, unsigned, const char *,
...);
extern void printflike cerror (const char *, unsigned, const char *,
...);
#define info(...) \
cinfo (__FILE__, __LINE__, __VA_ARGS__)
#define error(...) \
cerror (__FILE__, __LINE__, __VA_ARGS__)
#define debug(...) \
cdebug (__FILE__, __LINE__, __VA_ARGS__)
#include "logger.h" // remember to import our cython public's
#endif //__MAIN_H__

Now we have these macros calling cinfo and the rest, and we can see the file and line number where we call these logging functions:

void cdebug (const char * file, unsigned line,
const char * fmt, ...)
{
char buffer [256];
va_list args;

va_start (args, fmt);
vsprintf (buffer, fmt, args);
va_end (args);
char buf [512];
snprintf (buf, sizeof (buf), "%s-%i -> %s",
file, line, buffer);
pydebug (buf);

}

On calling debug ("debug message"), we see the following output:

Philips-MacBook:cpy-logging redbrain$ ./example log
Philips-MacBook:cpy-logging redbrain$ cat log
INFO 05/06/2013 12:28:24: main.c-62 -> info message
DEBUG 05/06/2013 12:28:24: main.c-63 -> debug message
ERROR 05/06/2013 12:28:24: main.c-64 -> error message

Also, you should note that we import and do everything we would do in Python as we would in here, so don't be afraid to make lists or classes and use these to help out. Remember if you had a Cython module with public declarations calling into the logging module, this integrates your applications as if it were one.

More importantly, you only need all of this boilerplate when you fully embed Python, not when you compile your module to a shared library.

Python ConfigParser

Another useful case is to make Python's ConfigParser accessible in some way from C; ideally, all we really want is to have a function to which we pass the path to a config file to receive a STATUS OK/FAIL message and a filled buffer of the configuration that we need:

from ConfigParser import SafeConfigParser, NoSectionError
cdef extern from "main.h":
struct config:
char * path
int number
cdef config myconfig

Here, we've Cythoned our struct and declared an instance on the stack for easier management:

cdef public config * parseConfig (char * cfg):
# initialize the global stack variable for our config...

myconfig.path = NULL
myconfig.number = 0
# buffers for assigning python types into C types
cdef char * path = NULL
cdef number = 0
parser = SafeConfigParser ()
try:
parser.readfp (open (cfg))
pynumber = int (parser.get ("example", "number"))
pypath = parser.get ("example", "path")
except NoSectionError:
print "No section named example"
return NULL
except IOError:
print "no such file ", cfg
return NULL
finally:
myconfig.number = pynumber
myconfig.path = pypath
return &myconfig

This is a fairly trivial piece of Cython code that will return NULL on error as well as the pointer to the struct containing the configuration:

Philips-MacBook:cpy-configparser redbrain$ ./example sample.cfg
cfg->path = some/path/to/something
cfg-number = 15

As you can see, we easily parsed a config file without using any C code. I always found figuring out how I was going to parse config files in C to be a nightmare. I usually ended up writing my own mini domain-specific language using Flex and Bison as a parser as well as my own middle-end, which is just too involved.

Learning Cython Programming Expand your existing legacy applications in C using Python with this book and ebook
Published: September 2013
eBook Price: £11.99
Book Price: £19.99
See more
Select your format and quantity:

Cython cdef syntax and usage reference

So far, we have explored how to set up Cython and how to run "Hello World" modules. Not only that, we have also seen how we can call our own C code from Python. Let's take a look at how we can interface Python into different C declarations such as structs, enums, and typedefs. We will use this to build up a cool project at the end of the article.

Although not that interesting or fun, this small section should serve as a reference for you later on when you're building your next awesome project.

Structs

#ifndef __MYCODE_H__
#define __MYCODE_H__
struct mystruct {
char * string;
int integer;
char ** string_array;
};

extern void printStruct (struct mystruct *);
#endif //__MYCODE_H__

Now we can use Cython to interface and initialize structs and even allocate/free memory. There are a few pointers to make a note of when doing this, so let's create the code. First we need to create the Cython declaration:

cdef extern from "mycode.h":
struct mystruct:
char * string
int integer
char ** string_array

void printStruct (mystruct *)
def testStruct ():
cdef mystruct s
cdef char *array [2]

s.string = "Hello World"
s.integer = 2
array [0] = "foo"
array [1] = "bar"
s.string_array = array
printStruct (&s)

Let's look at this line by line. First off, we see the cdef keyword; this tells Cython that this is an external C declaration and that the original C declarations can be included from mycode.h; the generated code from Cython can include this to squash all warnings about undeclared symbols. Anything that is within this cdef suite, Cython will treat as a cdef. The struct looks very similar to normal C structs—just be careful with your indentation. Also be sure, even in the cdef functions, that if you want explicit C types, you need to declare this with the cdef type identifier to make sure they will be of the correct type and not just PyObjects.

There are a few subtleties with the testStruct function. We declare our struct and array on the stack with cdef as well, as this allows us to declare variables. In Cython, we have the reference operator &; this works just as in C, so we have the struct on the stack and we can pass a pointer via the reference operator just like in C. But we don't have a →operator in Cython, so when trying to access the struct (even if it is on a pointer), we simply use the .operator. Cython understands this at compile time. We also have an extension in Cython to specify fixed length arrays as shown and assignment should look very familiar. A simple makefile for this system would be as follows:

all:
cython -2 -o mycodepy.c mycodepy.pyx
gcc -g -O2 -fpic -c mycodepy.c -o mycodepy.o `python-config
--cflags`
gcc -g -O2 -fpic -c mycode.c -o mycode.o
gcc -g -O2 -shared -o mycodepy.so mycode.o mycodepy.o
clean:
rm -f *.o *.so *~ mycodepy.c

And a simple printStruct function would be as follows:

#include <stdio.h>
#include "mycode.h"
void printStruct (struct mystruct * s)
{
printf (".string = %s\n", s->string);
printf (".integer = %i\n", s->integer);
printf (".string_array = \n");
int i;
for (i = 0; i < s->integer; ++i)
printf ("\t[%i] = %s\n", i, s->string_array [i]);
}

A simple run of this in the downloaded code is as follows:

redbrain@blue-sun:~/workspace/cython-book/chapter2/c-decl-reference$ make
cython -2 -o mycodepy.c mycodepy.pyx
gcc -g -O2 -fpic -c mycodepy.c -o mycodepy.o `python-config --cflags`
gcc -g -O2 -fpic -c mycode.c -o mycode.o
gcc -g -O2 -shared -o mycodepy.so mycode.o mycodepy.o
redbrain@blue-sun:~/workspace/cython-book/chapter2/c-decl-reference$
python
Python 2.7.3 (default, Sep 26 2012, 21:51:14)
[GCC 4.7.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from mycodepy import testStruct
>>> testStruct ()
.string = Hello World
.integer = 2
.string_array =
[0] = foo
[1] = bar

This simply demonstrates that Cython can work properly with C structs—it initialized the C struct and assigned it data correctly, as you would expect if it was from C.

Enums

Interfacing with C enums is simple. If you have the following enum in C:

enum cardsuit {
CLUBS,
DIAMONDS,
HEARTS,
SPADES
};

This can be expressed as the following Cython declaration:

cdef enum cardsuit:
CLUBS, DIAMONDS, HEARTS, SPADES

Then, use the following as the cdef declaration within our code:

cdef cardsuite card = CLUBS

Typedef and function pointers

Typedefs are just how you would expect them to be. It's simpler to understand with examples; consider the following C code:

struct foobar {
int x;
char * y;
};
typedef struct foobar foobar_t;

In Cython, this can be described by the following:

cdef struct foobar:
int x
char * y
ctypedef foobar foobar_t
# You can also typedef pointers too
ctypedef int * int_ptr

We can also typedef function pointers as follows:

typedef void (*cfptr) (int)

In Cython, this will be as follows:

ctypedef void (*cfptr)(int)
# then we use the function pointer:
cdef cfptr myfunctionptr = &myfunc

Overall, this should be the reference you use whenever you are using Cython to understand how your C types map to Cython and to see how you can use them.

Scalable asynchronous servers

Using all the concepts learned in this article, I want to show you how we can use Cython to build something awesome—a complete messaging server that uses C to do all the low-level I/O and libevent to keep everything asynchronous. This means we will be using callbacks to handle the events that we will manage in the Python messaging engine. We can then define a simple protocol for a messaging system and roster. This design can be easily extended to a lot of things. To see if we are on the same page, refer to the following figure:

Learning Cython Programming Expand your existing legacy applications in C using Python with this book and ebook
Published: September 2013
eBook Price: £11.99
Book Price: £19.99
See more
Select your format and quantity:

C sockets with libevent

For those of you who are unfamiliar with libevent, I will now give a brief overview and show the main parts of the code

What is libevent?

libevent allows us to create a socket in C, which we can use to pass the file descriptor to libevent and give it several events to care about; for example, if a client is connecting to this socket, we can tell libevent to listen for it and it call our callback. Other events such as errors (clients going offline) or reads (clients pushing up data) can also be handled in the same manner. We use libevent because it's much more scalable and well defined, and it is a far better choice than writing our own polling event loop.

Once we create a socket, we must make it non-blocking for libevent. This useful snippet of C code may or may not be familiar to you, but it's a useful one to have in your tool-belt:

int setnonblock (int fd)
{
int flags;
flags = fcntl (fd, F_GETFL);
if (flags < 0)
return flags;
flags |= O_NONBLOCK;
if (fcntl (fd, F_SETFL, flags) < 0)
return -1;
return 0;
}

Once you create a socket, you pass the resulting file descriptor to this function and then create an on-connect event for libevent:

struct event ev_accept;
event_assign (&ev_accept, evbase,
sockfd,
EV_READ|EV_PERSIST,
&callback_client_connect,
NULL);
event_add (&ev_accept, NULL);

Now we have an event that will call the callback_client_connect function. Test this server with the following:

redbrain@blue-sun:~/workspace/cython-book/chapter2/async-server/server1$
make
gcc -g -O2 -Wall -c server.c -o server.o
gcc -g -O2 -o server server.o -levent
redbrain@blue-sun:~/workspace/cython-book/chapter2/async-server/server1$
./server

In another shell or multiple shells, run telnet to act as a simple client for now:

$ telnet localhost 9080

You can now type away and see all your data and events. At the moment, this is just a dumb event-driven messaging system, but imagine how you would begin adding a messaging engine to pass messages between clients and set how you would up a protocol in C. It would take some time to map out and, in general, it would be an unpleasant experience. We can use Cython to take control of the server and create our logic in Python using callbacks.

Messaging engine

With these callbacks, we can start making use of Python very easily to make this project awesome.

Cython callbacks

If you look at cython-book/chapter2/async-server/server2, you can see the callbacks in action:

./messagingServer -c config/server.cfg -l server.log

You can also spawn multiple telnet sessions again to see some things being printed out. There is a lot going on here, so I will break it down first. If you look inside this directory, you will see pyserver.pyx and pyserver.pxd. Here, we will introduce the pseudo Cython header files: (*.pxd).

Cython PXD

The use of PXD files is very similar to that of header files in C/C++. We can simply use our cdef declarations like extern functions or struct definitions and then use the following within a *.pyx file:

cimport pyserver

Now you can just code your method prototypes like you would in C and the cimport of the PXD file will get all the definitions.

Now that you have seen how *.pxd files work, we will remove the main method from server.c so we can use Python to control the whole system. If you look at pyserver.pyx, you will see the pyinit_server function; it takes a port number. We can then from Python pass the configuration of the server from pure Python with import pyserver when we build the shared library. We also call server.c to set callbacks, which are the cdef Cython functions, and we pass their addresses to the server:

static callback conncb, discb, readcb;
void setConnect_PyCallback (callback c)

{
conncb = c;
}
void setDisconnect_PyCallback (callback c)
{
discb = c;
}
void setRead_PyCallback (callback c)
{
readcb = c;
}

Now, in each of the events that exist, we can call these callbacks simply with readcb (NULL. NULL) and we will be in Python land. You can look at the Cython functions in depth in the pyserver.pyx file; however, know that they just print out some data:

cdef void pyconnect_callback (client *c, char * args):
print c.cid, "is online..."
cdef void pydisconnect_callback (client *c, char * args):
print c.cid, "went offline..."
cdef void pyread_callback (client *c, char * args):
print c.cid, "said: ", args

These are your basic callbacks into Cython code from the native event-driven system. You can see the basic main method from the messageServer.py file. It is executable and initializes everything required for our purposes. I know this may seem a fairly niche example, but I truly believe it demonstrates how cool C/Python can be. It simply imports pyserver and calls pyinit_server with a port. With this, you can use Python to control the configuration of system-level C components very easily, which can be fiddly to do well in pure C. We let Python do it.

Python messaging engine

Now that you've seen how we can have callbacks from this system into Cython, we can start to add some logic to the system so that if you spawn multiple localhost connections, they will run concurrently. It would be good to have some Roster logic, say to just make the client address its identifier, such that there can be only one client per address. We could implement this via a simple dictionary where key is address and value is true or false for online or offline. We can query if it is online; return a yes if it is or no to kill the connection. Currently, messagingEngine.py implements a basic roster class to perform this function.

This roster class will initialize a dictionary of client objects against their name, and handleEvent will, if it's a rosterEvent, handle clients going online and offline via the Cython callbacks. The other case is if the client is already online. We return true if we want to tell the server to disconnect that client by closing the socket connection, else we return false.

A simple way to initialize the roster class is through pyserver.pyx:

from messagingEngine import Roster
roster = None
def pyinit_server (port):
global roster
roster = Roster ()
….

Now, in each of the callbacks, we can simply call roster.handleEvent (…). On running this, we can see that the same address connections are now closed, as shown in the following screenshot (only one instance is allowed to personify clients logging in to a system):

I think this gives you an idea of how easy it could be to have Python handle message passing. You can easily extend your read callbacks to fully read the buffer and use Google protocol buffers (https://developers.google.com/protocol-buffers/docs/pythontutorial) to implement a full protocol for your system, but that's a whole project of its own.

Integration with build systems

This topic is basically dependent on the linking model you choose if you are to choose the shared-library approach. I would recommend using Python distutils. And if you are going for embedded Python, you should choose the autotools approach.

Python distutils

I just want to note how you can integrate Cython into your setup.py file; it's very simple:

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
setup(
scripts = ['messagingServer.py'],
packages = ['messagingEngine'],
cmdclass = { 'build_ext' : build_ext },
ext_modules = [ Extension ("pyserver", ["pyserver.pyx",
"server.c" ]) ]
)

Just append your module sources and Cython picks up the *.pyx and *.c files. You can use setup.py as you normally would:

$ python setup.py build
$ python setup.py install

Note that to install correctly, you must package and modularize your project so that messagingEngine is now its own module:

$ mkdir messagingEngine
$ cd messagingEngine
$ mv ../messagingEngine.py .
$ touch __init__.py
$ $EDITOR __init__.py
__all__ = ['messagingEngine']

GNU/Autotools

The snippet you need to know for this would simply be as follows:

found_python=no
AC_ARG_ENABLE(
python,
AC_HELP_STRING(--enable-python, create python support),
found_python=yes

)
AM_CONDITIONAL(IS_PYTHON, test "x%found_python" = xyes)
PYLIBS=""
PYINCS=""
if test "x$found_python" = xyes; then
AC_CHECK_PROG(CYTHON_CHECK,cython,yes)
if test x"$CYTHON_CHECK" != x"yes" ; then
AC_MSG_ERROR([Please install cython])
fi
AC_CHECK_PROG(PYTHON_CONF_CHECK,python-config,yes)
PYLIBS=`python-config --libs`
PYINCS=`python-config --includes`

if test "x$PYLIBS" == x; then
AC_MSG_ERROR("python-dev not found")
fi
fi
AC_SUBST(PYLIBS)
AC_SUBST(PYINCS)

This adds the –enable-python switch to your configure script. You now have the Cython command found and the PYLIBS and PYINCS variables for the compilation flags you need to compile. Now you need a snippet to understand how to compile *.pyx in your sources in automake:

bin_PROGRAMS = myprog
ACLOCAL_AMFLAGS = -I etc
CFLAGS += -I$(PYINCS)
LIBTOOL_DEPS = @LIBTOOL_DEPS@
libtool: $(LIBTOOL_DEPS)
$(SHELL) ./config.status libtool

SUFFIXES = .pyx
.pyx.c:
@echo " CPY " $<
@cython -2 -o $@ $<
myprog_SOURCES = \
src/bla.pyx \
...
myprog_LDADD = \
$(PYLIBS)

When you're comfortable with understanding where your code is and the linking models, you can choose the build systems. At that point, embedding Python becomes very easy–almost like second nature.

Summary

This whole article dealt with trying to make you more comfortable with Cython and aimed to show you that it is just like writing Python code. If you start using public and cdef regularly, you will see that you can mix C and Python code as if it was all the same language! Better yet, in each language, you get access to everything that language has. So, if you have Twisted installed in Python, you can access Twisted when you're in Python land; and if you're in C land, you can use fcntl or ioctl!


Further resources on this subject:


About the Author :


Philip Herron

Philip Herron is an avid software engineer who focuses his passion towards compilers and virtual machine implementations. When he was first accepted to Google Summer of Code 2010, he used inspiration from Paul Biggar's PhD on optimization of dynamic languages to develop a proof of concept GCC frontend to compile Python. This project sparked his deep interest of how Python works.

After completing a consecutive year on the same project in 2011, Philip decided to apply for Cython under the Python foundation to gain a deeper appreciation of the standard Python implementation. Through this, he started leveraging the advantages of Python to control the logic in systems or even to add more high-level interfaces such as embedding Twisted web servers for REST calls to a system-level piece of software without writing any C code.

Currently Philip is employed by NYSE Euronext in Belfast Northern Ireland, working on multiprocessing systems. But he spends his evenings hacking on GCCPy, Cython, and GCC. In the past, he has worked with WANdisco as an Apache Hadoop developer and as an intern with SAP Research on cloud computing.

Books From Packt


	 Python 2.6 Text Processing: Beginners Guide
Python 2.6 Text Processing: Beginners Guide

Python 2.6 Graphics Cookbook
Python 2.6 Graphics Cookbook

	 Expert Python Programming
Expert Python Programming

OpenCV Computer Vision with Python
OpenCV Computer Vision with Python

	 wxPython 2.8 Application Development Cookbook
wxPython 2.8 Application Development Cookbook

CherryPy Essentials: Rapid Python Web Application Development
CherryPy Essentials: Rapid Python Web Application Development

 Building Machine Learning Systems with Python
Building Machine Learning Systems with Python

Python 3 Object Oriented Programming
Python 3 Object Oriented Programming


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software