It's been nearly four months since TensorFlow, Google's computation graph machine learning library, was open sourced, and the momentum from its launch is still going strong. Over the time, both Microsoft and Baidu have released their own deep-learning libraries (CNTK and warp-ctc, respectively), and the machine learning arms race has escalated even further with Yahoo open sourcing CaffeOnSpark. Google hasn't been idle, however, and with the recent releases of TensorFlow Serving and the long awaited distributed runtime, now is the time for businesses and individual data scientists to ask: is it time to commit to TensorFlow?
TensorFlow's most appealing features
There are a lot of machine learning libraries available today—what makes TensorFlow stand out in this crowded space?
TensorFlow heavily borrows concepts from the more tenured machine learning library Theano. Many models written for research papers were built in Theano, and its composable, node-by-node writing style translates well when implementing a model whose graph was drawn by hand first. TensorFlow's API is extremely similar. Both Theano and TensorFlow feature a Python API for defining the computation graph, which then hooks into high performance C/C++ implementations of mathematical operations. Both are able to automatically differentiate their graphs with respect to their inputs, which facilitates learning on complicated neural network structures and both integrate tightly with Numpy for defining tensors (n-dimensional arrays).
However, one of the biggest advantages TensorFlow currently has over Theano (at least when comparing features both Theano and TensorFlow have) is its compile time. As of the time of writing this, Theano's compile times can be quite lengthy and although there are options to speed up compilation for experimentation, they come at the cost of a slower output model. TensorFlow's compilation is much faster, which leads to less headaches when trying out slightly different versions of models.
At first, it may sound more like brand recognition than a tangible advantage, but when I say it's 'backed' by Google, what I mean is that Google is seriously pouring tons of resources into making TensorFlow an awesome tool. There is an entire team at Google dedicated on maintaining and improving the software steadily and visibly, while simultaneously running a clinic on how to properly interact with and engage the open source community.
Google proved itself willing to adopt quality submissions from the community as well as flexible enough to adapt to public demands (such as moving the master contribution repository from Google's self-hosted Gerrit server to GitHub). These actions combined with genuinely constructive feedback from Google's team on pull-requests and issues helped make the community feel like this was a project worth supporting. The result? A continuous stream of little improvements and ideas from the community while the core Google team works on releasing larger features. Not only does TensorFlow recieve the benefits of a larger contributor base because of this, it also is more likely to withstand user decay as more people have invested time in making TensorFlow their own.
TensorBoard was the shiny toy that shipped on release with the first open source version of TensorFlow, but it's much more than eye candy. Not only can you use it as a guide to ensure what you've coded matches your reference model, but you can also keep track of data flowing through your model. This is especially useful when debugging subsections of your graph, as you can go in and see where any hiccups may have occurred.
The typical life cycle of machine learning models in the business world is generally as follows:
- Research and develop a model that is more accurate/faster/more descriptive than the previous model
- Write down the exact specifications of the finalized model
- Recreate the model in C++/C/Java/some other fast, compiled language
- Push the new model into deployment, replacing the old model
On release, TensorFlow promised to "connect research and production." However, the community had to wait until just recently for that promise to come to fruition with TensorFlow Serving. This software allows you to run it as a server that can natively run models built in TensorFlow, which makes the new life cycle look like this:
- Research and develop a new model
- Hook the new model into TensorFlow Serving
While there is overhead in learning how to use TensorFlow Serving, the process of hooking up new models stays the same, whereas rewriting new models in a different language is time consuming and difficult.
The distributed runtime is one of the newest features to be pushed to the TensorFlow repository, but it has been, by far, the most eagerly anticipated aspect of TensorFlow. Without having to incorporate any other libraries or software packages, TensorFlow is able to run distributed learning tasks on heterogenous hardware with various CPUs and GPUs. This feature is absolutely brand new (it came out in the middle of writing this post!), so do your research on how to use it and how well it runs.
TensorFlow can't claim to be the best at everything, and there are several sticking points that should be addressed sooner rather than later. Luckily, Google has been making steady improvements to TensorFlow since it was released, and I would be surprised if most of these were not remedied within the next few months.
Although the TensorFlow team promises deployment worthy models from compiled TensorFlow code, at this time, its single machine training speed lags behind most other options. The team has made improvements in speed since its release, but there is still more work to be done. In-place operations, a more efficient node placement algorithm, and better compression techniques could help here. Distributed benchmarks are not available at this time—expect to see them after the next official TensorFlow release.
Libraries such as Caffe, Torch, and Theano have a good selection of pre-trained, state-of-the-art models that are implemented in their library. While Google did release a version of its Inception-v3 model in TensorFlow, it needs more options to provide a starting place for more types of problems.
Yes, TensorFlow did push code for it's distributed runtime, but it still needs better documentation as well as more examples. I'm incredibly excited that it's available to try out right now, but it's going to take some time for most people to put it into production.
Interested in getting up and running with TensorFlow? You'll need a primer on Python. Luckily, our Python Fundamentals course in Mapt gives you an accessible yet comprehensive journey through Python - and this week it's completely free. Click here, login, then get stuck in...
Most people want to use software that is going to last for more than a few months—what does the future look like for TensorFlow? Here are my predictions about the medium-term future of the library.
Just as Hadoop has commercial distributions of its software, I expect to see more and more companies offering supported suites that tie into TensorFlow. Whether they have more pre-trained models built on top of Keras (which already supports a TensorFlow backend), or make TensorFlow work seamlessly with a distributed file system like Hadoop, I forsee a lot of demand for enterprise features and support with TensorFlow.
As mentioned earlier, TensorFlow still lags behind many other libraries out there. However, with the improvements already made; it's clear that Google is determined to make TensorFlow as efficient as possible. That said, I believe most applications of TensorFlow won't desperately need the speed increase. Of course, it's nice to have your models run faster, but most businesses out there don't have petabytes of useful data to work with, which means that model training usually doesn't take the "weeks" that we often see claimed as training time.
While there are definitely going to be many new features in upcoming releases of TensorFlow, I expect to see the learning curve of the software go down as more resources, such as tutorials, examples, and books are made available. The documentation's terminology has already changed in places to be more understandable; navigation within the documentation should improve over time. Finally, while most of the latest features in TensorFlow don't have the friendliest APIs right now, I'd be shocked if more user-friendly versions of TensorFlow Serving and the distributed runtime weren't in the works right now.
TensorFlow appears primed to fulfil the promise that was made back in November: a distributed, flexible data flow graph library that excels at neural network composition. I leave it to you decision makers to figure out whether TensorFlow is the right move for your own machine learning tasks, but here is my overall impression of TensorFlow: no other machine learning framework targeted at production-level tasks is as flexible, powerful, or improving as rapidly as TensorFlow. While other frameworks may carry advantages over TensorFlow now, Google is putting the effort into making consistent improvements, which bodes well for a community that is still in its infancy.
About the author
Sam Abrahams is a freelance data engineer and animator in Los Angeles, CA. He specializes in real-world applications of machine learning and is a contributor to TensorFlow. Sam runs a small tech blog, Memdump, and is an active member of the local hacker scene in West LA.