Integrating Storm and Hadoop

Exclusive offer: get 80% off this eBook here
Storm Real-time Processing Cookbook

Storm Real-time Processing Cookbook — Save 80%

Efficiently process unbounded streams of data in real time with this book and ebook

₨924.00    ₨184.80
by Quinton Anderson | September 2013 | Cookbooks Open Source

In this article created by Quinton Anderson, author of the book Storm Real-time Processing Cookbook, we will cover the following topics:

  • Implementing TF-IDF in Hadoop

  • Persisting documents from Storm

  • Integrating the batch and real-time views

(For more resources related to this topic, see here.)

In this article, we will implement the Batch and Service layers to complete the architecture.

There are some key concepts underlying this big data architecture:

  • Immutable state

  • Abstraction and composition

  • Constrain complexity

Immutable state is the key, in that it provides true fault-tolerance for the architecture. If a failure is experienced at any level, we can always rebuild the data from the original immutable data. This is in contrast to many existing data systems, where the paradigm is to act on mutable data. This approach may seem simple and logical; however, it exposes the system to a particular kind of risk in which the state is lost or corrupted. It also constrains the system, in that you can only work with the current view of the data; it isn't possible to derive new views of the data. When the architecture is based on a fundamentally immutable state, it becomes both flexible and fault-tolerant.

Abstractions allow us to remove complexity in some cases, and in others they can introduce complexity. It is important to achieve an appropriate set of abstractions that increase our productivity and remove complexity, but at an appropriate cost. It must be noted that all abstractions leak, meaning that when failures occur at a lower abstraction, they will affect the higher-level abstractions. It is therefore often important to be able to make changes within the various layers and understand more than one layer of abstraction. The designs we choose to implement our abstractions must therefore not prevent us from reasoning about or working at the lower levels of abstraction when required. Open source projects are often good at this, because of the obvious access to the code of the lower level abstractions, but even with source code available, it is easy to convolute the abstraction to the extent that it becomes a risk. In a big data solution, we have to work at higher levels of abstraction in order to be productive and deal with the massive complexity, so we need to choose our abstractions carefully. In the case of Storm, Trident represents an appropriate abstraction for dealing with the data-processing complexity, but the lower level Storm API on which Trident is based isn't hidden from us. We are therefore able to easily reason about Trident based on an understanding of lower-level abstractions within Storm.

Another key issue to consider when dealing with complexity and productivity is composition. Composition within a given layer of abstraction allows us to quickly build out a solution that is well tested and easy to reason about. Composition is fundamentally decoupled, while abstraction contains some inherent coupling to the lower-level abstractions—something that we need to be aware of.

Finally, a big data solution needs to constrain complexity. Complexity always equates to risk and cost in the long run, both from a development perspective and from an operational perspective. Real-time solutions will always be more complex than batch-based systems; they also lack some of the qualities we require in terms of performance. Nathan Marz's Lambda architecture attempts to address this by combining the qualities of each type of system to constrain complexity and deliver a truly fault-tolerant architecture.

We divided this flow into preprocessing and "at time" phases, using streams and DRPC streams respectively. We also introduced time windows that allowed us to segment the preprocessed data. In this article, we complete the entire architecture by implementing the Batch and Service layers.

The Service layer is simply a store of a view of the data. In this case, we will store this view in Cassandra, as it is a convenient place to access the state alongside Trident's state. The preprocessed view is identical to the preprocessed view created by Trident, counted elements of the TF-IDF formula (D, DF, and TF), but in the batch case, the dataset is much larger, as it includes the entire history.

The Batch layer is implemented in Hadoop using MapReduce to calculate the preprocessed view of the data. MapReduce is extremely powerful, but like the lower-level Storm API, is potentially too low-level for the problem at hand for the following reasons:

  • We need to describe the problem as a data pipeline; MapReduce isn't congruent with such a way of thinking

  • Productivity

We would like to think of a data pipeline in terms of streams of data, tuples within the stream and predicates acting on those tuples. This allows us to easily describe a solution to a data processing problem, but it also promotes composability, in that predicates are fundamentally composable, but pipelines themselves can also be composed to form larger, more complex pipelines. Cascading provides such an abstraction for MapReduce in the same way as Trident does for Storm.

With these tools, approaches, and considerations in place, we can now complete our real-time big data architecture. There are a number of elements, that we will update, and a number of elements that we will add. The following figure illustrates the final architecture, where the elements in light grey will be updated from the existing recipe, and the elements in dark grey will be added in this article:

Implementing TF-IDF in Hadoop

TF-IDF is a well-known problem in the MapReduce communities; it is well-documented and implemented, and it is interesting in that it is sufficiently complex to be useful and instructive at the same time. Cascading has a series of tutorials on TF-IDF at http://www.cascading.org/2012/07/31/cascading-for-the-impatient-part-5/, which documents this implementation well. For this recipe, we shall use a Clojure Domain Specific Language (DSL) called Cascalog that is implemented on top of Cascading. Cascalog has been chosen because it provides a set of abstractions that are very semantically similar to the Trident API and are very terse while still remaining very readable and easy to understand.

Getting ready

Before you begin, please ensure that you have installed Hadoop by following the instructions at http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/.

How to do it…

  1. Start by creating the project using the lein command:

    lein new tfidf-cascalog

  2. Next, you need to edit the project.clj file to include the dependencies:

    (defproject tfidf-cascalog "0.1.0-SNAPSHOT" :dependencies [[org.clojure/clojure "1.4.0"] [cascalog "1.10.1"] [org.apache.cassandra/cassandra-all "1.1.5"] [clojurewerkz/cassaforte "1.0.0-beta11-SNAPSHOT"] [quintona/cascading-cassandra "0.0.7-SNAPSHOT"] [clj-time "0.5.0"] [cascading.avro/avro-scheme "2.2-SNAPSHOT"] [cascalog-more-taps "0.3.0"] [org.apache.httpcomponents/httpclient "4.2.3"]] :profiles{:dev{:dependencies[[org.apache.hadoop/hadoop-core "0.20.2-dev"] [lein-midje "3.0.1"] [cascalog/midje-cascalog "1.10.1"]]}})

    It is always a good idea to validate your dependencies; to do this, execute lein deps and review any errors. In this particular case, cascading-cassandra has not been deployed to clojars, and so you will receive an error message. Simply download the source from https://github.com/quintona/cascading-cassandra and install it into your local repository using Maven.

  3. It is also good practice to understand your dependency tree. This is important to not only prevent duplicate classpath issues, but also to understand what licenses you are subject to. To do this, simply run lein pom, followed by mvn dependency:tree. You can then review the tree for conflicts. In this particular case, you will notice that there are two conflicting versions of Avro. You can fix this by adding the appropriate exclusions:

    [org.apache.cassandra/cassandra-all "1.1.5" :exclusions [org.apache.cassandra.deps/avro]]

  4. We then need to create the Clojure-based Cascade queries that will process the document data. We first need to create the query that will create the "D" view of the data; that is, the D portion of the TF-IDF function. This is achieved by defining a Cascalog function that will output a key and a value, which is composed of a set of predicates:

    (defn D [src] (let [src (select-fields src ["?doc-id"])] (<- [?key ?d-str] (src ?doc-id) (c/distinct-count ?doc-id :> ?n-docs) (str "twitter" :> ?key) (str ?n-docs :> ?d-str))))

    You can define this and any of the following functions in the REPL, or add them to core.clj in your project. If you want to use the REPL, simply use lein repl from within the project folder. The required namespace (the use statement), require, and import definitions can be found in the source code bundle.

  5. We then need to add similar functions to calculate the TF and DF values:

    (defn DF [src] (<- [?key ?df-count-str] (src ?doc-id ?time ?df-word) (c/distinct-count ?doc-id ?df-word :> ?df-count) (str ?df-word :> ?key) (str ?df-count :> ?df-count-str))) (defn TF [src] (<- [?key ?tf-count-str] (src ?doc-id ?time ?tf-word) (c/count ?tf-count) (str ?doc-id ?tf-word :> ?key) (str ?tf-count :> ?tf-count-str)))

  6. This Batch layer is only interested in calculating views for all the data leading up to, but not including, the current hour. This is because the data for the current hour will be provided by Trident when it merges this batch view with the view it has calculated. In order to achieve this, we need to filter out all the records that are within the current hour. The following function makes that possible:

    (deffilterop timing-correct? [doc-time] (let [now (local-now) interval (in-minutes (interval (from-long doc-time) now))] (if (< interval 60) false true))

  7. Each of the preceding query definitions require a clean stream of words. The text contained in the source documents isn't clean. It still contains stop words. In order to filter these and emit a clean set of words for these queries, we can compose a function that splits the text into words and filters them based on a list of stop words and the time function defined previously:

    (defn etl-docs-gen [rain stop] (<- [?doc-id ?time ?word] (rain ?doc-id ?time ?line) (split ?line :> ?word-dirty) ((c/comp s/trim s/lower-case) ?word-dirty :> ?word) (stop ?word :> false) (timing-correct? ?time)))

  8. We will be storing the outputs from our queries to Cassandra, which requires us to define a set of taps for these views:

    (defn create-tap [rowkey cassandra-ip] (let [keyspace storm_keyspace column-family "tfidfbatch" scheme (CassandraScheme. cassandra-ip "9160" keyspace column-family rowkey {"cassandra.inputPartitioner"
    "org.apache.cassandra.dht.RandomPartitioner" "cassandra.outputPartitioner"
    "org.apache.cassandra.dht.RandomPartitioner"}) tap (CassandraTap. scheme)] tap)) (defn create-d-tap [cassandra-ip] (create-tap "d"cassandra-ip)) (defn create-df-tap [cassandra-ip] (create-tap "df" cassandra-ip)) (defn create-tf-tap [cassandra-ip] (create-tap "tf" cassandra-ip))

    The way this schema is created means that it will use a static row key and persist name-value pairs from the tuples as column:value within that row. This is congruent with the approach used by the Trident Cassandra adaptor. This is a convenient approach, as it will make our lives easier later.

  9. We can complete the implementation by a providing a function that ties everything together and executes the queries:

    (defn execute [in stop cassandra-ip] (cc/connect! cassandra-ip) (sch/set-keyspace storm_keyspace) (let [input (tap/hfs-tap (AvroScheme. (load-schema)) in) stop (hfs-delimited stop :skip-header? true) src (etl-docs-gen input stop)] (?- (create-d-tap cassandra-ip) (D src)) (?- (create-df-tap cassandra-ip) (DF src)) (?- (create-tf-tap cassandra-ip) (TF src))))

  10. Next, we need to get some data to test with. I have created some test data, which is available at https://bitbucket.org/qanderson/tfidf-cascalog. Simply download the project and copy the contents of src/data to the data folder in your project structure.

  11. We can now test this entire implementation. To do this, we need to insert the data into Hadoop:

    hadoop fs -copyFromLocal ./data/document.avro data/document.avro hadoop fs -copyFromLocal ./data/en.stop data/en.stop

  12. Then launch the execution from the REPL:

    => (execute "data/document" "data/en.stop" "127.0.0.1")

How it works…

There are many excellent guides on the Cascalog wiki (https://github.com/nathanmarz/cascalog/wiki), but for completeness's sake, the nature of a Cascalog query will be explained here. Before that, however, a revision of Cascading pipelines is required.

The following is quoted from the Cascading documentation (http://docs.cascading.org/cascading/2.1/userguide/htmlsingle/):

Pipe assemblies define what work should be done against tuple streams, which are read from tap sources and written to tap sinks. The work performed on the data stream may include actions such as filtering, transforming, organizing, and calculating. Pipe assemblies may use multiple sources and multiple sinks, and may define splits, merges, and joins to manipulate the tuple streams.

This concept is embodied in Cascalog through the definition of queries. A query takes a set of inputs and applies a list of predicates across the fields in each tuple of the input stream. Queries are composed through the application of many predicates. Queries can also be composed to form larger, more complex queries. In either event, these queries are reduced down into a Cascading pipeline. Cascalog therefore provides an extremely terse and powerful abstraction on top of Cascading; moreover, it enables an excellent development workflow through the REPL. Queries can be easily composed and executed against smaller representative datasets within the REPL, providing the idiomatic API and development workflow that makes Clojure beautiful.

If we unpack the query we defined for TF, we will find the following code:

(defn DF [src] (<- [?key ?df-count-str] (src ?doc-id ?time ?df-word) (c/distinct-count ?doc-id ?df-word :> ?df-count) (str ?df-word :> ?key) (str ?df-count :> ?df-count-str)))

The <- macro defines a query, but does not execute it. The initial vector, [?key ?df-count-str], defines the output fields, which is followed by a list of predicate functions. Each predicate can be one of the following three types:

  • Generators: A source of data where the underlying source is either a tap or another query.

  • Operations: Implicit relations that take in input variables defined elsewhere and either act as a function that binds new variables or a filter. Operations typically act within the scope of a single tuple.

  • Aggregators: Functions that act across tuples to create aggregate representations of data. For example, count and sum.

The :> keyword is used to separate input variables from output variables. If no :> keyword is specified, the variables are considered as input variables for operations and output variables for generators and aggregators.

The (src ?doc-id ?time ?df-word) predicate function names the first three values within the input tuple, whose names are applicable within the query scope. Therefore, if the tuple ("doc1" 123324 "This") arrives in this query, the variables would effectively bind as follows:

  • ?doc-id: "doc1"

  • ?time: 123324

  • ?df-word: "This"

Each predicate within the scope of the query can use any bound value or add new bound variables to the scope of the query. The final set of bound values that are emitted is defined by the output vector.

We defined three queries, each calculating a portion of the value required for the TF-IDF algorithm. These are fed from two single taps, which are files stored in the Hadoop filesystem. The document file is stored using Apache Avro, which provides a high-performance and dynamic serialization layer. Avro takes a record definition and enables serialization/deserialization based on it. The record structure, in this case, is for a document and is defined as follows:

{"namespace": "storm.cookbook", "type": "record", "name": "Document", "fields": [ {"name": "docid", "type": "string"}, {"name": "time", "type": "long"}, {"name": "line", "type": "string"} ] }

Both the stop words and documents are fed through an ETL function that emits a clean set of words that have been filtered. The words are derived by splitting the line field using a regular expression:

(defmapcatop split [line] (s/split line #"[\[\]\\\(\),.)\s]+"))

The ETL function is also a query, which serves as a source for our downstream queries, and defines the [?doc-id ?time ?word] output fields.

The output tap, or sink, is based on the Cassandra scheme. A query defines predicate logic, not the source and destination of data. The sink ensures that the outputs of our queries are sent to Cassandra. The ?- macro executes a query, and it is only at execution time that a query is bound to its source and destination, again allowing for extreme levels of composition. The following, therefore, executes the TF query and outputs to Cassandra:

(?- (create-tf-tap cassandra-ip) (TF src))

There's more…

The Avro test data was created using the test data from the Cascading tutorial at http://www.cascading.org/2012/07/31/cascading-for-the-impatient-part-5/. Within this tutorial is the rain.txt tab-separated data file. A new column was created called time that holds the Unix epoc time in milliseconds. The updated text file was then processed using some basic Java code that leverages Avro:

Schema schema = Schema.parse(SandboxMain.class.getResourceAsStream
("/document.avsc")); File file = new File("document.avro"); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<
GenericRecord>(schema); DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<
GenericRecord>(datumWriter); dataFileWriter.create(schema, file); BufferedReader reader = new BufferedReader(new InputStreamReader
(SandboxMain.class.getResourceAsStream("/rain.txt"))); String line = null; try { while ((line = reader.readLine()) != null) { String[] tokens = line.split("\t"); GenericRecord docEntry = new GenericData.Record(schema); docEntry.put("docid", tokens[0]); docEntry.put("time", Long.parseLong(tokens[1])); docEntry.put("line", tokens[2]); dataFileWriter.append(docEntry); } } catch (IOException e) { e.printStackTrace(); } dataFileWriter.close();

Persisting documents from Storm

In the previous recipe, we looked at deriving precomputed views of our data taking some immutable data as the source. In that recipe, we used statically created data. In an operational system, we need Storm to store the immutable data into Hadoop so that it can be used in any preprocessing that is required.

How to do it…

As each tuple is processed in Storm, we must generate an Avro record based on the document record definition and append it to the data file within the Hadoop filesystem.

We must create a Trident function that takes each document tuple and stores the associated Avro record.

  1. Within the tfidf-topology project created in, inside the storm.cookbook.tfidf.function package, create a new class named PersistDocumentFunction that extends BaseFunction. Within the prepare function, initialize the Avro schema and document writer:

    public void prepare(Map conf, TridentOperationContext context) { try { String path = (String) conf.get("DOCUMENT_PATH"); schema = Schema.parse(PersistDocumentFunction.class .getResourceAsStream("/document.avsc")); File file = new File(path); DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter
    <GenericRecord>(schema); dataFileWriter = new DataFileWriter<GenericRecord>(datumWriter); if(file.exists()) dataFileWriter.appendTo(file); else dataFileWriter.create(schema, file); } catch (IOException e) { throw new RuntimeException(e); } }

  2. As each tuple is received, coerce it into an Avro record and add it to the file:

    public void execute(TridentTuple tuple, TridentCollector collector) { GenericRecord docEntry = new GenericData.Record(schema); docEntry.put("docid", tuple.getStringByField("documentId")); docEntry.put("time", Time.currentTimeMillis()); docEntry.put("line", tuple.getStringByField("document")); try { dataFileWriter.append(docEntry); dataFileWriter.flush(); } catch (IOException e) { LOG.error("Error writing to document record: " + e); throw new RuntimeException(e); } }

  3. Next, edit the TermTopology.build topology and add the function to the document stream:

    documentStream.each(new Fields("documentId","document"), new PersistDocumentFunction(), new Fields());

  4. Finally, include the document path into the topology configuration:

    conf.put("DOCUMENT_PATH", "document.avro");

How it works…

There are various logical streams within the topology, and certainly the input for the topology is not in the appropriate state for the recipes in this article containing only URLs. We therefore need to select the correct stream from which to consume tuples, coerce these into Avro records, and serialize them into a file.

The previous recipe will then periodically consume this file. Within the context of the topology definition, include the following code:

Stream documentStream = getUrlStream(topology, spout) .each(new Fields("url"), new DocumentFetchFunction(mimeTypes), new Fields("document", "documentId", "source")); documentStream.each(new Fields("documentId","document"), new PersistDocumentFunction(), new Fields());

The function should consume tuples from the document stream whose tuples are populated with already fetched documents.

Storm Real-time Processing Cookbook Efficiently process unbounded streams of data in real time with this book and ebook
Published: August 2013
eBook Price: ₨924.00
Book Price: ₨1,540.00
See more
Select your format and quantity:

Integrating the batch and real-time views

The final step to complete the big data architecture is largely complete already and is surprisingly simple, as is the case with all good functional style designs.

How to do it…

We need three new state sources that represents the D, DF, and TF values computed in the Batch layer. We will combine the values from these states with the existing state before performing the final TF-IDF calculation.

  1. Start from the inside out by creating the combination function called BatchCombiner within the storm.cookbook.tfidf.function package and implement the logic to combine two versions of the same state. One version should be from the current hour, and the other from all the data prior to the current hour:

    public void execute(TridentTuple tuple, TridentCollector collector) { try { double d_rt = (double) tuple.getLongByField("d_rt"); double df_rt = (double) tuple.getLongByField("df_rt"); double tf_rt = (double) tuple.getLongByField("tf_rt"); double d_batch = (double) tuple.getLongByField("d_batch"); double df_batch = (double) tuple.getLongByField("df_batch"); double tf_batch = (double) tuple.getLongByField("tf_batch"); collector.emit(new Values(tf_rt + tf_batch,
    d_rt + d_batch, df_rt + df_batch)); } catch (Exception e) { } }

  2. Add the state to the topology by adding these calls to the addTFIDFQueryStream function:

    TridentState batchDfState = topology.newStaticState
    (getBatchStateFactory("df")); TridentState batchDState = topology.newStaticState
    (getBatchStateFactory("d")); TridentState batchTfState = topology.newStaticState
    (getBatchStateFactory("tf"));

  3. This is supported by the static utility function:

    private static StateFactory getBatchStateFactory(String rowKey) { CassandraState.Options options = new CassandraState.Options(); options.keyspace = "storm"; options.columnFamily = "tfidfbatch"; options.rowKey = rowKey; return CassandraState.nonTransactional("localhost", options); }

    Within a cluster deployment of Cassandra, simply replace the word localhost with a list of seed node IP addresses. Seed nodes are simply Cassandra nodes, which, when appropriately configured, will know about their peers in the cluster. For more information on Cassandra, please see the online documentation at http://wiki.apache.org/cassandra/GettingStarted.

  4. Finally, edit the existing DRPC query to reflect the added state and combiner function:

    topology.newDRPCStream("tfidfQuery",drpc) .each(new Fields("args"), new SplitAndProjectToFields(), new Fields("documentId", "term")) .each(new Fields(), new StaticSourceFunction("twitter"), new Fields("source")).stateQuery(tfState, new Fields("documentId", "term"), new MapGet(), new Fields("tf_rt")) .stateQuery(dfState,new Fields("term"), new MapGet(), new Fields("df_rt")) .stateQuery(dState,new Fields("source"), new MapGet(), new Fields("d_rt")) .stateQuery(batchTfState, new Fields("documentId", "term"), new MapGet(), new Fields("tf_batch")) .stateQuery(batchDfState,new Fields("term"), new MapGet(), new Fields("df_batch")) .stateQuery(batchDState,new Fields("source"), new MapGet(), new Fields("d_batch")) .each(new Fields("tf_rt","df_rt", "d_rt","tf_batch","df_batch","d_batch"), new BatchCombiner(),
    new Fields("tf","d","df")) .each(new Fields("term","documentId","tf","d", "df"), new TfidfExpression(), new Fields("tfidf")) .each(new Fields("tfidf"), new FilterNull()) .project(new Fields("documentId", "term","tfidf"));

How it works…

We have covered a huge amount of ground to get to this point. We have implemented an entire real-time, big data architecture that is fault-tolerant, scalable, and reliable using purely open source technologies. It is therefore useful at this point to recap the journey we have taken to the point, ending back where we are now:

  • We learned how to implement a Trident topology and define a stream data pipeline. This data pipeline defines predicates that not only act on tuples but also on persistent, mutable states.

  • Using this pipeline, we implemented the TF-IDF algorithm.

  • We separated out the preprocessing stage of the data pipeline from the "at time" stage of the pipeline. We achieved this by implementing a portion of the pipeline in a DRPC stream that is only invoked at "at time".

  • We then added the concept of time windows to the topology. This allowed us to segment the state into time-window buckets. We chose hours as a convenient segmentation.

  • We learned how to test a time-dependent topology using the Clojure testing API.

  • Then, in this article, we implemented the immutable state and the batch computation.

  • Finally, we combined the batch-computed view with the mutable state to provide a complete solution.

The following flow diagram illustrates the entire process:

With the high-level picture in place, the final DRPC query stream becomes easier to understand. The stream effectively implements the following steps:

  • .each(SplitAndProjectToFields): This splits the input arguments from the query and projects them out into separate fields in the tuple

  • .each(StaticSourceFunction): This adds a static value to the stream, which will be required later

  • .stateQuery(tfState): This queries the state of the tf value for the current hour based on the document ID and term and outputs tf_rt

  • .stateQuery(dState): This queries the state of the d value for the current hour based on the static source value and outputs d_rt

  • .stateQuery(dfState): This queries the state of the df value for the current hour based on the term and outputs df_rt

  • .stateQuery(tfBatchState): This queries the state of the tf value for all previous hours based on the document ID and term and outputs tf_batch

  • .stateQuery(dBatchState): This queries the state of the d value for all previous hours based on the static source value and outputs d_batch

  • .stateQuery(dfBatchState): This queries the state of the df value for all previous hours based on the term and outputs df_batch

  • .each(BatchCombiner): This combines the separate _rt and _batch fields into a single set of values

  • .each(TfidfExpression): This calculates the TF-IDF final value

  • .project: This projects just the fields we require in the output

A key to understanding this is that in each stage in this process, the tuple is simply receiving new values and each function is simply adding new named values to the tuple. The state queries are doing the same based on existing fields within the tuple. Finally, we end up with a very "wide" tuple that we trim down before returning the final result.

Summary

In this article, helps in guiding the user through the process of integrating Storm with Hadoop, thus creating a complete Lambda architecture.

Resources for Article :


Further resources on this subject:


About the Author :


Quinton Anderson

Quinton Anderson is a software engineer with a background and focus on real-time computational systems. His career has been split between building real-time communication systems for defense systems and building enterprise applications within financial services and banking. Quinton does not align himself with any particular technology or programming language, but rather prefers to focus on sound engineering and polyglot development. He is passionate about open source, and is an active member of the Storm community; he has also enjoyed delivering various Storm-based solutions.

Quinton's next area of focus is machine learning; specifically, Deep Belief networks, as they pertain to robotics. Please follow his blog entries on Computational Theory, general IT concepts, and Deep Belief networks for more information.

You can find more information on Quinton via his LinkedIn profile (http://au.linkedin.com/pub/quinton-anderson/37/422/11b/) or more importantly, view and contribute to the source code available at his GitHub (https://github.com/quintona) and Bitbucket (https://bitbucket.org/qanderson) accounts.

Books From Packt


Socket.IO Real-time Web Application Development
Socket.IO Real-time Web Application Development

Hadoop MapReduce Cookbook
Hadoop MapReduce Cookbook

SignalR: Real-time Application Development
SignalR: Real-time Application Development

Hadoop Real-World Solutions Cookbook
Hadoop Real-World Solutions Cookbook

Real-time Web Application Development with Vert.x
Real-time Web Application Development with Vert.x

Instant PhpStorm Starter
Instant PhpStorm Starter

Instant LEGO Mindstorm EV3
Instant LEGO Mindstorm EV3

Open Text Metastorm ProVision® 6.2 Strategy Implementation
Open Text Metastorm ProVision® 6.2 Strategy Implementation


Code Download and Errata
Packt Anytime, Anywhere
Register Books
Print Upgrades
eBook Downloads
Video Support
Contact Us
Awards Voting Nominations Previous Winners
Judges Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software
Resources
Open Source CMS Hall Of Fame CMS Most Promising Open Source Project Open Source E-Commerce Applications Open Source JavaScript Library Open Source Graphics Software