Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Learning Cascading

You're reading from  Learning Cascading

Product type Book
Published in May 2015
Publisher
ISBN-13 9781785288913
Pages 276 pages
Edition 1st Edition
Languages

Chapter 9. Planning for Future Growth

This chapter provides information that will allow you to continue to advance your skills. Here we will help you locate additional sources of information, explore other tools that are available, and provide forward-looking information concerning the future direction that Cascading will take.

Cascading is now in a period of expansion, and new libraries and products are emerging very quickly that facilitate integration with a variety of big data, machine learning, data management, and advanced analytics systems. It is important to get connected with this online world and to stay abreast of these new developments. The world of open source is supported by thousands of developers and testers, and becoming attuned to the places where people post their utilities and libraries can save you a lot of time by not having to "reinvent the wheel."

Finding online resources


Online resources exist, but they are sparse. There is of course a full JavaDoc online at http://docs.concurrentinc.com/cascading/<version>/javadoc/cascading-core/. Concurrent maintains a pretty full set of JavaDocs for each version. Currently 2.6 is the most recent, but 3.0 is coming soon!

There is also a user's guide online. This can be found at http://docs.cascading.org/cascading/<version>/userguide/html/. There is a six-part quick start tutorial that is located at http://docs.cascading.org/tutorials/etl-log/. There is a six-part tutorial that is also quite good called Cascading for the Impatient at http://docs.cascading.org/impatient/ and in general, a good location for other Cascading references and documentation is http://www.cascading.org/documentation/.

All code is available on GitHub and can be found at https://github.com/Cascading. Start here. There is a lot here that will help you. In fact, in some cases it is a good idea to simply download the...

Using other Cascading tools


The Cascading ecosphere is expanding rapidly and there are many toolsets and extensions that can help you with common tasks. Let's take a look at a few of them. Many of these are available for download from the previous URLs.

Lingual

Lingual is a Cascading framework tool that allows ANSI standard SQL queries to be used to create, transform, and move data. Lingual is based on the Apache Calcite framework (formerly known as Optiq). Essentially, using Lingual, one can write complex workflows as SQL commands, and then they are translated into Cascading flows, taps, and so on. Since Cascading interoperates with Hadoop, this allows ingestion and storage of Hadoop resident data through this programming paradigm.

This use pattern is very similar to Hive. However, Lingual is Cascading-specific and can therefore interoperate with Cascalog, Scalding, and Java. One other advantage over Hive is that using Cascading as an insulating layer, these programs will be able to more quickly...

Custom taps


Cascading can also connect to many data sources and sinks using custom developed taps. These projects allow Cascading to access a wide variety of other data source. Most significant are many of the interfaces to NoSQL databases, which can be used to access indexed data in a very scalable way. The following screenshot shows Cascading data sources:

Cascading serializers


Many customer serializers have also been written for use with both local and Hadoop mode in Cascading. New serializers are also becoming quite prevalent and should be examined for applicability to your particular problem. Some offer exceptional speed, others better compression schemes, and still others offer novel forms of managing metadata. The following screenshot shows Cascading custom serializers:

Java open source mock frameworks


The following table shows a list of many Java open source mock frameworks that you can use to test your Cascading applications:

Summary


In this chapter, you learned how to find other Cascading resources. This includes online help, Wikis, code sources, and various other sources of in-depth documentation. We also discussed open source and commercial software, special purpose interface software, and specialized add-on utilities that extend Cascading's capabilities and can also reduce development time.

This is the last chapter of this book. There is so much more, but we think that this book provides a solid foundation that will more than just get you started, but actually give you a great basis for further learning. So we hope that you continue to explore, learn, and most of all, write good reusable, scalable code using Cascading and the full ecosystem of tools that comes with it. We hope that you monitor the many sites that we have outlined here because Cascading and nearly all big data solutions are moving, changing, and evolving at a tremendous pace. This is a great time to be alive and to be developing code in this...

lock icon The rest of the chapter is locked
You have been reading a chapter from
Learning Cascading
Published in: May 2015 Publisher: ISBN-13: 9781785288913
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}