Search icon
Arrow left icon
All Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletters
Free Learning
Arrow right icon
Analytics for the Internet of Things (IoT)

You're reading from  Analytics for the Internet of Things (IoT)

Product type Book
Published in Jul 2017
Publisher Packt
ISBN-13 9781787120730
Pages 378 pages
Edition 1st Edition
Languages
Author (1):
Andrew Minteer Andrew Minteer
Profile icon Andrew Minteer

Table of Contents (20) Chapters

Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
Defining IoT Analytics and Challenges IoT Devices and Networking Protocols IoT Analytics for the Cloud Creating an AWS Cloud Analytics Environment Collecting All That Data - Strategies and Techniques Getting to Know Your Data - Exploring IoT Data Decorating Your Data - Adding External Datasets to Innovate Communicating with Others - Visualization and Dashboarding Applying Geospatial Analytics to IoT Data Data Science for IoT Analytics Strategies to Organize Data for Analytics The Economics of IoT Analytics Bringing It All Together

Chapter 5. Collecting All That Data - Strategies and Techniques

You stare at your drawing of the IoT device hanging on the wall of your cubicle, lost in thought on the ways you might manipulate the data to squeeze out game changing insights. You can almost hear your colleagues cheer as you accept the Executive Award for best project of the year and the huge bonus that goes with it.

"Ahem!" someone coughs behind you. You almost jump out of your chair.

Your boss has sidled up to your cubicle. He looks both cheerful and amused. You are a little concerned at the amused part.

"You did an excellent job selling them on using the cloud for analytics, and they are fully on board and want to start immediately," he says with a big grin. You perk up, as this is great news.

"You did so well," he continues with a smirk, "that they want to double the data capture rate on the next generation of devices. They figure the cost will not change much if it is routed through cloud infrastructure. And since capacity...

Designing data processing for analytics


There are some key cloud services that are likely to be employed in your IoT data processing environment. Both AWS and Microsoft Azure have IoT-specific services that we will review. There are also services that support data processing and transformation that are worth a review to increase your familiarity with them.

Amazon Kinesis

Amazon Kinesis is a set of services for loading and analyzing streaming data. It handles all the underlying compute, storage, and messaging services for you.

The services in the Kinesis family are as follows:

  • Amazon Kinesis Firehose: This enables loading of massive volumes of streaming data into AWS.
  • Amazon Kinesis Streams: This service allows you to create custom applications to process and analyze streaming data in real time. There are two ends to each stream; you use the Amazon Kinesis Producer Library (KPL) to build the application that sends data into the stream. The Amazon Kinesis Client Library (KCL) is used in the application...

Applying big data technology to storage


With IoT data flooding into your cloud environment and after processing and transforming it, the next problem to solve is how to store it. The solution should support holding large datasets and be easy to interact with for analytics.

Hadoop

Hadoop is an open source effort that falls under the umbrella of the Apache Software Foundation. As defined by the official project documentation, The Apache Hadoop project develops open source software for reliable, scalable, distributed computing. Hadoop is available for free in its pure open source form.

Unless you have some Hadoop experts on your team, you should opt for one of the managed Hadoop distributions. This will give you a level of troubleshooting support and implementation advice. Cloudera and Hortonworks are two main providers of managed distributions and support. Amazon AWS and Microsoft Azure both have their own Hadoop managed services, EMR and HDInsights respectively.

Hadoop is a little difficult to...

Apache Spark for data processing


Apache Spark is a new-ish project (at least in the world of big data, which moves at warp speed) that integrates well with Hadoop but does not necessarily require Hadoop components to operate. It is a

fast and general engine for large-scale data processing

as described on the Spark project team welcome page. The tagline of

lightning fast cluster computing

is a little catchier: we like that one better.

Apache Spark logo

What is Apache Spark?

Good question, glad you asked. Spark was built for distributed cluster computing, so everything scales nicely without any code changes. The word general in the general engine description is very appropriate for Spark. It refers to the many and varied ways you can use it.

You can use it for ETL data processing, machine learning modeling, graph processing, stream data processing, and SQL and structure data processing. It is a boon for analytics in a distributed computing world.

It has APIs for multiple programming languages such...

To stream or not to stream


Streams are datasets that continuously update as each new data message arrives with little to no latency. Streaming analytics operate on this continuously updating dataset at much shorter intervals than batch processing. Real-time analytics is a little bit of a misnomer when applied to streaming analytics as intervals are typically in minutes rather than continuously ongoing. The frequency affects processing and technology requirements, so intervals should be set for longer time periods if possible in order to save costs.

Stream datasets normally keep data for a window of time, and then discard it. There are specialized technology and processing options to handle streams, which are, for the most part, in addition to requirements for long term big data store technology we have focused on in this chapter. Amazon Kinesis is an example of a specialized data streaming technology service.

The technology and the programming code base needed to support analytics are (usually...

Handling change


Change is constant, as contradictory as that sounds. The architecture, data model, and technology will constantly evolve over time. You will need to decide how to handle change in order to keep flexibility in your data storage and processing architecture. You will want to use tools that are decoupled from each other to allow future analytics to be easily integrated into the data processing stream.

Fortunately, components in the Hadoop ecosystem were intentionally designed to be decoupled from each other. They allow the mixing and matching of components and were built to be extensible for new frameworks not even invented yet. Cloud infrastructures allow for easy testing and incorporation of new technologies and software.

You will need to design a process that takes your analytics and data processing code from experimentation, to development, to production. This holds true for your overall infrastructure as well. A common practice is to keep three separate environments: one for...

Summary


In this chapter, we discussed several cloud infrastructure technologies specific to IoT and processing incoming data from devices in the field. We also reviewed some strategies to collect and store IoT data in order to enable analytics. The Hadoop ecosystem and architecture was introduced with some detail on the key components for IoT analytics.

You also learned when and why to use Spark for data processing. We discussed tradeoffs between streaming and batch processing. The Lambda architecture was introduced. Deciding how to handle change was discussed to build in flexibility to allow future analytics to be integrated with data processing.

lock icon The rest of the chapter is locked
You have been reading a chapter from
Analytics for the Internet of Things (IoT)
Published in: Jul 2017 Publisher: Packt ISBN-13: 9781787120730
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime}