Reader small image

You're reading from  Analytics for the Internet of Things (IoT)

Product typeBook
Published inJul 2017
Reading LevelIntermediate
PublisherPackt
ISBN-139781787120730
Edition1st Edition
Languages
Right arrow
Author (1)
Andrew Minteer
Andrew Minteer
author image
Andrew Minteer

Andrew Minteer is currently the senior director, data science and research at a leading global retail company. Prior to that, he served as the director, IoT Analytics and Machine Learning at a Fortune 500 manufacturing company. He has an MBA from Indiana University with a background in statistics, software development, database design, cloud architecture, and has led analytics teams for over 10 years. He first taught himself to program on an Atari 800 computer at the age of 11 and fondly remembers the frustration of waiting through 20 minutes of beeps and static to load a 100-line program. He now thoroughly enjoys launching a 1 TB GPU-backed cloud instance in a few minutes and getting right to work. Andrew is a private pilot who looks forward to spending some time in the air sometime soon. He enjoys kayaking, camping, traveling the world, and playing around with his six-year-old son and three-year-old daughter.
Read more about Andrew Minteer

Right arrow

Chapter 3. IoT Analytics for the Cloud

Now that you know how your data is transmitted back to the corporate servers, you feel you have a better understanding of it. You also have a reference frame in your head of how it is operating out in the real world.

Your boss stops by again.

"Is that rolling average job done running yet?" he asks impatiently.

It used to run fine and finished in an hour three months ago. It has steadily taken longer and longer and now sometimes does not even finish. Today, it has been going on six hours, and you are crossing your fingers. Yesterday, it crashed twice with what looked like out-of-memory errors.

You have talked to your IT group and finance group about getting a faster server with more memory. The cost would be significant and will probably take months to complete the process of going through purchasing, putting it on order, and having it installed. Your friend in finance is hesitant to approve it. The money was not budgeted for this fiscal year. You feel bad...

Building elastic analytics


IoT data volumes increase quickly. Analytics for IoT is particularly compute intensive at times that are difficult to predict. Business value is uncertain and requires a lot of experimentation to find the right implementation.

Combine all this together and you need something that scales quickly, is dynamic and responsive to resource needs, and has virtually unlimited capacity at just the right time. And all of this needs to be implemented quickly with a low cost and low maintenance needs.

Enter the cloud. IoT analytics and cloud infrastructure fit together like a hand in a glove.

What is cloud infrastructure?

The National Institute of Standards and Technology defines five essential characteristics:

  • On-demand self-service: You can provision things such as servers and storage as needed and without interacting with someone.
  • Broad network access: Your cloud resources are accessible over the internet (if enabled) by various methods, such as web browser or mobile phone.
  • Resource...

Elastic analytics concepts


What do we mean by elastic analytics? Let's define it as designing your analytics processes so that scale is not a concern. You want your focus to be on the analytics and not on the underlying technology. You want to avoid constraining your analytics capability so it will fit within some set hardware limitations. Focus instead on the potential value of your analytics versus the limit of what can be done with existing hardware.

You also want your analytics to be able to scale. It should go from supporting 100 IoT devices to 1 million IoT devices without requiring any fundamental changes. All that should happen is that the costs increase as demand increases.

This reduces complexity and increases maintainability. This translates into lower costs, which enables you to do more analytics. More analytics increases the probability of finding value. Finding more value enables even more analytics. Virtuous circle!

Some core elastic analytics concepts:

  • Separate compute from storage...

Designing for scale


Following some key concepts will help keep changes to your analytics processes to a minimum, as your needs scale.

Decouple key components

Decoupling means separating functional groups into components so they are not dependent upon each other to operate. This allows functionality to change or new functionality to be added with minimal impact on other components.

Encapsulate analytics

Encapsulate means grouping together similar functions and activities into distinct units. It is a core principle of object-oriented programming, and you should employ it in analytics as well. The goal is to reduce complexity and simplify future changes.

As your analytics develop, you will have a list of actions that is either transforming the data, running it through a model or algorithm, or reacting to the result. It can get complicated quickly. By encapsulating the analytics, it is easier to know where to make changes when needed down the road. You will also be able reconfigure parts of the process...

Cloud security and analytics


You can build security into analytics using several methods supported by major cloud infrastructure providers.

Public/private keys

Cloud providers use asymmetric cryptography throughout their services. The public and private keys are generated. You keep the private key, so the service does not have a copy. The service holds the public key. Communication using public/private key is secure and has never been broken.

The cloud provider could publish the public key in tomorrow's newspaper and it would not matter; the encryption cannot be broken with just the public key. It may seem counterintuitive that a public key is used to encrypt data but cannot be used to decrypt it. But it works.

Every time you visit a website starting with HTTPS:, a public/private key encryption is being used. It is the basis of SSL and TLS encryption, which is employed for HTTPS communications.

You will use the public/private keys often for IoT analytics when you build secure processes. Think...

The AWS overview


In the land of cloud infrastructure, AWS is the king. It was the first of its kind, launched in 2005, and is the largest by a wide margin. It is ranked number one in every segment of Gartner magic quadrants on cloud infrastructure providers.

As reported by Computerworld in 2016, it has ten times the compute capacity of its 14 closest rivals combined. Entire companies, such as Netflix and AirBnB, run their operations on it. As can be seen in the following chart, AWS has over 30% of the market share, with the next closest competitor at 9%.

AWS offers a wide range of services from networking to compute to IoT. The following is a listing of the services from AWS management console. The management console is where you launch new services, monitor existing ones, and review billing:

AWS services list. Source: AWS management console

You can reduce these into three categories of services that you need to have configured properly to support your analytics: networking, compute, and storage...

Microsoft Azure overview


Microsoft offers a cloud infrastructure service called Azure that competes directly with AWS. It is generally ranked as number two in the industry in size and capabilities, although it has been closing that gap recently.

The range of services is similar to AWS but with a Microsoft flavor. Microsoft markets the services as easier to integrate with corporate on-premise networks. Integration leans more toward Microsoft technology, such as the Windows operating system, the .NET programming language, and the SQL Server database.

We will review some of the services of interest for IoT analytics.

Azure Data Lake Store

Azure Data Lake Store is compatible with Hadoop Distributed File System (HDFS), which we will be discussing in Chapter 4, Creating an AWS Cloud Analytics Environment. It also has a REST interface for applications that is WebHDFS-compatible.

Data stored in Data Lake Store can be analyzed using analytic frameworks within the Hadoop ecosystem, such as MapReduce and...

The ThingWorx overview


The company PTC, which has a long history in creating software for the world of machines talking to machines, developed ThingWorx. It is an application development environment for building IoT solutions. It is a software platform that abstracts IoT devices and related components and services into model-based development objects. You install the software on your own hardware (or using cloud providers virtual instances).

The platform makes it easy to model your devices, the data, and has the ability to quickly create dashboards through a web-based application. No code is required. ThingWorx is also extensible to third-party components through its marketplace. This makes it easy to add in a third-party functionality without special configuration. It can also integrate with both AWS and Azure IoT hub services.

There are multiple components of ThingWorx. ThingWorx Foundation is the center of the platform. It is divided into three areas, as shown in the following image:

  • ThingWorx...

Summary


In this chapter, we reviewed what is meant by elastic analytics and learned the advantages of using cloud infrastructure for IoT analytics. Designing for scale was discussed along with distributed computing.

The two main cloud providers were introduced, AWS and Microsoft Azure. We also reviewed a purpose-built software platform, ThingWorx, made for IoT devices, communications, and analysis.

lock icon
The rest of the chapter is locked
You have been reading a chapter from
Analytics for the Internet of Things (IoT)
Published in: Jul 2017Publisher: PacktISBN-13: 9781787120730
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
undefined
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $15.99/month. Cancel anytime

Author (1)

author image
Andrew Minteer

Andrew Minteer is currently the senior director, data science and research at a leading global retail company. Prior to that, he served as the director, IoT Analytics and Machine Learning at a Fortune 500 manufacturing company. He has an MBA from Indiana University with a background in statistics, software development, database design, cloud architecture, and has led analytics teams for over 10 years. He first taught himself to program on an Atari 800 computer at the age of 11 and fondly remembers the frustration of waiting through 20 minutes of beeps and static to load a 100-line program. He now thoroughly enjoys launching a 1 TB GPU-backed cloud instance in a few minutes and getting right to work. Andrew is a private pilot who looks forward to spending some time in the air sometime soon. He enjoys kayaking, camping, traveling the world, and playing around with his six-year-old son and three-year-old daughter.
Read more about Andrew Minteer