How-To Tutorials

20 Jun 2017

14 min read

CORS in Node.js

20 Jun 2017

0
0
25710

How-To Tutorials

article-image-understanding-basics-rxjava

Packt

20 Jun 2017

15 min read

Understanding the Basics of RxJava

Packt

20 Jun 2017

15 min read

0
0
14739

Packt

20 Jun 2017

14 min read

Introduction to NFRs

Packt

20 Jun 2017

14 min read

0
0
7250

article-image-monitoring-logging-and-troubleshooting

Packt

20 Jun 2017

6 min read

Monitoring, Logging, and Troubleshooting

Packt

20 Jun 2017

6 min read

In this article by Gigi Sayfan, the author of the book Mastering Kubernetes, we will learn how to do the monitoring Kubernetes with Heapster. (For more resources related to this topic, see here.) Monitoring Kubernetes with Heapster Heapster is a Kubernetes project that provides a robust monitoring solution for Kubernetes clusters. It runs as a pod (of course), so it can be managed by Kubernetes itself. Heapster supports Kubernetes and CoreOS clusters. It has a very modular and flexible design. Heapster collects both operational metrics and events from every node in the cluster, stores them in a persistent backend (with a well-defined schema) and allows visualization and programmatic access. Heapster can be configured to use different backends (or sinks, in Heapster’s parlance) and their corresponding visualization frontends. The most common combination is InfluxDB as backend and Grafana as frontend. The Google Cloud platform integrates Heapster with the Google monitoring service. There are many other less common backends, such as the following: Log InfluxDB Google Cloud monitoring Google Cloud logging Hawkular-Metrics(metrics only) OpenTSDB Monasca (metrics only) Kafka (metrics only) Riemann (metrics only) Elasticsearch You can use multiple backends by specifying sinks on the command-line: --sink=log --sink=influxdb:http://monitoring-influxdb:80/ cAdvisor cAdvisor is part of the kubelet, which runs on every node. It collects information about the CPU/cores usage, memory, network,and file systems of each container. It provides a basic UI on port 4194, but, most importantly for Heapster, it provides all this information through the kubelet. Heapster records the information collected by cAdvisor on each node and stores it in its backend for analysis and visualization. The cAdvisor UI is useful if you want to quickly verify that a particular node is setup correctly, for example, while creating a new cluster when Heapster is not hooked up yet. Here is what it looks same as shown following: InfluxDB backend InfluxDB is a modern and robust distributed time-series database. It is very well-suited and used broadly for centralized metrics and logging. It is also the preferred Heapster backend (outside the Google Cloud platform). The only thing is InfluxDB clustering, high availability is part of enterprise offering. The storageschema The InfluxDB storage schema defines the information that Heapster stores in InfluxDB and is available for querying and graphing later. The metrics are divided into multiple categories, called measurements. You can treat and query each metric separately, or you can query a whole category as one measurement and receive the individual metrics as fields. The naming convention is <category>/<metrics name> (except for uptime, which has a single metric). If you have a SQL background you can think of measurements as tables. Each metrics are stored per container. Each metric is labeled with the following information: pod_id – Unique ID of a pod pod_name – User-provided name of a pod pod_namespace – The namespace of a pod container_base_image – Base image for the container container_name – User-provided name of the container or full cgroup name for system containers host_id – Cloud-provider-specified or user-specified Identifier of a node hostname – Hostname where the container ran labels – Comma-separated list of user-provided labels; format is key:value’ namespace_id – UID of the namespace of a pod resource_id – A unique identifier used to differentiate multiple metrics of the same type, for example, FS partitions under filesystem/usage Here are all the metrics grouped by category. As you can see, it is quite extensive. CPU cpu/limit – CPU hard limit in millicores cpu/node_capacity – CPU capacity of a node cpu/node_allocatable – CPU allocatable of a node cpu/node_reservation – Share of CPU that is reserved on the node allocatable cpu/node_utilization – CPU utilization as a share of node allocatable cpu/request – CPU request (the guaranteed amount of resources) in millicores cpu/usage – Cumulative CPU usage on all cores cpu/usage_rate – CPU usage on all cores in millicores File system filesystem/usage – Total number of bytes consumed on a filesystem filesystem/limit – The total size of the filesystem in bytes filesystem/available – The number of available bytes remaining in the filesystem Memory memory/limit – Memory hard limit in bytes memory/major_page_faults – Number of major page faults memory/major_page_faults_rate – Number of major page faults per second memory/node_capacity – Memory capacity of a node memory/node_allocatable – Memory allocatable of a node memory/node_reservation – Share of memory that is reserved on the node allocatable memory/node_utilization – Memory utilization as a share of memory allocatable memory/page_faults – Number of page faults memory/page_faults_rate – Number of page faults per second memory/request – Memory request (the guaranteed amount of resources) in bytes memory/usage – Total memory usage memory/working_set – Total working set usage; working set is the memory being used and not easily dropped by the kernel Network network/rx – Cumulative number of bytes received over the network network/rx_errors – Cumulative number of errors while receiving over the network network/rx_errors_rate – Number of errors per second while receiving over the network network/rx_rate – Number of bytes received over the network per second network/tx – Cumulative number of bytes sent over the network network/tx_errors – Cumulative number of errors while sending over the network network/tx_errors_rate – Number of errors while sending over the network network/tx_rate – Number of bytes sent over the network per second Uptime uptime – Number of milliseconds since the container was started You can work with InfluxDB directly if you’re familiar with it. You can either connect to it using its own API or use its web interface. Type the following command to find its port: k describe service monitoring-influxdb --namespace=kube-system | grep NodePort Type: NodePort NodePort: http 32699/TCP NodePort: api 30020/TCP Now you can browse to the InfluxDB web interface using the HTTP port. You’ll need to configure it to point to the API port. The username and password are root and root by default: Once you’re setup you can select what database to use (see top-right corner). The Kubernetes database is called k8s. You can now query the metrics using the InfluxDB query language. Grafana visualization Grafana runs in its own container and serves a sophisticated dashboard that works well with InfluxDB as a data source. To locate the port, type the following command: k describe service monitoring-influxdb --namespace=kube-system | grep NodePort Type: NodePort NodePort: <unset> 30763/TCP Now you can access the Grafana web interface on that port. The first thing you need to do is setup the data source to point to the InfluxDB backend: Make sure to test the connection and then go explore the various options in the dashboards. There are several default dashboards, but you should be able to customize it to your preferences. Grafana is designed to let adapt it to your needs. Summary In this article we have learned how to do monitoring Kubernetes with Heapster. Resources for Article: Further resources on this subject: The Microsoft Azure Stack Architecture [article] Building A Recommendation System with Azure [article] Setting up a Kubernetes Cluster [article]

0
0
21389

article-image-manipulating-functions-functional-programming

Packt

20 Jun 2017

6 min read

Manipulating functions in functional programming

Packt

20 Jun 2017

6 min read

0
0
18827

Packt

19 Jun 2017

15 min read

Understanding the Basics of Gulp

Packt

19 Jun 2017

15 min read

In this article written by Travis Maynard, author of the book Getting Started with Gulp - Second Edition, we will take a look at the basics of gulp and how it works. Understanding some of the basic principles and philosophies behind the tool, it's plugin system will assist you as you begin writing your own gulpfiles. We'll start by taking a look at the engine behind gulp and then follow up by breaking down the inner workings of gulp itself. By the end of this article, you will be prepared to begin writing your own gulpfile. (For more resources related to this topic, see here.) Installing node.js and npm As you learned in the introduction, node.js and npm are the engines that work behind the scenes that allow us to operate gulp and keep track of any plugins we decide to use. Downloading and installing node.js For Mac and Windows, the installation is quite simple. All you need to do is navigate over to http://nodejs.org and click on the big green install button. Once the installer has finished downloading, run the application and it will install both node.js and npm. For Linux, there are a couple more steps, but don't worry; with your newly acquired command-line skills, it should be relatively simple. To install node.js and npm on Linux, you'll need to run the following three commands in Terminal: sudo add-apt-repository ppa:chris-lea/node.js sudo apt-get update sudo apt-get install nodejs The details of these commands are outside the scope of this book, but just for reference, they add a repository to the list of available packages, update the total list of packages, and then install the application from the repository we added. Verify the installation To confirm that our installation was successful, try the following command in your command line: node -v If node.js is successfully installed, node -v will output a version number on the next line of your command line. Now, let's do the same with npm: npm -v Like before, if your installation was successful, npm -v should output the version number of npm on the next line. The versions displayed in this screenshot reflect the latest Long Term Support (LTS) release currently available as of this writing. This may differ from the version that you have installed depending on when you're reading this. It's always suggested that you use the latest LTS release when possible. The -v command is a common flag used by most command-line applications to quickly display their version number. This is very useful to debug version issues while using command-line applications. Creating a package.json file Having npm in our workflow will make installing packages incredibly easy; however, we should look ahead and establish a way to keep track of all the packages (or dependencies) that we use in our projects. Keeping track of dependencies is very important to keep your workflow consistent across development environments. Node.js uses a file named package.json to store information about your project, and npm uses this same file to manage all of the package dependencies your project requires to run properly. In any project using gulp, it is always a great practice to create this file ahead of time so that you can easily populate your dependency list as you are installing packages or plugins. To create the package.json file, we will need to run npm's built in init action using the following command: npm init Now, using the preceding command, the terminal will show the following output: Your command line will prompt you several times asking for basic information about the project, such as the project name, author, and the version number. You can accept the defaults for these fields by simply pressing the Enter key at each prompt. Most of this information is used primarily on the npm website if a developer decides to publish a node.js package. For our purposes, we will just use it to initialize the file so that we can properly add our dependencies as we move forward. The screenshot for the preceding command is as follows: Installing gulp With npm installed and our package.json file created, we are now ready to begin installing node.js packages. The first and most important package we will install is none other than gulp itself. Locating gulp Locating and gathering information about node.js packages is very simple, thanks to the npm registry. The npm registry is a companion website that keeps track of all the published node.js modules, including gulp and gulp plugins. You can find this registry at http://npmjs.org. Take a moment to visit the npm registry and do a quick search for gulp. The listing page for each node.js module will give you detailed information on each project, including the author, version number, and dependencies. Additionally, it also features a small snippet of command-line code that you can use to install the package along with readme information that will outline basic usage of the package and other useful information. Installing gulp locally Before we install gulp, make sure you are in your project's root directory, gulp-book, using the cd and ls commands you learned earlier. If you ever need to brush up on any of the standard commands, feel free to take a moment to step back and review as we progress through the book. To install packages with npm, we will follow a similar pattern to the ones we've used previously. Since we will be covering both versions 3.x and 4.x in this book, we'll demonstrate installing both: For installing gulp 3.x, you can use the following: npm install --save-dev gulp For installing gulp 4.x, you can use the following: npm install --save-dev gulpjs/gulp#4.0 This command is quite different from the 3.x command because this command is installing the latest development release directly from GitHub. Since the 4.x version is still being actively developed, this is the only way to install it at the time of writing this book. Once released, you will be able to run the previous command to without installing from GitHub. Upon executing the command, it will result in output similar to the following: To break this down, let's examine each piece of this command to better understand how npm works: npm: This is the application we are running install: This is the action that we want the program to run. In this case, we are instructing npm to install something in our local folder --save-dev: This is a flag that tells npm to add this module to the dev dependencies list in our package.json file gulp: This is the package we would like to install Additionally, npm has a –-save flag that saves the module to the list of dependencies instead of devDependencies. These dependency lists are used to separate the modules that a project depends on to run, and the modules a project depends on when in development. Since we are using gulp to assist us in development, we will always use the --save-dev flag throughout the book. So, this command will use npm to contact the npm registry, and it will install gulp to our local gulp-book directory. After using this command, you will note that a new folder has been created that is named node_modules. It is where node.js and npm store all of the installed packages and dependencies of your project. Take a look at the following screenshot: Installing gulp-cli globally For many of the packages that we install, this will be all that is needed. With gulp, we must install a companion module gulp-cli globally so that we can use the gulp command from anywhere in our filesystem. To install gulp-cli globally, use the following command: npm install -g gulp-cli In this command, not much has changed compared to the original command where we installed the gulp package locally. We've only added a -g flag to the command, which instructs npm to install the package globally. On Windows, your console window should be opened under an administrator account in order to install an npm package globally. At first, this can be a little confusing, and for many packages it won't apply. Similar build systems actually separate these usages into two different packages that must be installed separately; once that is installed globally for command-line use and another installed locally in your project. Gulp was created so that both of these usages could be combined into a single package, and, based on where you install it, it could operate in different ways. Anatomy of a gulpfile Before we can begin writing tasks, we should take a look at the anatomy and structure of a gulpfile. Examining the code of a gulpfile will allow us to better understand what is happening as we run our tasks. Gulp started with four main methods:.task(), .src(), .watch(), and .dest(). The release of version 4.x introduced additional methods such as: .series() and .parallel(). In addition to the gulp API methods, each task will also make use of the node.js .pipe() method. This small list of methods is all that is needed to understand how to begin writing basic tasks. They each represent a specific purpose and will act as the building blocks of our gulpfile. The task() method The .task() method is the basic wrapper for which we create our tasks. Its syntax is .task(string, function). It takes two arguments—string value representing the name of the task and a function that will contain the code you wish to execute upon running that task. The src() method The .src() method is our input or how we gain access to the source files that we plan on modifying. It accepts either a single glob string or an array of glob strings as an argument. Globs are a pattern that we can use to make our paths more dynamic. When using globs, we can match an entire set of files with a single string using wildcard characters as opposed to listing them all separately. The syntax is for this method is .src(string || array). The watch() method The .watch() method is used to specifically look for changes in our files. This will allow us to keep gulp running as we code so that we don't need to rerun gulp any time we need to process our tasks. This syntax is different between the 3.x and 4.x version. For version 3.x the syntax is—.watch(string || array, array) with the first argument being our paths/globs to watch and the second argument being the array of task names that need to be run when those files change. For version 4.x the syntax has changed a bit to allow for two new methods that provide more explicit control of the order in which tasks are executed. When using 4.x instead of passing in an array as the second argument, we will use either the .series() or .parallel() method like so—.watch(string || array, gulp.series() || gulp.parallel()). The dest() method The dest() method is used to set the output destination of your processed file. Most often, this will be used to output our data into a build or dist folder that will be either shared as a library or accessed by your application. The syntax for this method is .dest(string). The pipe() method The .pipe() method will allow us to pipe together smaller single-purpose plugins or applications into a pipechain. This is what gives us full control of the order in which we would need to process our files. The syntax for this method is .pipe(function). The parallel() and series() methods The parallel and series methods were added in version 4.x as a way to easily control whether your tasks are run together all at once or in a sequence one after the other. This is important if one of your tasks requires that other tasks complete before it can be ran successfully. When using these methods the arguments will be the string names of your tasks separated by a comma. The syntax for these methods is—.series(tasks) and .parallel(tasks); Understanding these methods will take you far, as these are the core elements of building your tasks. Next, we will need to put these methods together and explain how they all interact with one another to create a gulp task. Including modules/plugins When writing a gulpfile, you will always start by including the modules or plugins you are going to use in your tasks. These can be both gulp plugins or node.js modules, based on what your needs are. Gulp plugins are small node.js applications built for use inside of gulp to provide a single-purpose action that can be chained together to create complex operations for your data. Node.js modules serve a broader purpose and can be used with gulp or independently. Next, we can open our gulpfile.js file and add the following code: // Load Node Modules/Plugins var gulp = require('gulp'); var concat = require('gulp-concat'); var uglify = require('gulp-uglify'); The gulpfile.js file will look as shown in the following screenshot: In this code, we have included gulp and two gulp plugins: gulp-concat and gulp-uglify. As you can now see, including a plugin into your gulpfile is quite easy. After we install each module or plugin using npm, you simply use node.js' require() function and pass it in the name of the module. You then assign it to a new variable so that you can use it throughout your gulpfile. This is node.js' way of handling modularity, and because a gulpfile is essentially a small node.js application, it adopts this practice as well. Writing a task All tasks in gulp share a common structure. Having reviewed the five methods at the beginning of this section, you will already be familiar with most of it. Some tasks might end up being larger than others, but they still follow the same pattern. To better illustrate how they work, let's examine a bare skeleton of a task. This skeleton is the basic bone structure of each task we will be creating. Studying this structure will make it incredibly simple to understand how parts of gulp work together to create a task. An example of a sample task is as follows: gulp.task(name, function() { return gulp.src(path) .pipe(plugin) .pipe(plugin) .pipe(gulp.dest(path)); }); In the first line, we use the new gulp variable that we created a moment ago and access the .task() method. This creates a new task in our gulpfile. As you learned earlier, the task method accepts two arguments: a task name as a string and a callback function that will contain the actions we wish to run when this task is executed. Inside the callback function, we reference the gulp variable once more and then use the .src() method to provide the input to our task. As you learned earlier, the source method accepts a path or an array of paths to the files that we wish to process. Next, we have a series of three .pipe() methods. In each of these pipe methods, we will specify which plugin we would like to use. This grouping of pipes is what we call our pipechain. The data that we have provided gulp with in our source method will flow through our pipechain to be modified by each piped plugin that it passes through. The order of the pipe methods is entirely up to you. This gives you a great deal of control in how and when your data is modified. You may have noticed that the final pipe is a bit different. At the end of our pipechain, we have to tell gulp to move our modified file somewhere. This is where the .dest() method comes into play. As we mentioned earlier, the destination method accepts a path that sets the destination of the processed file as it reaches the end of our pipechain. If .src() is our input, then .dest() is our output. Reflection To wrap up, take a moment to look at a finished gulpfile and reflect on the information that we just covered. This is the completed gulpfile that we will be creating from scratch, so don't worry if you still feel lost. This is just an opportunity to recognize the patterns and syntaxes that we have been studying so far. We will begin creating this file step by step: // Load Node Modules/Plugins var gulp = require('gulp'); var concat = require('gulp-concat'); var uglify = require('gulp-uglify'); // Process Styles gulp.task('styles', function() { return gulp.src('css/*.css') .pipe(concat('all.css')) .pipe(gulp.dest('dist/')); }); // Process Scripts gulp.task('scripts', function() { return gulp.src('js/*.js') .pipe(concat('all.js')) .pipe(uglify()) .pipe(gulp.dest('dist/')); }); // Watch Files For Changes gulp.task('watch', function() { gulp.watch('css/*.css', 'styles'); gulp.watch('js/*.js', 'scripts'); }); // Default Task gulp.task('default', gulp.parallel('styles', 'scripts', 'watch')); The gulpfile.js file will look as follows: Summary In this article, you installed node.js and learned the basics of how to use npm and understood how and why to install gulp both locally and globally. We also covered some of the core differences between the 3.x and 4.x versions of gulp and how they will affect your gulpfiles as we move forward. To wrap up the article, we took a small glimpse into the anatomy of a gulpfile to prepare us for writing our own gulpfiles from scratch. Resources for Article: Further resources on this subject: Performing Task with Gulp [article] Making a Web Server in Node.js [article] Developing Node.js Web Applications [article]

0
0
17807

How-To Tutorials

article-image-overview-important-concepts-microsoft-dynamics-nav-2016

Packt

19 Jun 2017

15 min read

Overview of Important Concepts of Microsoft Dynamics NAV 2016

Packt

19 Jun 2017

15 min read

0
0
1999

How-To Tutorials

article-image-basics-python-absolute-beginners

Packt

19 Jun 2017

5 min read

Basics of Python for Absolute Beginners

Packt

19 Jun 2017

5 min read

In this article by Bhaskar Das and Mohit Raj, authors of the book, Learn Python in 7 days, we will learn basics of Python. The Python language had a humble beginning in the late 1980s when a Dutchman, Guido Von Rossum, started working on a fun project that would be a successor to the ABC language with better exception handling and capability to interface with OS Amoeba at Centrum Wiskunde and Informatica. It first appeared in 1991. Python 2.0 was released in the year 2000 and Python 3.0 was released in the year 2008. The language was named Python after the famous British television comedy show Monty Python's Flying Circus, which was one of the favorite television programmes of Guido. Here, we will see why Python has suddenly influenced our lives, various applications that use Python, and Python's implementations. In this article, you will be learning the basic installation steps required to perform on different platforms (that is Windows, Linux and Mac), about environment variables, setting up environment variables, file formats, Python interactive shell, basic syntaxes, and, finally, printing out the formatted output. (For more resources related to this topic, see here.) Why Python? Now you might be suddenly bogged with the question, why Python? According to the Institute of Electrical and Electronics Engineers (IEEE) 2016 ranking, Python ranked third after C and Java. As per Indeed.com's data of 2016, Python job market search ranked fifth. Clearly, all the data points to the ever-rising demand in the job market for Python. It's a cool language if you want to learn it just for fun. Also, you will adore the language if you want to build your career around Python. At the school level, many schools have started including Python programming for kids. With new technologies taking the market by surprise, Python has been playing a dominant role. Whether it's cloud platform, mobile app development, BigData, IoT with Raspberry Pi, or the new Blockchain technology, Python is being seen as a niche language platform to develop and deliver scalable and robust applications. Some key features of the language are: Python programs can run on any platform, you can carry code created in a Windows machine and run it on Mac or Linux Python has a large inbuilt library with prebuilt and portable functionality, known as the standard library Python is an expressive language Python is free and open source Python code is about one third of the size of equivalent C++ and Java code. Python can be both dynamically and strongly typed In dynamically typed, the type of a variable is interpreted at runtime, which means that there is no need to define the type (int, float) of a variable in Python Python applications One of the most famous platform where Python is extensively used is YouTube. Other places where you will find Python being extensively used are special effects in Hollywood movies, drug evolution and discovery, traffic control systems, ERP systems, cloud hosting, e-commerce platform, CRM systems, and whichever field you can think of. Versions At the time of writing this book, the two main versions of the Python programming language available in the market were Python 2.x and Python 3.x. The stable releases at the time of writing this book were Python 2.7.13 and Python 3.6.0. Implementations of Python Major implementations include CPython, Jython, IronPython, MicroPython and PyPy. Installation Here, we will look forward to the installation of Python on three different OS platforms, namely Windows, Linux, and Mac OS. Let's begin with the Windows platform. Installation on Windows platform Python 2.x can be downloaded from https://www.python.org/downloads. The installer is simple and easy to install. Follow these steps to install the setup: Once you click on the setup installer, you will get a small window on your desktop screen as shown. Click onNext: Provide a suitable installation folder to install Python. If you don't provide the installation folder, then the installer will automatically create an installation folder for you as shown in the screenshot shown. Click on Next: After the completion of Step 2, you will get a window to customize Python as shown in the following screenshot. Note that theAdd python.exe to Path option has been markedx. Select this option to add it to system path variable. Click onNext: Finally, clickFinish to complete the installation: Summary So far, we did a walk through on the beginning and brief history of Python. We looked at the various implementations and flavors of Python. You also learned about installing on Windows OS. Hope this article has incited enough interest in Python and serves as your first step in the kingdom of Python, with enormous possibilities! Resources for Article: Further resources on this subject: Layout Management for Python GUI [article] Putting the Fun in Functional Python [article] Basics of Jupyter Notebook and Python [article]

0
0
12415

article-image-lambdaarchitecture-pattern

Packt

19 Jun 2017

8 min read

LambdaArchitecture Pattern

Packt

19 Jun 2017

8 min read

In this article by Tomcy John and Pankaj Misra, the authors of the book, Data Lake For Enterprises, we will learn about how the data in landscape of big data solutions can be made in near real time and certain practices that can be adopted for realizing Lambda Architecture in context of data lake. (For more resources related to this topic, see here.) The concept of a data lake in an enterprise was driven by certain challenges that Enterprises were facing with the way the data was handled, processed, and stored. Initially all the individual applications in the Enterprise, via a natural evolution cycle, started maintaining huge amounts of data into themselves with almost no reuse to other applications in the same enterprise. These created information silos across arious applications. As the next step of evolution, these individual applications started exposing this data across the organization as a data mart access layer over central data warehouse. While data mart solved one part of the problem, other problems still persisted. These problems were more about data governance, data ownership, data accessibility which were required to be resolved so as to have better availability of enterprise relevant data. This is where a need was felt to have data lakes, that could not only make such data available but also could store any form of data and process it so that data is analyzed and kept ready for consumption by consumer applications. In this article, we will look at some of the critical aspects of a data lake and understand why does it matter for an enterprise. If we need to define the term Data Lake, it can be defined as a vast repository of variety of enterprise wide raw information that can be acquired, processed, analyzed and delivered. The information thus handled could be any type of information ranging from structured, semi-structured data to completely unstructured data. Data Lake is expected to be able to derive Enterprise relevant meaning and insights from this information using various analysis and machine learning algorithms. Lambda architecture and data lake Lambda architecture as a pattern provides ways and means to perform highly scalable, performant, distributed computing on large sets of data and yet provide consistent (eventually) data with required processing both in batch as well as in near real time. Lambda architecture defines ways and means to enable scale out architecture across various data load profiles in an enterprise, with low latency expectations. The architecture pattern became significant with the emergence of big data and enterprise’s focus on real-time analytics and digital transformation. The pattern named Lambda (symbol λ) is indicative of a way by which data comes from two places (batch and speed - the curved parts of the lambda symbol) which then combines and served through the serving layer (the line merging from the curved part). Figure 01 : Lambda Symbol The main layers constituting the Lambda layer are shown below: Figyure 02 : Components of Lambda Architecure In the above high level representation, data is fed to both the batch and speed layer. The batch layer keeps producing and re-computing views at every set batch interval. The speed layer also creates the relevant real-time/speed views. The serving layer orchestrates the query by querying both the batch and speed layer, merges it and sends the result back as results. A practical realization of such a data lake can be illustrated as shown below. The figure below shows multiple technologies used for such a realization, however once the data is acquired from multiple sources and queued in messaging layer for ingestion, the Lambda architecture pattern in form of ingestion layer, batch layer.and speed layer springs into action: Figure 03: Layers in Data Lake Data Acquisition Layer:In an organization, data exists in various forms which can be classified as structured data, semi-structured data, or as unstructured data.One of the key roles expected from the acquisition layer is to be able convert the data into messages that can be further processed in a data lake, hence the acquisition layer is expected to be flexible to accommodate variety of schema specifications at the same time must have a fast connect mechanism to seamlessly push all the translated data messages into the data lake. A typical flow can be represented as shown below. Figure 04: Data Acquisition Layer Messaging Layer: The messaging layer would form the Message Oriented Middleware (MOM) for the data lake architecture and hence would be the primary layer for decoupling the various layers with each other, but with guaranteed delivery of messages.The other aspect of a messaging layer is its ability to enqueue and dequeue messages, as in the case with most of the messaging frameworks. Most of the messaging frameworks provide enqueue and dequeue mechanisms to manage publishing and consumption of messages respectively. Every messaging frameworks provides its own set of libraries to connect to its resources(queues/topics). Figure 05: Message Queue Additionally the messaging layer also can perform the role of data stream producer which can converted the queued data into continuous streams of data which can be passed on for data ingestion. Data Ingestion Layer: A fast ingestion layer is one of the key layers in Lambda Architecture pattern. This layer needs to ensure how fast can data be delivered into working models of Lambda architecture. The data ingestion layer is responsible for consuming the messages from the messaging layer and perform the required transformation for ingesting them into the lambda layer (batch and speed layer) such that the transformed output conforms to the expected storage or processing formats. Figure 06: Data Ingestion Layer Batch Processing: The batch processing layer of lambda architecture is expected to process the ingested data in batches so as to have optimum utilization of system resources, at the same time, long running operations may be applied to the data to ensure high quality of data output, which is also known as Modelled data. The conversion of raw data to a modelled data is the primary responsibility of this layer, wherein the modelled data is the data model which can be served by serving layers of lambda architecture. While Hadoop as a framework has multiple components that can process data as a batch, each data processing in Hadoop is a map reduce process. A map and reduce paradigm of process execution is not a new paradigm, rather it has been used in many application ever since mainframe systems came into existence. It is based on divide and rule and stems from the traditional multi-threading model. The primary mechanism here is to divide the batch across multiple processes and then combine/reduce output of all the processes into a single output. Figure 07: Batch Processing Speed (Near Real Time) Data Processing: This layer is expected to perform near real time processing on data received from ingestion layer. Since the processing is expected to be in near real time, such data processing will need to be quick, fast and efficient, with support and design for high concurrency scenarios and eventually consistent outcome. The real-time processing was often dependent on data like the look-up data and reference data, hence there was a need to have a very fast data layer such that any look-up or reference data does not adversely impact the real-time nature of the processing. Near real time data processing pattern is not very different from the way it is done in batch mode, but the primary difference being that the data is processed as soon as it is available for processing and does not have to be batched, as shown below. Figure 08: Speed (Near Real Time) Processing Data Storage Layer: The data storage layer is very eminent in the lambda architecture pattern as this layer defines the reactivity of the overall solution to the incoming event/data streams. The storage, in context of lambda architecture driven data lake can be classified broadly into non-indexed and indexed data storage. Typically, the batch processing is performed on non-indexed data stored as data blocks for faster batch processing, while speed (near real time processing) is performed on indexed data which can be accessed randomly and supports complex search patterns by means of inverted indexes. Both of these models are depicted below. Figure 09: Non-Indexed and Indexed Data Storage Examples Lambda in action Once all the layers in lambda architecture have performed their respective roles, the data can be exported, exposed via services and can be delivered through other protocols from the data lake. This can also include merging the high quality processed output from batch processing with indexed storage, using technologies and frameworks, so as to provide enriched data for near real time requirements as well with interesting visualizations. Figure 10: Lambda in action Summary In this article we have briefly discussed a practical approach towards implementing a data lake for enterprises by leveraging Lambda architecture pattern. Resources for Article: Further resources on this subject: The Microsoft Azure Stack Architecture [article] System Architecture and Design of Ansible [article] Microservices and Service Oriented Architecture [article]

0
0
4658

article-image-thread-synchronization-and-communication

Packt

19 Jun 2017

20 min read

Thread synchronization and communication

Packt

19 Jun 2017

20 min read

0
0
20849

article-image-introduction-cyber-extortion

Packt

19 Jun 2017

21 min read

Introduction to Cyber Extortion

Packt

19 Jun 2017

21 min read

In this article by Dhanya Thakkar, the author of the book Preventing Digital Extortion, explains how often we make the mistake of relying on the past for predicting the future, and nowhere is this more relevant than in the sphere of the Internet and smart technology. People, processes, data, and things are tightly and increasingly connected, creating new, intelligent networks unlike anything else we have seen before. The growth is exponential and the consequences are far reaching for individuals, and progressively so for businesses. We are creating the Internet of Things and the Internet of Everything. (For more resources related to this topic, see here.) It has become unimaginable to run a business without using the Internet. It is not only an essential tool for current products and services, but an unfathomable well for innovation and fresh commercial breakthroughs. The transformative revolution is spillinginto the public sector, affecting companies like vanguards and diffusing to consumers, who are in a feedback loop with suppliers, constantly obtaining and demanding new goods. Advanced technologies that apply not only to machine-to-machine communication but also to smart sensors generate complex networks to which theoretically anything that can carry a sensor can be connected. Cloud computing and cloud-based applications provide immense yet affordable storage capacity for people and organizations and facilitate the spread of data in more ways than one. Keeping in mind the Internet’s nature, the physical boundaries of business become blurred, and virtual data protection must incorporate a new characteristic of security: encryption. In the middle of the storm of the IoT, major opportunities arise, and equally so, unprecedented risks lurk. People often think that what they put on the Internet is protected and closed information. It is hardly so. Sending an e-mail is not like sending a letter in a closed envelope. It is more like sending a postcard, where anyone who gets their hands on it can read what's written on it. Along with people who want to utilize the Internet as an open business platform, there are people who want to find ways of circumventing legal practices and misusing the wealth of data on computer networks by unlawfully gaining financial profits, assets, or authority that can be monetized. Being connected is now critical. As cyberspace is growing, so are attempts to violate vulnerable information gaining global scale. This newly discovered business dynamic is under persistent threat of criminals. Cyberspace, cyber crime, and cyber security are perceptibly being found in the same sentence. Cyber crime –under defined and under regulated A massive problem encouraging the perseverance and evolution of cyber crime is the lack of an adequate unanimous definition and the under regulation on a national, regional, and global level. Nothing is criminal unless stipulated by the law. Global law enforcement agencies, academia, and state policies have studied the constant development of the phenomenon since its first appearance in 1989, in the shape of the AIDS Trojan virus transferred from an infected floppy disk. Regardless of the bizarre beginnings, there is nothing entertaining about cybercrime. It is serious. It is dangerous. Significant efforts are made to define cybercrime on a conceptual level in academic research and in national and regional cybersecurity strategies. Still, as the nature of the phenomenon evolves, so must the definition. Research reports are still at a descriptive level, and underreporting is a major issue. On the other hand, businesses are more exposed due to ignorance of the fact that modern-day criminals increasingly rely on the Internet to enhance their criminal operations. Case in point: Aaushi Shah and Srinidhi Ravi from the Asian School of Cyber Laws have created a cybercrime list by compiling a set of 74 distinctive and creativelynamed actions emerging in the last three decades that can be interpreted as cybercrime. These actions target anything from e-mails to smartphones, personal computers, and business intranets: piggybacking, joe jobs, and easter eggs may sound like cartoons, but their true nature resembles a crime thriller. The concept of cybercrime Cyberspace is a giant community made out of connected computer users and data on a global level. As a concept, cybercrime involves any criminal act dealing withcomputers andnetworks, including traditional crimes in which the illegal activities are committed through the use of a computer and the Internet. As businesses become more open and widespread, the boundary between data freedom and restriction becomes more porous. Countless e-shopping transactions are made, hospitals keep record of patient histories, students pass exams, and around-the-clock payments are increasingly processed online. It is no wonder that criminals are relentlessly invading cyberspace trying to find a slipping crack. There are no recognizable border controls on the Internet, but a business that wants to evade harm needs to understand cybercrime's nature and apply means to restrict access to certain information. Instead of identifying it as a single phenomenon, Majid Jar proposes a common denominator approach for all ICT-related criminal activities. In his book Cybercrime and Society, Jar refers to Thomas and Loader’s working concept of cybercrime as follows: “Computer-mediated activities which are either illegal or considered illicit by certain parties and which can be conducted through global electronic network.” Jar elaborates the important distinction of this definition by emphasizing the difference between crime and deviance. Criminal activities are explicitly prohibited by formal regulations and bear sanctions, while deviances breach informal social norms. This is a key note to keep in mind. It encompasses the evolving definition of cybercrime, which keeps transforming after resourceful criminals who constantly think of new ways to gain illegal advantages. Law enforcement agencies on a global level make an essential distinction between two subcategories of cybercrime: Advanced cybercrime or high-tech crime Cyber-enabled crime The first subcategory, according to Interpol, includes newly emerged sophisticated attacks against computer hardware and software. On the other hand, the second category contains traditional crimes in modern clothes,for example crimes against children, such as exposing children to illegal content; financial crimes, such as payment card frauds, money laundering, and counterfeiting currency and security documents; social engineering frauds; and even terrorism. We are much beyond the limited impact of the 1989 cybercrime embryo. Intricate networks are created daily. They present new criminal opportunities, causing greater damage to businesses and individuals, and require a global response. Cybercrime is conceptualized as a service embracing a commercial component.Cybercriminals work as businessmen who look to sell a product or a service to the highest bidder. Critical attributes of cybercrime An abridged version of the cybercrime concept provides answers to three vital questions: Where are criminal activities committed and what technologies are used? What is the reason behind the violation? Who is the perpetrator of the activities? Where and how – realm Cybercrime can be an online, digitally committed, traditional offense. Even if the component of an online, digital, or virtual existence were not included in its nature, it would still have been considered crime in the traditional, real-world sense of the word. In this sense, as the nature of cybercrime advances, so mustthe spearheads of lawenforcement rely on laws written for the non-digital world to solve problems encountered online. Otherwise, the combat becomesstagnant and futile. Why – motivation The prefix "cyber" sometimes creates additional misperception when applied to the digital world. It is critical to differentiate cybercrime from other malevolent acts in the digital world by considering the reasoning behind the action. This is not only imperative for clarification purposes, but also for extending the definition of cybercrime over time to include previously indeterminate activities. Offenders commit a wide range of dishonest acts for selfish motives such as monetary gain, popularity, or gratification. When the intent behind the behavior is misinterpreted, confusion may arise and actions that should not have been classified as cybercrime could be charged with criminal prosecution. Who –the criminal deed component The action must be attributed to a perpetrator. Depending on the source, certain threats can be translated to the criminal domain only or expanded to endanger potential larger targets, representing an attack to national security or a terrorist attack. Undoubtedly, the concept of cybercrime needs additional refinement, and a comprehensive global definition is in progress. Along with global cybercrime initiatives, national regulators are continually working on implementing laws, policies, and strategies to exemplify cybercrime behaviors and thus strengthen combating efforts. Types of common cyber threats In their endeavors to raise cybercrime awareness, the United Kingdom'sNational Crime Agency (NCA) divided common and popular cybercrime activities by affiliating themwith the target under threat. While both individuals and organizations are targets of cyber criminals, it is the business-consumer networks that suffer irreparable damages due to the magnitude of harmful actions. Cybercrime targeting consumers Phishing The term encompasses behavior where illegitimate e-mails are sent to the receiver to collect security information and personal details Webcam manager A webcam manager is an instance of gross violating behavior in which criminals take over a person's webcam File hijacker Criminals hijack files and hold them "hostage" until the victim pays the demanded ransom Keylogging With keylogging, criminals have the means to record what the text behind the keysyou press on your keyboard is Screenshot manager A screenshot manager enables criminals to take screenshots of an individual’s computer screen Ad clicker Annoying but dangerous ad clickers direct victims’ computer to click on a specific harmful link Cybercrime targeting businesses Hacking Hacking is basically unauthorized access to computer data. Hackers inject specialist software with which they try to take administrative control of a computerized network or system. If the attack is successful, the stolen data can be sold on the dark web and compromise people’s integrity and safety by intruding and abusing the privacy of products as well as sensitive personal and business information. Hacking is particularly dangerous when it compromises the operation of systems that manage physical infrastructure, for example, public transportation. Distributed denial of service (DDoS) attacks When an online service is targeted by a DDoS attack, the communication links overflow with data from messages sent simultaneously by botnets. Botnets are a bunch of controlled computers that stop legitimate access to online services for users. The system is unable to provide normal access as it cannot handle the huge volume of incoming traffic. Cybercrime in relation to overall computer crime Many moons have passed since 2001, when the first international treatythat targeted Internet and computer crime—the Budapest Convention on Cybercrime—was adopted. The Convention’s intention was to harmonize national laws, improve investigative techniques, and increase cooperation among nations. It was drafted with the active participation of the Council of Europe's observer states Canada, Japan, South Africa, and the United States and drawn up by the Council of Europe in Strasbourg, France. Brazil and Russia, on the other hand, refused to sign the document on the basis of not being involved in the Convention's preparation. In The Understanding Cybercrime: A Guide to Developing Countries(Gercke, 2011), Marco Gercke makes an excellent final point: “Not all computer-related crimes come under the scope of cybercrime. Cybercrime is a narrower notion than all computer-related crime because it has to include a computer network. On the other hand, computer-related crime in general can also affect stand-alone computer systems.” Although progress has been made, consensus over the definition of cybercrime is not final. Keeping history in mind, a fluid and developing approach must be kept in mind when applying working and legal interpretations. In the end, international noncompliance must be overcome to establish a common and safe ground to tackle persistent threats. Cybercrime localized – what is the risk in your region? Europol’s heat map for the period between 2014 and 2015 reports on the geographical distribution of cybercrime on the basis of the United Nations geoscheme. The data in the report encompassed cyber-dependent crime and cyber-enabled fraud, but it did not include investigations into online child sexual abuse. North and South America Due to its overwhelming presence, it is not a great surprise that the North American region occupies several lead positions concerning cybercrime, both in terms of enabling malicious content and providing residency to victims in the regions that participate in the global cybercrime numbers. The United States hosted between 20% and nearly 40% of the total world's command-and-control servers during 2014. Additionally, the US currently hosts over 45% of the world's phishing domains and is in the pack of world-leading spam producers. Between 16% and 20% percent of all global bots are located in the United States, while almost a third of point-of-sale malware and over 40% of all ransomware incidents were detected there. Twenty EU member states have initiated criminal procedures in which the parties under suspicion were located in the United States. In addition, over 70 percent of the countries located in the Single European Payment Area have been subject to losses from skimmed payment cards because of the distinct way in which the US, under certain circumstances, processes card payments without chip-and-PIN technology. There are instances of cybercrime in South America, but the scope of participation by the southern continent is way smaller than that of its northern neighbor, both in industry reporting and in criminal investigations. Ecuador, Guatemala, Bolivia, Peru, and Brazil are constantly rated high on the malware infection scale, and the situation is not changing, while Argentina and Colombia remain among the top 10 spammer countries. Brazil has a critical role in point-of-sale malware, ATM malware, and skimming devices. Europe The key aspect making Europe a region with excellent cybercrime potential is the fast, modern, and reliable ICT infrastructure. According to The Internet Organised Crime Threat Assessment (IOCTA) 2015, Cybercriminals abuse Western European countries to host malicious content and launch attacks inside and outside the continent. EU countries host approximately 13 percent of the global malicious URLs, out of which Netherlands is the leading country, while Germany, the U.K., and Portugal come second, third, and fourth respectively. Germany, the U.K., the Netherlands, France, and Russia are important hosts for bot C&C infrastructure and phishing domains, while Italy, Germany, the Netherlands, Russia, and Spain are among the top sources of global spam. Scandinavian countries and Finland are famous for having the lowest malware infection rates. France, Germany, Italy, and to some extent the U.K. have the highest malware infection rates and the highest proportion of bots found within the EU. However, the findings are presumably the result of the high population of the aforementioned EU countries. A half of the EU member states identified criminal infrastructure or suspects in the Netherlands, Germany, Russia, or the United Kingdom. One third of the European law enforcement agencies confirmed connections to Austria, Belgium, Bulgaria, the Czech Republic, France, Hungary, Italy, Latvia, Poland, Romania, Spain, or Ukraine. Asia China is the United States' counterpart in Asia in terms of the top position concerning reported threats to Internet security. Fifty percent of the EU member states' investigations on cybercrime include offenders based in China. Moreover, certain authorities quote China as the source of one third of all global network attacks. In the company of India and South Korea, China is third among the top-10 countries hosting botnet C&C infrastructure, and it has one of the highest global malware infection rates. India, Indonesia, Malaysia, Taiwan, and Japan host serious bot numbers, too. Japan takes on a significant part both as a source country and as a victim of cybercrime. Apart from being an abundant spam source, Japan is included in the top three Asian countries where EU law enforcement agencies have identified cybercriminals. On the other hand, Japan, along with South Korea and the Philippines, is the most popular country in the East and Southeast region of Asia where organized crime groups run sextortion campaigns. Vietnam, India, and China are the top Asian countries featuring spamming sources. Alternatively, China and Hong Kong are the most prominent locations for hosting phishing domains. From another point of view, the country code top-level domains (ccTLDs) for Thailand and Pakistan are commonly used in phishing attacks. In this region, most SEPA members reported losses from the use of skimmed cards. In fact, five (Indonesia, Philippines, South Korea, Vietnam, and Malaysia) out of the top six countries are from this region. Africa Africa remains renowned for combined and sophisticated cybercrime practices. Data from the Europol heat map report indicates that the African region holds a ransomware-as-a-service presence equivalent to the one of the European black market. Cybercriminals from Africa make profits from the same products. Nigeria is on the list of the top 10 countries compiled by the EU law enforcement agents featuring identified cybercrime perpetrators and related infrastructure. In addition, four out of the top five top-level domains used for phishing are of African origin: .cf, .za, .ga, and .ml. Australia and Oceania Australia has two critical cybercrime claims on a global level: First, the country is present in several top-10 charts in the cybersecurity industry, including bot populations, ransomware detection, and network attack originators. Second, the country-code top-level domain for the Palau Islands in Micronesia is massively used by Chinese attackers as the TLD with the second highest proportion of domains used for phishing. Cybercrime in numbers Experts agree that the past couple of years have seen digital extortion flourishing. In 2015 and 2016, cybercrime reached epic proportions. Although there is agreement about the serious rise of the threat, putting each ransomware aspect into numbers is a complex issue. Underreporting is not an issue only in academic research but also in practical case scenarios. The threat to businesses around the world is growing, because businesses keep it quiet. The scope of extortion is obscured because companies avoid reporting and pay the ransom in order to settle the issue in a conducive way. As far as this goes for corporations, it is even more relevant for public enterprises or organizations that provide a public service of any kind. Government bodies, hospitals, transportation companies, and educational institutions are increasingly targeted with digital extortion. Cybercriminals estimate that these targets are likely to pay in order to protect drops in reputation and to enable uninterrupted execution of public services. When CEOs and CIOs keep their mouths shut, relying on reported cybercrime numbers can be a tricky question. The real picture is not only what is visible in the media or via professional networking, but also what remains hidden and is dealt with discreetly by the security experts. In the second quarter of 2015, Intel Security reported an increase in ransomware attacks by 58%. Just in the first 3 months of 2016, cybercriminals amassed $209 million from digital extortion. By making businesses and authorities pay the relatively small average ransom amount of $10,000 per incident, extortionists turn out to make smart business moves. Companies are not shaken to the core by this amount. Furthermore, they choose to pay and get back to business as usual, thus eliminating further financial damages that may arise due to being out of business and losing customers. Extortionists understand the nature of ransom payment and what it means for businesses and institutions. As sound entrepreneurs, they know their market. Instead of setting unreasonable skyrocketing prices that may cause major panic and draw severe law enforcement action, they keep it low profile. In this way, they maintain the dark business in flow, moving from one victim to the next and evading legal measures. A peculiar perspective – Cybercrime in absolute and normalized numbers “To get an accurate picture of the security of cyberspace, cybercrime statistics need to be expressed as a proportion of the growing size of the Internet similar to the routine practice of expressing crime as a proportion of a population, i.e., 15 murders per 1,000 people per year.” This statement by Eric Jardine from the Global Commission on Internet Governance (Jardine, 2015) launched a new perspective of cybercrime statistics, one that accounts for the changing nature and size of cyberspace. The approach assumes that viewing cybercrime findings isolated from the rest of the changes in cyberspace provides a distorted view of reality. The report aimed at normalizing crime statistics and thus avoiding negative, realistic cybercrime scenarios that emerge when drawing conclusions from the limited reliability of absolute numbers. In general, there are three ways in which absolute numbers can be misinterpreted: Absolute numbers can negatively distort the real picture, while normalized numbers show whether the situation is getting better Both numbers can show that things are getting better, but normalized numbers will show that the situation is improving more quickly Both numbers can indicate that things are deteriorating, but normalized numbers will indicate that the situation is deteriorating at a slower rate than absolute numbers Additionally, the GCIG (Global Commission on Internet Governance) report includes some excellent reasoning about the nature of empirical research undertaken in the age of the Internet. While almost everyone and anything is connected to the network and data can be easily collected, most of the information is fragmented across numerous private parties. Normally, this entangles the clarity of the findings of cybercrime presence in the digital world. When data is borrowed from multiple resources and missing slots are modified with hypothetical numbers, the end result can be skewed. Keeping in mind this observation, it is crucial to emphasize that the GCIG report measured the size of cyberspace by accounting for eight key aspects: The number of active mobile broadband subscriptions The number of smartphones sold to end users The number of domains and websites The volume of total data flow The volume of mobile data flow The annual number of Google searches The Internet’s contribution to GDP It has been illustrated several times during this introduction that as cyberspace grows, so does cybercrime. To fight the menace, businesses and individuals enhance security measures and put more money into their security budgets. A recent CIGI-Ipsos (Centre for International Governance Innovation - Ipsos) survey collected data from 23,376 Internet users in 24 countries, including Australia, Brazil, Canada, China, Egypt, France, Germany, Great Britain, Hong Kong, India, Indonesia, Italy, Japan, Kenya, Mexico, Nigeria, Pakistan, Poland, South Africa, South Korea, Sweden, Tunisia, Turkey, and the United States. Survey results showed that 64% of users were more concerned about their online privacy compared to the previous year, whereas 78% were concerned about having their banking credentials hacked. Additionally, 77% of users were worried about cyber criminals stealing private images and messages. These perceptions led to behavioral changes: 43% of users started avoiding certain sites and applications, some 39% regularly updated passwords, while about 10% used the Internet less (CIGI-Ipsos, 2014). GCIC report results are indicative of a heterogeneous cyber security picture. Although many cybersecurity aspects are deteriorating over time, there are some that are staying constant, and a surprising number are actually improving. Jardine compares cyberspace security to trends in crime rates in a specific country operationalizing cyber attacks via 13 measures presented in the following table, as seen in Table 2 of Summary Statistics for the Security of Cyberspace(E. Jardine, GCIC Report, p. 6): Minimum Maximum Mean Standard Deviation New Vulnerabilities 4,814 6,787 5,749 781.880 Malicious Web Domains 29,927 74,000 53,317 13,769.99 Zero-day Vulnerabilities 8 24 14.85714 6.336 New Browser Vulnerabilities 232 891 513 240.570 Mobile Vulnerabilities 115 416 217.35 120.85 Botnets 1,900,000 9,437,536 4,485,843 2,724,254 Web-based Attacks 23,680,646 1,432,660,467 907,597,833 702,817,362 Average per Capita Cost 188 214 202.5 8.893818078 Organizational Cost 5,403,644 7,240,000 6,233,941 753,057 Detection and Escalation Costs 264,280 455,304 372,272 83,331 Response Costs 1,294,702 1,738,761 1,511,804 152,502.2526 Lost Business Costs 3,010,000 4,592,214 3,827,732 782,084 Victim Notification Costs 497,758 565,020 565,020 30,342 While reading the table results, an essential argument must be kept in mind. Statistics for cybercrime costs are not available worldwide. The author worked with the assumption that data about US costs of cybercrime indicate costs on a global level. For obvious reasons, however, this assumption may not be true, and many countries will have had significantly lower costs than the US. To mitigate the assumption's flaws, the author provides comparative levels of those measures. The organizational cost of data breaches in 2013 in the United States was a little less than six million US dollars, while the average number on the global level, which was drawn from the Ponemon Institute’s annual Cost of Data Breach Study (from 2011, 2013, and 2014 via Jardine, p.7) measured the overall cost of data breaches, including the US ones, as US$2,282,095. The conclusion is that US numbers will distort global cost findings by expanding the real costs and will work against the paper's suggestion, which is that normalized numbers paint a rosier picture than the one provided by absolute numbers. Summary In this article, we have covered the birth and concept of cyber crime and the challenges law enforcement, academia, and security professionals face when combating its threatening behavior. We also explored the impact of cyber crime by numbers on varied geographical regions, industries, and devices. Resources for Article: Further resources on this subject: Interactive Crime Map Using Flask [article] Web Scraping with Python [article]

0
0
15520

article-image-article-movie-recommendation

Packt

16 Jun 2017

14 min read

Article: Movie Recommendation

Packt

16 Jun 2017

14 min read

In this article by Robert Layton author of the book Learning Data Mining with Python - Second Edition is the second revision of Learning Data Mining with Python by Robert Layton improves upon the first book with updated examples, more in-depth discussion and exercises for your future development with data analytics. In this snippet from the book, we look at movie recommendation with a technique known as Affinity Analysis. (For more resources related to this topic, see here.) Affinity Analysis Affinity Analysis is the task of determining when objects are used in similar ways. We focused on whether the objects themselves are similar. The data for Affinity Analysis are often described in the form of a transaction. Intuitively, this comes from a transaction at a store—determining when objects are purchased together as a way to recommend products to users that they might purchase. Other use cases for Affinity Analysis include: Fraud detection Customer segmentation Software optimization Product recommendations Affinity Analysis is usually much more exploratory than classification. At the very least, we often simply rank the results and choose the top 5 recommendations (or some other number), rather than expect the algorithm to give us a specific answer. Algorithms for Affinity Analysis A brute force solution, testing all possible combinations, is not efficient enough for real-world use. We could expect even a small store to have hundreds of items for sale, while many online stores would have thousands (or millions!). As we add more items, the time it takes to compute all rules increases significantly faster. Specifically, the total possible number of rules is 2n - 1. Even the drastic increase in computing power couldn't possibly keep up with the increases in the number of items stored online. Therefore, we need algorithms that work smarter, as opposed to computers that work harder. The Apriori algorithm addresses the exponential problem of creating sets of items that occur frequently within a database, called frequent itemsets. Once these frequent itemsets are discovered, creating association rules is straightforward. The intuition behind Apriori is both simple and clever. First, we ensure that a rule has sufficient support within the dataset. Defining a minimum support level is the key parameter for Apriori. To build a frequent itemset, for an itemset (A, B) to have a support of at least 30, both A and B must occur at least 30 times in the database. This property extends to larger sets as well. For an itemset (A, B, C, D) to be considered frequent, the set (A, B, C) must also be frequent (as must D). Apriori discovers larger frequent itemsets by building off smaller frequent itemsets. The picture below outlines the full process: The Movie Recommendation Problem Product recommendation is a big business. Online stores use it to up-sell to customers by recommending other products that they could buy. Making better recommendations leads to better sales. When online shopping is selling to millions of customers every year, there is a lot of potential money to be made by selling more items to these customers. Grouplens, a research group at the University of Minnesota, has released several datasets that are often used for testing algorithms in this area. They have released several versions of a movie rating dataset, which have different sizes. There is a version with 100,000 reviews, one with 1 million reviews and one with 10 million reviews. The datasets are available from http://grouplens.org/datasets/movielens/ and the dataset we are going to use in this article is the MovieLens 100K dataset (with 100,000 reviews). Download this dataset and unzip it in your data folder. Start a new Jupyter Notebook and type the following code: import os import pandas as pd data_folder = os.path.join(os.path.expanduser("~"), "Data", "ml-100k") ratings_filename = os.path.join(data_folder, "u.data") Ensure that ratings_filename points to the u.data file in the unzipped folder. Loading with pandas The MovieLens dataset is in a good shape; however, there are some changes from the default options in pandas.read_csv that we need to make. When loading the file, we set the delimiter parameter to the tab character, tell pandas not to read the first row as the header (with header=None) and to set the column names with given values. Let's look at the following code: all_ratings = pd.read_csv(ratings_filename, delimiter="t", header=None, names = ["UserID", "MovieID", "Rating", "Datetime"]) While we won't use it in this article, you can properly parse the date timestamp using the following line. Dates for reviews can be an important feature in recommendation prediction, as movies that are rated together often have more similar rankings than movies ranked separately. Accounting for this can improve models significantly. all_ratings["Datetime"] = pd.to_datetime(all_ratings['Datetime'], unit='s') Understanding the Apriori algorithm and its implementation The goal of this article is to produce rules of the following form: if a person recommends this set of movies, they will also recommend this movie. We will also discuss extensions where a person recommends a set of movies is likely to recommend another particular movie. To do this, we first need to determine if a person recommends a movie. We can do this by creating a new feature Favorable, which is True if the person gave a favorable review to a movie: all_ratings["Favorable"] = all_ratings["Rating"] > 3 We will sample our dataset to form a training data. This also helps reduce the size of the dataset that will be searched, making the Apriori algorithm run faster. We obtain all reviews from the first 200 users: ratings = all_ratings[all_ratings['UserID'].isin(range(200))] Next, we can create a dataset of only the favorable reviews in our sample: favorable_ratings = ratings[ratings["Favorable"]] We will be searching the user's favorable reviews for our itemsets. So, the next thing we need is the movies which each user has given a favorable rating. We can compute this by grouping the dataset by the UserID and iterating over the movies in each group: favorable_reviews_by_users = dict((k, frozenset(v.values)) for k, v in favorable_ratings.groupby("UserID")["MovieID"]) In the preceding code, we stored the values as a frozenset, allowing us to quickly check if a movie has been rated by a user. Sets are much faster than lists for this type of operation, and we will use them in a later code. Finally, we can create a DataFrame that tells us how frequently each movie has been given a favorable review: num_favorable_by_movie = ratings[["MovieID", "Favorable"]].groupby("MovieID").sum() We can see the top five movies by running the following code: num_favorable_by_movie.sort_values(by="Favorable", ascending=False).head() Implementing the Apriori algorithm On the first iteration of Apriori, the newly discovered itemsets will have a length of 2, as they will be supersets of the initial itemsets created in the first step. On the second iteration (after applying the fourth step and going back to step 2), the newly discovered itemsets will have a length of 3. This allows us to quickly identify the newly discovered itemsets, as needed in the second step. We can store our discovered frequent itemsets in a dictionary, where the key is the length of the itemsets. This allows us to quickly access the itemsets of a given length, and therefore the most recently discovered frequent itemsets, with the help of the following code: frequent_itemsets = {} We also need to define the minimum support needed for an itemset to be considered frequent. This value is chosen based on the dataset but try different values to see how that affects the result. I recommend only changing it by 10 percent at a time though, as the time the algorithm takes to run will be significantly different! Let's set a minimum support value: min_support = 50 To implement the first step of the Apriori algorithm, we create an itemset with each movie individually and test if the itemset is frequent. We use frozenset, as they allow us to perform faster set-based operations later on, and they can also be used as keys in our counting dictionary (normal sets cannot). Let's look at the following example of frozenset code: frequent_itemsets[1] = dict((frozenset((movie_id,)), row["Favorable"]) for movie_id, row in num_favorable_by_movie.iterrows() if row["Favorable"] > min_support) We implement the second and third steps together for efficiency by creating a function that takes the newly discovered frequent itemsets, creates the supersets, and then tests if they are frequent. First, we set up the function to perform these steps: from collections import defaultdict def find_frequent_itemsets(favorable_reviews_by_users, k_1_itemsets, min_support): counts = defaultdict(int) for user, reviews in favorable_reviews_by_users.items(): for itemset in k_1_itemsets: if itemset.issubset(reviews): for other_reviewed_movie in reviews - itemset: current_superset = itemset | frozenset((other_reviewed_movie,)) counts[current_superset] += 1 return dict([(itemset, frequency) for itemset, frequency in counts.items() if frequency >= min_support]) In keeping with our rule of thumb of reading through the data as little as possible, we iterate over the dataset once per call to this function. While this doesn't matter too much in this implementation (our dataset is relatively small compared to the average computer), single-pass is a good practice to get into for larger applications. Let's have a look at the core of this function in detail. We iterate through each user, and each of the previously discovered itemsets, and then check if it is a subset of the current set of reviews, which are stored in k_1_itemsets (note that here, k_1 means k-1). If it is, this means that the user has reviewed each movie in the itemset. This is done by the itemset.issubset(reviews) line. We can then go through each individual movie that the user has reviewed (that is not already in the itemset), create a superset by combining the itemset with the new movie and record that we saw this superset in our counting dictionary. These are the candidate frequent itemsets for this value of k. We end our function by testing which of the candidate itemsets have enough support to be considered frequent and return only those that have a support more than our min_support value. This function forms the heart of our Apriori implementation and we now create a loop that iterates over the steps of the larger algorithm, storing the new itemsets as we increase k from 1 to a maximum value. In this loop, k represents the length of the soon-to-be discovered frequent itemsets, allowing us to access the previously most discovered ones by looking in our frequent_itemsets dictionary using the key k - 1. We create the frequent itemsets and store them in our dictionary by their length. Let's look at the code: for k in range(2, 20): # Generate candidates of length k, using the frequent itemsets of length k-1 # Only store the frequent itemsets cur_frequent_itemsets = find_frequent_itemsets(favorable_reviews_by_users, frequent_itemsets[k-1], min_support) if len(cur_frequent_itemsets) == 0: print("Did not find any frequent itemsets of length {}".format(k)) sys.stdout.flush() break else: print("I found {} frequent itemsets of length {}".format(len(cur_frequent_itemsets), k)) sys.stdout.flush() frequent_itemsets[k] = cur_frequent_itemsets Extracting association rules After the Apriori algorithm has completed, we have a list of frequent itemsets. These aren't exactly association rules, but they can easily be converted into these rules. For each itemset, we can generate a number of association rules by setting each movie to be the conclusion and the remaining movies as the premise. candidate_rules = [] for itemset_length, itemset_counts in frequent_itemsets.items(): for itemset in itemset_counts.keys(): for conclusion in itemset: premise = itemset - set((conclusion,)) candidate_rules.append((premise, conclusion)) In these rules, the first partis the list of movies in the premise, while the number after it is the conclusion. In the first case, if a reviewer recommends movie 79, they are also likely to recommend movie 258. The process of computing confidence starts by creating dictionaries to store how many times we see the premise leading to the conclusion (a correct example of the rule) and how many times it doesn't (an incorrect example). We then iterate over all reviews and rules, working out whether the premise of the rule applies and, if it does, whether the conclusion is accurate. correct_counts = defaultdict(int) incorrect_counts = defaultdict(int) for user, reviews in favorable_reviews_by_users.items(): for candidate_rule in candidate_rules: premise, conclusion = candidate_rule if premise.issubset(reviews): if conclusion in reviews: correct_counts[candidate_rule] += 1 else: incorrect_counts[candidate_rule] += 1 We then compute the confidence for each rule by dividing the correct count by the total number of times the rule was seen: rule_confidence = {candidate_rule: (correct_counts[candidate_rule] / float(correct_counts[candidate_rule] + incorrect_counts[candidate_rule])) for candidate_rule in candidate_rules} Now we can print the top five rules by sorting this confidence dictionary and printing the results: from operator import itemgetter sorted_confidence = sorted(rule_confidence.items(), key=itemgetter(1), reverse=True) for index in range(5): print("Rule #{0}".format(index + 1)) premise, conclusion = sorted_confidence[index][0] print("Rule: If a person recommends {0} they will also recommend {1}".format(premise, conclusion)) print(" - Confidence: {0:.3f}".format(rule_confidence[(premise, conclusion)])) print("") The resulting printout shows only the movie IDs, which isn't very helpful without the names of the movies also. The dataset came with a file called u.items, which stores the movie names and their corresponding MovieID (as well as other information, such as the genre). We can load the titles from this file using pandas. Additional information about the file and categories is available in the README file that came with the dataset. The data in the files is in CSV format, but with data separated by the | symbol; it has no header and the encoding is important to set. The column names were found in the README file. movie_name_filename = os.path.join(data_folder, "u.item") movie_name_data = pd.read_csv(movie_name_filename, delimiter="|", header=None, encoding = "mac-roman") movie_name_data.columns = ["MovieID", "Title", "Release Date", "Video Release", "IMDB", "<UNK>", "Action", "Adventure", "Animation", "Children's", "Comedy", "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror", "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"] Let's also create a helper function for finding the name of a movie by its ID: def get_movie_name(movie_id): title_object = movie_name_data[movie_name_data["MovieID"] == movie_id]["Title"] title = title_object.values[0] return title We can now adjust our previous code for printing out the top rules to also include the titles: for index in range(5): print("Rule #{0}".format(index + 1)) premise, conclusion = sorted_confidence[index][0] premise_names = ", ".join(get_movie_name(idx) for idx in premise) conclusion_name = get_movie_name(conclusion) print("Rule: If a person recommends {0} they will also recommend {1}".format(premise_names, conclusion_name)) print(" - Confidence: {0:.3f}".format(rule_confidence[(premise, conclusion)])) print("") The results gives a recommendation for movies, based on previous movies that person liked. Give it a shot and see if it matches your expectations! Learning Data Mining with Python In this short section of Learning Data Mining with Python, Revision 2, we performed Affinity Analysis in order to recommend movies based on a large set of reviewers. We did this in two stages. First, we found frequent itemsets in the data using the Apriori algorithm. Then, we created association rules from those itemsets. We performed training on a subset of our data in order to find the association rules, and then tested those rules on the rest of the data—a testing set. We could extend this concept to use cross-fold validation to better evaluate the rules. This would lead to a more robust evaluation of the quality of each rule. We cover topics such as classification, clusters, text analysis, image recognition, TensorFlow and Big Data. Each section comes with a practical real-world example, steps through the code in detail and provides suggestions for your to continue your (machine) learning. Summary In this article we have covered more in-depth discussion and exercises for your future development with data analytics. In this snippet from the book, we look at movie recommendation with a technique known as Affinity Analysis. The most recent upgrades to the HTMLG online editor are the tag manager and the attribute filter. Try it for free and purchase a subscription if you like it! Resources for Article: Further resources on this subject: Expanding Your Data Mining Toolbox [article] Data mining [article] Big Data Analysis [article]

0
0
3366

article-image-understanding-puppet-resources

Packt

16 Jun 2017

15 min read

Understanding the Puppet Resources

Packt

16 Jun 2017

15 min read

A little learning is a dangerous thing, but a lot of ignorance is just as bad. —Bob Edwards In this article by John Arundel, the author of Puppet 4.10 Beginner’s Guide - Second Edition, we’ll go into details of packages, files, and services to see how to exploit their power to the full. Along the way, we’ll talk about the following topics: Managing files, directories, and trees Ownership and permissions Symbolic links Installing and uninstallingpackages Specific and latest versions of packages Installing Ruby gems Services: hasstatus and pattern Services: hasrestart, restart, stop, and start (For more resources related to this topic, see here.) Files Puppet can manage files on the server using the file resource, and the following example sets the contents of a file[TJ1] to a particular string using the content attribute (file_hello.pp): file { ‘/tmp/hello.txt’: content =>“hello, worldn”, } Managing whole files While it’s useful to be able to set the contents of a file to a short text string, most files we’re likely to want to manage, will be too large to include directly in our Puppet manifests. Ideally, we would put a copy of the file in the Puppet repo, and have Puppet simply copy it to the desired place in the filesystem. The source attribute (file_source.pp)does exactly that: file { ‘/etc/motd’: source =>‘/vagrant/examples/files/motd.txt’, } To try this example with your Vagrant box, run the following commands: sudo puppet apply /vagrant/examples/file_source.pp cat /etc/motd The best software in the world only sucks. The worst software is significantly worse than that. -Luke Kanies To run such examples, just apply them using sudo puppet apply as shown in the preceding example. Why do we have to run sudo puppet apply instead of just puppet apply? Puppet has the permissions of the user who runs it, so if Puppet needs to modify a file owned by root, it must be run with the root’s permissions (which is what sudo does). You will usually run Puppet as root because it needs those permissions to do things such as installing packagesand modifying config files owned by root. The value of the source attribute can be a path to a file on the server, as here, or an HTTP URL, as shown in the following example (file_http.pp): file { ‘/tmp/README.md’: source =>‘https://raw.githubusercontent.com/puppetlabs/puppet/master/README.md’, } Although this is a handy feature, bear in mind that every time you add an external dependency like this to your Puppet manifest, you’re adding a potential point of failure. Wherever you can, use a local copy of such a file instead of having Puppet fetch it remotely every time. This particularly applies to software which needs to be built from a tarball downloaded from a website. If possible, download the tarball and serve it from a local web server or file server. If this isn’t practical, using a caching proxy server can help save time and bandwidth when you’re building a large number of machines. Ownership On Unix-like systems, files are associated with an owner, a group, and a set of permissions to read, write, or execute the file. Since we normally run Puppet with the permissions of the root user (via sudo), the files Puppet manages will be owned by that user: ls -l /etc/motd -rw-r--r-- 1 root root 109 Aug 31 04:03 /etc/motd Often, this is just fine, but if we need the file to belong to another user (for example, if that user needs to be able to write to the file), we can express this by setting the owner attribute (file_owner.pp): file { ‘/etc/owned_by_vagrant’: ensure => present, owner =>‘vagrant’, } Run the following command: ls -l /etc/owned_by_vagrant -rw-r--r-- 1 vagrant root 0 Aug 31 04:48 /etc/owned_by_vagrant You can see that Puppet has created the file and its owner attribute has been set to vagrant. You can also set the group ownership of the file using the group attribute (file_group.pp): file { ‘/etc/owned_by_vagrant’: ensure => present, owner =>‘vagrant’, group =>‘vagrant’, } Run the following command: ls -l /etc/owned_by_vagrant -rw-r--r-- 1 vagrant vagrant 0 Aug 31 04:48 /etc/owned_by_vagrant This time, we didn’t specify either a content or source attribute for the file, but simply ensure => present. In this case, Puppet will create a file of zero size (useful, for example, if you want to make sure the file exists and is writeable, but doesn’t need to have any contents yet). Permissions Files on Unix-like systems have an associated mode, which determines access permissions for the file. It governs read, write, and execute permissions for the file’s owner, any user in the file’s group, and other users. Puppet supports setting permissions on files using the mode attribute. This takes an octal value, with each digit representing the permissions for owner, group, and other, in that order. In the following example, we use the mode attribute to set a mode of 0644 (read and write for owner, read-only for group, read-only for other) on a file (file_mode.pp): file { ‘/etc/owned_by_vagrant’: ensure => present, owner =>‘vagrant’, mode =>‘0644’, } This will be quite familiar to experienced system administrators, as the octal values for file permissions are exactly the same as those understood by the Unixchmod command. For more information, run theman chmod command. Directories Creating or managing permissions on a directory is a common task, and Puppet uses the file resource to do this too. If the value of the ensure attribute is directory, the file will be a directory (file_directory.pp): file { ‘/etc/config_dir’: ensure => directory, } As with regular files, you can use the owner, group, and mode attributes to control access to directories. Trees of files Puppet can copy a single file to the server, but what about a whole directory of files, possibly including subdirectories (known as a file tree)? The recurse attribute will take care of this (file_tree.pp): file { ‘/etc/config_dir’: source =>‘/vagrant/examples/files/config_dir’, recurse => true, } Run the following command: ls /etc/config_dir/ 1 2 3 When recurse attribute is true, Puppet will copy all the files and directories (and their subdirectories) in the source directory (/vagrant/examples/files/config_dir in this example) to the target directory (/etc/config_dir). If the target directory already exists and has files in it, Puppet will not interfere with them, but you can change this behavior using the purge attribute.[JR4] If this is true, Puppet will delete any files and directories in the target directory which are not present in the source directory. Use this attribute with care! Symbolic links Another common requirement for managing files is to create or modify a symbolic link (known as a symlink for short). You can have Puppet do this by setting ensure => link on the file resource, and specifying the target attribute (file_symlink.pp): file { ‘/etc/this_is_a_link’: ensure => link, target =>‘/etc/motd’, } Run the following command: ls -l /etc/this_is_a_link lrwxrwxrwx 1 root root 9 Aug 31 05:05 /etc/this_is_a_link -> /etc/motd Packages To install a package usethe package resource, and this is all you need to do with most packages. However, the package resource has a few extra features which may be useful. Uninstalling packages The ensure attribute normally takes the installedvalue in order to install a package, but if you specify absent instead, Puppet will remove the package if it happens to be installed. Otherwise, it will take no action. The following example will remove the apparmor package if it’s installed (package_remove.pp): package { ‘apparmor’: ensure => absent, } Installing specific versions If there are multiple versions of a package available to the system’s package manager, specifying ensure => installed will cause Puppet to install the default version (usually the latest). But if you need a specific version, you can specify that version string as the value of ensure, and Puppet will install that version (package_version.pp): package { ‘openssl’: ensure =>‘1.0.2g-1ubuntu4.2’, } It’s a good idea to specify an exact version whenever you manage packages with Puppet, so that all servers will get the same version of a given package. Otherwise, if you use ensure => installed, they will just get whatever version was current at the time they were built, leading to a situation where different machines have different package versions. When a newer version of the package is released, and you decide it’s time to upgrade to it, you can update the version string specified in the Puppet manifest and Puppet will upgrade the package everywhere. Installing the latest version On the other hand, if you specify ensure => latest for a package, Puppet will make sure that the latest available version is installed every time it runs. When a new version of the package becomes available, it will be installed automatically on the next Puppet run. This is not generally what you want when using a package repository that’s not under your control (for example, the main Ubuntu repository). It means that packages will be upgraded at unexpected times, which may break your application (or at least result in unplanned downtime). A better strategy is to tell Puppet to install a specific version that you know works, and test upgrades in a controlled environment before rolling them out to production. If you maintain your own package repository, and control the release of new packages to it, ensure => latest can be a useful feature: Puppet will update a package as soon as you push a new version to the repo. If you are relying on upstream repositories, such as the Ubuntu repositories, it’s better to tell Puppet to install a specific version and upgrade that as necessary.[JR10] Installing Ruby gems Although the package resource is most often used to install packages using the normal system package manager (in the case of Ubuntu, that’s APT), it can install other kinds of packages as well. Library packages for the Ruby programming language are known as gems. Puppet can install Ruby gems for you using the provider => gem attribute (package_gem.pp): package { ‘ruby’: ensure => installed, } package { ‘bundler’: ensure => installed, provider => gem, } In the preceding code, bundler is a Ruby gem, and therefore we have to specify provider => gem for this package so that Puppet doesn’t think it’s a standard system package and try to install it via APT. Since the gem provider is not available unless Ruby is installed, we install the ruby package first, and then the bundler gem. Installing gems in Puppet’s context Puppet itself is written at least partly in Ruby, and makes use of several Ruby gems. To avoid any conflicts with the version of Ruby and gems, which the server might need for other applications, Puppet packages its own version of Ruby and associated gems under the /etc/puppetlabs directory. This means you can install (or remove) whichever version of Ruby you like, and Puppet will not be affected. However, if you need to install a gem to extend Puppet’s capabilities in some way, then doing it with a package resource and provider => gem won’t work. That is, the gem will be installed, but only in the system Ruby context, and it won’t be visible to Puppet. Fortunately, the puppet_gem provider is available for exactly this purpose. When you use this provider, the gem will be installed in Puppet’s context (and, naturally, won’t be visible in the system context). The following example demonstrates how to use this provider (package_puppet_gem.pp): package { ‘hiera-eyaml’: ensure => installed, provider => puppet_gem, } To see the gems installed in Puppet’s context, use Puppet’s own version of the gem command, with the following path: /opt/puppetlabs/puppet/bin/gem list Services Although services are implemented in a number of varied and complicated ways at the operating system level, Puppet does a good job of abstracting away most of this with the service resource and exposing just the two attributes of services which really matter , whether they’re running (ensure) and whether they start at boot time (enable). However, you’ll occasionally encounter services that don’t play well with Puppet, for a variety of reasons. Sometimes, Puppet is unable to detect that the service is already running, and keeps trying to start it. At other times, Puppet may not be able to properly restart the service when a dependent resource changes. There are a few useful attributes for service resources that can help resolve these problems. The hasstatus attribute When a service resource has theensure => running attribute, Puppet needs to be able to check whether the service is, in fact, running. The way it does this depends on the underlying operating system, but on Ubuntu 16+, for example, it runs systemctl is-active SERVICE. If the service is packaged to work with systemd, that should be just fine, but in many cases, particularly with older software, it may not respond properly. If you find that Puppet keeps attempting to start the service on every Puppet run, even though the service is running, it may be that Puppet’s default service status detection isn’t working. In this case, you can specify the hasstatus => false attribute for the service (service_hasstatus.pp): service { ‘ntp’: ensure =>running, enable => true, hasstatus => false, } When hasstatus is false, Puppet knows not to try to check the service status using the default system service management command, and instead will look in the process table for a running process thatmatches the name of the service. If it finds one, it will infer that the service is running and take no further action. The pattern attribute Sometimes when using hasstatus => false, the service name as defined in Puppet doesn’t actually appear in the process table because the command that provides the service has a different name. If this is the case, you can tell Puppet exactly what to look for using the pattern attribute (service_pattern.pp): service { ‘ntp’: ensure =>running, enable => true, hasstatus => false, pattern =>‘ntpd’, } If hasstatus is false and pattern is specified, Puppet will search for the value of pattern in the process table to determine whether or not the service is running. To find the pattern you need, you can use the ps command to see the list of running processes: ps ax The hasrestart and restart attributes When a service is notified (for example, if a file resource uses the notify attribute to tell the service its config file has changed), Puppet’s default behavior is to stop the service, then start it again. This usually works, but many services implement a restart command in their management scripts. If this is available, it’s usually a good idea to use itas it may be faster or safer than stopping and starting the service. Some services take a while to shut down properly when stopped, for example, and Puppet may not wait long enough before trying to restart them, so that you end up with the service not running at all. If you specify hasrestart => true for a service, then Puppet will try to send a restart command to it, using whatever service management command is appropriate (systemctl, for example). The following example shows the use of hasrestart (service_hasrestart.pp): service { ‘ntp’: ensure =>running, enable => true, hasrestart => true, } To further complicate things, the default system service restart command may not work, or you may need to take certain special actions when the service is restarted (disabling monitoring notifications, for example). You can specify any restart command you like for the service using the restart attribute (service_custom_restart.pp): service { ‘ntp’: ensure => running, enable => true, restart =>‘/bin/echo Restarting >>/tmp/debug.log && systemctl restart ntp’, } In this example, the restart command writes a message to a log file before restarting the service in the usual way, but it could, of course, do anything you need it to. In the extremely rare event that the service cannot be stopped or started using the default service management command, Puppet also provides the stop and start attributes so that you can specify custom commands to stop and start the service, in just the same way as with the restart attribute. If you need to use either of these, though, it’s probably safe to say that you’re having a bad day. Summary In this article, we explored Puppet’s file resource in detail, covering file sources, ownership, permissions, directories, symbolic links, and file trees. You learned how to manage packages by installing specific versions, or the latest version, and how to uninstall packages. We also covered Ruby gems, both in the system context and Puppet’s internal context. We looked at service resources, including the has status, pattern, has restart, restart, stop, and start attributes. Resources for Article: Further resources on this subject: My First Puppet Module [article] Puppet Language and Style [article] External Tools and the Puppet Ecosystem [article]

0
0
5621

Packt

16 Jun 2017

12 min read

Exploring Functions

Packt

16 Jun 2017

12 min read

0
0
12317

Packt

16 Jun 2017

9 min read

Streaming and the Actor Model – Akka Streams!

Packt

16 Jun 2017

9 min read

1
0
4966

CORS in Node.js

Understanding the Basics of RxJava

Introduction to NFRs

Monitoring, Logging, and Troubleshooting

Manipulating functions in functional programming

Understanding the Basics of Gulp

Overview of Important Concepts of Microsoft Dynamics NAV 2016

Basics of Python for Absolute Beginners

LambdaArchitecture Pattern

Thread synchronization and communication

Trending Topics

Introduction to Cyber Extortion

Article: Movie Recommendation

Understanding the Puppet Resources

Exploring Functions

Streaming and the Actor Model – Akka Streams!

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access