Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-data-extracting-transforming-and-loading
Packt
01 Aug 2016
15 min read
Save for later

Data Extracting, Transforming, and Loading

Packt
01 Aug 2016
15 min read
In this article, by Yu-Wei, Chiu, author of the book, R for Data Science Cookbook, covers the following topics: Scraping web data Accessing Facebook data (For more resources related to this topic, see here.) Before using data to answer critical business questions, the most important thing is to prepare it. Data is normally archived in files, and using Excel or text editors allows it to be easily obtained. However, data can be located in a range of different sources, such as databases, websites, and various file formats. Being able to import data from these sources is crucial. There are four main types of data. Data recorded in a text format is the most simple. As some users require storing data in a structured format, files with a .tab or .csv extension can be used to arrange data in a fixed number of columns. For many years, Excel has held a leading role in the field of data processing, and this software uses the .xls and .xlsx formats. Knowing how to read and manipulate data from databases is another crucial skill. Moreover, as most data is not stored in a database, we must know how to use the web scraping technique to obtain data from the internet. As part of this chapter, we will introduce how to scrape data from the internet using the rvest package. Many experienced developers have already created packages to allow beginners to obtain data more easily, and we focus on leveraging these packages to perform data extraction, transformation, and loading. In this chapter, we will first learn how to utilize R packages to read data from a text format and scan files line by line. We then move to the topic of reading structured data from databases and Excel. Finally, we will learn how to scrape internet and social network data using the R web scraper. Scraping web data In most cases, the majority of data will not exist in your database, but it will instead be published in different forms on the internet. To dig more valuable information from these data sources, we need to know how to access and scrape data from the Web. Here, we will illustrate how to use the rvest package to harvest finance data from http://www.bloomberg.com/. Getting ready For this recipe, prepare your environment with R installed on a computer with internet access. How to do it... Perform the following steps to scrape data from http://www.bloomberg.com/: First, access the following link to browse the S&P 500 index on the Bloomberg Business website http://www.bloomberg.com/quote/SPX:IND: Once the page appears as shown in the preceding screenshot, we can begin installing and loading the rvest package: > install.packages("rvest") > library(rvest) Next, you can use the HTML function from rvest package to scrape and parse the HTML page of the link to the S&P 500 index at http://www.bloomberg.com/: > spx_quote <- html("http://www.bloomberg.com/quote/SPX:IND") Use the browser's built-in web inspector to inspect the location of detail quote (marked with a red rectangle) below the index chart: You can then move the mouse over the detail quote and click on the target element that you wish to scrape down. As the following screenshot shows, the <div class="cell"> section holds all the information that we need: Extract elements with the class of cell using the html_nodes function: > cell <- spx_quote %>% html_nodes(".cell") Furthermore, we can parse the label of the detailed quote from elements with the class of cell__label, extract text from scraped HTML, and eventually clean spaces and newline characters from the extracted text: > label <- cell %>% + html_nodes(".cell__label") %>% + html_text() %>% + lapply(function(e) gsub("n|\s+", "", e)) We can also extract the value of detailed quote from the element with the class of cell__value, extract text from scraped HTML, as well as clean spaces and newline characters: > value <- cell %>% + html_nodes(".cell__value") %>% + html_text() %>% + lapply(function(e)gsub("n|\s+", "", e)) Finally, we can set the extracted label as the name to value: > names(value) <- title Next, we can access the energy and oil market index page at this link (http://www.bloomberg.com/energy), as shown in the following screenshot: We can then use the web inspector to inspect the location of the table element: Finally, we can use html_table to extract the table element with class of data-table: > energy <- html("http://www.bloomberg.com/energy") > energy.table <- energy %>% html_node(".data-table") %>% html_table() How it works... The most difficult step in scraping data from a website is that web data is published and structured in different formats. We have to fully understand how data is structured within the HTML tag before continuing. As HTML (Hypertext Markup Language) is a language that has similar syntax to XML, we can use the XML package to read and parse HTML pages. However, the XML package only provides the XPath method, which has two main shortcomings, as follows: Inconsistent behavior in different browsers It is hard to read and maintain For these reasons, we recommend using the CSS selector over XPath when parsing HTML. Python users may be familiar with how to scrape data quickly using requests and the BeautifulSoup packages. The rvest package is the counterpart package in R, which provides the same capability to simply and efficiently harvest data from HTML pages. In this recipe, our target is to scrape the finance data of the S&P 500 detail quote from http://www.bloomberg.com/. Our first step is to make sure that we can access our target webpage through the internet, which is followed by installing and loading the rvest package. After installation and loading is complete, we can then use the HTML function to read the source code of the page to spx_quote. Once we have confirmed that we can read the HTML page, we can start parsing the detailed quote from the scraped HTML. However, we first need to inspect the CSS path of the detail quote. There are many ways to inspect the CSS path of a specific element. The most popular method is to use the development tool built into each browser (press F12 or FN + F12) to inspect the CSS path. Using Google Chrome as an example, you can open the development tool by pressing F12. A DevTools window may show up somewhere in the visual area (you may refer to https://developer.chrome.com/devtools/docs/dom-and-styles#inspecting-elements). Then, you can move the mouse cursor to the upper left of the DevTools window and select the Inspect Element icon (a magnifier icon similar to ). Next, click on the target element, and the DevTools window will highlight the source code of the selected area. You can then move the mouse cursor to the highlighted area and right-click on it. From the pop-up menu, click on Copy CSS Path to extract the CSS path. Or, you can examine the source code and find that the selected element is structured in HTML code with the class of cell. One highlight of rvest is that it is designed to work with magrittr, so that we can use a %>% pipelines operator to chain output parsed at each stage. Thus, we can first obtain the output source by calling spx_quote and then pipe the output to html_nodes. As the html_nodes function uses CSS selector to parse elements, the function takes basic selectors with type (for example, div), ID (for example, #header), and class (for example, .cell). As the elements to be extracted have the class of cell, you should place a period (.) in front of cell. Finally, we should extract both label and value from previously parsed nodes. Here, we first extract the element of class cell__label, and we then use html_text to extract text. We can then use the gsub function to clean spaces and newline characters from the parsed text. Likewise, we apply the same pipeline to extract the element of the class__value class. As we extracted both label and value from detail quote, we can apply the label as the name to the extracted values. We have now organized data from the web to structured data. Alternatively, we can also use rvest to harvest tabular data. Similarly to the process used to harvest the S&P 500 index, we can first access the energy and oil market index page. We can then use the web element inspector to find the element location of table data. As we have found the element located in the class of data-table, we can use the html_table function to read the table content into an R data frame. There's more... Instead of using the web inspector built into each browser, we can consider using SelectorGadget (http://selectorgadget.com/) to search for the CSS path. SelectorGadget is a very powerful and simple to use extension for Google Chrome, which enables the user to extract the CSS path of the target element with only a few clicks: To begin using SelectorGadget, access this link (https://chrome.google.com/webstore/detail/selectorgadget/mhjhnkcfbdhnjickkkdbjoemdmbfginb). Then, click on the green button (circled in the red rectangle as shown in the following screenshot) to install the plugin to Chrome: Next, click on the upper-right icon to open SelectorGadget, and then select the area which needs to be scraped down. The selected area will be colored green, and the gadget will display the CSS path of the area and the number of elements matched to the path: Finally, you can paste the extracted CSS path to html_nodes as an input argument to parse the data. Besides rvest, we can connect R to Selenium via Rselenium to scrape the web page. Selenium was originally designed as an automating web application that enables the user to command a web browser to automate processes through simple scripts. However, we can also use Selenium to scrape data from the Internet. The following instruction presents a sample demo on how to scrape Bloomberg.com using Rselenium: First, access this link to download the Selenium standalone server (http://www.seleniumhq.org/download/), as shown in the following screenshot: Next, start the Selenium standalone server using the following command: $ java -jar selenium-server-standalone-2.46.0.jar If you can successfully launch the standalone server, you should see the following message, which means that you can connect to the server that binds to port 4444: At this point, you can begin installing and loading RSelenium with the following command: > install.packages("RSelenium") > library(RSelenium) After RSelenium is installed, register the driver and connect to the Selenium server: > remDr <- remoteDriver(remoteServerAddr = "localhost" + , port = 4444 + , browserName = "firefox" +) Examine the status of the registered driver: > remDr$getStatus() Next, we navigate to Bloomberg.com: > remDr$open() > remDr$navigate("http://www.bloomberg.com/quote/SPX:IND ") Finally, we can scrape the data using the CSS selector. > webElem <- remDr$findElements('css selector', ".cell") > webData <- sapply(webElem, function(x){ + label <- x$findChildElement('css selector', '.cell__label') + value <- x$findChildElement('css selector', '.cell__value') + cbind(c("label" = label$getElementText(), "value" = value$getElementText())) + } + ) Accessing Facebook data Social network data is another great source for a user who is interested in exploring and analyzing social interactions. The main difference between social network data and web data is that social network platforms often provide a semi-structured data format (mostly JSON). Thus, we can easily access the data without the need to inspect how the data is structured. In this recipe, we will illustrate how to use rvest and rson to read and parse data from Facebook. Getting ready For this recipe, prepare your environment with R installed on a computer with Internet access. How to do it… Perform the following steps to access data from Facebook: First, we need to log in to Facebook and access the developer page (https://developers.facebook.com/), as shown in the following screenshot: Click on Tools & Support and select Graph API Explorer: Next, click on Get Token and choose Get Access Token: On the User Data Permissions pane, select user_tagged_places and then click on Get Access Token: Copy the generated access token to the clipboard: Try to access Facebook API using rvest: > access_token <- '<access_token>' > fb_data <- html(sprintf("https://graph.facebook.com/me/tagged_places?access_token=%s",access_token)) Install and load rjson package: > install.packages("rjson") > library(rjson) Extract the text from fb_data and then use fromJSON to read JSON data: > fb_json <- fromJSON(fb_data %>% html_text()) Use sapply to extract the name and ID of the place from fb_json: > fb_place <- sapply(fb_json$data, function(e){e$place$name}) > fb_id <- sapply(fb_json$data, function(e){e$place$id}) Last, use data.frame to wrap the data: > data.frame(place = fb_place, id = fb_id) How it works… In this recipe, we covered how to retrieve social network data through Facebook's Graph API. Unlike scraping web pages, you need to obtain a Facebook access token before making any request for insight information. There are two ways to retrieve the access token: the first is to use Facebook's Graph API Explorer, and the other is to create a Facebook application. In this recipe, we illustrated how to use the Graph API Explorer to obtain the access token. Facebook's Graph API Explorer is where you can craft your requests URL to access Facebook data on your behalf. To access the explorer page, we first visit Facebook's developer page (https://developers.facebook.com/). The Graph API Explorer page is under the drop-down menu of Tools & Support. After entering the explorer page, we select Get Access Token from the drop-down menu of Get Token. Subsequently, a tabbed window will appear; we can check access permission to various levels of the application. For example, we can check tagged_places to access the locations that we previously tagged. After we selected the permissions that we require, we can click on Get Access Token to allow Graph API Explorer to access our insight data. After completing these steps, you will see an access token, which is a temporary and short-lived token that you can use to access Facebook API. With the access token, we can then access Facebook API with R. First, we need a HTTP request package. Similarly to the web scraping recipe, we can use the rvest package to make the request. We craft a request URL with the addition of the access_token (copied from Graph API Explorer) to the Facebook API. From the response, we should receive JSON formatted data. To read the attributes of the JSON format data, we install and load the RJSON package. We can then use the fromJSON function to read the JSON format string extracted from the response. Finally, we read places and ID information through the use of the sapply function, and we can then use data.frame to transform extracted information to the data frame. At the end of this recipe, we should see data formatted in the data frame. There's more... To learn more about Graph API, you can read the official document from Facebook (https://developers.facebook.com/docs/reference/api/field_expansion/): First, we need to install and load the Rfacebook package: > install.packages("Rfacebook") > library(Rfacebook) We can then use built-in functions to retrieve data from the user or access similar information with the provision of an access token: > getUsers("me", "<access_token>") If you want to scrape public fan pages without logging into Facebook every time, you can create a Facebook app to access insight information on behalf of the app.: To create an authorized app token, login to the Facebook developer page and click on Add a New Page: You can create a new Facebook app with any name, providing that it has not already been registered: Finally, you can copy both the app ID and app secret and craft the access token to <APP ID>|<APP SECRET>. You can now use this token to scrape public fan page information with Graph API: Similarly to Rfacebook, we can then replace the access_token with <APP ID>|<APP SECRET>: > getUsers("me", "<access_token>") Summary In this article, we learned how to utilize R packages to read data from a text format and scan files line by line. We also learned how to scrape internet and social network data using the R web scraper. Resources for Article: Further resources on this subject: Learning Data Analytics with R and Hadoop [article] Big Data Analysis (R and Hadoop) [article] Using R for Statistics, Research, and Graphics [article]
Read more
  • 0
  • 0
  • 2661

article-image-rapid-application-development-django-openduty-story
Bálint Csergő
01 Aug 2016
5 min read
Save for later

Rapid Application Development with Django, the Openduty story

Bálint Csergő
01 Aug 2016
5 min read
Openduty is an open source incident escalation tool, which is something like Pagerduty but free and much simpler. It was born during a hackathon at Ustream back in 2014. The project received a lot of attention in the devops community, and was also featured in Devops weekly andPycoders weekly.It is listed at Full Stack Python as an example Django project. This article is going to include some design decisions we made during the hackathon, and detail some of the main components of the Opendutysystem. Design When we started the project, we already knew what we wanted to end up with: We had to work quickly—it was a hackathon after all An API similar to Pagerduty Ability to send notifications asynchronously A nice calendar to organize on—call schedules can’t hurt anyone, right? Tokens for authorizing notifiers So we chose the corresponding components to reach our goal. Get the job done quickly If you have to develop apps rapidly in Python, Django is the framework you choose. It's a bit heavyweight, but hey, it gives you everything you need and sometimes even more. Don't get me wrong; I'm a big fan of Flask also, but it can be a bit fiddly to assemble everything by hand at the start. Flask may pay off later, and you may win on a lower amount of dependencies, but we only had 24 hours, so we went with Django. An API When it comes to Django and REST APIs, one of the GOTO soluitions is The Django REST Framework. It has all the nuts and bolts you'll need when you're assembling an API, like serializers, authentication, and permissions. It can even give you the possibility to make all your API calls self-describing. Let me show you howserializers work in the Rest Framework. class OnCallSerializer(serializers.Serializer): person = serializers.CharField() email = serializers.EmailField() start = serializers.DateTimeField() end = serializers.DateTimeField() The code above represents a person who is on-call on the API. As you can see, it is pretty simple; you just have to define the fields. It even does the validation for you, since you have to give a type to every field. But believe me, it's capable of more good things like generating a serializer from your Django model: class SchedulePolicySerializer(serializers.HyperlinkedModelSerializer): rules = serializers.RelatedField(many=True, read_only=True) class Meta: model = SchedulePolicy fields = ('name', 'repeat_times', 'rules') This example shows how you can customize a ModelSerializer, make fields read-only, and only accept given fields from an API call. Async Task Execution When you have tasks that are long-running, such as generating huge reports, resizing images, or even transcoding some media, it is a common practice thatyou must move the actual execution of those out of your webapp into a separate layer. This decreases the load on the webservers, helps in avoiding long or even timing out requests, and just makes your app more resilient and scalable. In the Python world, the go-to solution for asynchronous task execution is called Celery. In Openduty, we use Celery heavily to send notifications asynchronously and also to delay the execution of any given notification task by the delay defined in the service settings. Defining a task is this simple: @app.task(ignore_result=True) def send_notifications(notification_id): try: notification = ScheduledNotification.objects.get(id = notification_id) if notification.notifier == UserNotificationMethod.METHOD_XMPP: notifier = XmppNotifier(settings.XMPP_SETTINGS) #choosing notifier removed from example code snippet notifier.notify(notification) #logging task result removed from example snippet raise And calling an already defined task is also almost as simple as calling any regular function: send_notifications.apply_async((notification.id,) ,eta=notification.send_at) This means exactly what you think: Send the notification with the id: notification.id at notification.send_at. But how do these things get executed? Under the hood, Celery wraps your decorated functions so that when you call them, they get enqueued instead of being executed directly. When the celery worker detects that there is a task to be executed, it simply takes it from the queue and executes it asynchronously. Calendar We use django-scheduler for the awesome-looking calendar in Openduty. It is a pretty good project generally, supports recurring events, and provides you with a UI for your calendar, so you won't even have to fiddle with that. Tokens and Auth Service token implementation is a simple thing. You want them to be unique, and what else would you choose if not aUUID? There is a nice plugin for Django models used to handle UUID fields, called django-uuidfield. It just does what it says—addingUUIDField support to your models. User authentication is a bit more interesting, so we currently support plain Django Users, and you can use LDAP as your user provider. Summary This was just a short summary about the design decisions made when we coded Openduty. I also demonstrated the power of the components through some snippets that are relevant. If you are on a short deadline, consider using Django and its extensions. There is a good chance that somebody has already done what you need to do, or something similar, which can always be adapted to your needs thanks to the awesome power of the open source community. About the author BálintCsergő is a software engineer from Budapest, currently working as an infrastructure engineer at Hortonworks. He lovesUnix systems, PHP, Python, Ruby, the Oracle database, Arduino, Java, C#, music, and beer.
Read more
  • 0
  • 0
  • 14960

article-image-memory
Packt
01 Aug 2016
26 min read
Save for later

Memory

Packt
01 Aug 2016
26 min read
In this article by Enrique López Mañas and Diego Grancini, authors of the book Android High Performance Programming explains how memory is the matter to focus on. A bad memory managed application can affect the behavior of the whole system or it can affect the other applications installed on our device in the same way as other applications could affect ours. As we all know, Android has a wide range of devices in the market with a lot of different configurations and memory amounts. It's up to the developers to understand the strategy to take while dealing with this big amount of fragmentation, the pattern to follow while developing, and the tools to use to profile the code. This is the aim of this article. In the following sections, we will focus on heap memory. We will take a look at how our device handles memory deepening, what the garbage collection is, and how it works in order to understand how to avoid common developing mistakes and clarify what we will discuss to define best practices. We will also go through patterns definition in order to reduce drastically the risk of what we will identify as a memory leak and memory churn. This article will end with an overview of official tools and APIs that Android provides to profile our code and to find possible causes of memory leaks and that aren't deepened. (For more resources related to this topic, see here.) Walkthrough Before starting the discussion about how to improve and profile our code, it's really important to understand how Android devices handle memory. Then, in the following pages, we will analyze differences between the runtimes that Android uses, know more about the garbage collection, understand what a memory leak and memory churn are, and how Java handles object references. How memory works Have you ever thought about how a restaurant works during its service? Let's think about it for a while. When new groups of customers get into the restaurant, there's a waiter ready to search for a place to allocate them. But, the restaurant is a limited space. So, there is the need to free tables when possible. That's why, when a group has finished to eat, another waiter cleans and prepares the just freed table for other groups to come. The first waiter has to find the table with the right number of seats for every new group. Then, the second waiter's task should be fast and shouldn't hinder or block others' tasks. Another important aspect of this is how many seats are occupied by the group; the restaurant owner wants to have as much free seats as possible to place new clients. So, it's important to control that every group fills the right number of seats without occupying tables that could be freed and used in order to have more tables for other new groups. This is absolutely similar to what happens in an Android system. Every time we create a new object in our code, it needs to be saved in memory. So, it's allocated as part of our application private memory to be accessed whenever needed and the system keeps allocating memory for us during the whole application lifetime. Nevertheless, the system has a limited memory to use and it cannot allocate memory indefinitely. So, how is it possible for the system to have enough memory for our application all the time? And, why is there no need for an Android developer to free up memory? Let's find it out. Garbage collection The Garbage collection is an old concept that is based on two main aspects: Find no more referenced objects Free the memory referenced by those objects When that object is no more referenced, its "table" can be cleaned and freed up. This is, what it's done to provide memory for future new objects allocations. These operations of allocation of new objects and deallocation of no more referenced objects are executed by the particular runtime in use in the device, and there is no need for the developer to do anything just because they are all managed automatically. In spite of what happens in other languages, such as C or C++, there is no need for the developer to allocate and deallocate memory. In particular, while the allocation is made when needed, the garbage collection task is executed when a memory upper limit is reached. Those automatic operations in the background don't exempt developers from being aware of their app's memory management; if the memory management is not well done, the application can be lead to lags, malfunctions and, even, crashes when an OutOfMemoryError exception is thrown. Shared memory In Android, every app has its own process that is completely managed by the runtime with the aim to reclaim memory in order to free resources for other foreground processes, if needed. The available amount of memory for our application lies completely in RAM as Android doesn't use swap memory. The main consequence to this is that there is no other way for our app to have more memory than to unreferenced no longer used objects. But Android uses paging and memory mapping; the first technique defines blocks of memory of the same size called pages in a secondary storage, while the second one uses a mapping in memory with correlated files in secondary storage to be used as primary. They are used when the system needs to allocate memory for other processes, so the system creates paged memory-mapped files to save Dalvik code files, app resources, or native code files. In this way, those files can be shared between multiple processes. As a matter of fact, Android system uses a shared memory in order to better handle resources from a lot of different processes. Furthermore, every new process to be created is forked by an already existing one that is called Zygote. This particular process contains common framework classes and resources to speed up the first boot of the application. This means that the Zygote process is shared between processes and applications. This large use of shared memory makes it difficult to profile the use of memory of our application because there are many facets to be consider before reaching a correct analysis of memory usage. Runtime Some functions and operations of memory management depend on the runtime used. That's why we are going through some specific features of the two main runtime used by Android devices. They are as follows: Dalvik Android runtime (ART) ART has been added later to replace Dalvik to improve performance from different point of view. It was introduced in Android KitKat (API Level 19) as an option for developer to be enabled, and it has become the main and only runtime from Android Lollipop (API Level 21) on. Besides the difference between Dalvik and ART in compiling code, file formats, and internal instructions, what we are focusing on at the moment is memory management and garbage collection. So, let's understand how the Google team improved performance in runtimes garbage collection over time and what to pay attention at while developing our application. Let's step back and return to the restaurant for a bit more. What would happen if everything, all employees, such as other waiters and cooks, and all of the services, such as dishwashers, and so on, stop their tasks waiting for just a waiter to free a table? That single employee performance would make success or fail of all. So, it's really important to have a very fast waiter in this case. But, what to do if you cannot afford him? The owner wants him to do what he has to as fast as possible, by maximizing his productivity and, then, allocating all the customers in the best way and this is exactly what we have to do as developers. We have to optimize memory allocations in order to have a fast garbage collection even if it stops all the other operations. What is described here is just like the runtime garbage collection works. When the upper limit of memory is reached, the garbage collection starts its task pausing any other method, task, thread, or process execution, and those objects won't resume until the garbage collection task is completed. So, it's really important that the collection is fast enough not to impede the reaching of the 16 ms per frame rule, resulting in lags, and jank in the UI. The more time the garbage collection works, the less time the system has to prepare frames to be rendered on the screen. Keep in mind that automatic garbage collection is not free; bad memory management can lead to bad UI performance and, thus, bad UX. No runtime feature can replace good memory management. That's why we need to be careful about new allocations of objects and, above all, references. Obviously, ART introduced a lot of improvement in this process after the Dalvik era, but the background concept is the same; it reduces the collection steps, it adds a particular memory for Bitmap objects, it uses new fast algorithms, and it does other cool stuff getting better in the future, but there is no way to escape that we need to profile our code and memory usage if we want our application to have the best performance. Android N JIT compiler The ART runtime uses an ahead-of-time compilation that, as the name suggests, performs compilation when the applications are first installed. This approach brought in advantages to the overall system in different ways because, the system can: Reduce battery consumption due to pre-compilation and, then, improve autonomy Execute application faster than Dalvik Improve memory management and garbage collection However, those advantages have a cost related to installation timings; the system needs to compile the application at that time, and then, it's slower than a different type of compiler. For this reason, Google added a just-in-time (JIT) compiler to the ahead-of-time compiler of ART into the new Android N. This one acts when needed, so during the execution of the application and, then, it uses a different approach compared to the ahead-of-time one. This compiler uses code profiling techniques and it's not a replacement for the ahead-of-time, but it's in addition to it. It's a good enhancement to the system for the advantages in terms of performance it introduces. The profile-guided compilation adds the possibility to precompile and, then, to cache and to reuse methods of the application, depending on usage and/or device conditions. This feature can save time to the compilation and improve performance in every kind of system. Then, all of the devices benefit of this new memory management. The key advantages are: Less used memory Less RAM accesses Lower impact on battery All of these advantages introduced in Android N, however, shouldn't be a way to avoid a good memory management in our applications. For this, we need to know what pitfalls are lurking behind our code and, more than this, how to behave in particular situations to improve the memory management of the system while our application is active. Memory leak The main mistake from the memory performance perspective a developer can do while developing an Android application is called memory leak, and it refers to an object that is no more used but it's referenced by another object that is, instead, still active. In this situation, the garbage collector skips it because the reference is enough to leave that object in memory. Actually, we are avoiding that the garbage collector frees memory for other future allocations. So, our heap memory gets smaller because of this, and this leads to the garbage collection to be invoked more often, blocking the rest of executions of the application. This could lead to a situation where there is no more memory to allocate a new object and, then, an OutOfMemoryError exception is thrown by the system. Consider the case where a used object references no more used objects, that reference no more used objects, and so on; none of them can be collected, just because the root object is still in use. Memory churn Another anomaly in memory management is called memory churn, and it refers to the amount of allocations that is not sustainable by the runtime for the too many new instantiated objects in a small period of time. In this case, a lot of garbage collection events are called many times affecting the overall memory and UI performance of the application. The need to avoid allocations in the View.onDraw() method, is closely related to memory churn; we know that this method is called every time the view needs to be drawn again and the screen needs to be refreshed every 16.6667 ms. If we instantiate objects inside that method, we could cause a memory churn because those objects are instantiated in the View.onDraw() method and no longer used, so they are collected very soon. In some cases, this leads to one or more garbage collection events to be executed every time the frame is drawn on the screen, reducing the available time to draw it below the 16.6667 ms, depending on collection event duration. References Let's have a quick overview of different objects that Java provides us to reference objects. This way, we will have an idea of when we can use them and how. Java defines four levels of strength: Normal: It's the main type of reference. It corresponds to the simple creation of an object and this object will be collected when it will be no more used and referenced, and it's just the classical object instantiation: SampleObject sampleObject = new SampleObject(); Soft: It's a reference not enough strong to keep an object in memory when a garbage collection event is triggered. So, it can be null anytime during the execution. Using this reference, the garbage collector decides when to free the object memory based on memory demand of the system. To use it, just create a SoftReference object passing the real object as parameter in the constructor and call the SoftReference.get() method to get the object: SoftReference<SampleObject> sampleObjectSoftRef = new SoftReference<SampleObject>(new SampleObject()); SampleObject sampleObject = sampleObjectSoftRef.get(); Weak: It's exactly as SoftReferences, but this is weaker than the soft one: WeakReference<SampleObject> sampleObjectWeakRef = new WeakReference<SampleObject>(new SampleObject()); Phantom: This is the weakest reference; the object is eligible for finalization. This kind of references is rarely used and the PhantomReference.get() method returns always null. This is for reference queues that don't interest us at the moment, but it's just to know that this kind of reference is also provided. These classes may be useful while developing if we know which objects have a lower level of priority and can be collected without causing problems to the normal execution of our application. We will see how can help us manage memory in the following pages. Memory-side projects During the development of the Android platform, Google has always tried to improve the memory management system of the platform to maintain a wide compatibility with increasing performance devices and low resources ones. This is the main purpose of two project Google develops in parallel with the platform, and, then, every new Android version released means new improvements and changes to those projects and their impacts on the system performance. Every one of those side projects is focusing on a different matter: Project Butter: This is introduced in Android Jelly Bean 4.1 (API Level 16) and then improved in Android Jelly Bean 4.2 (API Level 17), added features related to the graphical aspect of the platform (VSync and buffering are the main addition) in order to improve responsiveness of the device while used. Project Svelte: This is introduced inside Android KitKat 4.4 (API Level 19), it deals with memory management improvements in order to support low RAM devices. Project Volta: This is introduced in Android Lollipop (API Level 21), it focuses on battery life of the device. Then, it adds important APIs to deal with batching expensive battery draining operations, such as the JobSheduler or new tools such as the Battery Historian. Project Svelte and Android N When it was first introduced, Project Svelte reduced the memory footprint and improved the memory management in order to support entry-level devices with low memory availability and then broaden the supported range of devices with clear advantage for the platform. With the new release of Android N, Google wants to provide an optimized way to run applications in background. We know that the process of our application last in background even if it is not visible on the screen, or even if there are no started activities, because a service could be executing some operations. This is a key feature for memory management; the overall system performance could be affected by a bad memory management of the background processes. But what's changed in the application behavior and the APIs with the new Android N? The chosen strategy to improve memory management reducing the impact of background processes is to avoid to send the application the broadcasts for the following actions: ConnectivityManager.CONNECTIVITY_ACTION: Starting from Android N, a new connectivity action will be received just from those applications that are in foreground and, then, that have registered BroadcastReceiver for this action. No application with implicit intent declared inside the manifest file will receive it any longer. Hence, the application needs to change its logics to do the same as before. Camera.ACTION_NEW_PICTURE: This one is used to notify that a picture has just been taken and added to the media store. This action won't be available anymore neither for receiving nor for sending and it will be for any application, not just for the ones that are targeting the new Android N. Camera.ACTION_NEW_VIDEO: This is used to notify a video has just been taken and added to the media store. As the previous one, this action cannot be used anymore, and it will be for any application too. Keep in mind these changes when targeting the application with the new Android N to avoid unwanted or unexpected behaviors. All of the preceding actions listed have been changed by Google to force developers not to use them in applications. As a more general rule, we should not use implicit receivers for the same reason. Hence, we should always check the behavior of our application while it's in the background because this could lead to an unexpected usage of memory and battery drain. Implicit receivers can start our application components, while the explicit ones are set up for a limited time while the activity is in foreground and then they cannot affect the background processes. It's a good practice to avoid the use of implicit broadcast while developing applications to reduce the impact of it on background operations that could lead to unwanted waste of memory and, then, a battery drain. Furthermore, Android N introduces a new command in ADB to test the application behavior ignoring the background processes. Use the following command to ignore background services and processes: adb shell cmd appops set RUN_IN_BACKGROUND ignore Use the following one to restore the initial state: adb shell cmd appops set RUN_IN_BACKGROUND allow Best practices Now that we know what can happen in memory while our application is active, let's have a deep examination of what we can do to avoid memory leaks, memory churns, and optimize our memory management in order to reach our performance target, not just in memory usage, but in garbage collection attendance, because, as we know, it stops any other working operation. In the following pages, we will go through a lot of hints and tips using a bottom-up strategy, starting from low-level shrewdness in Java code to highest level Android practices. Data types We weren't joking; we are really talking about Java primitive types as they are the foundation of all the applications, and it's really important to know how to deal with them even though it may be obvious. It's not, and we will understand why. Java provides primitive types that need to be saved in memory when used: the system allocate an amount of memory related to the needed one requested for that particular type. The followings are Java primitive types with related amount of bits needed to allocate the type: byte: 8 bit short: 16 bit int: 32 bit long: 64 bit float: 32 bit double: 64 bit boolean: 8 bit, but it depends on virtual machine char: 16 bit At first glance, what is clear is that you should be careful in choosing the right primitive type every time you are going to use them. Don't use a bigger primitive type if you don't really need it; never use long, float, or double, if you can represent the number with an integer data type. Otherwise, it would be a useless waste of memory and calculations every time the CPU need to deal with it and remember that to calculate an expression, the system needs to do a widening primitive implicit conversion to the largest primitive type involved in the calculation. Autoboxing Autoboxing is the term used to indicate an automatic conversion between a primitive type and its corresponding wrapper class object. Primitive type wrapper classes are the followings: java.lang.Byte java.lang.Short java.lang.Integer java.lang.Long java.lang.Float java.lang.Double java.lang.Boolean java.lang.Character They can be instantiated using the assignation operator as for the primitive types, and they can be used as their primitive types: Integer i = 0; This is exactly as the following: Integer i = new Integer(0); But the use of autoboxing is not the right way to improve the performance of our applications; there are many costs for that: first of all, the wrapper object is much bigger than the corresponding primitive type. For instance, an Integer object needs 16 bytes in memory instead of 16 bits of the primitive one. Hence, the bigger amount of memory used to handle that. Then, when we declare a variable using the primitive wrapper object, any operation on that implies at least another object allocation. Take a look at the following snippet: Integer integer = 0; integer++; Every Java developer knows what it is, but this simple code needs an explanation about what happened step by step: First of all, the integer value is taken from the Integer value integer and it's added 1: int temp = integer.intValue() + 1; Then the result is assigned to integer, but this means that a new autoboxing operation needs to be executed: i = temp; Undoubtedly, those operations are slower than if we used the primitive type instead of the wrapper class; no needs to autoboxing, hence, no more bad allocations. Things can get worse in loops, where the mentioned operations are repeated every cycle; take, for example the following code: Integer sum = 0; for (int i = 0; i < 500; i++) { sum += i; } In this case, there are a lot of inappropriate allocations caused by autoboxing, and if we compare this with the primitive type for loop, we notice that there are no allocations: int sum = 0; for (int i = 0; i < 500; i++) { sum += i; } Autoboxing should be avoided as much as possible. The more we use primitive wrapper classes instead of primitive types themselves, the more waste of memory there will be while executing our application and this waste could be propagated when using autoboxing in loop cycles, affecting not just memory, but CPU timings too. Sparse array family So, in all of the cases described in the previous paragraph, we can just use the primitive type instead of the object counterpart. Nevertheless, it's not always so simple. What happens if we are dealing with generics? For example, let's think about collections; we cannot use a primitive type as generics for objects that implements one of the following interfaces. We have to use the wrapper class this way: List<Integer> list; Map<Integer, Object> map; Set<Integer> set; Every time we use one of the Integer objects of a collection, autoboxing occurs at least once, producing the waste outlined above, and we know well how many times we deal with this kind of objects in every day developing time, but isn't there a solution to avoid autoboxing in these situations? Android provides a useful family of objects created on purpose to replace Maps objects and avoid autoboxing protecting memory from pointless bigger allocations; they are the Sparse arrays. The list of Sparse arrays, with related type of Maps they can replace, is the following: SparseBooleanArray: HashMap<Integer, Boolean> SparseLongArray: HashMap<Integer, Long> SparseIntArray: HashMap<Integer, Integer> SparseArray<E>: HashMap<Integer, E> LongSparseArray<E>: HashMap<Long, E> In the following, we will talk about SparseArray object specifically, but everything we say is true for all other object above as well. The SparseArray uses two different arrays to store hashes and objects. The first one collects the sorted hashes, while the second one stores the key/value pairs ordered conforming to the key hashes array sorting as in Figure 1: Figure 1: SparseArray's hashes structure When you need to add a value, you have to specify the integer key and the value to be added in SparseArray.put() method, just like in the HashMap case. This could create collisions if multiple key hashes are added in the same position. When a value is needed, simply call SparseArray.get(), specifying the related key; internally, the key object is used to binary search the index of the hash, and then the value of the related key, as in Figure 2: Figure 2: SparseArray's workflow When the key found at the index resulting from binary search does not match with the original one, a collision happened, so the search keeps on in both directions to find the same key and to provide the value if it's still inside the array. Thus, the time needed to find the value increases significantly with a large number of object contained by the array. By contrast, a HashMap contains just a single array to store hashes, keys, and values, and it uses largest arrays as a technique to avoid collisions. This is not good for memory, because it's allocating more memory than what it's really needed. So HashMap is fast, because it implements a better way to avoid collisions, but it's not memory efficient. Conversely, SparseArray is memory efficient because it uses the right number of object allocations, with an acceptable increase of execution timings. The memory used for these arrays is contiguous, so every time you remove a key/value pair from SparseArray, they can be compacted or resized: Compaction: The object to remove is shifted at the end and all the other objects are shifted left. The last block containing the item to be removed can be reused for future additions to save allocations. Resize: All the elements of the arrays are copied to other arrays and the old ones are deleted. On the other hand, the addition of new elements produces the same effect of copying all elements into new arrays. This is the slowest method, but it's completely memory safe because there are no useless memory allocations. In general, HashMap is faster while doing these operations because it contains more blocks than what it's really needed. Hence, the memory waste. The use of SparseArray family objects depends of the strategy applied for memory management and CPU performance patterns because of calculations performance cost compared to the memory saving. So, the use is right in some situations. Consider the use of it when: The number of object you are dealing with is below a thousand, and you are not going to do a lot of additions and deletions. You are using collections of Maps with a few items, but lots of iterations. Another useful feature of those objects is that they let you iterate over indexing, instead of using the iterator pattern that is slower and memory inefficient. The following snippet shows how the iteration doesn't involve objects: // SparseArray for (int i = 0; i < map.size(); i++) { Object value = map.get(map.keyAt(i)); } Contrariwise, the Iterator object is needed to iterate through HashMaps: // HashMap for (Iterator iter = map.keySet().iterator(); iter.hasNext(); ) { Object value = iter.next(); } Some developers think the HashMap object is the better choice because it can be exported from an Android application to other Java ones, while the SparseArray family's object don't. But what we analyzed here as memory management gain is applicable to any other cases. And, as developers, we should strive to reach performance goals in every platform, instead of reusing the same code in different platform, because different platform could have been affected differently from a memory perspective. That's why, our main suggestion is to always profile the code in every platform we are working on, and then make our personal considerations on better or worse approaches depending on results. ArrayMap An ArrayMap object is an Android implementation of the Map interface that is more memory efficient than the HashMap one. This class is provided by the Android platform starting from Android KitKat (API Level 19), but there is another implementation of this inside the Support package v4 because of its main usage on older and lower-end devices. Its implementation and usage is totally similar to the SparseArray objects with all the implications about memory usage and computational costs, but its main purpose is to let you use objects as keys of the map, just like the HashMap does. Hence, it provides the best of both worlds. Summary We defined a lot of best practices to help keep a good memory management, introducing helpful design patterns and analyzing which are the best choices while developing things taken for granted that can actually affect memory and performance. Then, we faced the main causes for the worst leaks in Android platform, those related to main components such as Activities and Services. As a conclusion for the practices, we introduced APIs both to use and not to use. Then, other ones able to define a strategy for events related to the system and, then, external to the application. Resources for Article: Further resources on this subject: Hacking Android Apps Using the Xposed Framework [article] Speeding up Gradle builds for Android [article] Get your Apps Ready for Android N [article]
Read more
  • 0
  • 0
  • 7537

article-image-webrtc-freeswitch
Packt
25 Jul 2016
16 min read
Save for later

WebRTC in FreeSWITCH

Packt
25 Jul 2016
16 min read
In this article by Anthony Minessale and Giovanni Maruzzelli, authors of Mastering FreeSWITCH, we will cover the following topics: What WebRTC is and how it works Encryption and NAT traversing (STUN, TURN, etc) Signaling and media Interconnection with PSTN and SIP networks FreeSWITCH as a WebRTC server, gateway, and application server SIP signaling clients with JavaScript (SIP.js) Verto signaling clients with JavaScript (mod_verto, verto.js) (For more resources related to this topic, see here.) WebRTC Finally something new! How refreshing it is to be learning and experimenting again, especially if you're an old hand! After at least ten years of linear evolution, here we are with a quantum leap, the black swan that truly disrupts the communication sector. Browsers are already out there, waiting With an installed base of hundreds of millions, and soon to be in the billions ballpark, browsers (both on PCs and on smart phones) are now complete communication terminals, audio/video endpoints that do not need any additional software, plugins, hardware, or whatever. Browsers now incorporate, per default and in a standard way, all the software needed to interact with loudspeakers, microphones, headsets, cameras, screens, etc. Browsers are the new endpoints, the CPEs, the phones. They have an API, they're updated automatically, and are compatible with your system. You don't have to procure, configure, support, or upgrade them. They're ready for your new service; they just work, and are waiting for your business. Web Real-Time Communication is coming There are two completely separated flows in communication: Signaling and media. Signaling is a flow of information that defines who is calling whom, taking what paths, and which technology is used to transmit which content. Media is the actual digitized content of the communication, for example, audio, video, screen-sharing, etc. Media and signaling often take completely unrelated paths to go from caller to callee, for example, their IP packets traverse different gateways and routers. Also, the two flows are managed by separate software (or by different parts of the same application) using different protocols. WebRTC defines how a browser accesses its own media capture, how it sends and receives media from a peer through the network and how it renders the media stream that it receives. It represents this using the same Session Description Protocol (SDP) as SIP does. So, WebRTC is all about media, and doesn't prescribe a signaling system. This is a design decision, embedded in the standard definition. Popular signaling systems include SIP, XMPP, and proprietary or custom protocols. Also, WebRTC is all about encryption. All WebRTC media streams are mandatorily encrypted. Chrome, Firefox, and Opera (together they account for more than 70 percent of the browsers in use) already implement the standard; Edge is announcing the first steps in supporting WebRTC basic features, while only Safari is still holding its cards (Skype and FaceTime on WebRTC with proprietary signaling? Wink wink). Under the hood More or less, WebRTC works like this: Browser connects to a web server and loads a webpage with some JavaScript in it JavaScript in the webpage takes control of browser's media interfaces (microphone, camera, speakers, and so on), resulting in an API media object The WebRTC Api Media object will contain the capabilities of all devices and codecs available, for example, definition, sample rate, and so on, and it will permit the user to choose their own capabilities preferences (for example, use QVGA video to minimize CPU and bandwidth) Webpage will interface with browser's user, getting some input for signing in the webserver's communication service (if any) JavaScript will use whatever signaling method (SIP, XMPP, proprietary, custom) over encrypted secure websocket (wss://) for signing in the communication service, finding peers, originating and receiving calls Once signed up in the service, a call can be made and received. Signaling will give the protocol address of the peer (for example, sip:gmaruzz@opentelecom.it) These points are represented in the following image: Now is the moment to find out actual IP addresses. JavaScript will generate a WebRTC API object for finding its own IP addresses, transports and ports (ICE candidates) to be offered to peer for exchanging media (JavaScript WebRTC API will use ICE, STUN, TURN, and will send to peer its own local LAN address, its own public IP address, and maybe the IP address of a Turn server it can use)   Then, WebRTC Net API will exchange ICE candidates with the peer, until they both find the most "rational" triplets of IP address, port and transport (udp, dtls, and so on), for each stream (for example, audio, video, screen share, and so on) Once they get the best addresses, the signaling will establish the call. These points are represented in the following image:   Once signaling communication with the peer is established, media capabilities are exchanged in SDP format (exactly as in SIP), and the two peers agree on media formats (sample rates, codecs, and so on) When media formats are agreed, JavaScript WebRTC Transport API will use secure (encrypted) websockets (wss://) as transport for media and data JavaScript WebRTC Media API will be used to render the media streams received (for example, render video, play sound, capture microphone, and so on) Additionally or in alternative to media, peers can establish one or more data channels, through which they bidirectionally exchange raw or structured data (file transfers, augmented reality, stock tickers, and so on) At hangup, signaling will tear down the call, and JavaScript WebRTC Media API will be used to shut down streams and renderings These points are represented in the following image: This is a high level, but complete, view of how a WebRTC system works. Encryption – security Please note that in normal operation everything is encrypted, uses real PKI certificates from real Certification Authorities, actual DNS names, SSL, TLS, HTTPS, WSS, DTLS-SRTP. This is how it is supposed to work. In WebRTC, security is not an afterthought: It is mandatory. To make signaling work without encryption (for example, for debugging signaling protocols) is not so easy, but it is possible. Browsers will often raise security exceptions, and will ask for permission each time they access a camera or microphone. Some hiccups will happen, but it is doable. Signaling is not part of WebRTC standard, as you know. On the contrary, it is not possible to have the media or data streams to leave the browser in the clear, without encryption. The use of plain RTP to transmit media is explicitly forbidden by the standard. Media is transmitted by SRTP (Secure RTP), where encryption keys are pre-exchanged via DTLS (Datagram Transport Layer Security, a version of TLS for Datagrams), basically a secure version of UDP. Beyond peer to peer – WebRTC to communication networks and services WebRTC is a technique for browsers to send media to each other via Internet, peer to peer, perhaps with the help of a relay server (TURN), if they can't reach each other directly. That's it. No directories, no means to find another person, and also no way to "call" that person if we know "where" to call her. No way to transfer calls, to react to a busy user or to a user that does not pickup, and so on. Let's say WebRTC is a half-built phone: It has the handset, complete with working microphone and speaker, from which it comes out, the wiring left loose. You can cross join that wiring with the wiring of another half-built phone, and they can talk to each other. Then, if you want to talk to another device, you must find it and then join the wires anew. No dial pad, no Telecom Central Office, no interconnection between Local Carriers, and with International Carriers. No PBX. No way to call your grandma, and no possibilities to navigate the IVR at Federal Express' Customer Care. We need to integrate the media capabilities and the ubiquity of WebRTC with the world of telecommunication services that constitute the planet's nervous system. Enter the "WebRTC Gateway" and the "WebRTC Application Server"; in our case both are embodied by FreeSWITCH WebRTC gateways and application servers The problem to be solved is: We can implement some kind of signaling plane, even implement a complete SIP signaling stack in JavaScript (there are some very good ones in open source, we'll see later), but then both at the network and at the media plane, WebRTC is only "kind of" compatible with the existing telecommunication world; it uses techniques and concepts that are "similar", and protocols that are mostly an "evolution " of those implemented in usual Voice over IP. At the network plane, WebRTC uses ICE protocol to traverse NAT via STUN and TURN servers. ICE has been developed as Internet standard to be the ultimate tool to solve all NAT problems, but has not yet been implemented in either telco infrastructure, nor in most VoIP clients. Also, ICE candidates (the various different addresses the browser thinks they would be reachable at) need to be passed in SDP and negotiated between peers, in the same way codecs are negotiated. Being able to pass through corporate firewalls (UDP blocked, TCP open only on ports 80 and 443, and perhaps through protocol-aware proxies) is an absolute necessity for serious WebRTC deployment. At media plane, WebRTC specific codecs (V8 for video and Opus for audio) are incompatible with the telco world, with audio G711 as the only common denominator. Worst yet, all media are encrypted as SRTP with DTLS key exchange, and that's unheard of in today's telco infrastructure. So, we need to create the signaling plane, and then convert the network transport, convert the codecs, manage the ICE candidates selection in SDP, and allow access to the wealth of ready-made services (PSTN calls, IVRs, PBXs, conference rooms, etc), and then complement the legacy services with special features and new interconnected services enabled by the unique capabilities of WebRTC endpoints. Yeah, that's a job for FreeSWITCH. Which architecture? Legacy on the Web, or Web on the Telco? Real-time communication via the Web: From the building blocks we just saw, we can implement it in many ways. We have one degree of freedom: Signaling. I mean, media will be anyway agreed about via SDP, transmitted via websockets as SRTP packets, and encrypted via DTLS key exchange. We still have the task to choose how we will find the peer to exchange media with. So, this is an exercise in directory, location, registration, routing, presence, status, etc. You get the idea. So, at the end of the day you need to come out with a JavaScript library to implement your signaling on the browsers, commanding their underlying mechanisms (Comet, Websockets, WebRTC Data Channel) to find your beloved communication peer. Actually it boils down to different possibilities: SIP XMPP (eg: jabber) In-house signaling implementation VERTO (open source) SIP and XMPP make today's world spin around. SIP is mostly known for carrying the majority of telephone and VoIP signaling traffic. The biggest implementations of instant messaging and chatting are based on XMPP. And there is more: Those two signaling protocols are often used together, although each one of them has extensions that provide the other one's functionality. Both SIP and XMPP have been designed to be expandable and modular, and SIP particularly is an abstract protocol, for the management of "sessions" (where a "session" can be whatever has a beginning and an end in time, as a voice or video call, a screen share, a whiteboard, a collaboration platform, a payment, a message, and so on). Both have robust JavaScript implementations available (for SIP check SIP.js, JsSIP, SIPML, while for XMPP check Strophe, stanza.io, jingle.js). If your company has considerable investments and/or expertise in those protocols, then it makes sense to expand their usage on the web too. If you're running Skype, or similar services, you may find it an attractive option to maintain your proprietary, closed-signaling protocol and implement it in JavaScript, so you can expand your service reach to browsers and exploit that common transport and media technologies. VERTO is our open source signaling proposal, designed from the ground up to be familiar to Web application developers, and allowing for a high degree of integration between FreeSWITCH-provided services and browsers. It is implemented on the FreeSWITCH side by a module (mod_verto) that talks JSON with the JavaScript library (verto.js) on the browser side. FreeSWITCH accommodates them ALL FreeSWITCH implements all of WebRTC low-level protocols, codecs and requirements. It's got encryption, SRTP, DTLS, RTP, websocket and secure websocket transports (ws:// and wss://). Having got it all, it is able to serve SIP endpoints over WebRTC via mod_sofia (they'll be just other SIP phones, exactly like the rest of soft and hard SIP phones), and it interacts with XMPP via mod_jingle. Crucially, FreeSWITCH has been designed since its inception to be able to manage and message high-definition media, both audio and video. Support for OPUS audio codec (8 up to 48 khz, enough for actual audio-cd quality) started years ago as a pioneering feature, and has evolved over the years to be so robust and self-healing as to sustain a loss of more than 40% (yep, as in FORTY PERCENT) packets and maintain understandability. WebRTC's V8 video codec is routinely carrying our mixed video conferences in FullHD (as in 1920x1080 pixel), and we're looking forward to investing in fiber and in some facial cream to look good in 4K. That's why FreeSWITCH can be the pivot of your next big WebRTC project: its architecture was designed from the start to be a multimedia powerhouse. There is lot of experience out there using FreeSWITCH in expanding the reach of existing SIP services having the browsers acting as SIP phones via JavaScript libraries, without modifying in any way the service logic and implementation. You just add SIP extensions that happen to be browsers. For the remainder of this article we'll write about VERTO, a FreeSWITCH proposal especially dedicated to Web development. What is Verto (module and jslib)? Verto is a FreeSWITCH module (mod_verto) that allows for JSON interaction with FreeSWITCH, via secure websockets (wss). All the power and complexity of FreeSWITCH can be harnessed via Verto: Session management, call control, text messaging, and user data exchange and synchronization. Take a note for yourself: "User data exchange and synchronization". We'll be back to this later. Verto is like Event Socket Layer (ESL) on steroids: Anything you can do in ESL (subscribe, send and receive messages in FS core message pumps/queues) you can do in Verto, but Verto is actually much more and can do much more. Verto is also made for high-level control of WebRTC! Verto has an accompanying JavaScript library, verto.js. Using verto.js a web developer can videoconference and enable a website and/or add a collaboration platform to a CRM system in few lines of code. And in a few lines of a code that he understands, in a logic that's familiar to web developers, without forcing references to foreign knowledge domains like SIP. Also, Verto allows for the simplest way to extend your existing SIP services to WebRTC browsers. The added benefit of "user data exchange and synchronization" (see, I'm back to it) is not to be taken lightly: You can create data structures (for example, in JSON) and have them synchronized on server and all clients, with each modification made by the client or server to be automatically, immediately and transparently reflected on all other clients. Imagine a dynamic list of conference participants, or a chat, or a stock ticker, or a multiuser ping pong game, and so on. Configure mod_verto Mod_verto is installed by default by standard FreeSWITCH implementation. Let's have a look at its configuration file, verto.conf.xml. The most important parameter here, and the only one I had to modify from the stock configuration file, is ext-rtp-ip. If your server is behind a NAT (that is, it sits on a private network and exchanges packets with the public internet via some sort of port forwarding by a router or firewall), you must set this parameter to the public IP address the clients are reaching for. Other very important parameters are the codec strings. Those two parameters determine the absolute string that will be used in SDP media negotiation. The list in the string will represent all the media formats to be proposed and accepted. WebRTC has mandatory (so, assured) support for vp8 video codec, while mandatory audio codecs are opus and pcmu/pcma (eg, g711). Pcmu and pcma are much less CPU hungry than opus. So, if you are willing to set for less quality (g711 is "old PSTN" audio quality), you can use "pcmu,pcma,vp8" as your strings, and have both clients and server use far less CPU power for audio processing. This can make a real difference and very much sense in certain setups, for example, if you must cope with low-power devices. Also, if you route/bridge calls to/from PSTN, they will have no use for opus high definition audio; much better to directly offer the original g711 stream than decode/recode it in opus. Test with Communicator Once configured, you want to test your mod_verto install. What better moment than now to get to know the awesomeness of Verto Communicator, a JavaScript videoconference and collaboration advanced client, developed by Italo Rossi, Jonatas Oliveira and Stefan Yohansson from Brazil, Joao Mesquita from Argentina, and our core devs Ken Rice and Brian West from Tennessee and Oklahoma? If it's not already done, copy Verto Communicator distribution directory (/usr/src/freeswitch.git/html5/verto/verto_communicator/dist/) into a directory served by your web server in SSL (be sure you got all the SSL certificates right). To see it in all its splendor, be sure to call from two different clients, one as simple participant, the other as moderator, and you'll be presented with controls to manage the conference layout, for giving floor, for screen sharing, for creating banners with name and title for each participant, for real-time chatting, and much more. It is simply astonishing what can be done with JavaScript and mod_verto. Summary In this article we delved in WerbRTC design, what infrastructure it requires, in what is similar and in what is different from known VoIP. We understood that WebRTC is only about media, and leave the signaling to the implementor. Also, we get the specific of WebRTC, its way to traverse NAT, its omnipresent encryption, its peer to peer nature. We witnessed going beyond peer to peer, connecting with the telecommunication world of services needs gateways that do transport, protocol and media translations. FreeSWITCH is the perfect fit as WebRTC server, WebRTC gateway, and also as application server. And then we saw how to implement Verto, a signaling born on WebRTC, a JSON web protocol designed to exploit the additional features of WerbRTC and of FreeSWITCH, like real time data structure synchronization, session rehydration, event systems, and so on. Resources for Article: Further resources on this subject: Configuring FreeSWITCH for WebRTC [article] Architecture of FreeSWITCH [article] FreeSWITCH 1.0.6: SIP and the User Directory [article]
Read more
  • 0
  • 0
  • 20893

article-image-visualizing-time-spent-typing-slack
Bradley Cicenas
25 Jul 2016
4 min read
Save for later

Visualizing Time Spent Typing in Slack

Bradley Cicenas
25 Jul 2016
4 min read
Slacks massive popularity as a team messaging platform has brought up some age-old questions about productivity in the workplace. Does ease of communication really enable us to get more done day-to-day? Or is it just another distraction in the sea of our notification panel? Using the Slack RTM(Real-Time Messaging) API, we can follow just how much of our day we spend collaborating, making business-critical decisions, and sharing cat GIFs. A word on the Real-Time Messaging API Much of Slack’s success can be attributed the plethora of bots, integrations, and apps available for the platform. While many are built on the robust Web API, the Real-Time Messaging API provides a stream comprised of over 65 different events as they happen, making it an ideal choice for analyzing your own messaging habits. Events types include file uploads, emoji usage, user status, joining and leaving a channel, and many more. Since it's difficult to gauge how long we spend reading or thinking about conversations in Slack, we'll use a metric we do know with a bit of certainty—time spent typing. Fortunately, this is also a specific event type broadcast from the RTM API: user_typing. Unlike most web APIs, connections to the RTM API are made over a persistent websocket. We'll use the SlackSocket Python library to listen in on events as they come in. Recording events To start, we'll need to gather and record event data across a period of time. Creating a SlackSocket object filtered by event type is fairly straightforward: fromslacksocketimportSlackSocket slack=SlackSocket('<slack-token>', event_filters=['user_typing']) Since we're only concerned with following a single type of event, an event_filter is added so that we won't have to read and filter every incoming message in our code. According to the documentation, a user_typing event is sent: on every key press in the chat input unless one has been sent in the last three seconds For the sake of our analysis, we'll assume that each of these events accounts for three seconds of a user’s time. importos fromdatetimeimportdatetime for event inslack.events(): now=datetime.now().timestamp() # get the current epoch timestamp withopen('typing.csv', 'a') as of: of.write('%s,%s'% (now, event.event['user'])) Our typing will be logged in CSV format with a timestamp and the corresponding user that triggered the event. Plotting with matplotlib After we've collected a sufficient amount of data(a day in this case) on our typing events, we can plot it out in a separate script using matplotlib. We'll read in all of the data, filtering for our user: importos fromdatetimeimportdatetime importmatplotlib.pyplotasplt withopen('typing.log') as of: data= [ l.strip('n').split(',') for l inof.readlines() ] x = [] y = [] forts, user in data: if user =='bradley': x.append(datetime.fromtimestamp(float(ts))) # convert epoch timestamp to datetime object y.append(3) # seconds of typing Epoch timestamps are converted back into datetime objects to ensure that matplotlib can display them correctly along the x-axis. Create the plot and export as a PNG: plt.plot(x,y) plt.gcf().autofmt_xdate() # make the x-labels nicer for timestamps plt.savefig('typing.png') Results:  Not a particularly eventful morning(at least until I'd had my coffee), but enough to infer that I'm rarely spending more than five minutes an hour here in active discussion. Another data point missing from our observation is the number of messages in comparison to the time spent typing. If a message was rewritten or partially written and retracted, this could account for quite a bit of typing time without producing much in terms of message content. A playground for analytics There's quite a bit of fun and insight to be had watching just this single user_typing event. Likewise, tracking any number of the 65+ other events broadcast by Slack’s RTM API works well to create an interesting and multi-layered dataset ripe for analysis. The code for SlackSocket is available on GitHub and, as always, we welcome any contributions or feature requests! About the author Bradley Cicenas is a New York City-based infrastructure engineer with an affinity for microservices, systems design, data science, and stoops.
Read more
  • 0
  • 0
  • 2087

article-image-detecting-and-protecting-against-your-enemies
Packt
22 Jul 2016
9 min read
Save for later

Detecting and Protecting against Your Enemies

Packt
22 Jul 2016
9 min read
In this article by Matthew Poole, the author of the book Raspberry Pi for Secret Agents - Third Edition, we will discuss how Raspberry Pi has lots of ways of connecting things to it, such as plugging things into the USB ports, connecting devices to the onboard camera and display ports and to the various interfaces that make up the GPIO (General Purpose Input/Output) connector. As part of our detection and protection regime we'll be focusing mainly on connecting things to the GPIO connector. (For more resources related to this topic, see here.) Build a laser trip wire You may have seen Wallace and Grommet's short film, The Wrong Trousers, where the penguin uses a contraption to control Wallace in his sleep, making him break into a museum to steal the big shiny diamond. The diamond is surrounded by laser beams but when one of the beams is broken the alarms go off and the diamond is protected with a cage! In this project, I'm going to show you how to set up a laser beam and have our Raspberry Pi alert us when the beam is broken—aka a laser trip wire. For this we're going to need to use a Waveshare Laser Sensor module (www.waveshare.com), which is readily available to buy on Amazon for around £10 / $15. The module comes complete with jumper wires, that allows us to easily connect it to the GPIO connector in the Pi: The Waveshare laser sensor module contains both the transmitter and receiver How it works The module contains both a laser transmitter and receiver. The laser beam is transmitted from the gold tube on the module at a particular modulating frequency. The beam will then be reflected off a surface such as a wall or skirting board and picked up by the light sensor lens at the top of the module. The receiver will only detect light that is modulated at the same frequency as the laser beam, and so does not get affected by visible light. This particular module works best when the reflective surface is between 80 and 120 cm away from the laser transmitter. When the beam is interrupted and prevented from reflecting back to the receiver this is detected and the data pin will be triggered. A script monitoring the data pin on the Pi will then do something when it detects this trigger. Important: Don't ever look directly into the laser beam as will hurt your eyes and may irreversibly damage them. Make sure the unit is facing away from you when you wire it up. Wiring it up This particular device runs from a power supply of between 2.5 V and 5.0 V. Since our GPIO inputs require 3.3 V maximum when a high level is input, we will use the 3.3 V supply from our Raspberry Pi to power the device: Wiring diagram for the laser sensor module Connect the included 3-hole connector to the three pins at the bottom of the laser module with the red wire on the left (the pin marked VCC). Referring to the earlier GPIO pin-out diagram, connect the yellow wire to pin 11 of the GPIO connector (labeled D0/GPIO 17). Connect the black wire to pin 6 of the GPIO connector (labeled GND/0V) Connect the red wire to pin 1 of the GPIO connector (3.3 V). The module should now come alive. The red LED on the left of the module will come on if the beam is interrupted. This is what it should look like in real-life: The laser module connected to the Raspberry Pi Writing the detection script Now that we have connected the laser sensor module to our Raspberry Pi, we need to write a little script that will detect when the beam has been broken. In this project we've connected our sensor output to D0, which is GPIO17 (refer to the earlier GPIO pin-out diagram). We need to create file access for the pin by entering the command: pi@raspberrypi ~ $ sudo echo 17 > /sys/class/gpio/export And now set its direction to "in": pi@raspberrypi ~ $ sudo echo in > sys/class/gpio/gpio17/direction We're now ready to read its value, and we can do this with the following command: pi@raspberrypi ~ $ sudo cat /sys/class/gpio/gpio17/value You'll notice that it will have returned "1" (digital high state) if the beam reflection is detected, or a "0" (digital low state) if the beam is interrupted. We can create a script to poll for the beam state: #!/bin/bash sudo echo 17 > /sys/class/gpio/export sudo echo in > /sys/class/gpio/gpio17/direction # loop forever while true do # read the beam state BEAM=$(sudo cat /sys/class/gpio/gpio17/value) if [ $BEAM == 1 ]; then #beam not blocked echo "OK" else #beam was broken echo "ALERT" fi done Code listing for beam-sensor.sh When you run the script you should see OK scroll up the screen. Now interrupt the beam using your hand and you should see ALERT scroll up the console screen until you remove your hand. Don't forget, that once we've finished with the GPIO port it's tidy to remove its file access: pi@raspberrypi ~ $ sudo echo 17 > /sys/class/gpio/unexport We've now seen how to easily read a GPIO input, the same wiring principle and script can be used to read other sensors, such as motion detectors or anything else that has an on and off state, and act upon their status. Protecting an entire area Our laser trip wire is great for being able to detect when someone walks through a doorway or down a corridor, but what if we wanted to know if people are in a particular area or a whole room? Well we can with a basic motion sensor, otherwise known as a passive infrared (PIR) detector. These detectors come in a variety of types, and you may have seen them lurking in the corners of rooms, but fundamentally they all work the same way by detecting the presence of body heat in relation to the background temperature, within a certain area, and so are commonly used to trigger alarm systems when somebody (or something such as the pet cat) has entered a room. For the covert surveillance of our private zone we're going to use a small Parallax PIR Sensor available from many online Pi-friendly stores such as ModMyPi, Robot Shop or Adafruit for less than £10 / $15. This little device will detect the presence of enemies within a 10 meter range of it. If you can't obtain one of these types then there other types that will work just as well, but the wiring might be different to that explained in this project. Parallax passive infrared motion sensor Wiring it up As with our laser sensor module, this device also just needs three wires to connect it to the Raspberry Pi. However, they are connected differently on the sensor as shown below: Wiring diagram for the Parallax PIR motion sensor module Referring to the earlier GPIO pin-out diagram, connect the yellow wire to pin 11 of the GPIO connector (labelled D0 /GPIO 17), with the other end connecting to the OUT pin on the PIR module. Connect the black wire to pin 6 of the GPIO connector (labelled GND / 0V), with the other end connecting to the GND pin on the PIR module. Connect the red wire to pin 1 of the GPIO connector (3.3 V), with the other end connecting to the VCC pin on the module. The module should now come alive, and you'll notice the light switching on and off as it detects your movement around it. This is what it should look like for real: PIR motion sensor connected to Raspberry Pi Implementing the detection script The detection script for the PIR motion sensor is the similar to the one we created for the laser sensor module in the previous section. Once again, we've connected our sensor output to D0, which is GPIO17. We create file access for the pin by entering the command: pi@raspberrypi ~ $ sudo echo 17 > /sys/class/gpio/export And now set its direction to in: pi@raspberrypi ~ $ sudo echo in >/sys/class/gpio/gpio17/direction We're now ready to read its value, and we can do this with the following command: pi@raspberrypi ~ $ sudo cat /sys/class/gpio/gpio17/value You'll notice that this time the PIR module will have returned 1 (digital high state) if the motion is detected, or a 0 (digital low state) if there is no motion detected. We can modify our previous script to poll for the motion-detected state: #!/bin/bash sudo echo 17 > /sys/class/gpio/export sudo echo in > /sys/class/gpio/gpio17/direction # loop forever while true do # read the beam state BEAM=$(sudo cat /sys/class/gpio/gpio17/value) if [ $BEAM == 0 ]; then #no motion detected echo "OK" else #motion was detected echo "INTRUDER!" fi done Code listing for motion-sensor.sh When you run the script you should see OK scroll up the screen if everything is nice and still. Now move in front of the PIR's detection area and you should see INTRUDER! scroll up the console screen until you are still again. Again, don't forget, that once we've finished with the GPIO port we should remove its file access: pi@raspberrypi ~ $ sudo echo 17 > /sys/class/gpio/unexport Summary In this article we have a guide to the Raspberry Pi's GPIO connector and how to safely connect peripherals to it, that is, by connecting a laser sensor module to our Pi to create a rather cool laser trip wire that could alert you when the laser beam is broken. Resources for Article: Further resources on this subject: Building Our First Poky Image for the Raspberry Pi[article] Raspberry Pi LED Blueprints[article] Raspberry Pi Gaming Operating Systems[article]
Read more
  • 0
  • 0
  • 7422
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-debugging-your-net-application
Packt
21 Jul 2016
13 min read
Save for later

Debugging Your .NET Application

Packt
21 Jul 2016
13 min read
In this article by Jeff Martin, author of the book Visual Studio 2015 Cookbook - Second Edition, we will discuss about how but modern software development still requires developers to identify and correct bugs in their code. The familiar edit-compile-test cycle is as familiar as a text editor, and now the rise of portable devices has added the need to measure for battery consumption and optimization for multiple architectures. Fortunately, our development tools continue to evolve to combat this rise in complexity, and Visual Studio continues to improve its arsenal. (For more resources related to this topic, see here.) Multi-threaded code and asynchronous code are probably the two most difficult areas for most developers to work with, and also the hardest to debug when you have a problem like a race condition. A race condition occurs when multiple threads perform an operation at the same time, and the order in which they execute makes a difference to how the software runs or the output is generated. Race conditions often result in deadlocks, incorrect data being used in other calculations, and random, unrepeatable crashes. The other painful area to debug involves code running on other machines, whether it is running locally on your development machine or running in production. Hooking up a remote debugger in previous versions of Visual Studio has been less than simple, and the experience of debugging code in production was similarly frustrating. In this article, we will cover the following sections: Putting Diagnostic Tools to work Maximizing everyday debugging Putting Diagnostic Tools to work In Visual Studio 2013, Microsoft debuted a new set of tools called the Performance and Diagnostics hub. With VS2015, these tools have revised further, and in the case of Diagnostic Tools, promoted to a central presence on the main IDE window, and is displayed, by default, during debugging sessions. This is great for us as developers, because now it is easier than ever to troubleshoot and improve our code. In this section, we will explore how Diagnostic Tools can be used to explore our code, identify bottlenecks, and analyze memory usage. Getting ready The changes didn't stop when VS2015 was released, and succeeding updates to VS2015 have further refined the capabilities of these tools. So for this section, ensure that Update 2 has been installed on your copy of VS2015. We will be using Visual Studio Community 2015, but of course, you may use one of the premium editions too. How to do it… For this section, we will put together a short program that will generate some activity for us to analyze: Create a new C# Console Application, and give it a name of your choice. In your project's new Program.cs file, add the following method that will generate a large quantity of strings: static List<string> makeStrings() { List<string> stringList = new List<string>(); Random random = new Random(); for (int i = 0; i < 1000000; i++) { string x = "String details: " + (random.Next(1000, 100000)); stringList.Add(x); } return stringList; } Next we will add a second static method that produces an SHA256-calculated hash of each string that we generated. This method reads in each string that was previously generated, creates an SHA256 hash for it, and returns the list of computed hashes in the hex format. static List<string> hashStrings(List<string> srcStrings) { List<string> hashedStrings = new List<string>(); SHA256 mySHA256 = SHA256Managed.Create(); StringBuilder hash = new StringBuilder(); foreach (string str in srcStrings) { byte[] srcBytes = mySHA256.ComputeHash(Encoding.UTF8.GetBytes(str), 0, Encoding.UTF8.GetByteCount(str)); foreach (byte theByte in srcBytes) { hash.Append(theByte.ToString("x2")); } hashedStrings.Add(hash.ToString()); hash.Clear(); } mySHA256.Clear(); return hashedStrings; } After adding these methods, you may be prompted to add using statements for System.Text and System.Security.Cryptography. These are definitely needed, so go ahead and take Visual Studio's recommendation to have them added. Now we need to update our Main method to bring this all together. Update your Main method to have the following: static void Main(string[] args) { Console.WriteLine("Ready to create strings"); Console.ReadKey(true); List<string> results = makeStrings(); Console.WriteLine("Ready to Hash " + results.Count() + " strings "); //Console.ReadKey(true); List<string> strings = hashStrings(results); Console.ReadKey(true); } Before proceeding, build your solution to ensure everything is in working order. Now run the application in the Debug mode (F5), and watch how our program operates. By default, the Diagnostic Tools window will only appear while debugging. Feel free to reposition your IDE windows to make their presence more visible or use Ctrl + Alt + F2 to recall it as needed. When you first launch the program, you will see the Diagnostic Tools window appear. Its initial display resembles the following screenshot. Thanks to the first ReadKey method, the program will wait for us to proceed, so we can easily see the initial state. Note that CPU usage is minimal, and memory usage holds constant. Before going any further, click on the Memory Usage tab, and then the Take Snapshot command as indicated in the preceding screenshot. This will record the current state of memory usage by our program, and will be a useful comparison point later on. Once a snapshot is taken, your Memory Usage tab should resemble the following screenshot: Having a forced pause through our ReadKey() method is nice, but when working with real-world programs, we will not always have this luxury. Breakpoints are typically used for situations where it is not always possible to wait for user input, so let's take advantage of the program's current state, and set two of them. We will put one to the second WriteLine method, and one to the last ReadKey method, as shown in the following screenshot: Now return to the open application window, and press a key so that execution continues. The program will stop at the first break point, which is right after it has generated a bunch of strings and added them to our List object. Let's take another snapshot of the memory usage using the same manner given in Step 9. You may also notice that the memory usage displayed in the Process Memory gauge has increased significantly, as shown in this screenshot: Now that we have completed our second snapshot, click on Continue in Visual Studio, and proceed to the next breakpoint. The program will then calculate hashes for all of the generated strings, and when this has finished, it will stop at our last breakpoint. Take another snapshot of the memory usage. Also take notice of how the CPU usage spiked as the hashes were being calculated: Now that we have these three memory snapshots, we will examine how they can help us. You may notice how memory usage increases during execution, especially from the initial snapshot to the second. Click on the second snapshot's object delta, as shown in the following screenshot: On clicking, this will open the snapshot details in a new editor window. Click on the Size (Bytes) column to sort by size, and as you may suspect, our List<String> object is indeed the largest object in our program. Of course, given the nature of our sample program, this is fairly obvious, but when dealing with more complex code bases, being able to utilize this type of investigation is very helpful. The following screenshot shows the results of our filter: If you would like to know more about the object itself (perhaps there are multiple objects of the same type), you can use the Referenced Types option as indicated in the preceding screenshot. If you would like to try this out on the sample program, be sure to set a smaller number in the makeStrings() loop, otherwise you will run the risk of overloading your system. Returning to the main Diagnostic Tools window, we will now examine CPU utilization. While the program is executing the hashes (feel free to restart the debugging session if necessary), you can observe where the program spends most of its time: Again, it is probably no surprise that most of the hard work was done in the hashStrings() method. But when dealing with real-world code, it will not always be so obvious where the slowdowns are, and having this type of insight into your program's execution will make it easier to find areas requiring further improvement. When using the CPU profiler in our example, you may find it easier to remove the first breakpoint and simply trigger a profiling by clicking on Break All as shown in this screenshot: How it works... Microsoft wanted more developers to be able to take advantage of their improved technology, so they have increased its availability beyond the Professional and Enterprise editions to also include Community. Running your program within VS2015 with the Diagnostic Tools window open lets you examine your program's performance in great detail. By using memory snapshots and breakpoints, VS2015 provides you with the tools needed to analyze your program's operation, and determine where you should spend your time making optimizations. There's more… Our sample program does not perform a wide variety of tasks, but of course, more complex programs usually perform well. To further assist with analyzing those programs, there is a third option available to you beyond CPU Usage and Memory Usage: the Events tab. As shown in the following screenshot, the Events tab also provides the ability to search events for interesting (or long-running) activities. Different event types include file activity, gestures (for touch-based apps), and program modules being loaded or unloaded. Maximizing everyday debugging Given the frequency of debugging, any refinement to these tools can pay immediate dividends. VS 2015 brings the popular Edit and Continue feature into the 21st century by supporting a 64-bit code. Added to that is the new ability to see the return value of functions in your debugger. The addition of these features combine to make debugging code easier, allowing to solve problems faster. Getting ready For this section, you can use VS 2015 Community or one of the premium editions. Be sure to run your choice on a machine using a 64-bit edition of Windows, as that is what we will be demonstrating in the section. Don't worry, you can still use Edit and Continue with 32-bit C# and Visual Basic code. How to do it… Both features are now supported by C#/VB, but we will be using C# for our examples. The features being demonstrated are compiler features, so feel free to use code from one of your own projects if you prefer. To see how Edit and Continue can benefit 64-bit development, perform the following steps: Create a new C# Console Application using the default name. To ensure the demonstration is running with 64-bit code, we need to change the default solution platform. Click on the drop-down arrow next to Any CPU, and select Configuration Manager... When the Configuration Manager dialog opens, we can create a new project platform targeting a 64-bit code. To do this, click on the drop-down menu for Platform, and select <New...>: When <New...> is selected, it will present the New Project Platform dialog box. Select x64 as the new platform type: Once x64 has been selected, you will return to Configuration Manager. Verify that x64 remains active under Platform, and then click on Close to close this dialog. The main IDE window will now indicate that x64 is active: With the project settings out of the face, let's add some code to demonstrate the new behavior. Replace the existing code in your blank class file so that it looks like the following listing: class Program { static void Main(string[] args) { int w = 16; int h = 8; int area = calcArea(w, h); Console.WriteLine("Area: " + area); } private static int calcArea(int width, int height) { return width / height; } } Let's set some breakpoints so that we are able to inspect during execution. First, add a breakpoint to the Main method's Console line. Add a second breakpoint to the calcArea method's return line. You can do this by either clicking on the left side of the editor window's border, or by right-clicking on the line, and selecting Breakpoint | Insert Breakpoint: If you are not sure where to click, use the right-click method, and then practice toggling the breakpoint by left-clicking on the breakpoint marker. Feel free to use whatever method you find most convenient. Once the two breakpoints are added, Visual Studio will mark their location as shown in the following screenshot (the arrow indicates where you may click to toggle the breakpoint): With the breakpoint marker now set, let's debug the program. Begin debugging by either pressing F5, or by clicking on the Start button on the toolbar: Once debugging starts, the program will quickly execute until stopped by the first breakpoint. Let's first take a look at Edit and Continue. Visual Studio will stop at the calcArea method's return line. Astute readers will notice an error (marked by 1 in the following screenshot) present in the calculation, as the area value returned should be width * height. Make the correction. Before continuing, note the variables listed in the Autos window (marked by 2 in the following screenshot). (If you don't see Autos, it can be made visible by pressing Ctrl + D, A, or through Debug | Windows | Autos while debugging.) After correcting the area calculation, advance the debugging step by pressing F10 twice. (Alternatively make the advancement by selecting the menu item Debug | Step Over twice). Visual Studio will advance to the declaration for the area. Note that you were able to edit your code and continue debugging without restarting. The Autos window will update to display the function's return value, which is 128 (the value for area has not been assigned yet in the following screenshot—Step Over once more if you would like to see that assigned): There's more… Programmers who write C++ have already had the ability to see the return values of functions—this just brings .NET developers into the fold. The result is that your development experience won't have to suffer based on the language you have chosen to use for your project. The Edit and Continue functionality is also available for ASP.NET projects. New projects created on VS2015 will have Edit and Continue enabled by default. Existing projects imported to VS2015 will usually need this to be enabled if it hasn't been done already. To do so, open the Options dialog via Tools | Options, and look for the Debugging | General section. The following screenshot shows where this option is located on the properties page: Whether you are working with an ASP.NET project or a regular C#/VB .NET application, you can verify Edit and Continue is set via this location. Summary In this article, we examine the improvements to the debugging experience in Visual Studio 2015, and how it can help you diagnose the root cause of a problem faster so that you can fix it properly, and not just patch over the symptoms. Resources for Article:   Further resources on this subject: Creating efficient reports with Visual Studio [article] Creating efficient reports with Visual Studio [article] Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [article]
Read more
  • 0
  • 0
  • 15849

article-image-managing-eap-domain-mode
Packt
19 Jul 2016
7 min read
Save for later

Managing EAP in Domain Mode

Packt
19 Jul 2016
7 min read
This article by Francesco Marchioni author of the book Mastering JBoss Enterprise Application Platform 7dives deep into application server management using the domain mode, its main components, and discusses how to shift to advanced configurations that resemble real-world projects. Here are the main topics covered are: Domain mode breakdown Handy domainproperties Electing the domaincontroller (For more resources related to this topic, see here.) Domain mode break down Managing the application server in the domain mode means, in a nutshell, to control multiple servers from a centralized single point of control. The servers that are part of the domain can span across multiple machines (or even across the cloud) and they can be grouped with similar servers of the domain to share a common configuration. To make some rationale, we will break down the domain components into two main categories: Physical components: Theseare the domain elements that can be identified with a Java process running on the operating system Logical components: Theseare the domain elements which can span across several physical components Domain physical components When you start the application server through the domain.sh script, you will be able to identify the following processes: Host controller: Each domain installation contains a host controller. This is a Java process that is in charge to start and stop the servers that are defined within the host.xml file. The host controller is only aware of the items that are specific to the local physical installation such as the domaincontroller host and port, the JVM settings of the servers or their system properties. Domain controller: One host controller of the domain (and only one) is configured to act as domaincontroller. This means basically two things: keeping the domainconfiguration (into the domain.xml file) and assisting the host controller for managing the servers of the domain. Servers: Each host controller can contain any number of servers which are the actual server instances. These server instances cannot be started autonomously. The host controller is in charge to start/stop single servers, when the domaincontroller commands them. If you start the default domain configuration on a Linux machine, you will see that the following processes will show in your operating system: As you can see, the process controller is identified by the [Process Controller] label, while the domaincontroller corresponds to the [Host Controller] label. Each server shows in the process table with the name defined in the host.xml file. You can use common operating system commands such as grep to further restrict the search to a specific process. Domain logical components A domain configuration with only physical elements in it would not add much to a line of standalone servers. The following components can abstract the domain definition, making it dynamic and flexible: Server Group: A server group is a collection of servers. They are defined in the domain.xml file, hence they don't have any reference to an actual host controller installation. You can use a server group to share configuration and deployments across a group of servers. Profile: A profile is an EAP configuration. A domain can hold as many profiles as you need. Out of the box the following configurations are provided: default: This configuration matches with the standalone.xml configuration (in standalone mode) hence it does not include JMS, IIOP, or HA. full: This configuration matches with the standalone-full.xml configuration (in standalone mode) hence it includes JMS and OpenJDK IIOP to the default server. ha: This configuration matches with the standalone-ha.xml configuration (in standalone mode) so it enhances the default configuration with clustering (HA). full-ha: This configuration matches with the standalone-full-ha.xml configuration (in standalone mode), hence it includes JMS, IIOP, and HA. Handy domainproperties So far we have learnt the default configuration files used by JBoss EAP and the location where they are placed. These settings can be however varied by means of system properties. The following table shows how to customize the domain configuration file names: Option Description --domain-config The domain configuration file (default domain.xml) --host-config The host configuration file (default host.xml) On the other hand, this table summarizes the available options to adjust the domain directory structure: Property Description jboss.domain.base.dir The base directory for domain content jboss.domain.config.dir The base configuration directory jboss.domain.data.dir The directory used for persistent data file storage jboss.domain.log.dir The directory containing the host-controller.log and process-controller.log files jboss.domain.temp.dir The directory used for temporary file storage jboss.domain.deployment.dir The directory used to store deployed content jboss.domain.servers.dir The directory containing the managed server instances For example, you can start EAP 7 in domain mode using the domain configuration file mydomain.xml and the host file named myhost.xml based on the base directory /home/jboss/eap7domain using the following command: $ ./domain.sh –domain-config=mydomain.xml –host-config=myhost.xml –Djboss.domain.base.dir=/home/jboss/eap7domain Electing the domaincontroller Before creating your first domain, we will learn more in detail the process which connects one or more host controller to one domaincontroller and how to elect a host controller to be a domaincontroller. The physical topology of the domain is stored in the host.xml file. Within this file, you will find as the first line the Host Controller name, which makes each host controller unique: <host name="master"> One of the host controllers will be configured to act as a domaincontroller. This is done in the domain-controller section with the following block, which states that the domaincontroller is the host controller itself (hence, local): <domain-controller> <local/> </domain-controller> All other host controllers will connect to the domaincontroller, using the following example configuration which uses the jboss.domain.master.address and jboss.domain.master.port properties to specify the domaincontroller address and port: <domain-controller> <remote protocol="remote" host="${jboss.domain.master.address}" port="${jboss.domain.master.port:9999}" security-realm="ManagementRealm"/> </domain-controller> The host controller-domaincontroller communication happens behind the scenes through a management native port that is defined as well into the host.xml file: <management-interfaces> <native-interface security-realm="ManagementRealm"> <socket interface="management" port="${jboss.management.native.port:9999}"/> </native-interface> <http-interface security-realm="ManagementRealm" http-upgrade-enabled="true"> <socket interface="management" port="${jboss.management.http.port:9990}"/> </http-interface> </management-interfaces> The other highlighted attribute is the managementhttpport that can be used by the administrator to reach the domaincontroller. This port is especially relevant if the host controller is the domaincontroller. Both sockets use the management interface, which is defined in the interfaces section of the host.xml file, and exposes the domain controller on a network available address: <interfaces> <interface name="management"> <inet-address value="${jboss.bind.address.management:127.0.0.1}"/> </interface> <interface name="public"> <inet-address value="${jboss.bind.address:127.0.0.1}"/> </interface> </interfaces> If you want to run multiplehost controllers on the same machine, you need to provide a unique jboss.management.native.port for each host controller or a different jboss.bind.address.management. Summary In this article we have some essentials of the domain mode breakdown, handy domain propertiesand also electing the domain controller. Resources for Article: Further resources on this subject: Red5: A video-on-demand Flash Server [article] Animating Elements [article] Data Science with R [article]
Read more
  • 0
  • 0
  • 16460

article-image-reactive-programming-c
Packt
18 Jul 2016
30 min read
Save for later

Reactive Programming with C#

Packt
18 Jul 2016
30 min read
In this article by Antonio Esposito from the book Reactive Programming for .NET Developers , we will see a practical example of what is reactive programming with pure C# coding. The following topics will be discussed here: IObserver interface IObservable interface Subscription life cycle Sourcing events Filtering events Correlating events Sourcing from CLR streams Sourcing from CLR enumerables (For more resources related to this topic, see here.) IObserver interface This core level interface is available within the Base Class Library (BCL) of .NET 4.0 and is available for the older 3.5 as an add-on. The usage is pretty simple and the goal is to provide a standard way of handling the most basic features of any reactive message consumer. Reactive messages flow by a producer and a consumer and subscribe for some messages. The IObserver C# interface is available to construct message receivers that comply with the reactive programming layout by implementing the three main message-oriented events: a message received, an error received, and a task completed message. The IObserver interface has the following sign and description: // Summary: // Provides a mechanism for receiving push-based notifications. // // Type parameters: // T: // The object that provides notification information.This type parameter is // contravariant. That is, you can use either the type you specified or any // type that is less derived. For more information about covariance and contravariance, // see Covariance and Contravariance in Generics. public interface IObserver<in T> { // Summary: // Notifies the observer that the provider has finished sending push-based notifications. void OnCompleted(); // // Summary: // Notifies the observer that the provider has experienced an error condition. // // Parameters: // error: // An object that provides additional information about the error. void OnError(Exception error); // // Summary: // Provides the observer with new data. // // Parameters: // value: // The current notification information. void OnNext(T value); } Any new message to flow to the receiver implementing such an interface will reach the OnNext method. Any error will reach the OnError method, while the task completed acknowledgement message will reach the OnCompleted method. The usage of an interface means that we cannot use generic premade objects from the BCL. We need to implement any receiver from scratch by using such an interface as a service contract. Let's see an example because talking about a code example is always simpler than talking about something theoretic. The following examples show how to read from a console application command from a user in a reactive way: cass Program { static void Main(string[] args) { //creates a new console input consumer var consumer = new ConsoleTextConsumer(); while (true) { Console.WriteLine("Write some text and press ENTER to send a messagernPress ENTER to exit"); //read console input var input = Console.ReadLine(); //check for empty messate to exit if (string.IsNullOrEmpty(input)) { //job completed consumer.OnCompleted(); Console.WriteLine("Task completed. Any further message will generate an error"); } else { //route the message to the consumer consumer.OnNext(input); } } } } public class ConsoleTextConsumer : IObserver<string> { private bool finished = false; public void OnCompleted() { if (finished) { OnError(new Exception("This consumer already finished it's lifecycle")); return; } finished = true; Console.WriteLine("<- END"); } public void OnError(Exception error) { Console.WriteLine("<- ERROR"); Console.WriteLine("<- {0}", error.Message); } public void OnNext(string value) { if (finished) { OnError(new Exception("This consumer finished its lifecycle")); return; } //shows the received message Console.WriteLine("-> {0}", value); //do something //ack the caller Console.WriteLine("<- OK"); } } The preceding example shows the IObserver interface usage within the ConsoleTextConsumer class that simply asks a command console (DOS-like) for the user input text to do something. In this implementation, the class simply writes out the input text because we simply want to look at the reactive implementation. The first important concept here is that a message consumer knows nothing about how messages are produced. The consumer simply reacts to one of the tree events (not CLR events). Besides this, some kind of logic and cross-event ability is also available within the consumer itself. In the preceding example, we can see that the consumer simply showed any received message again on the console. However, if a complete message puts the consumer in a finished state (by signaling the finished flag), any other message that comes on the OnNext method will be automatically routed to the error one. Likewise, any other complete message that will reach the consumer will produce another error once the consumer is already in the finished state. IObservable interface The IObservable interface, the opposite of the IObserver interface, has the task of handling message production and the observer subscription. It routes right messages to the OnNext message handler and errors to the OnError message handler. At its life cycle end, it acknowledges all the observers on the OnComplete message handler. To create a valid reactive observable interface, we must write something that is not locking against user input or any other external system input data. The observable object acts as an infinite message generator, something like an infinite enumerable of messages; although in such cases, there is no enumeration. Once a new message is available somehow, observer routes it to all the subscribers. In the following example, we will try creating a console application to ask the user for an integer number and then route such a number to all the subscribers. Otherwise, if the given input is not a number, an error will be routed to all the subscribers. This is observer similar to the one already seen in the previous example. Take a look at the following codes: /// <summary> /// Consumes numeric values that divides without rest by a given number /// </summary> public class IntegerConsumer : IObserver<int> { readonly int validDivider; //the costructor asks for a divider public IntegerConsumer(int validDivider) { this.validDivider = validDivider; } private bool finished = false; public void OnCompleted() { if (finished) OnError(new Exception("This consumer already finished it's lifecycle")); else { finished = true; Console.WriteLine("{0}: END", GetHashCode()); } } public void OnError(Exception error) { Console.WriteLine("{0}: {1}", GetHashCode(), error.Message); } public void OnNext(int value) { if (finished) OnError(new Exception("This consumer finished its lifecycle")); //the simple business logic is made by checking divider result else if (value % validDivider == 0) Console.WriteLine("{0}: {1} divisible by {2}", GetHashCode(), value, validDivider); } } This observer consumes integer numeric messages, but it requires that the number is divisible by another one without producing any rest value. This logic, because of the encapsulation principle, is within the observer object. The observable interface, instead, only has the logic of the message sending of valid or error messages. This filtering logic is made within the receiver itself. Although that is not something wrong, in more complex applications, specific filtering features are available in the publish-subscribe communication pipeline. In other words, another object will be available between observable (publisher) and observer (subscriber) that will act as a message filter. Back to our numeric example, here we have the observable implementation made using an inner Task method that does the main job of parsing input text and sending messages. In addition, a cancellation token is available to handle the user cancellation request and an eventual observable dispose: //Observable able to parse strings from the Console //and route numeric messages to all subscribers public class ConsoleIntegerProducer : IObservable<int>, IDisposable { //the subscriber list private readonly List<IObserver<int>> subscriberList = new List<IObserver<int>>(); //the cancellation token source for starting stopping //inner observable working thread private readonly CancellationTokenSource cancellationSource; //the cancellation flag private readonly CancellationToken cancellationToken; //the running task that runs the inner running thread private readonly Task workerTask; public ConsoleIntegerProducer() { cancellationSource = new CancellationTokenSource(); cancellationToken = cancellationSource.Token; workerTask = Task.Factory.StartNew(OnInnerWorker, cancellationToken); } //add another observer to the subscriber list public IDisposable Subscribe(IObserver<int> observer) { if (subscriberList.Contains(observer)) throw new ArgumentException("The observer is already subscribed to this observable"); Console.WriteLine("Subscribing for {0}", observer.GetHashCode()); subscriberList.Add(observer); return null; } //this code executes the observable infinite loop //and routes messages to all observers on the valid //message handler private void OnInnerWorker() { while (!cancellationToken.IsCancellationRequested) { var input = Console.ReadLine(); int value; foreach (var observer in subscriberList) if (string.IsNullOrEmpty(input)) break; else if (input.Equals("EXIT")) { cancellationSource.Cancel(); break; } else if (!int.TryParse(input, out value)) observer.OnError(new FormatException("Unable to parse given value")); else observer.OnNext(value); } cancellationToken.ThrowIfCancellationRequested(); } //cancel main task and ack all observers //by sending the OnCompleted message public void Dispose() { if (!cancellationSource.IsCancellationRequested) { cancellationSource.Cancel(); while (!workerTask.IsCanceled) Thread.Sleep(100); } cancellationSource.Dispose(); workerTask.Dispose(); foreach (var observer in subscriberList) observer.OnCompleted(); } //wait until the main task completes or went cancelled public void Wait() { while (!(workerTask.IsCompleted || workerTask.IsCanceled)) Thread.Sleep(100); } } To complete the example, here there is the program Main: static void Main(string[] args) { //this is the message observable responsible of producing messages using (var observer = new ConsoleIntegerProducer()) //those are the message observer that consume messages using (var consumer1 = observer.Subscribe(new IntegerConsumer(2))) using (var consumer2 = observer.Subscribe(new IntegerConsumer(3))) using (var consumer3 = observer.Subscribe(new IntegerConsumer(5))) observer.Wait(); Console.WriteLine("END"); Console.ReadLine(); } The cancellationToken.ThrowIfCancellationRequested may raise an exception in your Visual Studio when debugging. Simply go next by pressing F5, or test such code example without the attached debugger by starting the test with Ctrl + F5 instead of the F5 alone. The application simply creates an observable variable, which is able to parse user data. Then, register three observers specifying to each observer variables the wanted valid divider value. Then, the observable variable will start reading user data from the console and valid or error messages will flow to all the observers. Each observer will apply its internal logic of showing the message when it divides for the related divider. Here is the result of executing the application: Observables and observers in action Subscription life cycle What will happen if we want to stop a single observer from receiving messages from the observable event source? If we change the program Main from the preceding example to the following one, we could experience a wrong observer life cycle design. Here's the code: //this is the message observable responsible of producing messages using (var observer = new ConsoleIntegerProducer()) //those are the message observer that consume messages using (var consumer1 = observer.Subscribe(new IntegerConsumer(2))) using (var consumer2 = observer.Subscribe(new IntegerConsumer(3))) { using (var consumer3 = observer.Subscribe(new IntegerConsumer(5))) { //internal lifecycle } observer.Wait(); } Console.WriteLine("END"); Console.ReadLine(); Here is the result in the output console: The third observer unable to catch value messages By using the using construct method, we should stop the life cycle of the consumer object. However, we do not, because in the previous example, the Subscribe method of the observable simply returns a NULL object. To create a valid observer, we must handle and design its life cycle management. This means that we must eventually handle the external disposing of the Subscribe method's result by signaling the right observer that his life cycle reached the end. We have to create a Subscription class to handle an eventual object disposing in the right reactive way by sending the message for the OnCompleted event handler. Here is a simple Subscription class implementation: /// <summary> /// Handle observer subscription lifecycle /// </summary> public sealed class Subscription<T> : IDisposable { private readonly IObserver<T> observer; public Subscription(IObserver<T> observer) { this.observer = observer; } //the event signalling that the observer has //completed its lifecycle public event EventHandler<IObserver<T>> OnCompleted; public void Dispose() { if (OnCompleted != null) OnCompleted(this, observer); observer.OnCompleted(); } } The usage is within the observable Subscribe method. Here's an example: //add another observer to the subscriber list public IDisposable Subscribe(IObserver<int> observer) { if (observerList.Contains(observer)) throw new ArgumentException("The observer is already subscribed to this observable"); Console.WriteLine("Subscribing for {0}", observer.GetHashCode()); observerList.Add(observer); //creates a new subscription for the given observer var subscription = new Subscription<int>(observer); //handle to the subscription lifecycle end event subscription.OnCompleted += OnObserverLifecycleEnd; return subscription; } void OnObserverLifecycleEnd(object sender, IObserver<int> e) { var subscription = sender as Subscription<int>; //remove the observer from the internal list within the observable observerList.Remove(e); //remove the handler from the subscription event //once already handled subscription.OnCompleted -= OnObserverLifecycleEnd; } As visible, the preceding example creates a new Subscription<T> object to handle this observer life cycle with the IDisposable.Dispose method. Here is the result of such code edits against the full example available in the previous paragraph: The observer will end their life as we dispose their life cycle tokens This time, an observer ends up its life cycle prematurely by disposing the subscription object. This is visible by the first END message. Later, only two observers remain available at the application ending; when the user asks for EXIT, only such two observers end their life cycle by themselves rather than by the Subscription disposing. In real-world applications, often, observers subscribe to observables and later unsubscribe by disposing the Subscription token. This happens because we do not always want a reactive module to handle all the messages. In this case, this means that we have to handle the observer life cycle by ourselves, as we already did in the previous examples, or we need to apply filters to choose which messages flows to which subscriber, as visible in the later section Filtering events. Kindly consider that although filters make things easier, we will always have to handle the observer life cycle. Sourcing events Sourcing events is the ability to obtain from a particular source where few useful events are usable in reactive programming. Reactive programming is all about event message handling. Any event is a specific occurrence of some kind of handleable behavior of users or external systems. We can actually program event reactions in the most pleasant and productive way for reaching our software goals. In the following example, we will see how to react to CLR events. In this specific case, we will handle filesystem events by using events from the System.IO.FileSystemWatcher class that gives us the ability to react to the filesystem's file changes without the need of making useless and resource-consuming polling queries against the file system status. Here's the observer and observable implementation: public sealed class NewFileSavedMessagePublisher : IObservable<string>, IDisposable { private readonly FileSystemWatcher watcher; public NewFileSavedMessagePublisher(string path) { //creates a new file system event router this.watcher = new FileSystemWatcher(path); //register for handling File Created event this.watcher.Created += OnFileCreated; //enable event routing this.watcher.EnableRaisingEvents = true; } //signal all observers a new file arrived private void OnFileCreated(object sender, FileSystemEventArgs e) { foreach (var observer in subscriberList) observer.OnNext(e.FullPath); } //the subscriber list private readonly List<IObserver<string>> subscriberList = new List<IObserver<string>>(); public IDisposable Subscribe(IObserver<string> observer) { //register the new observer subscriberList.Add(observer); return null; } public void Dispose() { //disable file system event routing this.watcher.EnableRaisingEvents = false; //deregister from watcher event handler this.watcher.Created -= OnFileCreated; //dispose the watcher this.watcher.Dispose(); //signal all observers that job is done foreach (var observer in subscriberList) observer.OnCompleted(); } } /// <summary> /// A tremendously basic implementation /// </summary> public sealed class NewFileSavedMessageSubscriber : IObserver<string> { public void OnCompleted() { Console.WriteLine("-> END"); } public void OnError(Exception error) { Console.WriteLine("-> {0}", error.Message); } public void OnNext(string value) { Console.WriteLine("-> {0}", value); } } The observer interface simply gives us the ability to write text to the console. I think, there is nothing to say about it. On the other hand, the observable interface makes the most of the job in this implementation. The observable interface creates the watcher object and registers the right event handler to catch the wanted reactive events. It handles the life cycle of itself and the internal watcher object. Then, it correctly sends the OnComplete message to all the observers. Here's the program's initialization: static void Main(string[] args) { Console.WriteLine("Watching for new files"); using (var publisher = new NewFileSavedMessagePublisher(@"[WRITE A PATH HERE]")) using (var subscriber = publisher.Subscribe(new NewFileSavedMessageSubscriber())) { Console.WriteLine("Press RETURN to exit"); //wait for user RETURN Console.ReadLine(); } } Any new file that arises in the folder will let route  full FileName to observer. This is the result of a copy and paste of the same file three times: -> [YOUR PATH]out - Copy.png-> [YOUR PATH]out - Copy (2).png-> [YOUR PATH]out - Copy (3).png By using a single observable interface and a single observer interface, the power of reactive programming is not so evident. Let's begin with writing some intermediate object to change the message flow within the pipeline of our message pump made in a reactive way with filters, message correlator, and dividers. Filtering events As said in the previous section, it is time to alter message flow. The observable interface has the task of producing messages, while observer at the opposite consumes such messages. To create a message filter, we need to create an object that is a publisher and subscriber altogether. The implementation must take into consideration the filtering need and the message routing to underlying observers that subscribes to the filter observable object instead of the main one. Here's an implementation of the filter: /// <summary> /// The filtering observable/observer /// </summary> public sealed class StringMessageFilter : IObservable<string>, IObserver<string>, IDisposable { private readonly string filter; public StringMessageFilter(string filter) { this.filter = filter; } //the observer collection private readonly List<IObserver<string>> observerList = new List<IObserver<string>>(); public IDisposable Subscribe(IObserver<string> observer) { this.observerList.Add(observer); return null; } //a simple implementation //that disables message routing once //the OnCompleted has been invoked private bool hasCompleted = false; public void OnCompleted() { hasCompleted = true; foreach (var observer in observerList) observer.OnCompleted(); } //routes error messages until not completed public void OnError(Exception error) { if (!hasCompleted) foreach (var observer in observerList) observer.OnError(error); } //routes valid messages until not completed public void OnNext(string value) { Console.WriteLine("Filtering {0}", value); if (!hasCompleted && value.ToLowerInvariant().Contains(filter.ToLowerInvariant())) foreach (var observer in observerList) observer.OnNext(value); } public void Dispose() { OnCompleted(); } } This filter can be used together with the example from the previous section that routes the FileSystemWatcher events of created files. This is the new program initialization: static void Main(string[] args) { Console.WriteLine("Watching for new files"); using (var publisher = new NewFileSavedMessagePublisher(@"[WRITE A PATH HERE]")) using (var filter = new StringMessageFilter(".txt")) { //subscribe the filter to publisher messages publisher.Subscribe(filter); //subscribe the console subscriber to the filter //instead that directly to the publisher filter.Subscribe(new NewFileSavedMessageSubscriber()); Console.WriteLine("Press RETURN to exit"); Console.ReadLine(); } } As visible, this new implementation creates a new filter object that takes parameter to verify valid filenames to flow to the underlying observers. The filter subscribes to the main observable object, while the observer subscribes to the filter itself. It is like a chain where each chain link refers to the near one. This is the output console of the running application: The filtering observer in action Although I made a copy of two files (a .png and a .txt file), we can see that only the text file reached the internal observer object, while the image file reached the OnNext of filter because the invalid against the filter argument never reached internal observer. Correlating events Sometimes, especially when dealing with integration scenarios, there is the need of correlating multiple events that not always came altogether. This is the case of a header file that came together with multiple body files. In reactive programming, correlating events means correlating multiple observable messages into a single message that is the result of two or more original messages. Such messages must be somehow correlated to a value (an ID, serial, or metadata) that defines that such initial messages belong to the same correlation set. Useful features in real-world correlators are the ability to specify a timeout (that may be infinite too) in the correlation waiting logic and the ability to specify a correlation message count (infinite too). Here's a correlator implementation made for the previous example based on the FileSystemWatcher class: public sealed class FileNameMessageCorrelator : IObservable<string>, IObserver<string>, IDisposable { private readonly Func<string, string> correlationKeyExtractor; public FileNameMessageCorrelator(Func<string, string> correlationKeyExtractor) { this.correlationKeyExtractor = correlationKeyExtractor; } //the observer collection private readonly List<IObserver<string>> observerList = new List<IObserver<string>>(); public IDisposable Subscribe(IObserver<string> observer) { this.observerList.Add(observer); return null; } private bool hasCompleted = false; public void OnCompleted() { hasCompleted = true; foreach (var observer in observerList) observer.OnCompleted(); } //routes error messages until not completed public void OnError(Exception error) { if (!hasCompleted) foreach (var observer in observerList) observer.OnError(error); } Just a pause. Up to this row, we simply created the reactive structure of FileNameMessageCorrelator class by implementing the two main interfaces. Here is the core implementation that correlates messages: //the container of correlations able to contain //multiple strings per each key private readonly NameValueCollection correlations = new NameValueCollection(); //routes valid messages until not completed public void OnNext(string value) { if (hasCompleted) return; //check if subscriber has completed Console.WriteLine("Parsing message: {0}", value); //try extracting the correlation ID var correlationID = correlationKeyExtractor(value); //check if the correlation is available if (correlationID == null) return; //append the new file name to the correlation state correlations.Add(correlationID, value); //in this example we will consider always //correlations of two items if (correlations.GetValues(correlationID).Count() == 2) { //once the correlation is complete //read the two files and push the //two contents altogether to the //observers var fileData = correlations.GetValues(correlationID) //route messages to the ReadAllText method .Select(File.ReadAllText) //materialize the query .ToArray(); var newValue = string.Join("|", fileData); foreach (var observer in observerList) observer.OnNext(newValue); correlations.Remove(correlationID); } } This correlator class accepts a correlation function as a constructor parameter. This function is later used to evaluate correlationID when a new filename variable flows within the OnNext method. Once the function returns valid correlationID, such IDs will be used as key for NameValueCollection, a specialized string collection to store multiple values per key. When there are two values for the same key, correlation is ready to flow out to the underlying observers by reading file data and joining such data into a single string message. Here's the application's initialization: static void Main(string[] args) { using (var publisher = new NewFileSavedMessagePublisher(@"[WRITE A PATH HERE]")) //creates a new correlator by specifying the correlation key //extraction function made with a Regular expression that //extract a file ID similar to FILEID0001 using (var correlator = new FileNameMessageCorrelator(ExtractCorrelationKey)) { //subscribe the correlator to publisher messages publisher.Subscribe(correlator); //subscribe the console subscriber to the correlator //instead that directly to the publisher correlator.Subscribe(new NewFileSavedMessageSubscriber()); //wait for user RETURN Console.ReadLine(); } } private static string ExtractCorrelationKey(string arg) { var match = Regex.Match(arg, "(FILEID\d{4})"); if (match.Success) return match.Captures[0].Value; else return null; } The initialization is quite the same of the filtering example seen in the previous section. The biggest difference is that the correlator object, instead of a string filter variable, accepts a function that analyses the incoming filename and produces the eventually available correlationID variable. I prepared two files with the same ID in filename variable. Here's the console output of the running example: Two files correlated by their name As visible, correlator made its job by joining the two file's data into a single message regardless of the order in which the two files were stored in the filesystem. These examples regarding the filtering and correlation of messages should give you the idea that we can do anything with received messages. We can put a message in standby until a correlated message comes, we can join multiple messages into one, we can produce multiple times the same message, and so on. This programming style opens the programmer's mind to lot of new application designs and possibilities. Sourcing from CLR streams Any class that extends System.IO.Stream is some kind of cursor-based flow of data. The same happens when we want to see a video stream, a sort of locally not persisted data that flows only in the network with the ability to go forward and backward, stop, pause, resume, play, and so on. The same behavior is available while streaming any kind of data, thus, the Stream class is the base class that exposes such behavior for any need. There are specialized classes that extend Stream, helping work with the streams of text data (StreamWriter and StreamReader), binary serialized data (BinaryReader and BinaryWriter), memory-based temporary byte containers (MemoryStream), network-based streams (NetworkStream), and lot of others. Regarding reactive programming, we are dealing with the ability to source events from any stream regardless of its kind (network, file, memory, and so on). Real-world applications that use reactive programming based on streams are cheats, remote binary listeners (socket programming), and any other unpredictable event-oriented applications. On the other hand, it is useless to read a huge file in reactive way, because there is simply nothing reactive in such cases. It is time to look at an example. Here's a complete example of a reactive application made for listening to a TPC port and route string messages (CR + LF divides multiple messages) to all the available observers. The program Main and the usual ConsoleObserver methods are omitted for better readability: public sealed class TcpListenerStringObservable : IObservable<string>, IDisposable { private readonly TcpListener listener; public TcpListenerStringObservable(int port, int backlogSize = 64) { //creates a new tcp listener on given port //with given backlog size listener = new TcpListener(IPAddress.Any, port); listener.Start(backlogSize); //start listening asynchronously listener.AcceptTcpClientAsync().ContinueWith(OnTcpClientConnected); } private void OnTcpClientConnected(Task<TcpClient> clientTask) { //if the task has not encountered errors if (clientTask.IsCompleted) //we will handle a single client connection per time //to handle multiple connections, simply put following //code into a Task using (var tcpClient = clientTask.Result) using (var stream = tcpClient.GetStream()) using (var reader = new StreamReader(stream)) while (tcpClient.Connected) { //read the message var line = reader.ReadLine(); //stop listening if nothing available if (string.IsNullOrEmpty(line)) break; else { //construct observer message adding client's remote endpoint address and port var msg = string.Format("{0}: {1}", tcpClient.Client.RemoteEndPoint, line); //route messages foreach (var observer in observerList) observer.OnNext(msg); } } //starts another client listener listener.AcceptTcpClientAsync().ContinueWith(OnTcpClientConnected); } private readonly List<IObserver<string>> observerList = new List<IObserver<string>>(); public IDisposable Subscribe(IObserver<string> observer) { observerList.Add(observer); //subscription lifecycle missing //for readability purpose return null; } public void Dispose() { //stop listener listener.Stop(); } } The preceding example shows how to create a reactive TCP listener that acts as observable of string messages. The observable method uses an internal TcpListener class that provides mid-level network services across an underlying Socket object. The example asks the listener to start listening and starts waiting for a client into another thread with the usage of a Task object. When a remote client becomes available, its communication with the internals of observable is guaranteed by the OnTcpClientConneted method that verifies the normal execution of Task. Then, it catches TcpClient from Task, reads the network stream, and appends StreamReader to such a network stream to start a reading feature. Once the message reading feature is complete, another Task starts repeating the procedure. Although, this design handles a backlog of pending connections, it makes available only a single client per time. To change such designs to handle multiple connections altogether, simply encapsulate the OnTcpClientConnected logic. Here's an example: private void OnTcpClientConnected(Task<TcpClient> clientTask) { //if the task has not encountered errors if (clientTask.IsCompleted) Task.Factory.StartNew(() => { using (var tcpClient = clientTask.Result) using (var stream = tcpClient.GetStream()) using (var reader = new StreamReader(stream)) while (tcpClient.Connected) { //read the message var line = reader.ReadLine(); //stop listening if nothing available if (string.IsNullOrEmpty(line)) break; else { //construct observer message adding client's remote endpoint address and port var msg = string.Format("{0}: {1}", tcpClient.Client.RemoteEndPoint, line); //route messages foreach (var observer in observerList) observer.OnNext(msg); } } }, TaskCreationOptions.PreferFairness); //starts another client listener listener.AcceptTcpClientAsync().ContinueWith(OnTcpClientConnected); } This is the output of the reactive application when it receives two different connections by using telnet as a client (C:>telnet localhost 8081). The program Main and the usual ConsoleObserver methods are omitted for better readability: The observable routing events from the telnet client As you can see, each client starts connecting to the listener by using a different remote port. This gives us the ability to differentiate multiple remote connections although they connect altogether. Sourcing from CLR enumerables Sourcing from a finite collection is something useless with regard to reactive programming. Differently, specific enumerable collections are perfect for reactive usages. These collections are the changeable collections that support collection change notifications by implementing the INotifyCollectionChanged(System.Collections.Specialized) interface like the ObservableCollection(System.Collections.ObjectModel) class and any infinite collection that supports the enumerator pattern with the usage of the yield keyword. Changeable collections The ObservableCollection<T> class gives us the ability to understand, in an event-based way, any change that occurs against the collection content. Kindly consider that changes regarding collection child properties are outside of the collection scope. This means that we are notified only for collection changes like the one produced from the Add or Remove methods. Changes within a single item does not produce an alteration of the collection size, thus, they are not notified at all. Here's a generic (nonreactive) example: static void Main(string[] args) { //the observable collection var collection = new ObservableCollection<string>(); //register a handler to catch collection changes collection.CollectionChanged += OnCollectionChanged; collection.Add("ciao"); collection.Add("hahahah"); collection.Insert(0, "new first line"); collection.RemoveAt(0); Console.WriteLine("Press RETURN to EXIT"); Console.ReadLine(); } private static void OnCollectionChanged(object sender, NotifyCollectionChangedEventArgs e) { var collection = sender as ObservableCollection<string>; if (e.NewStartingIndex >= 0) //adding new items Console.WriteLine("-> {0} {1}", e.Action, collection[e.NewStartingIndex]); else //removing items Console.WriteLine("-> {0} at {1}", e.Action, e.OldStartingIndex); } As visible, collection notifies all the adding operations, giving the ability to catch the new message. The Insert method signals an Add operation; although with the Insert method, we can specify the index and the value will be available within collection. Obviously, the parameter containing the index value (e.NewStartingIndex) contains the new index accordingly to the right operation. Differently, the Remove operation, although notifying the removed element index, cannot give us the ability to read the original message before the removal, because the event triggers after the remove operation has already occurred. In a real-world reactive application, the most interesting operation against ObservableCollection is the Add operation. Here's an example (console observer omitted for better readability): class Program { static void Main(string[] args) { //the observable collection var collection = new ObservableCollection<string>(); using (var observable = new NotifiableCollectionObservable(collection)) using (var observer = observable.Subscribe(new ConsoleStringObserver())) { collection.Add("ciao"); collection.Add("hahahah"); collection.Insert(0, "new first line"); collection.RemoveAt(0); Console.WriteLine("Press RETURN to EXIT"); Console.ReadLine(); } } public sealed class NotifiableCollectionObservable : IObservable<string>, IDisposable { private readonly ObservableCollection<string> collection; public NotifiableCollectionObservable(ObservableCollection<string> collection) { this.collection = collection; this.collection.CollectionChanged += collection_CollectionChanged; } private readonly List<IObserver<string>> observerList = new List<IObserver<string>>(); public IDisposable Subscribe(IObserver<string> observer) { observerList.Add(observer); //subscription lifecycle missing //for readability purpose return null; } public void Dispose() { this.collection.CollectionChanged -= collection_CollectionChanged; foreach (var observer in observerList) observer.OnCompleted(); } } } The result is the same as the previous example about ObservableCollection without the reactive objects. The only difference is that observable routes only messages when the Action values add. The ObservableCollection signaling its content changes Infinite collections Our last example is regarding sourcing events from an infinite collection method. In C#, it is possible to implement the enumerator pattern by signaling each object to enumerate per time, thanks to the yield keyword. Here's an example: static void Main(string[] args) { foreach (var value in EnumerateValuesFromSomewhere()) Console.WriteLine(value); } static IEnumerable<string> EnumerateValuesFromSomewhere() { var random = new Random(DateTime.Now.GetHashCode()); while (true) //forever { //returns a random integer number as string yield return random.Next().ToString(); //some throttling time Thread.Sleep(100); } } This implementation is powerful because it never materializes all the values into the memory. It simply signals that a new object is available to the enumerator that the foreach structure internally uses by itself. The result is writing forever numbers onto the output console. Somehow, this behavior is useful for reactive usage, because it never creates a useless state like a temporary array, list, or generic collection. It simply signals new items available to the enumerable. Here's an example: public sealed class EnumerableObservable : IObservable<string>, IDisposable { private readonly IEnumerable<string> enumerable; public EnumerableObservable(IEnumerable<string> enumerable) { this.enumerable = enumerable; this.cancellationSource = new CancellationTokenSource(); this.cancellationToken = cancellationSource.Token; this.workerTask = Task.Factory.StartNew(() => { foreach (var value in this.enumerable) { //if task cancellation triggers, raise the proper exception //to stop task execution cancellationToken.ThrowIfCancellationRequested(); foreach (var observer in observerList) observer.OnNext(value); } }, this.cancellationToken); } //the cancellation token source for starting stopping //inner observable working thread private readonly CancellationTokenSource cancellationSource; //the cancellation flag private readonly CancellationToken cancellationToken; //the running task that runs the inner running thread private readonly Task workerTask; //the observer list private readonly List<IObserver<string>> observerList = new List<IObserver<string>>(); public IDisposable Subscribe(IObserver<string> observer) { observerList.Add(observer); //subscription lifecycle missing //for readability purpose return null; } public void Dispose() { //trigger task cancellation //and wait for acknoledge if (!cancellationSource.IsCancellationRequested) { cancellationSource.Cancel(); while (!workerTask.IsCanceled) Thread.Sleep(100); } cancellationSource.Dispose(); workerTask.Dispose(); foreach (var observer in observerList) observer.OnCompleted(); } } This is the code of the program startup with the infinite enumerable generation: class Program { static void Main(string[] args) { //we create a variable containing the enumerable //this does not trigger item retrieval //so the enumerator does not begin flowing datas var enumerable = EnumerateValuesFromSomewhere(); using (var observable = new EnumerableObservable(enumerable)) using (var observer = observable.Subscribe(new ConsoleStringObserver())) { //wait for 2 seconds than exit Thread.Sleep(2000); } Console.WriteLine("Press RETURN to EXIT"); Console.ReadLine(); } static IEnumerable<string> EnumerateValuesFromSomewhere() { var random = new Random(DateTime.Now.GetHashCode()); while (true) //forever { //returns a random integer number as string yield return random.Next().ToString(); //some throttling time Thread.Sleep(100); } } } As against the last examples, here we have the usage of the Task class. The observable uses the enumerable within the asynchronous Task method to give the programmer the ability to stop the execution of the whole operation by simply exiting the using scope or by manually invoking the Dispose method. This example shows a tremendously powerful feature: the ability to yield values without having to source them from a concrete (finite) array or collection by simply implementing the enumerator pattern. Although few are used, the yield operator gives the ability to create complex applications simply by pushing messages between methods. The more methods we create that cross send messages to each other, the more complex business logics the application can handle. Consider the ability to catch all such messages with observables, and you have a little idea about how powerful reactive programming can be for a developer. Summary In this article, we had the opportunity to test the main features that any reactive application must implement: message sending, error sending, and completing acknowledgement. We focused on plain C# programming to give the first overview of how reactive classic designs can be applied to all main application needs, such as sourcing from streams, from user input, from changeable and infinite collection. Resources for Article: Further resources on this subject: Basic Website using Node.js and MySQL database [article] Domain-Driven Design [article] Data Science with R [article]
Read more
  • 0
  • 0
  • 17281

article-image-getting-started-packages-r
Joel Carlson
18 Jul 2016
6 min read
Save for later

Getting Started with Packages in R

Joel Carlson
18 Jul 2016
6 min read
R is a powerful programming language for loading, manipulating, transforming, and visualizing data. The language is made more powerful by its extensibility in conjunction with the efforts of a highly active open source community. This community is constantly contributing to the language in the form of packages, which are, at their core, sets of thematically linked functions. By leveraging the work that has been put in to the creation of useful open source packages, an R user can substantially improve both the readability and efficiency of their code. In this post, you will learn how to install new packages to extend the functionality of R and how to load those packages into your session. We will also explore some of the most useful packages that have been contributed by the R community! Installing Packages There are a number of places where R packages can be stored, but the three most popular locations are CRAN, Bioconductor, and GitHub. CRAN The Comprehensive R Archive Network is the home of R. At the time of this writing, there are over 8,000 packages hosted on CRAN, all of which are free to download and use. If you are looking to get started with using R in your field but don't know exactly where to start, the CRAN task view for your field or area of interest is likely a good place to start. There you will find listings of relevant packages, along with short descriptions and links to source code. Let's say you've entered the "Reproducible Research" task view and have decided that the package named knitr sounds useful. To install knitr from CRAN, you type this in your R console: install.packages("knitr") Bioconductor Bioconductor is home to over 1,000 packages for R, with a focus on packages that can be used for bioinformatics research. One of the main differences between Bioconductor and CRAN is that Bioconductor has stricter guidelines for accepting packages than CRAN. After finding a package on Bioconductor, such as EBImage, install it by running these commands: source("https://bioconductor.org/biocLite.R") biocLite("EBImage") It is possible to install from Bioconductor using install.packages, but this is not recommended for reasons discussed here. GitHub GitHub is a space where you can post the source code of your work to keep it under version control and also to encourage and facilitate collaboration. Often, GitHub is where the truly bleeding-edge packages can be found, and where package updates are put first. Many of the packages that can be found on CRAN have a development version on GitHub, occasionally with features absent from the CRAN version. As you browse GitHub, you will likely find some packages that will never be put on CRAN or Bioconductor. For this reason, caution should be exercised when using packages sourced from GitHub. Should you find a package on GitHub and wish to install it, you must first download the package devtools from CRAN. You then have access to the install_github() function, where the argument is the name of the developer, followed by a slash, and then the name of the package: install.packages("devtools") # Install swirl! See: https://github.com/swirldev/swirl devtools::install_github("swirldev/swirl") Where the syntax devtools::xxxx() simply means "Use the xxxx function from the devtools package ". You could just have easily called library(devtools) after installing and then simply typed install_github(). The devtools package also includes a number of different methods for installing packages that are stored locally, on bitbucket, in an SVN repository. Try typing ??devtools::install_ to see a full list. Some Popular Packages Now that you know the basic commands for installing packages, let's take a very short look at some of the more popular and useful packages. Visualizing data with ggplot2 ggplot2 is a package that is used to visualize data. It provides a method of chart-building that is intuitive (based on The Grammar of Graphics) and results in aesthetically pleasing graphics. Here is an example of a graphic produced using ggplot2: install.packages("ggplot2") # Install from CRAN library(ggplot2) # Load ggplot2 data(diamonds) # Load diamonds data set # Create plot with carat on x axis, price on y, # and color based on quality of cut ggplot(data=diamonds, aes(x=carat, y=price, col=cut)) + geom_point(alpha=0.5) # Use points (dots) to represent data Manipulating data with dplyr dplyr presents a number of verbs used for manipulating data (select, filter, mutate, arrange, summarize, and so on), each of which are common tasks when working with data. To see how dplyr can simplify your workflow, let's compare the base R versus the dplyr code used to subset the diamonds data into only those gems with Ideal cut type and greater than 2 carats: install.packages("dplyr") # Install dplyr from CRAN library(dplyr) # Load dplyr BaseR <- diamonds[which(diamonds$cut == "Ideal" & diamonds$carat > 2),] # vs: Dplyr <- filter(diamonds, cut == "Ideal" & carat > 2) Clearly the dplyr version is more succinct, more readable, and, most importantly, easier to write. Machine learning with caret The caret package is a collection of functions that unify the syntax used by many of the most popular machine learning packages implemented in R. caret will allow you to quickly prepare your data, create predictive models, tune the model parameters, and interpret the results. Here is a simple working example of training and tuning a k-nearest neighbors model with caret to predict the price of a diamond based on cut, color, and clarity: install.packages("caret") library(caret) # Split data into training and testing sets inTrain <- createDataPartition(diamonds$price, p=0.01, list=FALSE) training <- diamonds[inTrain,] testing <- diamonds[-inTrain,] knn_model <- train(price ~ cut + color + clarity, data=training, method="knn") plot(knn_model) You can see that increasing the number of neighbors in the model increases the accuracy (decreases the RMSE, a method of measuring the average distance between predictions and data). Summary In this post, you learned how to install and load packages from three different major sources: CRAN, Bioconductor, and GitHub. You also took a brief look at three popular packages: ggplot2 for visualization, dplyr for manipulation, and caret for machine learning. About the author Joel Carlson is a recent MSc graduate from Seoul National University, and current Data Science Fellow at Galvanize in San Francisco. He has contributed two R packages to CRAN (radiomics and RImagePalette). You can learn more or contact him at his personal website.
Read more
  • 0
  • 0
  • 2187
article-image-overview-certificate-management
Packt
18 Jul 2016
24 min read
Save for later

Overview of Certificate Management

Packt
18 Jul 2016
24 min read
In this article by David Steadman and Jeff Ingalls, the authors of Microsoft Identity Manager 2016 Handbook, we will look at certificate management in brief. Microsoft Identity Management (MIM)—certificate management (CM)—is deemed the outcast in many discussions. We are here to tell you that this is not the case. We see many scenarios where CM makes the management of user-based certificates possible and improved. If you are currently using FIM certificate management or considering a new certificate management deployment with MIM, we think you will find that CM is a component to consider. CM is not a requirement for using smart cards, but it adds a lot of functionality and security to the process of managing the complete life cycle of your smart cards and software-based certificates in a single forest or multiforest scenario. In this article, we will look at the following topics: What is CM? Certificate management components Certificate management agents The certificate management permission model (For more resources related to this topic, see here.) What is certificate management? Certificate management extends MIM functionality by adding management policy to a driven workflow that enables the complete life cycle of initial enrollment, duplication, and the revocation of user-based certificates. Some smart card features include offline unblocking, duplicating cards, and recovering a certificate from a lost card. The concept of this policy is driven by a profile template within the CM application. Profile templates are stored in Active Directory, which means the application already has a built-in redundancy. CM is based on the idea that the product will proxy, or be the middle man, to make a request to and get one from CA. CM performs its functions with user agents that encrypt and decrypt its communications. When discussing PKI (Public Key Infrastructure) and smart cards, you usually need to have some discussion about the level of assurance you would like for the identities secured by your PKI. For basic insight on PKI and assurance, take a look at http://bit.ly/CorePKI. In typical scenarios, many PKI designers argue that you should use Hardware Security Module (HSM) to secure your PKI in order to get the assurance level to use smart cards. Our personal opinion is that HSMs are great if you need high assurance on your PKI, but smart cards increase your security even if your PKI has medium or low assurance. Using MIM CM with HSM will not be covered in this article, but if you take a look at http://bit.ly/CMandLunSA, you will find some guidelines on how to use MIM CM and HSM Luna SA. The Financial Company has a low-assurance PKI with only one enterprise root CA issuing the certificates. The Financial Company does not use a HSM with their PKI or their MIM CM. If you are running a medium- or high-assurance PKI within your company, policies on how to issue smart cards may differ from the example. More details on PKI design can be found at http://bit.ly/PKIDesign. Certificate management components Before we talk about certificate management, we need to understand the underlying components and architecture: As depicted before, we have several components at play. We will start from the left to the right. From a high level, we have the Enterprise CA. The Enterprise CA can be multiple CAs in the environment. Communication from the CM application server to the CA is over the DCOM/RPC channel. End user communication can be with the CM web page or with a new REST API via a modern client to enable the requesting of smart cards and the management of these cards. From the CM perspective, the two mandatory components are the CM server and the CA modules. Looking at the logical architecture, we have the CA, and underneath this, we have the modules. The policy and exit module, once installed, control the communication and behavior of the CA based on your CM's needs. Moving down the stack, we have Active Directory integration. AD integration is the nuts and bolts of the operation. Integration into AD can be very complex in some environments, so understanding this area and how CM interacts with it is very important. We will cover the permission model later in this article, but it is worth mentioning that most of the configuration is done and stored in AD along with the database. CM uses its own SQL database, and the default name is FIMCertificateManagement. The CM application uses its own dedicated IIS application pool account to gain access to the CM database in order to record transactions on behalf of users. By default, the application pool account is granted the clmApp role during the installation of the database, as shown in the following screenshot:   In CM, we have a concept called the profile template. The profile template is stored in the configuration partition of AD, and the security permissions on this container and its contents determine what a user is authorized to see. As depicted in the following screenshot, CM stores the data in the Public Key Services (1) and the Profile Templates container. CM then reads all the stored templates and the permissions to determine what a user has the right to do (2): Profile templates are at the core of the CM logic. The three components comprising profile templates are certificate templates, profile details, and management policies. The first area of the profile template is certificate templates. Certificate templates define the extensions and data point that can be included in the certificate being requested. The next item is profile details, which determines the type of request (either a smart card or a software user-based certificate), where we will generate the certificates (either on the server or on the client side of the operations), and which certificate templates will be included in the request. The final area of a profile template is known as management policies. Management policies are the workflow engine of the process and contain the manager, the subscriber functions, and any data collection items. The e-mail function is initiated here and commonly referred to as the One Time Password (OTP) activity. Note the word "One". A trigger will only happen once here; therefore, multiple alerts using e-mail would have to be engineered through alternate means, such as using the MIM service and expiration activities. The permission model is a bit complex, but you'll soon see the flexibility it provides. Keep in mind that Service Connection Point (SCP) also has permissions applied to it to determine who can log in to the portal and what rights the user has within the portal. SCP is created upon installation during the wizard configuration. You will want to be aware of the SCP location in case you run into configuration issues with administrators not being able to perform particular functions. The SCP location is in the System container, within Microsoft, and within Certificate Lifecycle Manager, as shown here: Typical location CN=Certificate Lifecycle Manager,CN=Microsoft,CN=System,DC=THEFINANCIALCOMPANY,DC=NET Certificate management agents We covered several key components of the profile templates and where some of the permission model is stored. We now need to understand how the separation of duties is defined within the agent role. The permission model provides granular control, which promotes the separation of duties. CM uses six agent accounts, and they can be named to fit your organization's requiremensts. We will walk through the initial setup again later in this article so that you can use our setup or alter it based on your need. The Financial Company only requires the typical setup. We precreated the following accounts for TFC, but the wizard will create them for you if you do not use them. During the installation and configuration of CM, we will use the following accounts: Besides the separation of duty, CM offers enrollment by proxy. Proxy enrollment of a request refers to providing a middle man to provide the end user with a fluid workflow during enrollment. Most of this proxy is accomplished via the agent accounts in one way or another. The first account is MIM CM Agent (MIMCMAgent), which is used by the CM server to encrypt data from the smart card admin PINs to the data collection stored in the database. So, the agent account has an important role to protect data and communication to and from the certificate authorities. The last user agent role CMAgent has is the capability to revoke certificates. The agent certificate thumbprint is very important, and you need to make sure the correct value is updated in the three areas: CM, web.config, and the certificate policy module under the Signing Certificates tab on the CA. We have identified these areas in the following. For web.config: <add key="Clm.SigningCertificate.Hash" value <add key="Clm.Encryption.Certificate.Hash" value <add key="Clm.SmartCard.ExchangeCertificate.Hash" value The Signing Certificates tab is as shown in the following screenshot:   Now, when you run through the configuration wizard, these items are already updated, but it is good to know which locations need to be updated if you need to troubleshoot agent issues or even update/renew this certificate. The second account we want to look at is Key Recovery Agent (MIMCMKRAgent); this agent account is needed for CM to recover any archived private keys certificates. Now, let's look at Enrollment Agent (MIMCMEnrollAgent); the main purpose of this agent account is to provide the enrollment of smart cards. Enrollment Agent, as we call it, is responsible for signing all smart card requests before they are submitted to the CA. Typical permission for this account on the CA is read and request. Authorization Agent (MIMCMAuthAgent)—or as some folks call this, the authentication agent—is responsible for determining access rights for all objects from a DACL perspective. When you log in to the CM site, it is the authorization account's job to determine what you have the right to do based on all the core components that ACL has applied. We will go over all the agents accounts and rights needed later in this article during our setup. CA Manager Agent (MIMCMManagerAgent) is used to perform core CA functions. More importantly, its job is to issue Certificate Revocation Lists (CRLs). This happens when a smart card or certificate is retired or revoked. It is up to this account to make sure the CRL is updated with this critical information. We saved the best for last: Web Pool Agent (MIMCMWebAgent). This agent is used to run the CM web application. The agent is the account that contacts the SQL server to record all user and admin transactions. The following is a good depiction of all the accounts together and the high-level functions:   The certificate management permission model In CM, we think this part is the most complex because with the implementation, you can be as granular as possible. For this reason, this area is the most difficult to understand. We will uncover the permission model so that we can begin to understand how the permission model works within CM. When looking at CM, you need to formulate the type of management model you will be deploying. What we mean by this is will you have a centralized or delegated model? This plays a key part in deployment planning for CM and the permission you will need to apply. In the centralized model, a specific set of managers are assigned all the rights for the management policy. This includes permissions on the users. Most environments use this method as it is less complex for environments. Now, within this model, we have manager-initiated permission, and this is where CM permissions are assigned to groups containing the subscribers. Subscribers are the actual users doing the enrollment or participating in the workflow. This is the model that The Financial Company will use in its configuration. The delegated model is created by updating two flags in web.config called clm.RequestSecurity.Flags and clm.RequestSecurity.Groups. These two flags work hand in hand as if you have UseGroups, then it will evaluate all the groups within the forests to include universal/global security. Now, if you use UseGroups and define clm.RequestSecurity.Groups, then it will only look for these specific groups and evaluate via the Authorization Agent . The user will tell the Authorization Agent to only read the permission on the user and ignore any group membership permissions:   When we continue to look at the permission, there are five locations that permissions can be applied in. In the preceding figure is an outline of these locations, but we will go in more depth in the subsections in a bit. The basis of the figure is to understand the location and what permission can be applied. The following are the areas and the permissions that can be set: Service Connection Point: Extended Permissions Users or Groups: Extended Permissions Profile Template Objects: Container: Read or Write Template Object: Read/Write or Enroll Certificate Template: Read or Enroll CM Management Policy within the Web application: We have multiple options based on the need, such as Initiate Request Now, let's begin to discuss the core areas to understand what they can do. So, The Financial Company can design the enrollment option they want. In the example, we will use the main scenario we encounter, such as the helpdesk, manager, and user-(subscriber) based scenarios. For example, certain functions are delegated to the helpdesk to allow them to assist the user base without giving them full control over the environment (delegated model). Remember this as we look at the five core permission areas. Creating service accounts So far, in our MIM deployment, we have created quite a few service accounts. MIM CM, however, requires that we create a few more. During the configuration wizard, we will get the option of having the wizard create them for us, but we always recommend creating them manually in FIM/MIM CM deployments. One reason is that a few of these need to be assigned some certificates. If we use an HSM, we have to create it manually in order to make sure the certificates are indeed using the HSM. The wizard will ask for six different service accounts (agents), but we actually need seven. In The Financial Company, we created the following seven accounts to be used by FIM/MIM CM: MIMCMAgent MIMCMAuthAgent MIMCMCAManagerAgent MIMCMEnrollAgent MIMCMKRAgent MIMCMWebAgent MIMCMService The last one, MIMCMService, will not be used during the configuration wizard, but it will be used to run the MIM CM Update service. We also created the following security groups to help us out in the scenarios we will go over: MIMCM-Helpdesk: This is the next step in OTP for subscribers MIMCM-Managers: These are the managers of the CM environment MIMCM-Subscribers: This is group of users that will enroll Service Connection Point Service Connection Point (SCP)is located under the Systems folder within Active Directory. This location, as discussed in the earlier parts of the article, defines who functions as the user as it relates to logging in to the web application. As an example, if we just wanted every user to only log in, we would give them read rights. Again, authenticated users, have this by default, but if you only wanted a subset of users to access, you should remove authenticated users and add your group. When you run the configuration wizard, SCP is decided, but the default is the one shown in the following screenshot:   If a user is assigned to any of the MIM CM permissions available on SCP, the administrative view of the MIM CM portal will be shown. The MIM CM permissions are defined in a Microsoft TechNet article at http://bit.ly/MIMCMPermission. For your convenience, we have copied parts of the information here: MIM CM Audit: This generates and displays MIM CM policy templates, defines management policies within a profile template, and generates MIM CM reports. MIM CM Enrollment Agent: This performs certificate requests for the user or group on behalf of another user. The issued certificate's subject contains the target user's name and not the requester's name. MIM CM Request Enroll: This initiates, executes, or completes an enrollment request. MIM CM Request Recover: This initiates encryption key recovery from the CA database. MIM CM Request Renew: This initiates, executes, or completes an enrollment request. The renewal request replaces a user's certificate that is near its expiration date with a new certificate that has a new validity period. MIM CM Request Revoke: This revokes a certificate before the expiration of the certificate's validity period. This may be necessary, for example, if a user's computer or smart card is stolen. MIM CM Request Unblock Smart Card: This resets a smart card's user Personal Identification Number (PIN) so that he/she can access the key material on a smart card. The Active Directory extended permissions So, even if you have the SCP defined, we still need to set up the permissions on the user or group of users that we want to manage. As in our helpdesk example, if we want to perform certain functions, the most common one is offline unblock. This would require the MIMCM-HelpDesk group. We will create this group later in this article. It would contain all help desk users then on SCP; we would give them CM Request Unblock Smart Card and CM Enrollment Agent. Then, you need to assign the permission to the extended permission on MIMCM-Subscribers, which contains all the users we plan to manage with the helpdesk and offline unblock:   So, as you can see, we are getting into redundant permissions, but depending on the location, it means what the user can do. So, planning of the model is very important. Also, it is important to document what you have as with some slight tweak, things can and will break. The certificate templates permission In order for any of this to be possible, we still need to give permission to the manager of the user to enroll or read the certificate template, as this will be added to the profile template. For anyone to manage this certificate, everyone will need read and enroll permissions. This is pretty basic, but that is it, as shown in the following screenshot:   The profile template permission The profile template determines what a user can read within the template. To get to the profile template, we need to use Active Directory sites and services to manage profile templates. We need to activate the services node as this is not shown by default, and to do this, we will click on View | Show Services Node:   As an example if you want a user to enroll in the cert, he/she would need CM Enroll on the profile template, as shown in the following screenshot:   Now, this is for users, but let's say you want to delegate the creation of profile templates. For this, all you need to do is give the MIMCM-Managers delegate the right to create all child items on the profile template container, as follows:   The management policy permission For the management policy, we will break it down into two sections: a software-based policy and a smart card management policy. As we have different capabilities within CM based on the type, by default, CM comes with two sample policies (take a look at the following screenshot), which we use for duplication to create a new one. When configuring, it is good to know that you cannot combine software and smart card-based certificates in a policy:   The software management policy The software-based certificate policy has the following policies available through the CM life cycle:   The Duplicate Policy panel creates a duplicate of all the certificates in the current profile. Now, if the first profile is created for the user, all the other profiles created afterwards will be considered duplicate, and the first generated policy will be primary. The Enroll Policy panel defines the initial enrollment steps for certificates such as initiate enroll request and data collection during enroll initiation. The Online Update Policy panel is part of the automatic policy function when key items in the policy change. This includes certificates about to expire, when a certificate is added to the existing profile template or even removed. The Recover Policy panel allows for the recovery of the profile in the event that the user was deleted. This includes the cases where certs are deleted by accident. One thing to point out is if the certificate was a signing cert, the recovery policy would issue a new replacement cert. However, if the cert was used for encryption, you can recover the original using this policy. The Recover On Behalf Policy panel allows managers or helpdesk operations to be recovered on behalf the user in the event that they need any of the certificates. The Renew Policy panel is the workflow that defines the renew setting, such as revocation and who can initiate a request. The Suspend and Reinstate Policy panel enables a temporary revocation of the profile and puts a "certificate hold" status. More information about the CRL status can be found at http://bit.ly/MIMCMCertificateStatus. The Revoke Policy panel maintains the revocation policy and setting around being able to set the revocation reason and delay. Also, it allows the system to push a delta CRL. You also can define the initiators for this policy workflow. The smart card management policy The smart card policy has some similarities to the software-based policy, but it also has a few new workflows to manage the full life cycle of the smart card:   The Profile Details panel is by far the most commonly used part in this section of the policy as it defines all the smart card certificates that will be loaded in the policy along with the type of provider. One key item is creating and destroying virtual smart cards. One final key part is diversifying the admin key. This is best practice as this secures the admin PIN using diversification. So, before we continue, we want to go over this setting as we think it is an important topic. Diversifying the admin key is important because each card or batch of cards comes with a default admin key. Smart cards may have several PINs, an admin PIN, a PINunlock key (PUK), and a user PIN. This admin key, as CM refers to it, is also known as the administrator PIN. This PIN differs from the user's PIN. When personalizing the smart card, you configure the admin key, the PUK, and the user's PIN. The admin key and the PUK are used to reset the virtual smart card's PIN. However, you cannot configure both. You must use the PUK to unlock the PIN if you assign one during the virtual smart card's creation. It is important to note that you must use the PUK to reset the PIN if you provide both a PUK and an admin key. During the configuration of the profile template, you will be asked to enter this key as follows:   The admin key is typically used by smart card management solutions that enable a challenge response approach to PIN unlocking. The card provides a set of random data that the user reads (after the verification of identity) to the deployment admin. The admin then encrypts the data with the admin key (obtained as mentioned before) and gives the encrypted data back to the user. If the encrypted data matches that produced by the card during verification, the card will allow PIN resetting. As the admin key is never in the hands of anyone other than the deployment administrator, it cannot be intercepted or recorded by any other party (including the employee) and thus has significant security benefits beyond those in using a PUK—an important consideration during the personalization process. When enabled, the admin key is set to a card-unique value when the card is assigned to the user. The option to diversify admin keys with the default initialization provider allows MIM CM to use an algorithm to uniquely generate a new key on the card. The key is encrypted and securely transmitted to the client. It is not stored in the database or anywhere else. MIM CM recalculates the key as needed to manage the card:   The CM profile template contains a thumbprint for the certificate to be used in admin key diversification. CM looks in the personal store of the CM agent service account for the private key of the certificate in the profile template. Once located, the private key is used to calculate the admin key for the smart card. The admin key allows CM to manage the smart card (issuing, revoking, retiring, renewing, and so on). Loss of the private key prevents the management of cards diversified using this certificate. More detail on the control can be found at http://bit.ly/MIMCMDiversifyAdminKey. Continuing on, the Disable Policy panel defines the termination of the smart card before expiration, you can define the reason if you choose. Once disabled, it cannot be reused in the environment. The Duplicate Policy panel, similarly to the software-based one, produces a duplicate of all the certificates that will be on the smart card. The Enroll Policy panel, similarly to the software policy, defines who can initiate the workflow and printing options. The Online Update Policy panel, similarly to the software-based cert, allows for the updating of certificates if the profile template is updated. The update is triggered when a renewal happens or, similarly to the software policy, a cert is added or removed. The Offline Unblock Policy panel is the configuration of a process to allow offline unblocking. This is used when a user is not connected to the network. This process only supports Microsoft-based smart cards with challenge questions and answers via, in most cases, the user calling the helpdesk. The Recovery On Behalf Policy panel allows the recovery of certificates for the management or the business to recover if the cert is needed to decrypt information from a user whose contract was terminated or who left the company. The Replace Policy panel is utilized by being able to replace a user's certificate in the event of a them losing their card. If the card they had had a signing cert, then a new signing cert would be issued on this new card. Like with software certs, if the certificate type is encryption, then it would need to be restored on the replace policy. The Renew Policy panel will be used when the profile/certificate is in the renewal period and defines revocation details and options and initiates permission. The Suspend and Reinstate Policy panel is the same as the software-based policy for putting the certificate on hold. The Retire Policy panel is similar to the disable policy, but a key difference is that this policy allows the card to be reused within the environment. The Unblock Policy panel defines the users that can perform an actual unblocking of a smart card. More in-depth detail of these policies can be found at http://bit.ly/MIMCMProfiletempates. Summary In this article, we uncovered the basics of certificate management and the management components that are required to successfully deploy a CM solution. Then, we discussed and outlined, agent accounts and the roles they play. Finally, we looked into the management permission model from the policy template to the permissions and the workflow. Resources for Article: Further resources on this subject: Managing Network Devices [article] Logging and Monitoring [article] Creating Horizon Desktop Pools [article]
Read more
  • 0
  • 0
  • 7293

article-image-microstrategy-10
Packt
15 Jul 2016
13 min read
Save for later

MicroStrategy 10

Packt
15 Jul 2016
13 min read
In this article by Dmitry Anoshin, Himani Rana, and Ning Ma, the authors of the book, Mastering Business Intelligence with MicroStrategy, we are going to talk about MicroStrategy 10 which is one of the leading platforms on the market, can handle all data analytics demands, and offers a powerful solution. We will be discussing the different concepts of MicroStrategy such as its history, deployment, and so on. (For more resources related to this topic, see here.) Meet MicroStrategy 10 MicroStrategy is a market leader in Business Intelligence (BI) products. It has rich functionality in order to meet the requirements of modern businesses. In 2015, MicroStrategy provided a new release of MicroStrategy, version 10. It offers both agility and governance like no other BI product. In addition, it is easy to use and enterprise ready. At the same time, it is great for both IT and business. In other words, MicroStrategy 10 offers an analytics platform that combines an easy and empowering user experience, together with enterprise-grade performance, management, and security capabilities. It is true bimodal BI and moves seamlessly between styles: Data discovery and visualization Enterprise reporting and dashboards In-memory high performance BI Scales from departments to enterprises Administration and security MicroStrategy 10 consists of three main products: MicroStrategy Desktop, MicroStrategy Mobile, and MicroStrategy Web. MicroStrategy Desktop lets users start discovering and visualizing data instantly. It is available for Mac and PC. It allows users to connect, prepare, discover, and visualize data. In addition, we can easily promote to a MicroStrategy Server. Moreover, MicroStrategy Desktop has a brand new HTML5 interface and includes all connection drivers. It allows us to use data blending, data preparation, and data enrichment. Finally, it has powerful advanced analytics and can be integrated with R. To cut a long story short, we want to notice main changes of new BI platform. All developers keep the same functionality, the looks as well as architect the same. All changes are about Web interface and Intelligence Server. Let's look closer at what MicroStrategy 10 can show us. MicroStrategy 10 expands the analytical ecosystem by using third-party toolkits such as: Data visualization libraries: We can easily plug in and use any visualization from the expanding range of Java libraries Statistical toolkits: R, SAS, SPSS, KXEN, and others Geolocation data visualization: Uses mapping capabilities to visualize and interact with location data MicroStrategy 10 has more than 25 new data sources that we can connect to quickly and simply. In addition, it allows us build reports on top of other BI tools, such as SAP Business Objects, Cognos, and Oracle BI. It has a new connector to Hadoop, which uses the native connector. Moreover, it allows us to blend multiple data sources in-memory. We want to notice that MicroStrategy 10 got reach functionality for work with data such as: Streamlined workflows to parse and prepare data Multi-table in-memory support from different sources Automatically parse and prepare data with every refresh 100+ inbuilt functions to profile and clean data Create custom groups on the fly without coding In terms of connection to Hadoop, most BI products use Hive or Impala ODBC drivers in order to use SQL to get data from Hadoop. However, this method is bad in terms of performance. MicroStrategy 10 queries directly against Hadoop. As a result, it is up to 50 times faster than via ODBC. Let's look at some of the main technical changes that have significantly improved MicroStrategy. The platform is now faster than ever before, because it doesn't have a two-billion-row limit on in-memory datasets and allows us to create analytical cubes up to 16 times bigger in size. It publishes cubes dramatically faster. Moreover, MicroStrategy 10 has higher data throughput and cubes can be loaded in parallel 4 times faster with multi-threaded parallel loading. In addition, the in-memory engine allows us to create cubes 80 times larger than before, and we can access data from cubes 50% faster, by using up to 8 parallel threads. Look at the following table, where we compare in-memory cube functionality in version 9 versus version 10: Feature Ver. 9 Ver. 10 Data volume 100 GB ~2TB Number of rows 2 billion 200 billion Load rate 8 GB/hour ~200 GB/hour Data model Star schema Any schema, tabular or multiple sets   In order to make the administration of MicroStrategy more effective in the new version, MicroStrategy Operation Manager was released. It gives MicroStrategy administrators powerful development tools to monitor, automate, and control systems. Operations Manager gives us: Centralized management in a web browser Enterprise Manager Console within Tool Triggers and 24/7 alerts System health monitors Server management Multiple environment administration MicroStrategy 10 education and certification MicroStrategy 10 offers new training courses that can be conducted offline in a training center, or online at http://www.microstrategy.com/us/services/education. We believe that certification is a good thing on your journey. The following certifications now exist for version 10: MicroStrategy 10 Certified Associated Analyst MicroStrategy 10 Certified Application Designer MicroStrategy 10 Certified Application Developer MicroStrategy 10 Certified Administrator After passing all of these exams, you will become a MicroStrategy 10 Application Engineer. More details can be found here: http://www.microstrategy.com/Strategy/media/downloads/training-events/MicroStrategy-certification-matrix_v10.pdf. History ofMicroStrategy Let us briefly look at the history of MicroStrategy, which began in 1991: 1991: Released first BI product, which allowed users to create graphical views and analyses of information data 2000: Released MicroStrategy 7 with a web interface 2003: First to release a fully integrated reporting tool, combining list reports, BI-style dashboards, and interface analyses in a single module. 2005: Released MicroStrategy 8, including one-click actions and drag-and-drop dashboard creation 2009: Released MicroStrategy 9, delivering a seamless consolidated path from department to enterprise BI 2010: Unveiled new mobile BI capabilities for iPad and iPhone, and was featured on the iTunes Bestseller List 2011: Released MicroStrategy Cloud, the first SaaS offering from a major BI vendor 2012: Released Visual Data Discovery and groundbreaking new security platform, Usher 2013: Released expanded Analytics Platform and free Analytics Desktop client 2014: Announced availability of MicroStrategy Analytics via Amazon Web Services (AWS) 2015: MicroStrategy 10 was released, the first ever enterprise analytics solution for centralized and decentralized BI DeployingMicroStrategy 10 We know only one way to master MicroStrategy, through practical exercises. Let's start by downloading and deploying MicroStrategy 10.2. Overview of training architecture In order to master MicroStrategy and learn about some BI considerations, we need to download the all-important software, deploy it, and connect to a network. During the preparation of the training environment, we will cover the installation of MicroStrategy on a Linux operating system. This is very good practice, because many people work with Windows and are not familiar with Linux, so this chapter will provide additional knowledge of working with Linux, as well as installing MicroStrategy and a web server. Look at the training architecture: There are three main components: Red Hat Linux 6.4: Used for deploying the web server and Intelligence Server. Windows machine: Uses MicroStrategy Client and Oracle database. Virtual machine with Hadoop: Ready virtual machine with Hadoop, which will connect to MicroStrategy using a brand new connection. In the real world, we should use separate machines for every component, and sometimes several machines in order to run one component. This is called clustering. Let's create a virtual machine. Creating of Red Hat Linux virtual machine Let's create a virtual machine with Red Hat Linux, which will host our Intelligence Server: Go to http://www.redhat.com/ and create an account Go to the software download center: https://access.redhat.com/downloads Download RHEL: https://access.redhat.com/downloads/content/69/ver=/rhel---7/7.2/x86_64/product-software Choose Red Hat Enterprise Linux Server Download Red Hat Enterprise Linux 6.4 x86_64 Choose Binary DVD Now we can create a virtual machine with RHEL 6.4. We have several options in order to choose the software for deploying virtual machine. In our case, we will use a VMware workstation. Before starting to deploy a new VM, we should adjust the default settings, such as increasing RAM and HDD, and adding one more network card in order to connect the external environment with the MicroStrategyClient and sample database. In addition, we should create a new network. When the deployment of the RHEL virtual machine is complete, we should activate a subscription in order to install the required packages. Let us do this with one command in the terminal: # subscription-manager register --username <username> --password <password> --auto-attach Performing prerequisites for MicroStrategy 10 According to the installation and configuration guide, we should deploy all necessary packages. In order to install them, we should execute them under the root: # su # yum install compat-libstdc++-33.i686 # yum install libXp.x86_64 # yum install elfutils-devel.x86_64 # yum install libstdc++-4.4.7-3.el6.i686 # yum install krb5-libs.i686 # yum install nss-pam-ldapd.i686 # yum install ksh.x86_64 The project design process Project design is not just about creating a project in MicroStrategy architect; it involves several steps and thorough analysis, such as how data is stored in the data warehouse, what reports the user wants based on the data, and so on. The following are the steps involved in our project design process: Logical data model design Once the user have business requirements documented, the user must create a fact qualifier matrix to identify the attributes, facts, and hierarchies, which are the building blocks of any logical data model. An example of a fact qualifier is as follows: A logical data model is created based on the source systems and designed before defining a data warehouse. So, it's good for seeing which objects the users want and checking whether the objects are in the source systems. It represents the definition, characteristics, and relationships of the data. This graphical representation of information is easily understandable by business users too. A logical data model graphically represents the following concepts: Attributes: Provides a detailed description of the data Facts: Provide numerical information about the data Hierarchies: Provide relationships between data Data warehouse schema design Physical data warehouse design is based on the logical data model and represents the storage and retrieval of data from the data warehouse. Here, we determine the optimal schema design, which ensures reporting performance and maintenance. The key components of a physical data warehouse schema are columns and tables: Columns: These store attribute and fact data. The following are the three types of columns: ID column: Stores the ID for an attribute Description column: Stores text description of the attribute Fact column: Stores fact data Tables: Physical grouping of related data. Following are the types of tables: Lookup tables: Store information about attributes such as IDs and descriptions Relationship tables: Store information about relationship between two or more attributes Fact tables: Store factual data and the level of aggregation, which is defined based on the attributes of the fact table. They contain base fact columns or derived fact columns: Base fact: Stores the data at the lowest possible level of detail. Aggregate fact: Stores data at a higher or summarized level of detail. Mobile server installation and configuration While mobile client is easy to install, mobile server is not. Here we provide a step-by-step guide on how to install mobile server: Download MicroStrategyMobile.war. Mobile server is packed in a WAR file, just like Operation Manager or Web: Copy MicroStrategyMobile.war from <Microstrategy Installation folder>/Mobile/MobileServer to /usr/local/tomcat7/webapps. Then restart Tomcat, by issuing the ./shutdown.sh and ./startup.sh commands: Connect to the mobile server. Go to http://192.168.81.134:8080/MicroStrategyMobile/servlet/mstrWebAdmin. Then add the server name localhost.localdomain and click connect: Configure mobile server. You can configure (1) Authentication settings for the mobile server application; (2) Privileges and permissions; (3) SSL encryption; (4) Client authentication with a certificate server; (5) Destination folder for the photo uploader widget and signature capture input control. Performing Pareto analysis One good thing about data discovery tools is their agile approach to the data. We can connect any data source and easily slice and dice data. Let's try to use the Pareto principle in order to answer the question: How are sales distributed among the different products? The Pareto principle states that, for many events, roughly 80% of results come from 20% of the causes. For example, 80% of profits come from 20% of the products offered. This type of analysis is very popular in product analytics. In MicroStrategy Desktop, we can use shortcut metrics in order to quickly make complex calculations such as running sums or a percent of the total. Let's build a visualization in order to see the 20% of products that bring us 80% of the money: Choose Combo Chart. Drag and drop Salesamount to the vertical and Englishproductname to the horizontal. Add Orderdate to the filters and restrict to 60 days. Right-click on Sales amountand choose Descending Sort. Right-click on Salesamount | ShortcutMetrics | Percent Running Total. Drag and drop Metric Names to the Color By. Change the color of Salesamount and Percent Running Total. Change the shape of Percent Running Total. As a result, we get this chart: From this chart we can quickly understand our top 20% of products which bring us 80% of revenue. Splunk and MicroStrategy MicroStrategy 10 has announced a new connection to Splunk. I suppose that Splunk is not very popular in the world of Business Intelligence. Most people who have heard about Splunk think that it is just a platform for processing logs. The answers is both true and false. Splunk was derived from the world of spelunking, because searching for root causes in logs is a kind of spelunking without light, and Splunk solves this problem by indexing machine data from a tremendous number of data sources, starting from applications, hardware, sensors, and so on. What is Splunk Splunk's goal is making machine data accessible, usable, and valuable for everyone, and turning machine data into business value. It can: Collect data from anywhere Search and analyze everything Gain real-time Operational Intelligence In the BI world, everyone knows what a data warehouse is. Creating reports from Splunk Now we are ready to build reports using MicroStrategy Desktop and Splunk. Let's do it: Go to MicroStrategy Desktop, click add data, and choose Splunk Create a connection using the existing DNS based on Splunk ODBC: Choose one of tables (Splunk reports): Add other tables as new data sources. Now we can build a dashboard using data from Splunk by dragging and dropping attributes and metrics: Summary In this article we looked at MicroStrategy 10 and its features. We learned about its history and deployment. We also learnt about the project design process, the Pareto analysis and about the connection of Splunk and MicroStrategy. Resources for Article: Further resources on this subject: Stacked Denoising Autoencoders [article] Creating external tables in your Oracle 10g/11g Database [article] Clustering Methods [article]
Read more
  • 0
  • 0
  • 4351

article-image-building-line-chart-ggplot2
Joel Carlson
15 Jul 2016
6 min read
Save for later

Building a Line Chart with ggplot2

Joel Carlson
15 Jul 2016
6 min read
In this blog post, you will follow along to produce a line chart using the ggplot2 package for R. The ggplot2 package is highly customizable and extensible, which provides an intuitive plotting syntax that allows for the creation of an incredibly diverse range of plots. This week, save 50% on some of out best R titles. If one isn't enough, grab any 5 featured products for $50! We're also giving away a free R eBook every week - bookmark this page! Motivating example Before getting started, let’s examine ggplot over the base R plotting functions. In general, the base R plotting system is more verbose and harder to understand and produces plots that are less attractive than their ggplot2 equivalents. To illustrate, let's build a plot using data on the growth of five trees from the “datasets” package. This is just a demonstration, so don't worry too much about the structure of the data or the details of the plotting syntax. Take a look at the following: library(datasets) data("Orange") The goal is to plot the growth of the trees as a line chart where each line corresponds to a different tree over time. Consider the following code to produce this chart using the base R plotting system: # Adapted from: http://www.statmethods.net/graphs/line.html ntrees <- length(unique(Orange$Tree)) # Get the range for the x and y axis xrange <- range(Orange$age) yrange <- range(Orange$circumference) # Set up the plot plot(xrange, yrange, type="n", xlab="Age (days)", ylab="Circumference (mm)" ) colors <- rainbow(ntrees) # Add lines for (i in 1:ntrees) { tree <- subset(Orange, Tree==i) lines(tree$age, tree$circumference, col=colors[i]) } # Add title title("Tree Growth (Base R)") # Add legend legend(xrange[1], yrange[2], 1:ntrees, cex=0.8, col=colors, lty=1, title="Tree")   The code is verbose, difficult to extend or change (for example, if you want to change the lines to points, you would need to change a number of variables), and the chart produced is not particularly attractive. The following is an equivalent chart using ggplot2:   Using ggplot2, you can produce this plot with fewer lines of code that are both more readable and extensible. You will also avoid the ugly "for" loop used to produce the lines. By the end of this post, you will have built this plot from the ground up using ggplot2! Installation and preparation For this post, you will first need to make sure that ggplot2 is installed via the following command: install.packages("ggplot2") Once the package is installed, load it into the session using: library(ggplot2) Data The dataset used in this post is already in the "tidy data" format, as described here. If your data is not in the tidy format, consider using the dplyr and/or tidyr packages to shape it into the correct format. You are using a very small dataset called Orange, which as the preceding plots describe, contains the growth patterns of five trees over several years. The data consist of 35 rows and three columns and is found in the datasets package. The structure of the data is as follows: str(Orange) 'data.frame': 35 obs. of 3 variables: $ Tree : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 1 1 1 1 1 2 2 2 ... $ age : num 118 484 664 1004 1231 ... $ circumference: num 30 58 87 115 120 142 145 33 69 111 ... Building plots You will now begin building up the previous plot using principles described in "The Grammar of Graphics", upon which ggplot2 is based. To build a plot using ggplot, think about it in terms of aesthetic mappings and geometries, which are used to create layers that make up the plot. Calling ggplot() without any aesthetics or geometries defined provides an empty canvas. Aesthetics and geometries Aesthetics are the visual properties (for example, size, shape, color, fill, and so on) of the geometries present in the graph. In this context, a geometry refers to objects that directly represent data points (that is, rows in a data frame), such as dots, lines, or bars. In ggplot2, create aesthetics using the aes() function. Inside aes(), you define which variables will map to aesthetics in the plot. Here, we wish to map the "age" variable to the x-axis aesthetic, the "circumference" variable to the y-axis aesthetic, and the "Tree" factor variable to the color aesthetic, with each factor level being represented by a different color, as follows: p <- ggplot(data = Orange, aes(x=age, y=circumference, col=Tree)) If you run the code after defining only the aesthetics, you will see that there is nothing on the plot except the axes:   This is because although you have mapped aesthetics to data, you have yet to represent these mappings with geometries (or geoms). To create this representation, you add a layer on the plot using a call to the line geometry and the geom_line() function, as follows: p <-p +geom_line() p   Take a look at the full listing of geoms that can be used here. Polishing the plot With the structure of the plot in place, polish the plot by: Editing the axis labels Adding a title Moving the legend Axis labels and the title You can create/change the axis labels of the plot using labs(), as follows: p <-p +labs(x="Age (days)", y="Circumference (mm)") You can also add a title using ggtitle(), as follows: p <- p + ggtitle("Tree Growth (ggplot2)") p Moving the legend To move the legend, use the theme() function and change the legend.justification and legend.position variables via the following code: p <- p + theme(legend.justification=c(0,1), legend.position=c(0,1)) p   The justification for the legend is laid out as a grid, where (0,0) is lower-left and (1,1) is upper-right. The legend.position parameter can also take values such as "top", "bottom", "left", "right", or "none" (which removes the legend entirely). The theme() function is very powerful and allows very fine-grained control over the plot. You can find a listing of all the available parameters in the documentation here. Final words The plot is now identical to the plot used to motivate the article! The final code is as follows: ggplot(data=Orange, aes(x=age, y=circumference, col=Tree)) + geom_line() + labs(x="Age (days)", y="Circumference (mm)") + ggtitle("Tree Growth (ggplot2)") + theme(legend.justification=c(0,1), legend.position=c(0,1)) Clearly, the code is more readable, and I think you would agree that the plot is more attractive than the equivalent plot using base R. Good luck and happy plotting! About the author Joel Carlson is a recent MSc graduate from Seoul National University and current Data Science Fellow at Galvanize in San Francisco. He has contributed two R packages in CRAN (radiomics and RImagePalette). You can learn more about him or get in touch at his personal website.
Read more
  • 0
  • 0
  • 11497
article-image-exploring-shaders-and-effects
Packt
14 Jul 2016
5 min read
Save for later

Exploring Shaders and Effects

Packt
14 Jul 2016
5 min read
In this article by Jamie Dean, the author of the book Mastering Unity Shaders and Effects, we will use transparent shaders and atmospheric effects to present the volatile conditions of the planet, Ridley VI, from the surface. In this article, we will cover the following topics: Exploring the difference between cutout, transparent, and fade Rendering Modes Implementing and adjusting Unity's fog effect in the scene (For more resources related to this topic, see here.) Creating the dust cloud material The surface of Ridley VI is made inhospitable by dangerous nitrogen storms. In our game scene, these are represented by dust cloud planes situated near the surface. We need to set up the materials for these clouds with the following steps: In the Project panel, click on the PACKT_Materials folder to view its contents in the Assets panel. In the Assets panel, right-click on an empty area and choose Create| Material. Rename the material dustCloud. In the Hierarchy panel, click to select the dustcloud object. The object's properties will appear in the Inspector. Drag the dustCloud material from the Assets panel onto the Materials field in the Mesh Renderer property visible in the Inspector. Next, we will set the texture map of the material. Reselect the dustCloud material by clicking on it in the Assets panel. Lock the Inspector by clicking on the small lock icon on the top-right corner of the panel. Locking the Inspector allows you to maintain the focus on assets while you are hooking up an associated asset in your project. In the Project panel, click on the PACKT_Textures folder. Locate the strato texture map and drag it into the dustCloud material's Albedo texture slot in the Inspector. The texture map contains four atlassed variations of the cloud effect. We need to adjust how much of the whole texture is shown in the material. In the Inspector, set the Tiling Y value to 0.25. This will ensure that only a quarter of the complete height of the texture will be used in the material. The texture map also contains opacity data. To use this in our material, we need to adjust the Rendering Mode. The Rendering Mode of Standard Shader allows us to specify the opaque nature of a surface. Most often, scene objects are Opaque. Objects behind them are blocked by them and are not visible through their surface. The next option is Cutout. This is used for surfaces containing areas of full opacity and full transparency, such as leaves on a tree or a chain link fence. The opacity is basically on or off for each pixel in a texture. Fade allows objects to have cutout areas where there are completely transparent and partially transparent pixels. The Transparent option is suitable for truly transparent surfaces such as windows, glass, and some types of plastic. When specular is used with a transparent material, it is applied over the whole surface, making it unsuitable for cutout effects. Comparison of Standard Shader transparency types The Fade Rendering Mode is the best option for our dustCloud material as we want the cloud objects to be cutout so that the edges of the quad where the material is applied to is not visible. We want the surface to be partially transparent so that other dustcloud quads are visible behind them, blending the effect. At the top of the material properties in the Inspector, click on the Rendering Mode drop-down menu and set it to Fade: Transparent dustCloud material applied The dust clouds should now be visible with their opacity reading correctly as shown in the preceding figure. In the next step, we will add some further environmental effects to the scene. Adding fog to the scene In this step, we will add fog to the scene. Fog can be set to fade out distant background elements to reduce the amount of scenery that needs to be rendered. It can be colored, allowing us to blend elements together and give our scene some depth. If the Lighting tab is not already visible in the Unity project, activate it from the menu bar by navigating to Windows | Lighting. Dock the Lighting panel if necessary. Scroll to the bottom to locate the Fog properties group. Check the checkbox next to Fog to enable it. You will see that fog is added to the environment in the Scene view as shown in the following figure. The default values do not quite match to what we need in the planet surface environment: Unity's default fog effect Click within the color swatch next to Fog Color to define the color value. When the color picker appears over the main Unity interface, type the hexcode E8BE80FF into the Hex Color field near the bottom as shown in the following screenshot: Fog effect color selection This will define the  yellow orange color that is appropriate for our planet's atmosphere. Set the Fog Mode to Exponential Squared to allow it to give the appearance of becoming thicker in the distance. Increase the fog by increasing the End value to 0.05: Adjusted fog blended with dust cloud transparencies Our dust cloud objects are being blended with the fog as shown in the preceding image. Summary In this article, we took a closer look at material Rendering Modes and how transparent effects can be implemented in a scene. We further explored the real-time environmental effects by creating dust clouds that fade in and out using atlassed textures. We then set up an environmental fog effect using Unity's built-in tools. For more information on Unity shaders and effects, you can refer to the following books: Unity 5.x Animation Cookbook: https://www.packtpub.com/game-development/unity-5x-animation-cookbook Unity 5.x Shaders and Effects Cookbook: https://www.packtpub.com/game-development/unity-5x-shaders-and-effects-cookbook Unity Shaders and Effects Cookbook: https://www.packtpub.com/game-development/unity-shaders-and-effects-cookbook Resources for Article: Further resources on this subject: Looking Good – The Graphical Interface [article] Build a First Person Shooter [article] The Vertex Functio [article]
Read more
  • 0
  • 0
  • 31296

article-image-basic-website-using-nodejs-and-mysql-database
Packt
14 Jul 2016
5 min read
Save for later

Basic Website using Node.js and MySQL database

Packt
14 Jul 2016
5 min read
In this article by Fernando Monteiro author of the book Node.JS 6.x Blueprints we will understand some basic concepts of a Node.js application using a relational database (Mysql) and also try to look at some differences between Object Document Mapper (ODM) from MongoDB and Object Relational Mapper (ORM) used by Sequelize and Mysql. For this we will create a simple application and use the resources we have available as sequelize is a powerful middleware for creation of models and mapping database. We will also use another engine template called Swig and demonstrate how we can add the template engine manually. (For more resources related to this topic, see here.) Creating the baseline applications The first step is to create another directory, I'll use the root folder. Create a folder called chapter-02. Open your terminal/shell on this folder and type the express command: express –-git Note that we are using only the –-git flag this time, we will use another template engine but we will install it manually. Installing Swig template Engine The first step to do is change the default express template engine to use Swig, a pretty simple template engine very flexible and stable, also offers us a syntax very similar to Angular which is denoting expressions just by using double curly brackets {{ variableName }}. More information about Swig can be found on the official website at: http://paularmstrong.github.io/swig/docs/ Open the package.json file and replace the jade line for the following: "swig": "^1.4.2" Open your terminal/shell on project folder and type: npm install Before we proceed let's make some adjust to app.js, we need to add the swig module. Open app.js and add the following code, right after the var bodyParser = require('body-parser'); line: var swig = require('swig'); Replace the default jade template engine line for the following code: var swig = new swig.Swig(); app.engine('html', swig.renderFile); app.set('view engine', 'html'); Refactoring the views folder Let's change the views folder to the following new structure: views pages/ partials/ Remove the default jade files form views. Create a file called layout.html inside pages folder and place the following code: <!DOCTYPE html> <html> <head> </head> <body> {% block content %} {% endblock %} </body> </html> Create a index.html inside the views/pages folder and place the following code: {% extends 'layout.html' %} {% block title %}{% endblock %} {% block content %} <h1>{{ title }}</h1> Welcome to {{ title }} {% endblock %} Create a error.html page inside the views/pages folder and place the following code: {% extends 'layout.html' %} {% block title %}{% endblock %} {% block content %} <div class="container"> <h1>{{ message }}</h1> <h2>{{ error.status }}</h2> <pre>{{ error.stack }}</pre> </div> {% endblock %} We need to adjust the views path on app.js, replace the code on line 14 for the following code: // view engine setup app.set('views', path.join(__dirname, 'views/pages')); At this time we completed the first step to start our MVC application. In this example we will use the MVC pattern in its full meaning, Model, View, Controller. Creating controllers folder Create a folder called controllers inside the root project folder. Create a index.js inside the controllers folder and place the following code: // Index controller exports.show = function(req, res) { // Show index content res.render('index', { title: 'Express' }); }; Edit the app.js file and replace the original index route app.use('/', routes); with the following code: app.get('/', index.show); Add the controller path to app.js on line 9, replace the original code, with the following code: // Inject index controller var index = require('./controllers/index'); Now it's time to get if all goes as expected, we run the application and check the result. Type on your terminal/shell the following command: npm start Check with the following URL: http://localhost:3000, you'll see the welcome message of express framework. Removing the default routes folder Remove the routes folder and its content. Remove the user route from the app.js, after the index controller and on line 31. Adding partials files for head and footer Inside views/partials create a new file called head.html and place the following code: <meta charset="utf-8"> <title>{{ title }}</title> <link rel='stylesheet' href='https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.0.0-alpha.2/css/bootstrap.min.css'> <link rel="stylesheet" href="/stylesheets/style.css"> Inside views/partials create a file called footer.html and place the following code: <script src='https://cdnjs.cloudflare.com/ajax/libs/jquery/2.2.1/jquery.min.js'></script> <script src='https://cdnjs.cloudflare.com/ajax/libs/twitter-bootstrap/4.0.0-alpha.2/js/bootstrap.min.js'></script> Now is time to add the partials file to layout.html page using the include tag. Open layout.html and add the following highlighted code: <!DOCTYPE html> <html> <head> {% include "../partials/head.html" %} </head> <body> {% block content %} {% endblock %} {% include "../partials/footer.html" %} </body> </html> Finally we are prepared to continue with our project, this time our directories structure looks like the following image: Folder structure Summaray In this article, we are discussing the basic concept of Node.js and Mysql database and we also saw how to refactor express engine template and use another resource like Swig template library to build a basic website. Resources for Article: Further resources on this subject: Exception Handling in MySQL for Python [article] Python Scripting Essentials [article] Splunk's Input Methods and Data Feeds [article]
Read more
  • 0
  • 0
  • 47023
Modal Close icon
Modal Close icon