How-To Tutorials

article-image-data-extracting-transforming-and-loading

01 Aug 2016

15 min read

Data Extracting, Transforming, and Loading

01 Aug 2016

0
0
2661

article-image-rapid-application-development-django-openduty-story

Bálint Csergő

01 Aug 2016

5 min read

Rapid Application Development with Django, the Openduty story

Bálint Csergő

01 Aug 2016

5 min read

Openduty is an open source incident escalation tool, which is something like Pagerduty but free and much simpler. It was born during a hackathon at Ustream back in 2014. The project received a lot of attention in the devops community, and was also featured in Devops weekly andPycoders weekly.It is listed at Full Stack Python as an example Django project. This article is going to include some design decisions we made during the hackathon, and detail some of the main components of the Opendutysystem. Design When we started the project, we already knew what we wanted to end up with: We had to work quickly—it was a hackathon after all An API similar to Pagerduty Ability to send notifications asynchronously A nice calendar to organize on—call schedules can’t hurt anyone, right? Tokens for authorizing notifiers So we chose the corresponding components to reach our goal. Get the job done quickly If you have to develop apps rapidly in Python, Django is the framework you choose. It's a bit heavyweight, but hey, it gives you everything you need and sometimes even more. Don't get me wrong; I'm a big fan of Flask also, but it can be a bit fiddly to assemble everything by hand at the start. Flask may pay off later, and you may win on a lower amount of dependencies, but we only had 24 hours, so we went with Django. An API When it comes to Django and REST APIs, one of the GOTO soluitions is The Django REST Framework. It has all the nuts and bolts you'll need when you're assembling an API, like serializers, authentication, and permissions. It can even give you the possibility to make all your API calls self-describing. Let me show you howserializers work in the Rest Framework. class OnCallSerializer(serializers.Serializer): person = serializers.CharField() email = serializers.EmailField() start = serializers.DateTimeField() end = serializers.DateTimeField() The code above represents a person who is on-call on the API. As you can see, it is pretty simple; you just have to define the fields. It even does the validation for you, since you have to give a type to every field. But believe me, it's capable of more good things like generating a serializer from your Django model: class SchedulePolicySerializer(serializers.HyperlinkedModelSerializer): rules = serializers.RelatedField(many=True, read_only=True) class Meta: model = SchedulePolicy fields = ('name', 'repeat_times', 'rules') This example shows how you can customize a ModelSerializer, make fields read-only, and only accept given fields from an API call. Async Task Execution When you have tasks that are long-running, such as generating huge reports, resizing images, or even transcoding some media, it is a common practice thatyou must move the actual execution of those out of your webapp into a separate layer. This decreases the load on the webservers, helps in avoiding long or even timing out requests, and just makes your app more resilient and scalable. In the Python world, the go-to solution for asynchronous task execution is called Celery. In Openduty, we use Celery heavily to send notifications asynchronously and also to delay the execution of any given notification task by the delay defined in the service settings. Defining a task is this simple: @app.task(ignore_result=True) def send_notifications(notification_id): try: notification = ScheduledNotification.objects.get(id = notification_id) if notification.notifier == UserNotificationMethod.METHOD_XMPP: notifier = XmppNotifier(settings.XMPP_SETTINGS) #choosing notifier removed from example code snippet notifier.notify(notification) #logging task result removed from example snippet raise And calling an already defined task is also almost as simple as calling any regular function: send_notifications.apply_async((notification.id,) ,eta=notification.send_at) This means exactly what you think: Send the notification with the id: notification.id at notification.send_at. But how do these things get executed? Under the hood, Celery wraps your decorated functions so that when you call them, they get enqueued instead of being executed directly. When the celery worker detects that there is a task to be executed, it simply takes it from the queue and executes it asynchronously. Calendar We use django-scheduler for the awesome-looking calendar in Openduty. It is a pretty good project generally, supports recurring events, and provides you with a UI for your calendar, so you won't even have to fiddle with that. Tokens and Auth Service token implementation is a simple thing. You want them to be unique, and what else would you choose if not aUUID? There is a nice plugin for Django models used to handle UUID fields, called django-uuidfield. It just does what it says—addingUUIDField support to your models. User authentication is a bit more interesting, so we currently support plain Django Users, and you can use LDAP as your user provider. Summary This was just a short summary about the design decisions made when we coded Openduty. I also demonstrated the power of the components through some snippets that are relevant. If you are on a short deadline, consider using Django and its extensions. There is a good chance that somebody has already done what you need to do, or something similar, which can always be adapted to your needs thanks to the awesome power of the open source community. About the author BálintCsergő is a software engineer from Budapest, currently working as an infrastructure engineer at Hortonworks. He lovesUnix systems, PHP, Python, Ruby, the Oracle database, Arduino, Java, C#, music, and beer.

0
0
14960

Packt

01 Aug 2016

26 min read

Memory

Packt

01 Aug 2016

26 min read

In this article by Enrique López Mañas and Diego Grancini, authors of the book Android High Performance Programming explains how memory is the matter to focus on. A bad memory managed application can affect the behavior of the whole system or it can affect the other applications installed on our device in the same way as other applications could affect ours. As we all know, Android has a wide range of devices in the market with a lot of different configurations and memory amounts. It's up to the developers to understand the strategy to take while dealing with this big amount of fragmentation, the pattern to follow while developing, and the tools to use to profile the code. This is the aim of this article. In the following sections, we will focus on heap memory. We will take a look at how our device handles memory deepening, what the garbage collection is, and how it works in order to understand how to avoid common developing mistakes and clarify what we will discuss to define best practices. We will also go through patterns definition in order to reduce drastically the risk of what we will identify as a memory leak and memory churn. This article will end with an overview of official tools and APIs that Android provides to profile our code and to find possible causes of memory leaks and that aren't deepened. (For more resources related to this topic, see here.) Walkthrough Before starting the discussion about how to improve and profile our code, it's really important to understand how Android devices handle memory. Then, in the following pages, we will analyze differences between the runtimes that Android uses, know more about the garbage collection, understand what a memory leak and memory churn are, and how Java handles object references. How memory works Have you ever thought about how a restaurant works during its service? Let's think about it for a while. When new groups of customers get into the restaurant, there's a waiter ready to search for a place to allocate them. But, the restaurant is a limited space. So, there is the need to free tables when possible. That's why, when a group has finished to eat, another waiter cleans and prepares the just freed table for other groups to come. The first waiter has to find the table with the right number of seats for every new group. Then, the second waiter's task should be fast and shouldn't hinder or block others' tasks. Another important aspect of this is how many seats are occupied by the group; the restaurant owner wants to have as much free seats as possible to place new clients. So, it's important to control that every group fills the right number of seats without occupying tables that could be freed and used in order to have more tables for other new groups. This is absolutely similar to what happens in an Android system. Every time we create a new object in our code, it needs to be saved in memory. So, it's allocated as part of our application private memory to be accessed whenever needed and the system keeps allocating memory for us during the whole application lifetime. Nevertheless, the system has a limited memory to use and it cannot allocate memory indefinitely. So, how is it possible for the system to have enough memory for our application all the time? And, why is there no need for an Android developer to free up memory? Let's find it out. Garbage collection The Garbage collection is an old concept that is based on two main aspects: Find no more referenced objects Free the memory referenced by those objects When that object is no more referenced, its "table" can be cleaned and freed up. This is, what it's done to provide memory for future new objects allocations. These operations of allocation of new objects and deallocation of no more referenced objects are executed by the particular runtime in use in the device, and there is no need for the developer to do anything just because they are all managed automatically. In spite of what happens in other languages, such as C or C++, there is no need for the developer to allocate and deallocate memory. In particular, while the allocation is made when needed, the garbage collection task is executed when a memory upper limit is reached. Those automatic operations in the background don't exempt developers from being aware of their app's memory management; if the memory management is not well done, the application can be lead to lags, malfunctions and, even, crashes when an OutOfMemoryError exception is thrown. Shared memory In Android, every app has its own process that is completely managed by the runtime with the aim to reclaim memory in order to free resources for other foreground processes, if needed. The available amount of memory for our application lies completely in RAM as Android doesn't use swap memory. The main consequence to this is that there is no other way for our app to have more memory than to unreferenced no longer used objects. But Android uses paging and memory mapping; the first technique defines blocks of memory of the same size called pages in a secondary storage, while the second one uses a mapping in memory with correlated files in secondary storage to be used as primary. They are used when the system needs to allocate memory for other processes, so the system creates paged memory-mapped files to save Dalvik code files, app resources, or native code files. In this way, those files can be shared between multiple processes. As a matter of fact, Android system uses a shared memory in order to better handle resources from a lot of different processes. Furthermore, every new process to be created is forked by an already existing one that is called Zygote. This particular process contains common framework classes and resources to speed up the first boot of the application. This means that the Zygote process is shared between processes and applications. This large use of shared memory makes it difficult to profile the use of memory of our application because there are many facets to be consider before reaching a correct analysis of memory usage. Runtime Some functions and operations of memory management depend on the runtime used. That's why we are going through some specific features of the two main runtime used by Android devices. They are as follows: Dalvik Android runtime (ART) ART has been added later to replace Dalvik to improve performance from different point of view. It was introduced in Android KitKat (API Level 19) as an option for developer to be enabled, and it has become the main and only runtime from Android Lollipop (API Level 21) on. Besides the difference between Dalvik and ART in compiling code, file formats, and internal instructions, what we are focusing on at the moment is memory management and garbage collection. So, let's understand how the Google team improved performance in runtimes garbage collection over time and what to pay attention at while developing our application. Let's step back and return to the restaurant for a bit more. What would happen if everything, all employees, such as other waiters and cooks, and all of the services, such as dishwashers, and so on, stop their tasks waiting for just a waiter to free a table? That single employee performance would make success or fail of all. So, it's really important to have a very fast waiter in this case. But, what to do if you cannot afford him? The owner wants him to do what he has to as fast as possible, by maximizing his productivity and, then, allocating all the customers in the best way and this is exactly what we have to do as developers. We have to optimize memory allocations in order to have a fast garbage collection even if it stops all the other operations. What is described here is just like the runtime garbage collection works. When the upper limit of memory is reached, the garbage collection starts its task pausing any other method, task, thread, or process execution, and those objects won't resume until the garbage collection task is completed. So, it's really important that the collection is fast enough not to impede the reaching of the 16 ms per frame rule, resulting in lags, and jank in the UI. The more time the garbage collection works, the less time the system has to prepare frames to be rendered on the screen. Keep in mind that automatic garbage collection is not free; bad memory management can lead to bad UI performance and, thus, bad UX. No runtime feature can replace good memory management. That's why we need to be careful about new allocations of objects and, above all, references. Obviously, ART introduced a lot of improvement in this process after the Dalvik era, but the background concept is the same; it reduces the collection steps, it adds a particular memory for Bitmap objects, it uses new fast algorithms, and it does other cool stuff getting better in the future, but there is no way to escape that we need to profile our code and memory usage if we want our application to have the best performance. Android N JIT compiler The ART runtime uses an ahead-of-time compilation that, as the name suggests, performs compilation when the applications are first installed. This approach brought in advantages to the overall system in different ways because, the system can: Reduce battery consumption due to pre-compilation and, then, improve autonomy Execute application faster than Dalvik Improve memory management and garbage collection However, those advantages have a cost related to installation timings; the system needs to compile the application at that time, and then, it's slower than a different type of compiler. For this reason, Google added a just-in-time (JIT) compiler to the ahead-of-time compiler of ART into the new Android N. This one acts when needed, so during the execution of the application and, then, it uses a different approach compared to the ahead-of-time one. This compiler uses code profiling techniques and it's not a replacement for the ahead-of-time, but it's in addition to it. It's a good enhancement to the system for the advantages in terms of performance it introduces. The profile-guided compilation adds the possibility to precompile and, then, to cache and to reuse methods of the application, depending on usage and/or device conditions. This feature can save time to the compilation and improve performance in every kind of system. Then, all of the devices benefit of this new memory management. The key advantages are: Less used memory Less RAM accesses Lower impact on battery All of these advantages introduced in Android N, however, shouldn't be a way to avoid a good memory management in our applications. For this, we need to know what pitfalls are lurking behind our code and, more than this, how to behave in particular situations to improve the memory management of the system while our application is active. Memory leak The main mistake from the memory performance perspective a developer can do while developing an Android application is called memory leak, and it refers to an object that is no more used but it's referenced by another object that is, instead, still active. In this situation, the garbage collector skips it because the reference is enough to leave that object in memory. Actually, we are avoiding that the garbage collector frees memory for other future allocations. So, our heap memory gets smaller because of this, and this leads to the garbage collection to be invoked more often, blocking the rest of executions of the application. This could lead to a situation where there is no more memory to allocate a new object and, then, an OutOfMemoryError exception is thrown by the system. Consider the case where a used object references no more used objects, that reference no more used objects, and so on; none of them can be collected, just because the root object is still in use. Memory churn Another anomaly in memory management is called memory churn, and it refers to the amount of allocations that is not sustainable by the runtime for the too many new instantiated objects in a small period of time. In this case, a lot of garbage collection events are called many times affecting the overall memory and UI performance of the application. The need to avoid allocations in the View.onDraw() method, is closely related to memory churn; we know that this method is called every time the view needs to be drawn again and the screen needs to be refreshed every 16.6667 ms. If we instantiate objects inside that method, we could cause a memory churn because those objects are instantiated in the View.onDraw() method and no longer used, so they are collected very soon. In some cases, this leads to one or more garbage collection events to be executed every time the frame is drawn on the screen, reducing the available time to draw it below the 16.6667 ms, depending on collection event duration. References Let's have a quick overview of different objects that Java provides us to reference objects. This way, we will have an idea of when we can use them and how. Java defines four levels of strength: Normal: It's the main type of reference. It corresponds to the simple creation of an object and this object will be collected when it will be no more used and referenced, and it's just the classical object instantiation: SampleObject sampleObject = new SampleObject(); Soft: It's a reference not enough strong to keep an object in memory when a garbage collection event is triggered. So, it can be null anytime during the execution. Using this reference, the garbage collector decides when to free the object memory based on memory demand of the system. To use it, just create a SoftReference object passing the real object as parameter in the constructor and call the SoftReference.get() method to get the object: SoftReference<SampleObject> sampleObjectSoftRef = new SoftReference<SampleObject>(new SampleObject()); SampleObject sampleObject = sampleObjectSoftRef.get(); Weak: It's exactly as SoftReferences, but this is weaker than the soft one: WeakReference<SampleObject> sampleObjectWeakRef = new WeakReference<SampleObject>(new SampleObject()); Phantom: This is the weakest reference; the object is eligible for finalization. This kind of references is rarely used and the PhantomReference.get() method returns always null. This is for reference queues that don't interest us at the moment, but it's just to know that this kind of reference is also provided. These classes may be useful while developing if we know which objects have a lower level of priority and can be collected without causing problems to the normal execution of our application. We will see how can help us manage memory in the following pages. Memory-side projects During the development of the Android platform, Google has always tried to improve the memory management system of the platform to maintain a wide compatibility with increasing performance devices and low resources ones. This is the main purpose of two project Google develops in parallel with the platform, and, then, every new Android version released means new improvements and changes to those projects and their impacts on the system performance. Every one of those side projects is focusing on a different matter: Project Butter: This is introduced in Android Jelly Bean 4.1 (API Level 16) and then improved in Android Jelly Bean 4.2 (API Level 17), added features related to the graphical aspect of the platform (VSync and buffering are the main addition) in order to improve responsiveness of the device while used. Project Svelte: This is introduced inside Android KitKat 4.4 (API Level 19), it deals with memory management improvements in order to support low RAM devices. Project Volta: This is introduced in Android Lollipop (API Level 21), it focuses on battery life of the device. Then, it adds important APIs to deal with batching expensive battery draining operations, such as the JobSheduler or new tools such as the Battery Historian. Project Svelte and Android N When it was first introduced, Project Svelte reduced the memory footprint and improved the memory management in order to support entry-level devices with low memory availability and then broaden the supported range of devices with clear advantage for the platform. With the new release of Android N, Google wants to provide an optimized way to run applications in background. We know that the process of our application last in background even if it is not visible on the screen, or even if there are no started activities, because a service could be executing some operations. This is a key feature for memory management; the overall system performance could be affected by a bad memory management of the background processes. But what's changed in the application behavior and the APIs with the new Android N? The chosen strategy to improve memory management reducing the impact of background processes is to avoid to send the application the broadcasts for the following actions: ConnectivityManager.CONNECTIVITY_ACTION: Starting from Android N, a new connectivity action will be received just from those applications that are in foreground and, then, that have registered BroadcastReceiver for this action. No application with implicit intent declared inside the manifest file will receive it any longer. Hence, the application needs to change its logics to do the same as before. Camera.ACTION_NEW_PICTURE: This one is used to notify that a picture has just been taken and added to the media store. This action won't be available anymore neither for receiving nor for sending and it will be for any application, not just for the ones that are targeting the new Android N. Camera.ACTION_NEW_VIDEO: This is used to notify a video has just been taken and added to the media store. As the previous one, this action cannot be used anymore, and it will be for any application too. Keep in mind these changes when targeting the application with the new Android N to avoid unwanted or unexpected behaviors. All of the preceding actions listed have been changed by Google to force developers not to use them in applications. As a more general rule, we should not use implicit receivers for the same reason. Hence, we should always check the behavior of our application while it's in the background because this could lead to an unexpected usage of memory and battery drain. Implicit receivers can start our application components, while the explicit ones are set up for a limited time while the activity is in foreground and then they cannot affect the background processes. It's a good practice to avoid the use of implicit broadcast while developing applications to reduce the impact of it on background operations that could lead to unwanted waste of memory and, then, a battery drain. Furthermore, Android N introduces a new command in ADB to test the application behavior ignoring the background processes. Use the following command to ignore background services and processes: adb shell cmd appops set RUN_IN_BACKGROUND ignore Use the following one to restore the initial state: adb shell cmd appops set RUN_IN_BACKGROUND allow Best practices Now that we know what can happen in memory while our application is active, let's have a deep examination of what we can do to avoid memory leaks, memory churns, and optimize our memory management in order to reach our performance target, not just in memory usage, but in garbage collection attendance, because, as we know, it stops any other working operation. In the following pages, we will go through a lot of hints and tips using a bottom-up strategy, starting from low-level shrewdness in Java code to highest level Android practices. Data types We weren't joking; we are really talking about Java primitive types as they are the foundation of all the applications, and it's really important to know how to deal with them even though it may be obvious. It's not, and we will understand why. Java provides primitive types that need to be saved in memory when used: the system allocate an amount of memory related to the needed one requested for that particular type. The followings are Java primitive types with related amount of bits needed to allocate the type: byte: 8 bit short: 16 bit int: 32 bit long: 64 bit float: 32 bit double: 64 bit boolean: 8 bit, but it depends on virtual machine char: 16 bit At first glance, what is clear is that you should be careful in choosing the right primitive type every time you are going to use them. Don't use a bigger primitive type if you don't really need it; never use long, float, or double, if you can represent the number with an integer data type. Otherwise, it would be a useless waste of memory and calculations every time the CPU need to deal with it and remember that to calculate an expression, the system needs to do a widening primitive implicit conversion to the largest primitive type involved in the calculation. Autoboxing Autoboxing is the term used to indicate an automatic conversion between a primitive type and its corresponding wrapper class object. Primitive type wrapper classes are the followings: java.lang.Byte java.lang.Short java.lang.Integer java.lang.Long java.lang.Float java.lang.Double java.lang.Boolean java.lang.Character They can be instantiated using the assignation operator as for the primitive types, and they can be used as their primitive types: Integer i = 0; This is exactly as the following: Integer i = new Integer(0); But the use of autoboxing is not the right way to improve the performance of our applications; there are many costs for that: first of all, the wrapper object is much bigger than the corresponding primitive type. For instance, an Integer object needs 16 bytes in memory instead of 16 bits of the primitive one. Hence, the bigger amount of memory used to handle that. Then, when we declare a variable using the primitive wrapper object, any operation on that implies at least another object allocation. Take a look at the following snippet: Integer integer = 0; integer++; Every Java developer knows what it is, but this simple code needs an explanation about what happened step by step: First of all, the integer value is taken from the Integer value integer and it's added 1: int temp = integer.intValue() + 1; Then the result is assigned to integer, but this means that a new autoboxing operation needs to be executed: i = temp; Undoubtedly, those operations are slower than if we used the primitive type instead of the wrapper class; no needs to autoboxing, hence, no more bad allocations. Things can get worse in loops, where the mentioned operations are repeated every cycle; take, for example the following code: Integer sum = 0; for (int i = 0; i < 500; i++) { sum += i; } In this case, there are a lot of inappropriate allocations caused by autoboxing, and if we compare this with the primitive type for loop, we notice that there are no allocations: int sum = 0; for (int i = 0; i < 500; i++) { sum += i; } Autoboxing should be avoided as much as possible. The more we use primitive wrapper classes instead of primitive types themselves, the more waste of memory there will be while executing our application and this waste could be propagated when using autoboxing in loop cycles, affecting not just memory, but CPU timings too. Sparse array family So, in all of the cases described in the previous paragraph, we can just use the primitive type instead of the object counterpart. Nevertheless, it's not always so simple. What happens if we are dealing with generics? For example, let's think about collections; we cannot use a primitive type as generics for objects that implements one of the following interfaces. We have to use the wrapper class this way: List<Integer> list; Map<Integer, Object> map; Set<Integer> set; Every time we use one of the Integer objects of a collection, autoboxing occurs at least once, producing the waste outlined above, and we know well how many times we deal with this kind of objects in every day developing time, but isn't there a solution to avoid autoboxing in these situations? Android provides a useful family of objects created on purpose to replace Maps objects and avoid autoboxing protecting memory from pointless bigger allocations; they are the Sparse arrays. The list of Sparse arrays, with related type of Maps they can replace, is the following: SparseBooleanArray: HashMap<Integer, Boolean> SparseLongArray: HashMap<Integer, Long> SparseIntArray: HashMap<Integer, Integer> SparseArray<E>: HashMap<Integer, E> LongSparseArray<E>: HashMap<Long, E> In the following, we will talk about SparseArray object specifically, but everything we say is true for all other object above as well. The SparseArray uses two different arrays to store hashes and objects. The first one collects the sorted hashes, while the second one stores the key/value pairs ordered conforming to the key hashes array sorting as in Figure 1: Figure 1: SparseArray's hashes structure When you need to add a value, you have to specify the integer key and the value to be added in SparseArray.put() method, just like in the HashMap case. This could create collisions if multiple key hashes are added in the same position. When a value is needed, simply call SparseArray.get(), specifying the related key; internally, the key object is used to binary search the index of the hash, and then the value of the related key, as in Figure 2: Figure 2: SparseArray's workflow When the key found at the index resulting from binary search does not match with the original one, a collision happened, so the search keeps on in both directions to find the same key and to provide the value if it's still inside the array. Thus, the time needed to find the value increases significantly with a large number of object contained by the array. By contrast, a HashMap contains just a single array to store hashes, keys, and values, and it uses largest arrays as a technique to avoid collisions. This is not good for memory, because it's allocating more memory than what it's really needed. So HashMap is fast, because it implements a better way to avoid collisions, but it's not memory efficient. Conversely, SparseArray is memory efficient because it uses the right number of object allocations, with an acceptable increase of execution timings. The memory used for these arrays is contiguous, so every time you remove a key/value pair from SparseArray, they can be compacted or resized: Compaction: The object to remove is shifted at the end and all the other objects are shifted left. The last block containing the item to be removed can be reused for future additions to save allocations. Resize: All the elements of the arrays are copied to other arrays and the old ones are deleted. On the other hand, the addition of new elements produces the same effect of copying all elements into new arrays. This is the slowest method, but it's completely memory safe because there are no useless memory allocations. In general, HashMap is faster while doing these operations because it contains more blocks than what it's really needed. Hence, the memory waste. The use of SparseArray family objects depends of the strategy applied for memory management and CPU performance patterns because of calculations performance cost compared to the memory saving. So, the use is right in some situations. Consider the use of it when: The number of object you are dealing with is below a thousand, and you are not going to do a lot of additions and deletions. You are using collections of Maps with a few items, but lots of iterations. Another useful feature of those objects is that they let you iterate over indexing, instead of using the iterator pattern that is slower and memory inefficient. The following snippet shows how the iteration doesn't involve objects: // SparseArray for (int i = 0; i < map.size(); i++) { Object value = map.get(map.keyAt(i)); } Contrariwise, the Iterator object is needed to iterate through HashMaps: // HashMap for (Iterator iter = map.keySet().iterator(); iter.hasNext(); ) { Object value = iter.next(); } Some developers think the HashMap object is the better choice because it can be exported from an Android application to other Java ones, while the SparseArray family's object don't. But what we analyzed here as memory management gain is applicable to any other cases. And, as developers, we should strive to reach performance goals in every platform, instead of reusing the same code in different platform, because different platform could have been affected differently from a memory perspective. That's why, our main suggestion is to always profile the code in every platform we are working on, and then make our personal considerations on better or worse approaches depending on results. ArrayMap An ArrayMap object is an Android implementation of the Map interface that is more memory efficient than the HashMap one. This class is provided by the Android platform starting from Android KitKat (API Level 19), but there is another implementation of this inside the Support package v4 because of its main usage on older and lower-end devices. Its implementation and usage is totally similar to the SparseArray objects with all the implications about memory usage and computational costs, but its main purpose is to let you use objects as keys of the map, just like the HashMap does. Hence, it provides the best of both worlds. Summary We defined a lot of best practices to help keep a good memory management, introducing helpful design patterns and analyzing which are the best choices while developing things taken for granted that can actually affect memory and performance. Then, we faced the main causes for the worst leaks in Android platform, those related to main components such as Activities and Services. As a conclusion for the practices, we introduced APIs both to use and not to use. Then, other ones able to define a strategy for events related to the system and, then, external to the application. Resources for Article: Further resources on this subject: Hacking Android Apps Using the Xposed Framework [article] Speeding up Gradle builds for Android [article] Get your Apps Ready for Android N [article]

0
0
7537

Packt

25 Jul 2016

16 min read

WebRTC in FreeSWITCH

Packt

25 Jul 2016

16 min read

In this article by Anthony Minessale and Giovanni Maruzzelli, authors of Mastering FreeSWITCH, we will cover the following topics: What WebRTC is and how it works Encryption and NAT traversing (STUN, TURN, etc) Signaling and media Interconnection with PSTN and SIP networks FreeSWITCH as a WebRTC server, gateway, and application server SIP signaling clients with JavaScript (SIP.js) Verto signaling clients with JavaScript (mod_verto, verto.js) (For more resources related to this topic, see here.) WebRTC Finally something new! How refreshing it is to be learning and experimenting again, especially if you're an old hand! After at least ten years of linear evolution, here we are with a quantum leap, the black swan that truly disrupts the communication sector. Browsers are already out there, waiting With an installed base of hundreds of millions, and soon to be in the billions ballpark, browsers (both on PCs and on smart phones) are now complete communication terminals, audio/video endpoints that do not need any additional software, plugins, hardware, or whatever. Browsers now incorporate, per default and in a standard way, all the software needed to interact with loudspeakers, microphones, headsets, cameras, screens, etc. Browsers are the new endpoints, the CPEs, the phones. They have an API, they're updated automatically, and are compatible with your system. You don't have to procure, configure, support, or upgrade them. They're ready for your new service; they just work, and are waiting for your business. Web Real-Time Communication is coming There are two completely separated flows in communication: Signaling and media. Signaling is a flow of information that defines who is calling whom, taking what paths, and which technology is used to transmit which content. Media is the actual digitized content of the communication, for example, audio, video, screen-sharing, etc. Media and signaling often take completely unrelated paths to go from caller to callee, for example, their IP packets traverse different gateways and routers. Also, the two flows are managed by separate software (or by different parts of the same application) using different protocols. WebRTC defines how a browser accesses its own media capture, how it sends and receives media from a peer through the network and how it renders the media stream that it receives. It represents this using the same Session Description Protocol (SDP) as SIP does. So, WebRTC is all about media, and doesn't prescribe a signaling system. This is a design decision, embedded in the standard definition. Popular signaling systems include SIP, XMPP, and proprietary or custom protocols. Also, WebRTC is all about encryption. All WebRTC media streams are mandatorily encrypted. Chrome, Firefox, and Opera (together they account for more than 70 percent of the browsers in use) already implement the standard; Edge is announcing the first steps in supporting WebRTC basic features, while only Safari is still holding its cards (Skype and FaceTime on WebRTC with proprietary signaling? Wink wink). Under the hood More or less, WebRTC works like this: Browser connects to a web server and loads a webpage with some JavaScript in it JavaScript in the webpage takes control of browser's media interfaces (microphone, camera, speakers, and so on), resulting in an API media object The WebRTC Api Media object will contain the capabilities of all devices and codecs available, for example, definition, sample rate, and so on, and it will permit the user to choose their own capabilities preferences (for example, use QVGA video to minimize CPU and bandwidth) Webpage will interface with browser's user, getting some input for signing in the webserver's communication service (if any) JavaScript will use whatever signaling method (SIP, XMPP, proprietary, custom) over encrypted secure websocket (wss://) for signing in the communication service, finding peers, originating and receiving calls Once signed up in the service, a call can be made and received. Signaling will give the protocol address of the peer (for example, sip:gmaruzz@opentelecom.it) These points are represented in the following image: Now is the moment to find out actual IP addresses. JavaScript will generate a WebRTC API object for finding its own IP addresses, transports and ports (ICE candidates) to be offered to peer for exchanging media (JavaScript WebRTC API will use ICE, STUN, TURN, and will send to peer its own local LAN address, its own public IP address, and maybe the IP address of a Turn server it can use) Then, WebRTC Net API will exchange ICE candidates with the peer, until they both find the most "rational" triplets of IP address, port and transport (udp, dtls, and so on), for each stream (for example, audio, video, screen share, and so on) Once they get the best addresses, the signaling will establish the call. These points are represented in the following image: Once signaling communication with the peer is established, media capabilities are exchanged in SDP format (exactly as in SIP), and the two peers agree on media formats (sample rates, codecs, and so on) When media formats are agreed, JavaScript WebRTC Transport API will use secure (encrypted) websockets (wss://) as transport for media and data JavaScript WebRTC Media API will be used to render the media streams received (for example, render video, play sound, capture microphone, and so on) Additionally or in alternative to media, peers can establish one or more data channels, through which they bidirectionally exchange raw or structured data (file transfers, augmented reality, stock tickers, and so on) At hangup, signaling will tear down the call, and JavaScript WebRTC Media API will be used to shut down streams and renderings These points are represented in the following image: This is a high level, but complete, view of how a WebRTC system works. Encryption – security Please note that in normal operation everything is encrypted, uses real PKI certificates from real Certification Authorities, actual DNS names, SSL, TLS, HTTPS, WSS, DTLS-SRTP. This is how it is supposed to work. In WebRTC, security is not an afterthought: It is mandatory. To make signaling work without encryption (for example, for debugging signaling protocols) is not so easy, but it is possible. Browsers will often raise security exceptions, and will ask for permission each time they access a camera or microphone. Some hiccups will happen, but it is doable. Signaling is not part of WebRTC standard, as you know. On the contrary, it is not possible to have the media or data streams to leave the browser in the clear, without encryption. The use of plain RTP to transmit media is explicitly forbidden by the standard. Media is transmitted by SRTP (Secure RTP), where encryption keys are pre-exchanged via DTLS (Datagram Transport Layer Security, a version of TLS for Datagrams), basically a secure version of UDP. Beyond peer to peer – WebRTC to communication networks and services WebRTC is a technique for browsers to send media to each other via Internet, peer to peer, perhaps with the help of a relay server (TURN), if they can't reach each other directly. That's it. No directories, no means to find another person, and also no way to "call" that person if we know "where" to call her. No way to transfer calls, to react to a busy user or to a user that does not pickup, and so on. Let's say WebRTC is a half-built phone: It has the handset, complete with working microphone and speaker, from which it comes out, the wiring left loose. You can cross join that wiring with the wiring of another half-built phone, and they can talk to each other. Then, if you want to talk to another device, you must find it and then join the wires anew. No dial pad, no Telecom Central Office, no interconnection between Local Carriers, and with International Carriers. No PBX. No way to call your grandma, and no possibilities to navigate the IVR at Federal Express' Customer Care. We need to integrate the media capabilities and the ubiquity of WebRTC with the world of telecommunication services that constitute the planet's nervous system. Enter the "WebRTC Gateway" and the "WebRTC Application Server"; in our case both are embodied by FreeSWITCH WebRTC gateways and application servers The problem to be solved is: We can implement some kind of signaling plane, even implement a complete SIP signaling stack in JavaScript (there are some very good ones in open source, we'll see later), but then both at the network and at the media plane, WebRTC is only "kind of" compatible with the existing telecommunication world; it uses techniques and concepts that are "similar", and protocols that are mostly an "evolution " of those implemented in usual Voice over IP. At the network plane, WebRTC uses ICE protocol to traverse NAT via STUN and TURN servers. ICE has been developed as Internet standard to be the ultimate tool to solve all NAT problems, but has not yet been implemented in either telco infrastructure, nor in most VoIP clients. Also, ICE candidates (the various different addresses the browser thinks they would be reachable at) need to be passed in SDP and negotiated between peers, in the same way codecs are negotiated. Being able to pass through corporate firewalls (UDP blocked, TCP open only on ports 80 and 443, and perhaps through protocol-aware proxies) is an absolute necessity for serious WebRTC deployment. At media plane, WebRTC specific codecs (V8 for video and Opus for audio) are incompatible with the telco world, with audio G711 as the only common denominator. Worst yet, all media are encrypted as SRTP with DTLS key exchange, and that's unheard of in today's telco infrastructure. So, we need to create the signaling plane, and then convert the network transport, convert the codecs, manage the ICE candidates selection in SDP, and allow access to the wealth of ready-made services (PSTN calls, IVRs, PBXs, conference rooms, etc), and then complement the legacy services with special features and new interconnected services enabled by the unique capabilities of WebRTC endpoints. Yeah, that's a job for FreeSWITCH. Which architecture? Legacy on the Web, or Web on the Telco? Real-time communication via the Web: From the building blocks we just saw, we can implement it in many ways. We have one degree of freedom: Signaling. I mean, media will be anyway agreed about via SDP, transmitted via websockets as SRTP packets, and encrypted via DTLS key exchange. We still have the task to choose how we will find the peer to exchange media with. So, this is an exercise in directory, location, registration, routing, presence, status, etc. You get the idea. So, at the end of the day you need to come out with a JavaScript library to implement your signaling on the browsers, commanding their underlying mechanisms (Comet, Websockets, WebRTC Data Channel) to find your beloved communication peer. Actually it boils down to different possibilities: SIP XMPP (eg: jabber) In-house signaling implementation VERTO (open source) SIP and XMPP make today's world spin around. SIP is mostly known for carrying the majority of telephone and VoIP signaling traffic. The biggest implementations of instant messaging and chatting are based on XMPP. And there is more: Those two signaling protocols are often used together, although each one of them has extensions that provide the other one's functionality. Both SIP and XMPP have been designed to be expandable and modular, and SIP particularly is an abstract protocol, for the management of "sessions" (where a "session" can be whatever has a beginning and an end in time, as a voice or video call, a screen share, a whiteboard, a collaboration platform, a payment, a message, and so on). Both have robust JavaScript implementations available (for SIP check SIP.js, JsSIP, SIPML, while for XMPP check Strophe, stanza.io, jingle.js). If your company has considerable investments and/or expertise in those protocols, then it makes sense to expand their usage on the web too. If you're running Skype, or similar services, you may find it an attractive option to maintain your proprietary, closed-signaling protocol and implement it in JavaScript, so you can expand your service reach to browsers and exploit that common transport and media technologies. VERTO is our open source signaling proposal, designed from the ground up to be familiar to Web application developers, and allowing for a high degree of integration between FreeSWITCH-provided services and browsers. It is implemented on the FreeSWITCH side by a module (mod_verto) that talks JSON with the JavaScript library (verto.js) on the browser side. FreeSWITCH accommodates them ALL FreeSWITCH implements all of WebRTC low-level protocols, codecs and requirements. It's got encryption, SRTP, DTLS, RTP, websocket and secure websocket transports (ws:// and wss://). Having got it all, it is able to serve SIP endpoints over WebRTC via mod_sofia (they'll be just other SIP phones, exactly like the rest of soft and hard SIP phones), and it interacts with XMPP via mod_jingle. Crucially, FreeSWITCH has been designed since its inception to be able to manage and message high-definition media, both audio and video. Support for OPUS audio codec (8 up to 48 khz, enough for actual audio-cd quality) started years ago as a pioneering feature, and has evolved over the years to be so robust and self-healing as to sustain a loss of more than 40% (yep, as in FORTY PERCENT) packets and maintain understandability. WebRTC's V8 video codec is routinely carrying our mixed video conferences in FullHD (as in 1920x1080 pixel), and we're looking forward to investing in fiber and in some facial cream to look good in 4K. That's why FreeSWITCH can be the pivot of your next big WebRTC project: its architecture was designed from the start to be a multimedia powerhouse. There is lot of experience out there using FreeSWITCH in expanding the reach of existing SIP services having the browsers acting as SIP phones via JavaScript libraries, without modifying in any way the service logic and implementation. You just add SIP extensions that happen to be browsers. For the remainder of this article we'll write about VERTO, a FreeSWITCH proposal especially dedicated to Web development. What is Verto (module and jslib)? Verto is a FreeSWITCH module (mod_verto) that allows for JSON interaction with FreeSWITCH, via secure websockets (wss). All the power and complexity of FreeSWITCH can be harnessed via Verto: Session management, call control, text messaging, and user data exchange and synchronization. Take a note for yourself: "User data exchange and synchronization". We'll be back to this later. Verto is like Event Socket Layer (ESL) on steroids: Anything you can do in ESL (subscribe, send and receive messages in FS core message pumps/queues) you can do in Verto, but Verto is actually much more and can do much more. Verto is also made for high-level control of WebRTC! Verto has an accompanying JavaScript library, verto.js. Using verto.js a web developer can videoconference and enable a website and/or add a collaboration platform to a CRM system in few lines of code. And in a few lines of a code that he understands, in a logic that's familiar to web developers, without forcing references to foreign knowledge domains like SIP. Also, Verto allows for the simplest way to extend your existing SIP services to WebRTC browsers. The added benefit of "user data exchange and synchronization" (see, I'm back to it) is not to be taken lightly: You can create data structures (for example, in JSON) and have them synchronized on server and all clients, with each modification made by the client or server to be automatically, immediately and transparently reflected on all other clients. Imagine a dynamic list of conference participants, or a chat, or a stock ticker, or a multiuser ping pong game, and so on. Configure mod_verto Mod_verto is installed by default by standard FreeSWITCH implementation. Let's have a look at its configuration file, verto.conf.xml. The most important parameter here, and the only one I had to modify from the stock configuration file, is ext-rtp-ip. If your server is behind a NAT (that is, it sits on a private network and exchanges packets with the public internet via some sort of port forwarding by a router or firewall), you must set this parameter to the public IP address the clients are reaching for. Other very important parameters are the codec strings. Those two parameters determine the absolute string that will be used in SDP media negotiation. The list in the string will represent all the media formats to be proposed and accepted. WebRTC has mandatory (so, assured) support for vp8 video codec, while mandatory audio codecs are opus and pcmu/pcma (eg, g711). Pcmu and pcma are much less CPU hungry than opus. So, if you are willing to set for less quality (g711 is "old PSTN" audio quality), you can use "pcmu,pcma,vp8" as your strings, and have both clients and server use far less CPU power for audio processing. This can make a real difference and very much sense in certain setups, for example, if you must cope with low-power devices. Also, if you route/bridge calls to/from PSTN, they will have no use for opus high definition audio; much better to directly offer the original g711 stream than decode/recode it in opus. Test with Communicator Once configured, you want to test your mod_verto install. What better moment than now to get to know the awesomeness of Verto Communicator, a JavaScript videoconference and collaboration advanced client, developed by Italo Rossi, Jonatas Oliveira and Stefan Yohansson from Brazil, Joao Mesquita from Argentina, and our core devs Ken Rice and Brian West from Tennessee and Oklahoma? If it's not already done, copy Verto Communicator distribution directory (/usr/src/freeswitch.git/html5/verto/verto_communicator/dist/) into a directory served by your web server in SSL (be sure you got all the SSL certificates right). To see it in all its splendor, be sure to call from two different clients, one as simple participant, the other as moderator, and you'll be presented with controls to manage the conference layout, for giving floor, for screen sharing, for creating banners with name and title for each participant, for real-time chatting, and much more. It is simply astonishing what can be done with JavaScript and mod_verto. Summary In this article we delved in WerbRTC design, what infrastructure it requires, in what is similar and in what is different from known VoIP. We understood that WebRTC is only about media, and leave the signaling to the implementor. Also, we get the specific of WebRTC, its way to traverse NAT, its omnipresent encryption, its peer to peer nature. We witnessed going beyond peer to peer, connecting with the telecommunication world of services needs gateways that do transport, protocol and media translations. FreeSWITCH is the perfect fit as WebRTC server, WebRTC gateway, and also as application server. And then we saw how to implement Verto, a signaling born on WebRTC, a JSON web protocol designed to exploit the additional features of WerbRTC and of FreeSWITCH, like real time data structure synchronization, session rehydration, event systems, and so on. Resources for Article: Further resources on this subject: Configuring FreeSWITCH for WebRTC [article] Architecture of FreeSWITCH [article] FreeSWITCH 1.0.6: SIP and the User Directory [article]

0
0
20893

How-To Tutorials

article-image-visualizing-time-spent-typing-slack

Bradley Cicenas

25 Jul 2016

4 min read

Visualizing Time Spent Typing in Slack

Bradley Cicenas

25 Jul 2016

4 min read

Slacks massive popularity as a team messaging platform has brought up some age-old questions about productivity in the workplace. Does ease of communication really enable us to get more done day-to-day? Or is it just another distraction in the sea of our notification panel? Using the Slack RTM(Real-Time Messaging) API, we can follow just how much of our day we spend collaborating, making business-critical decisions, and sharing cat GIFs. A word on the Real-Time Messaging API Much of Slack’s success can be attributed the plethora of bots, integrations, and apps available for the platform. While many are built on the robust Web API, the Real-Time Messaging API provides a stream comprised of over 65 different events as they happen, making it an ideal choice for analyzing your own messaging habits. Events types include file uploads, emoji usage, user status, joining and leaving a channel, and many more. Since it's difficult to gauge how long we spend reading or thinking about conversations in Slack, we'll use a metric we do know with a bit of certainty—time spent typing. Fortunately, this is also a specific event type broadcast from the RTM API: user_typing. Unlike most web APIs, connections to the RTM API are made over a persistent websocket. We'll use the SlackSocket Python library to listen in on events as they come in. Recording events To start, we'll need to gather and record event data across a period of time. Creating a SlackSocket object filtered by event type is fairly straightforward: fromslacksocketimportSlackSocket slack=SlackSocket('<slack-token>', event_filters=['user_typing']) Since we're only concerned with following a single type of event, an event_filter is added so that we won't have to read and filter every incoming message in our code. According to the documentation, a user_typing event is sent: on every key press in the chat input unless one has been sent in the last three seconds For the sake of our analysis, we'll assume that each of these events accounts for three seconds of a user’s time. importos fromdatetimeimportdatetime for event inslack.events(): now=datetime.now().timestamp() # get the current epoch timestamp withopen('typing.csv', 'a') as of: of.write('%s,%s'% (now, event.event['user'])) Our typing will be logged in CSV format with a timestamp and the corresponding user that triggered the event. Plotting with matplotlib After we've collected a sufficient amount of data(a day in this case) on our typing events, we can plot it out in a separate script using matplotlib. We'll read in all of the data, filtering for our user: importos fromdatetimeimportdatetime importmatplotlib.pyplotasplt withopen('typing.log') as of: data= [ l.strip('n').split(',') for l inof.readlines() ] x = [] y = [] forts, user in data: if user =='bradley': x.append(datetime.fromtimestamp(float(ts))) # convert epoch timestamp to datetime object y.append(3) # seconds of typing Epoch timestamps are converted back into datetime objects to ensure that matplotlib can display them correctly along the x-axis. Create the plot and export as a PNG: plt.plot(x,y) plt.gcf().autofmt_xdate() # make the x-labels nicer for timestamps plt.savefig('typing.png') Results: Not a particularly eventful morning(at least until I'd had my coffee), but enough to infer that I'm rarely spending more than five minutes an hour here in active discussion. Another data point missing from our observation is the number of messages in comparison to the time spent typing. If a message was rewritten or partially written and retracted, this could account for quite a bit of typing time without producing much in terms of message content. A playground for analytics There's quite a bit of fun and insight to be had watching just this single user_typing event. Likewise, tracking any number of the 65+ other events broadcast by Slack’s RTM API works well to create an interesting and multi-layered dataset ripe for analysis. The code for SlackSocket is available on GitHub and, as always, we welcome any contributions or feature requests! About the author Bradley Cicenas is a New York City-based infrastructure engineer with an affinity for microservices, systems design, data science, and stoops.

0
0
2087

How-To Tutorials

article-image-detecting-and-protecting-against-your-enemies

Packt

22 Jul 2016

9 min read

Detecting and Protecting against Your Enemies

Packt

22 Jul 2016

9 min read

0
0
7422

article-image-debugging-your-net-application

Packt

21 Jul 2016

13 min read

Debugging Your .NET Application

Packt

21 Jul 2016

13 min read

0
0
15849

Packt

19 Jul 2016

7 min read

Managing EAP in Domain Mode

Packt

19 Jul 2016

7 min read

0
0
16460

Packt

18 Jul 2016

30 min read

Reactive Programming with C#

Packt

18 Jul 2016

30 min read

0
0
17281

article-image-getting-started-packages-r

Joel Carlson

18 Jul 2016

6 min read

Getting Started with Packages in R

Joel Carlson

18 Jul 2016

6 min read

R is a powerful programming language for loading, manipulating, transforming, and visualizing data. The language is made more powerful by its extensibility in conjunction with the efforts of a highly active open source community. This community is constantly contributing to the language in the form of packages, which are, at their core, sets of thematically linked functions. By leveraging the work that has been put in to the creation of useful open source packages, an R user can substantially improve both the readability and efficiency of their code. In this post, you will learn how to install new packages to extend the functionality of R and how to load those packages into your session. We will also explore some of the most useful packages that have been contributed by the R community! Installing Packages There are a number of places where R packages can be stored, but the three most popular locations are CRAN, Bioconductor, and GitHub. CRAN The Comprehensive R Archive Network is the home of R. At the time of this writing, there are over 8,000 packages hosted on CRAN, all of which are free to download and use. If you are looking to get started with using R in your field but don't know exactly where to start, the CRAN task view for your field or area of interest is likely a good place to start. There you will find listings of relevant packages, along with short descriptions and links to source code. Let's say you've entered the "Reproducible Research" task view and have decided that the package named knitr sounds useful. To install knitr from CRAN, you type this in your R console: install.packages("knitr") Bioconductor Bioconductor is home to over 1,000 packages for R, with a focus on packages that can be used for bioinformatics research. One of the main differences between Bioconductor and CRAN is that Bioconductor has stricter guidelines for accepting packages than CRAN. After finding a package on Bioconductor, such as EBImage, install it by running these commands: source("https://bioconductor.org/biocLite.R") biocLite("EBImage") It is possible to install from Bioconductor using install.packages, but this is not recommended for reasons discussed here. GitHub GitHub is a space where you can post the source code of your work to keep it under version control and also to encourage and facilitate collaboration. Often, GitHub is where the truly bleeding-edge packages can be found, and where package updates are put first. Many of the packages that can be found on CRAN have a development version on GitHub, occasionally with features absent from the CRAN version. As you browse GitHub, you will likely find some packages that will never be put on CRAN or Bioconductor. For this reason, caution should be exercised when using packages sourced from GitHub. Should you find a package on GitHub and wish to install it, you must first download the package devtools from CRAN. You then have access to the install_github() function, where the argument is the name of the developer, followed by a slash, and then the name of the package: install.packages("devtools") # Install swirl! See: https://github.com/swirldev/swirl devtools::install_github("swirldev/swirl") Where the syntax devtools::xxxx() simply means "Use the xxxx function from the devtools package ". You could just have easily called library(devtools) after installing and then simply typed install_github(). The devtools package also includes a number of different methods for installing packages that are stored locally, on bitbucket, in an SVN repository. Try typing ??devtools::install_ to see a full list. Some Popular Packages Now that you know the basic commands for installing packages, let's take a very short look at some of the more popular and useful packages. Visualizing data with ggplot2 ggplot2 is a package that is used to visualize data. It provides a method of chart-building that is intuitive (based on The Grammar of Graphics) and results in aesthetically pleasing graphics. Here is an example of a graphic produced using ggplot2: install.packages("ggplot2") # Install from CRAN library(ggplot2) # Load ggplot2 data(diamonds) # Load diamonds data set # Create plot with carat on x axis, price on y, # and color based on quality of cut ggplot(data=diamonds, aes(x=carat, y=price, col=cut)) + geom_point(alpha=0.5) # Use points (dots) to represent data Manipulating data with dplyr dplyr presents a number of verbs used for manipulating data (select, filter, mutate, arrange, summarize, and so on), each of which are common tasks when working with data. To see how dplyr can simplify your workflow, let's compare the base R versus the dplyr code used to subset the diamonds data into only those gems with Ideal cut type and greater than 2 carats: install.packages("dplyr") # Install dplyr from CRAN library(dplyr) # Load dplyr BaseR <- diamonds[which(diamonds$cut == "Ideal" & diamonds$carat > 2),] # vs: Dplyr <- filter(diamonds, cut == "Ideal" & carat > 2) Clearly the dplyr version is more succinct, more readable, and, most importantly, easier to write. Machine learning with caret The caret package is a collection of functions that unify the syntax used by many of the most popular machine learning packages implemented in R. caret will allow you to quickly prepare your data, create predictive models, tune the model parameters, and interpret the results. Here is a simple working example of training and tuning a k-nearest neighbors model with caret to predict the price of a diamond based on cut, color, and clarity: install.packages("caret") library(caret) # Split data into training and testing sets inTrain <- createDataPartition(diamonds$price, p=0.01, list=FALSE) training <- diamonds[inTrain,] testing <- diamonds[-inTrain,] knn_model <- train(price ~ cut + color + clarity, data=training, method="knn") plot(knn_model) You can see that increasing the number of neighbors in the model increases the accuracy (decreases the RMSE, a method of measuring the average distance between predictions and data). Summary In this post, you learned how to install and load packages from three different major sources: CRAN, Bioconductor, and GitHub. You also took a brief look at three popular packages: ggplot2 for visualization, dplyr for manipulation, and caret for machine learning. About the author Joel Carlson is a recent MSc graduate from Seoul National University, and current Data Science Fellow at Galvanize in San Francisco. He has contributed two R packages to CRAN (radiomics and RImagePalette). You can learn more or contact him at his personal website.

0
0
2187

How-To Tutorials

article-image-overview-certificate-management

Packt

18 Jul 2016

24 min read

Overview of Certificate Management

Packt

18 Jul 2016

24 min read

In this article by David Steadman and Jeff Ingalls, the authors of Microsoft Identity Manager 2016 Handbook, we will look at certificate management in brief. Microsoft Identity Management (MIM)—certificate management (CM)—is deemed the outcast in many discussions. We are here to tell you that this is not the case. We see many scenarios where CM makes the management of user-based certificates possible and improved. If you are currently using FIM certificate management or considering a new certificate management deployment with MIM, we think you will find that CM is a component to consider. CM is not a requirement for using smart cards, but it adds a lot of functionality and security to the process of managing the complete life cycle of your smart cards and software-based certificates in a single forest or multiforest scenario. In this article, we will look at the following topics: What is CM? Certificate management components Certificate management agents The certificate management permission model (For more resources related to this topic, see here.) What is certificate management? Certificate management extends MIM functionality by adding management policy to a driven workflow that enables the complete life cycle of initial enrollment, duplication, and the revocation of user-based certificates. Some smart card features include offline unblocking, duplicating cards, and recovering a certificate from a lost card. The concept of this policy is driven by a profile template within the CM application. Profile templates are stored in Active Directory, which means the application already has a built-in redundancy. CM is based on the idea that the product will proxy, or be the middle man, to make a request to and get one from CA. CM performs its functions with user agents that encrypt and decrypt its communications. When discussing PKI (Public Key Infrastructure) and smart cards, you usually need to have some discussion about the level of assurance you would like for the identities secured by your PKI. For basic insight on PKI and assurance, take a look at http://bit.ly/CorePKI. In typical scenarios, many PKI designers argue that you should use Hardware Security Module (HSM) to secure your PKI in order to get the assurance level to use smart cards. Our personal opinion is that HSMs are great if you need high assurance on your PKI, but smart cards increase your security even if your PKI has medium or low assurance. Using MIM CM with HSM will not be covered in this article, but if you take a look at http://bit.ly/CMandLunSA, you will find some guidelines on how to use MIM CM and HSM Luna SA. The Financial Company has a low-assurance PKI with only one enterprise root CA issuing the certificates. The Financial Company does not use a HSM with their PKI or their MIM CM. If you are running a medium- or high-assurance PKI within your company, policies on how to issue smart cards may differ from the example. More details on PKI design can be found at http://bit.ly/PKIDesign. Certificate management components Before we talk about certificate management, we need to understand the underlying components and architecture: As depicted before, we have several components at play. We will start from the left to the right. From a high level, we have the Enterprise CA. The Enterprise CA can be multiple CAs in the environment. Communication from the CM application server to the CA is over the DCOM/RPC channel. End user communication can be with the CM web page or with a new REST API via a modern client to enable the requesting of smart cards and the management of these cards. From the CM perspective, the two mandatory components are the CM server and the CA modules. Looking at the logical architecture, we have the CA, and underneath this, we have the modules. The policy and exit module, once installed, control the communication and behavior of the CA based on your CM's needs. Moving down the stack, we have Active Directory integration. AD integration is the nuts and bolts of the operation. Integration into AD can be very complex in some environments, so understanding this area and how CM interacts with it is very important. We will cover the permission model later in this article, but it is worth mentioning that most of the configuration is done and stored in AD along with the database. CM uses its own SQL database, and the default name is FIMCertificateManagement. The CM application uses its own dedicated IIS application pool account to gain access to the CM database in order to record transactions on behalf of users. By default, the application pool account is granted the clmApp role during the installation of the database, as shown in the following screenshot: In CM, we have a concept called the profile template. The profile template is stored in the configuration partition of AD, and the security permissions on this container and its contents determine what a user is authorized to see. As depicted in the following screenshot, CM stores the data in the Public Key Services (1) and the Profile Templates container. CM then reads all the stored templates and the permissions to determine what a user has the right to do (2): Profile templates are at the core of the CM logic. The three components comprising profile templates are certificate templates, profile details, and management policies. The first area of the profile template is certificate templates. Certificate templates define the extensions and data point that can be included in the certificate being requested. The next item is profile details, which determines the type of request (either a smart card or a software user-based certificate), where we will generate the certificates (either on the server or on the client side of the operations), and which certificate templates will be included in the request. The final area of a profile template is known as management policies. Management policies are the workflow engine of the process and contain the manager, the subscriber functions, and any data collection items. The e-mail function is initiated here and commonly referred to as the One Time Password (OTP) activity. Note the word "One". A trigger will only happen once here; therefore, multiple alerts using e-mail would have to be engineered through alternate means, such as using the MIM service and expiration activities. The permission model is a bit complex, but you'll soon see the flexibility it provides. Keep in mind that Service Connection Point (SCP) also has permissions applied to it to determine who can log in to the portal and what rights the user has within the portal. SCP is created upon installation during the wizard configuration. You will want to be aware of the SCP location in case you run into configuration issues with administrators not being able to perform particular functions. The SCP location is in the System container, within Microsoft, and within Certificate Lifecycle Manager, as shown here: Typical location CN=Certificate Lifecycle Manager,CN=Microsoft,CN=System,DC=THEFINANCIALCOMPANY,DC=NET Certificate management agents We covered several key components of the profile templates and where some of the permission model is stored. We now need to understand how the separation of duties is defined within the agent role. The permission model provides granular control, which promotes the separation of duties. CM uses six agent accounts, and they can be named to fit your organization's requiremensts. We will walk through the initial setup again later in this article so that you can use our setup or alter it based on your need. The Financial Company only requires the typical setup. We precreated the following accounts for TFC, but the wizard will create them for you if you do not use them. During the installation and configuration of CM, we will use the following accounts: Besides the separation of duty, CM offers enrollment by proxy. Proxy enrollment of a request refers to providing a middle man to provide the end user with a fluid workflow during enrollment. Most of this proxy is accomplished via the agent accounts in one way or another. The first account is MIM CM Agent (MIMCMAgent), which is used by the CM server to encrypt data from the smart card admin PINs to the data collection stored in the database. So, the agent account has an important role to protect data and communication to and from the certificate authorities. The last user agent role CMAgent has is the capability to revoke certificates. The agent certificate thumbprint is very important, and you need to make sure the correct value is updated in the three areas: CM, web.config, and the certificate policy module under the Signing Certificates tab on the CA. We have identified these areas in the following. For web.config: <add key="Clm.SigningCertificate.Hash" value <add key="Clm.Encryption.Certificate.Hash" value <add key="Clm.SmartCard.ExchangeCertificate.Hash" value The Signing Certificates tab is as shown in the following screenshot: Now, when you run through the configuration wizard, these items are already updated, but it is good to know which locations need to be updated if you need to troubleshoot agent issues or even update/renew this certificate. The second account we want to look at is Key Recovery Agent (MIMCMKRAgent); this agent account is needed for CM to recover any archived private keys certificates. Now, let's look at Enrollment Agent (MIMCMEnrollAgent); the main purpose of this agent account is to provide the enrollment of smart cards. Enrollment Agent, as we call it, is responsible for signing all smart card requests before they are submitted to the CA. Typical permission for this account on the CA is read and request. Authorization Agent (MIMCMAuthAgent)—or as some folks call this, the authentication agent—is responsible for determining access rights for all objects from a DACL perspective. When you log in to the CM site, it is the authorization account's job to determine what you have the right to do based on all the core components that ACL has applied. We will go over all the agents accounts and rights needed later in this article during our setup. CA Manager Agent (MIMCMManagerAgent) is used to perform core CA functions. More importantly, its job is to issue Certificate Revocation Lists (CRLs). This happens when a smart card or certificate is retired or revoked. It is up to this account to make sure the CRL is updated with this critical information. We saved the best for last: Web Pool Agent (MIMCMWebAgent). This agent is used to run the CM web application. The agent is the account that contacts the SQL server to record all user and admin transactions. The following is a good depiction of all the accounts together and the high-level functions: The certificate management permission model In CM, we think this part is the most complex because with the implementation, you can be as granular as possible. For this reason, this area is the most difficult to understand. We will uncover the permission model so that we can begin to understand how the permission model works within CM. When looking at CM, you need to formulate the type of management model you will be deploying. What we mean by this is will you have a centralized or delegated model? This plays a key part in deployment planning for CM and the permission you will need to apply. In the centralized model, a specific set of managers are assigned all the rights for the management policy. This includes permissions on the users. Most environments use this method as it is less complex for environments. Now, within this model, we have manager-initiated permission, and this is where CM permissions are assigned to groups containing the subscribers. Subscribers are the actual users doing the enrollment or participating in the workflow. This is the model that The Financial Company will use in its configuration. The delegated model is created by updating two flags in web.config called clm.RequestSecurity.Flags and clm.RequestSecurity.Groups. These two flags work hand in hand as if you have UseGroups, then it will evaluate all the groups within the forests to include universal/global security. Now, if you use UseGroups and define clm.RequestSecurity.Groups, then it will only look for these specific groups and evaluate via the Authorization Agent . The user will tell the Authorization Agent to only read the permission on the user and ignore any group membership permissions: When we continue to look at the permission, there are five locations that permissions can be applied in. In the preceding figure is an outline of these locations, but we will go in more depth in the subsections in a bit. The basis of the figure is to understand the location and what permission can be applied. The following are the areas and the permissions that can be set: Service Connection Point: Extended Permissions Users or Groups: Extended Permissions Profile Template Objects: Container: Read or Write Template Object: Read/Write or Enroll Certificate Template: Read or Enroll CM Management Policy within the Web application: We have multiple options based on the need, such as Initiate Request Now, let's begin to discuss the core areas to understand what they can do. So, The Financial Company can design the enrollment option they want. In the example, we will use the main scenario we encounter, such as the helpdesk, manager, and user-(subscriber) based scenarios. For example, certain functions are delegated to the helpdesk to allow them to assist the user base without giving them full control over the environment (delegated model). Remember this as we look at the five core permission areas. Creating service accounts So far, in our MIM deployment, we have created quite a few service accounts. MIM CM, however, requires that we create a few more. During the configuration wizard, we will get the option of having the wizard create them for us, but we always recommend creating them manually in FIM/MIM CM deployments. One reason is that a few of these need to be assigned some certificates. If we use an HSM, we have to create it manually in order to make sure the certificates are indeed using the HSM. The wizard will ask for six different service accounts (agents), but we actually need seven. In The Financial Company, we created the following seven accounts to be used by FIM/MIM CM: MIMCMAgent MIMCMAuthAgent MIMCMCAManagerAgent MIMCMEnrollAgent MIMCMKRAgent MIMCMWebAgent MIMCMService The last one, MIMCMService, will not be used during the configuration wizard, but it will be used to run the MIM CM Update service. We also created the following security groups to help us out in the scenarios we will go over: MIMCM-Helpdesk: This is the next step in OTP for subscribers MIMCM-Managers: These are the managers of the CM environment MIMCM-Subscribers: This is group of users that will enroll Service Connection Point Service Connection Point (SCP)is located under the Systems folder within Active Directory. This location, as discussed in the earlier parts of the article, defines who functions as the user as it relates to logging in to the web application. As an example, if we just wanted every user to only log in, we would give them read rights. Again, authenticated users, have this by default, but if you only wanted a subset of users to access, you should remove authenticated users and add your group. When you run the configuration wizard, SCP is decided, but the default is the one shown in the following screenshot: If a user is assigned to any of the MIM CM permissions available on SCP, the administrative view of the MIM CM portal will be shown. The MIM CM permissions are defined in a Microsoft TechNet article at http://bit.ly/MIMCMPermission. For your convenience, we have copied parts of the information here: MIM CM Audit: This generates and displays MIM CM policy templates, defines management policies within a profile template, and generates MIM CM reports. MIM CM Enrollment Agent: This performs certificate requests for the user or group on behalf of another user. The issued certificate's subject contains the target user's name and not the requester's name. MIM CM Request Enroll: This initiates, executes, or completes an enrollment request. MIM CM Request Recover: This initiates encryption key recovery from the CA database. MIM CM Request Renew: This initiates, executes, or completes an enrollment request. The renewal request replaces a user's certificate that is near its expiration date with a new certificate that has a new validity period. MIM CM Request Revoke: This revokes a certificate before the expiration of the certificate's validity period. This may be necessary, for example, if a user's computer or smart card is stolen. MIM CM Request Unblock Smart Card: This resets a smart card's user Personal Identification Number (PIN) so that he/she can access the key material on a smart card. The Active Directory extended permissions So, even if you have the SCP defined, we still need to set up the permissions on the user or group of users that we want to manage. As in our helpdesk example, if we want to perform certain functions, the most common one is offline unblock. This would require the MIMCM-HelpDesk group. We will create this group later in this article. It would contain all help desk users then on SCP; we would give them CM Request Unblock Smart Card and CM Enrollment Agent. Then, you need to assign the permission to the extended permission on MIMCM-Subscribers, which contains all the users we plan to manage with the helpdesk and offline unblock: So, as you can see, we are getting into redundant permissions, but depending on the location, it means what the user can do. So, planning of the model is very important. Also, it is important to document what you have as with some slight tweak, things can and will break. The certificate templates permission In order for any of this to be possible, we still need to give permission to the manager of the user to enroll or read the certificate template, as this will be added to the profile template. For anyone to manage this certificate, everyone will need read and enroll permissions. This is pretty basic, but that is it, as shown in the following screenshot: The profile template permission The profile template determines what a user can read within the template. To get to the profile template, we need to use Active Directory sites and services to manage profile templates. We need to activate the services node as this is not shown by default, and to do this, we will click on View | Show Services Node: As an example if you want a user to enroll in the cert, he/she would need CM Enroll on the profile template, as shown in the following screenshot: Now, this is for users, but let's say you want to delegate the creation of profile templates. For this, all you need to do is give the MIMCM-Managers delegate the right to create all child items on the profile template container, as follows: The management policy permission For the management policy, we will break it down into two sections: a software-based policy and a smart card management policy. As we have different capabilities within CM based on the type, by default, CM comes with two sample policies (take a look at the following screenshot), which we use for duplication to create a new one. When configuring, it is good to know that you cannot combine software and smart card-based certificates in a policy: The software management policy The software-based certificate policy has the following policies available through the CM life cycle: The Duplicate Policy panel creates a duplicate of all the certificates in the current profile. Now, if the first profile is created for the user, all the other profiles created afterwards will be considered duplicate, and the first generated policy will be primary. The Enroll Policy panel defines the initial enrollment steps for certificates such as initiate enroll request and data collection during enroll initiation. The Online Update Policy panel is part of the automatic policy function when key items in the policy change. This includes certificates about to expire, when a certificate is added to the existing profile template or even removed. The Recover Policy panel allows for the recovery of the profile in the event that the user was deleted. This includes the cases where certs are deleted by accident. One thing to point out is if the certificate was a signing cert, the recovery policy would issue a new replacement cert. However, if the cert was used for encryption, you can recover the original using this policy. The Recover On Behalf Policy panel allows managers or helpdesk operations to be recovered on behalf the user in the event that they need any of the certificates. The Renew Policy panel is the workflow that defines the renew setting, such as revocation and who can initiate a request. The Suspend and Reinstate Policy panel enables a temporary revocation of the profile and puts a "certificate hold" status. More information about the CRL status can be found at http://bit.ly/MIMCMCertificateStatus. The Revoke Policy panel maintains the revocation policy and setting around being able to set the revocation reason and delay. Also, it allows the system to push a delta CRL. You also can define the initiators for this policy workflow. The smart card management policy The smart card policy has some similarities to the software-based policy, but it also has a few new workflows to manage the full life cycle of the smart card: The Profile Details panel is by far the most commonly used part in this section of the policy as it defines all the smart card certificates that will be loaded in the policy along with the type of provider. One key item is creating and destroying virtual smart cards. One final key part is diversifying the admin key. This is best practice as this secures the admin PIN using diversification. So, before we continue, we want to go over this setting as we think it is an important topic. Diversifying the admin key is important because each card or batch of cards comes with a default admin key. Smart cards may have several PINs, an admin PIN, a PINunlock key (PUK), and a user PIN. This admin key, as CM refers to it, is also known as the administrator PIN. This PIN differs from the user's PIN. When personalizing the smart card, you configure the admin key, the PUK, and the user's PIN. The admin key and the PUK are used to reset the virtual smart card's PIN. However, you cannot configure both. You must use the PUK to unlock the PIN if you assign one during the virtual smart card's creation. It is important to note that you must use the PUK to reset the PIN if you provide both a PUK and an admin key. During the configuration of the profile template, you will be asked to enter this key as follows: The admin key is typically used by smart card management solutions that enable a challenge response approach to PIN unlocking. The card provides a set of random data that the user reads (after the verification of identity) to the deployment admin. The admin then encrypts the data with the admin key (obtained as mentioned before) and gives the encrypted data back to the user. If the encrypted data matches that produced by the card during verification, the card will allow PIN resetting. As the admin key is never in the hands of anyone other than the deployment administrator, it cannot be intercepted or recorded by any other party (including the employee) and thus has significant security benefits beyond those in using a PUK—an important consideration during the personalization process. When enabled, the admin key is set to a card-unique value when the card is assigned to the user. The option to diversify admin keys with the default initialization provider allows MIM CM to use an algorithm to uniquely generate a new key on the card. The key is encrypted and securely transmitted to the client. It is not stored in the database or anywhere else. MIM CM recalculates the key as needed to manage the card: The CM profile template contains a thumbprint for the certificate to be used in admin key diversification. CM looks in the personal store of the CM agent service account for the private key of the certificate in the profile template. Once located, the private key is used to calculate the admin key for the smart card. The admin key allows CM to manage the smart card (issuing, revoking, retiring, renewing, and so on). Loss of the private key prevents the management of cards diversified using this certificate. More detail on the control can be found at http://bit.ly/MIMCMDiversifyAdminKey. Continuing on, the Disable Policy panel defines the termination of the smart card before expiration, you can define the reason if you choose. Once disabled, it cannot be reused in the environment. The Duplicate Policy panel, similarly to the software-based one, produces a duplicate of all the certificates that will be on the smart card. The Enroll Policy panel, similarly to the software policy, defines who can initiate the workflow and printing options. The Online Update Policy panel, similarly to the software-based cert, allows for the updating of certificates if the profile template is updated. The update is triggered when a renewal happens or, similarly to the software policy, a cert is added or removed. The Offline Unblock Policy panel is the configuration of a process to allow offline unblocking. This is used when a user is not connected to the network. This process only supports Microsoft-based smart cards with challenge questions and answers via, in most cases, the user calling the helpdesk. The Recovery On Behalf Policy panel allows the recovery of certificates for the management or the business to recover if the cert is needed to decrypt information from a user whose contract was terminated or who left the company. The Replace Policy panel is utilized by being able to replace a user's certificate in the event of a them losing their card. If the card they had had a signing cert, then a new signing cert would be issued on this new card. Like with software certs, if the certificate type is encryption, then it would need to be restored on the replace policy. The Renew Policy panel will be used when the profile/certificate is in the renewal period and defines revocation details and options and initiates permission. The Suspend and Reinstate Policy panel is the same as the software-based policy for putting the certificate on hold. The Retire Policy panel is similar to the disable policy, but a key difference is that this policy allows the card to be reused within the environment. The Unblock Policy panel defines the users that can perform an actual unblocking of a smart card. More in-depth detail of these policies can be found at http://bit.ly/MIMCMProfiletempates. Summary In this article, we uncovered the basics of certificate management and the management components that are required to successfully deploy a CM solution. Then, we discussed and outlined, agent accounts and the roles they play. Finally, we looked into the management permission model from the policy template to the permissions and the workflow. Resources for Article: Further resources on this subject: Managing Network Devices [article] Logging and Monitoring [article] Creating Horizon Desktop Pools [article]

0
0
7293

Packt

15 Jul 2016

13 min read

MicroStrategy 10

Packt

15 Jul 2016

13 min read

In this article by Dmitry Anoshin, Himani Rana, and Ning Ma, the authors of the book, Mastering Business Intelligence with MicroStrategy, we are going to talk about MicroStrategy 10 which is one of the leading platforms on the market, can handle all data analytics demands, and offers a powerful solution. We will be discussing the different concepts of MicroStrategy such as its history, deployment, and so on. (For more resources related to this topic, see here.) Meet MicroStrategy 10 MicroStrategy is a market leader in Business Intelligence (BI) products. It has rich functionality in order to meet the requirements of modern businesses. In 2015, MicroStrategy provided a new release of MicroStrategy, version 10. It offers both agility and governance like no other BI product. In addition, it is easy to use and enterprise ready. At the same time, it is great for both IT and business. In other words, MicroStrategy 10 offers an analytics platform that combines an easy and empowering user experience, together with enterprise-grade performance, management, and security capabilities. It is true bimodal BI and moves seamlessly between styles: Data discovery and visualization Enterprise reporting and dashboards In-memory high performance BI Scales from departments to enterprises Administration and security MicroStrategy 10 consists of three main products: MicroStrategy Desktop, MicroStrategy Mobile, and MicroStrategy Web. MicroStrategy Desktop lets users start discovering and visualizing data instantly. It is available for Mac and PC. It allows users to connect, prepare, discover, and visualize data. In addition, we can easily promote to a MicroStrategy Server. Moreover, MicroStrategy Desktop has a brand new HTML5 interface and includes all connection drivers. It allows us to use data blending, data preparation, and data enrichment. Finally, it has powerful advanced analytics and can be integrated with R. To cut a long story short, we want to notice main changes of new BI platform. All developers keep the same functionality, the looks as well as architect the same. All changes are about Web interface and Intelligence Server. Let's look closer at what MicroStrategy 10 can show us. MicroStrategy 10 expands the analytical ecosystem by using third-party toolkits such as: Data visualization libraries: We can easily plug in and use any visualization from the expanding range of Java libraries Statistical toolkits: R, SAS, SPSS, KXEN, and others Geolocation data visualization: Uses mapping capabilities to visualize and interact with location data MicroStrategy 10 has more than 25 new data sources that we can connect to quickly and simply. In addition, it allows us build reports on top of other BI tools, such as SAP Business Objects, Cognos, and Oracle BI. It has a new connector to Hadoop, which uses the native connector. Moreover, it allows us to blend multiple data sources in-memory. We want to notice that MicroStrategy 10 got reach functionality for work with data such as: Streamlined workflows to parse and prepare data Multi-table in-memory support from different sources Automatically parse and prepare data with every refresh 100+ inbuilt functions to profile and clean data Create custom groups on the fly without coding In terms of connection to Hadoop, most BI products use Hive or Impala ODBC drivers in order to use SQL to get data from Hadoop. However, this method is bad in terms of performance. MicroStrategy 10 queries directly against Hadoop. As a result, it is up to 50 times faster than via ODBC. Let's look at some of the main technical changes that have significantly improved MicroStrategy. The platform is now faster than ever before, because it doesn't have a two-billion-row limit on in-memory datasets and allows us to create analytical cubes up to 16 times bigger in size. It publishes cubes dramatically faster. Moreover, MicroStrategy 10 has higher data throughput and cubes can be loaded in parallel 4 times faster with multi-threaded parallel loading. In addition, the in-memory engine allows us to create cubes 80 times larger than before, and we can access data from cubes 50% faster, by using up to 8 parallel threads. Look at the following table, where we compare in-memory cube functionality in version 9 versus version 10: Feature Ver. 9 Ver. 10 Data volume 100 GB ~2TB Number of rows 2 billion 200 billion Load rate 8 GB/hour ~200 GB/hour Data model Star schema Any schema, tabular or multiple sets In order to make the administration of MicroStrategy more effective in the new version, MicroStrategy Operation Manager was released. It gives MicroStrategy administrators powerful development tools to monitor, automate, and control systems. Operations Manager gives us: Centralized management in a web browser Enterprise Manager Console within Tool Triggers and 24/7 alerts System health monitors Server management Multiple environment administration MicroStrategy 10 education and certification MicroStrategy 10 offers new training courses that can be conducted offline in a training center, or online at http://www.microstrategy.com/us/services/education. We believe that certification is a good thing on your journey. The following certifications now exist for version 10: MicroStrategy 10 Certified Associated Analyst MicroStrategy 10 Certified Application Designer MicroStrategy 10 Certified Application Developer MicroStrategy 10 Certified Administrator After passing all of these exams, you will become a MicroStrategy 10 Application Engineer. More details can be found here: http://www.microstrategy.com/Strategy/media/downloads/training-events/MicroStrategy-certification-matrix_v10.pdf. History ofMicroStrategy Let us briefly look at the history of MicroStrategy, which began in 1991: 1991: Released first BI product, which allowed users to create graphical views and analyses of information data 2000: Released MicroStrategy 7 with a web interface 2003: First to release a fully integrated reporting tool, combining list reports, BI-style dashboards, and interface analyses in a single module. 2005: Released MicroStrategy 8, including one-click actions and drag-and-drop dashboard creation 2009: Released MicroStrategy 9, delivering a seamless consolidated path from department to enterprise BI 2010: Unveiled new mobile BI capabilities for iPad and iPhone, and was featured on the iTunes Bestseller List 2011: Released MicroStrategy Cloud, the first SaaS offering from a major BI vendor 2012: Released Visual Data Discovery and groundbreaking new security platform, Usher 2013: Released expanded Analytics Platform and free Analytics Desktop client 2014: Announced availability of MicroStrategy Analytics via Amazon Web Services (AWS) 2015: MicroStrategy 10 was released, the first ever enterprise analytics solution for centralized and decentralized BI DeployingMicroStrategy 10 We know only one way to master MicroStrategy, through practical exercises. Let's start by downloading and deploying MicroStrategy 10.2. Overview of training architecture In order to master MicroStrategy and learn about some BI considerations, we need to download the all-important software, deploy it, and connect to a network. During the preparation of the training environment, we will cover the installation of MicroStrategy on a Linux operating system. This is very good practice, because many people work with Windows and are not familiar with Linux, so this chapter will provide additional knowledge of working with Linux, as well as installing MicroStrategy and a web server. Look at the training architecture: There are three main components: Red Hat Linux 6.4: Used for deploying the web server and Intelligence Server. Windows machine: Uses MicroStrategy Client and Oracle database. Virtual machine with Hadoop: Ready virtual machine with Hadoop, which will connect to MicroStrategy using a brand new connection. In the real world, we should use separate machines for every component, and sometimes several machines in order to run one component. This is called clustering. Let's create a virtual machine. Creating of Red Hat Linux virtual machine Let's create a virtual machine with Red Hat Linux, which will host our Intelligence Server: Go to http://www.redhat.com/ and create an account Go to the software download center: https://access.redhat.com/downloads Download RHEL: https://access.redhat.com/downloads/content/69/ver=/rhel---7/7.2/x86_64/product-software Choose Red Hat Enterprise Linux Server Download Red Hat Enterprise Linux 6.4 x86_64 Choose Binary DVD Now we can create a virtual machine with RHEL 6.4. We have several options in order to choose the software for deploying virtual machine. In our case, we will use a VMware workstation. Before starting to deploy a new VM, we should adjust the default settings, such as increasing RAM and HDD, and adding one more network card in order to connect the external environment with the MicroStrategyClient and sample database. In addition, we should create a new network. When the deployment of the RHEL virtual machine is complete, we should activate a subscription in order to install the required packages. Let us do this with one command in the terminal: # subscription-manager register --username <username> --password <password> --auto-attach Performing prerequisites for MicroStrategy 10 According to the installation and configuration guide, we should deploy all necessary packages. In order to install them, we should execute them under the root: # su # yum install compat-libstdc++-33.i686 # yum install libXp.x86_64 # yum install elfutils-devel.x86_64 # yum install libstdc++-4.4.7-3.el6.i686 # yum install krb5-libs.i686 # yum install nss-pam-ldapd.i686 # yum install ksh.x86_64 The project design process Project design is not just about creating a project in MicroStrategy architect; it involves several steps and thorough analysis, such as how data is stored in the data warehouse, what reports the user wants based on the data, and so on. The following are the steps involved in our project design process: Logical data model design Once the user have business requirements documented, the user must create a fact qualifier matrix to identify the attributes, facts, and hierarchies, which are the building blocks of any logical data model. An example of a fact qualifier is as follows: A logical data model is created based on the source systems and designed before defining a data warehouse. So, it's good for seeing which objects the users want and checking whether the objects are in the source systems. It represents the definition, characteristics, and relationships of the data. This graphical representation of information is easily understandable by business users too. A logical data model graphically represents the following concepts: Attributes: Provides a detailed description of the data Facts: Provide numerical information about the data Hierarchies: Provide relationships between data Data warehouse schema design Physical data warehouse design is based on the logical data model and represents the storage and retrieval of data from the data warehouse. Here, we determine the optimal schema design, which ensures reporting performance and maintenance. The key components of a physical data warehouse schema are columns and tables: Columns: These store attribute and fact data. The following are the three types of columns: ID column: Stores the ID for an attribute Description column: Stores text description of the attribute Fact column: Stores fact data Tables: Physical grouping of related data. Following are the types of tables: Lookup tables: Store information about attributes such as IDs and descriptions Relationship tables: Store information about relationship between two or more attributes Fact tables: Store factual data and the level of aggregation, which is defined based on the attributes of the fact table. They contain base fact columns or derived fact columns: Base fact: Stores the data at the lowest possible level of detail. Aggregate fact: Stores data at a higher or summarized level of detail. Mobile server installation and configuration While mobile client is easy to install, mobile server is not. Here we provide a step-by-step guide on how to install mobile server: Download MicroStrategyMobile.war. Mobile server is packed in a WAR file, just like Operation Manager or Web: Copy MicroStrategyMobile.war from <Microstrategy Installation folder>/Mobile/MobileServer to /usr/local/tomcat7/webapps. Then restart Tomcat, by issuing the ./shutdown.sh and ./startup.sh commands: Connect to the mobile server. Go to http://192.168.81.134:8080/MicroStrategyMobile/servlet/mstrWebAdmin. Then add the server name localhost.localdomain and click connect: Configure mobile server. You can configure (1) Authentication settings for the mobile server application; (2) Privileges and permissions; (3) SSL encryption; (4) Client authentication with a certificate server; (5) Destination folder for the photo uploader widget and signature capture input control. Performing Pareto analysis One good thing about data discovery tools is their agile approach to the data. We can connect any data source and easily slice and dice data. Let's try to use the Pareto principle in order to answer the question: How are sales distributed among the different products? The Pareto principle states that, for many events, roughly 80% of results come from 20% of the causes. For example, 80% of profits come from 20% of the products offered. This type of analysis is very popular in product analytics. In MicroStrategy Desktop, we can use shortcut metrics in order to quickly make complex calculations such as running sums or a percent of the total. Let's build a visualization in order to see the 20% of products that bring us 80% of the money: Choose Combo Chart. Drag and drop Salesamount to the vertical and Englishproductname to the horizontal. Add Orderdate to the filters and restrict to 60 days. Right-click on Sales amountand choose Descending Sort. Right-click on Salesamount | ShortcutMetrics | Percent Running Total. Drag and drop Metric Names to the Color By. Change the color of Salesamount and Percent Running Total. Change the shape of Percent Running Total. As a result, we get this chart: From this chart we can quickly understand our top 20% of products which bring us 80% of revenue. Splunk and MicroStrategy MicroStrategy 10 has announced a new connection to Splunk. I suppose that Splunk is not very popular in the world of Business Intelligence. Most people who have heard about Splunk think that it is just a platform for processing logs. The answers is both true and false. Splunk was derived from the world of spelunking, because searching for root causes in logs is a kind of spelunking without light, and Splunk solves this problem by indexing machine data from a tremendous number of data sources, starting from applications, hardware, sensors, and so on. What is Splunk Splunk's goal is making machine data accessible, usable, and valuable for everyone, and turning machine data into business value. It can: Collect data from anywhere Search and analyze everything Gain real-time Operational Intelligence In the BI world, everyone knows what a data warehouse is. Creating reports from Splunk Now we are ready to build reports using MicroStrategy Desktop and Splunk. Let's do it: Go to MicroStrategy Desktop, click add data, and choose Splunk Create a connection using the existing DNS based on Splunk ODBC: Choose one of tables (Splunk reports): Add other tables as new data sources. Now we can build a dashboard using data from Splunk by dragging and dropping attributes and metrics: Summary In this article we looked at MicroStrategy 10 and its features. We learned about its history and deployment. We also learnt about the project design process, the Pareto analysis and about the connection of Splunk and MicroStrategy. Resources for Article: Further resources on this subject: Stacked Denoising Autoencoders [article] Creating external tables in your Oracle 10g/11g Database [article] Clustering Methods [article]

0
0
4351

article-image-building-line-chart-ggplot2

Joel Carlson

15 Jul 2016

6 min read

Building a Line Chart with ggplot2

Joel Carlson

15 Jul 2016

6 min read

In this blog post, you will follow along to produce a line chart using the ggplot2 package for R. The ggplot2 package is highly customizable and extensible, which provides an intuitive plotting syntax that allows for the creation of an incredibly diverse range of plots. This week, save 50% on some of out best R titles. If one isn't enough, grab any 5 featured products for $50! We're also giving away a free R eBook every week - bookmark this page! Motivating example Before getting started, let’s examine ggplot over the base R plotting functions. In general, the base R plotting system is more verbose and harder to understand and produces plots that are less attractive than their ggplot2 equivalents. To illustrate, let's build a plot using data on the growth of five trees from the “datasets” package. This is just a demonstration, so don't worry too much about the structure of the data or the details of the plotting syntax. Take a look at the following: library(datasets) data("Orange") The goal is to plot the growth of the trees as a line chart where each line corresponds to a different tree over time. Consider the following code to produce this chart using the base R plotting system: # Adapted from: http://www.statmethods.net/graphs/line.html ntrees <- length(unique(Orange$Tree)) # Get the range for the x and y axis xrange <- range(Orange$age) yrange <- range(Orange$circumference) # Set up the plot plot(xrange, yrange, type="n", xlab="Age (days)", ylab="Circumference (mm)" ) colors <- rainbow(ntrees) # Add lines for (i in 1:ntrees) { tree <- subset(Orange, Tree==i) lines(tree$age, tree$circumference, col=colors[i]) } # Add title title("Tree Growth (Base R)") # Add legend legend(xrange[1], yrange[2], 1:ntrees, cex=0.8, col=colors, lty=1, title="Tree") The code is verbose, difficult to extend or change (for example, if you want to change the lines to points, you would need to change a number of variables), and the chart produced is not particularly attractive. The following is an equivalent chart using ggplot2: Using ggplot2, you can produce this plot with fewer lines of code that are both more readable and extensible. You will also avoid the ugly "for" loop used to produce the lines. By the end of this post, you will have built this plot from the ground up using ggplot2! Installation and preparation For this post, you will first need to make sure that ggplot2 is installed via the following command: install.packages("ggplot2") Once the package is installed, load it into the session using: library(ggplot2) Data The dataset used in this post is already in the "tidy data" format, as described here. If your data is not in the tidy format, consider using the dplyr and/or tidyr packages to shape it into the correct format. You are using a very small dataset called Orange, which as the preceding plots describe, contains the growth patterns of five trees over several years. The data consist of 35 rows and three columns and is found in the datasets package. The structure of the data is as follows: str(Orange) 'data.frame': 35 obs. of 3 variables: $ Tree : Ord.factor w/ 5 levels "1"<"2"<"3"<"4"<..: 1 1 1 1 1 1 1 2 2 2 ... $ age : num 118 484 664 1004 1231 ... $ circumference: num 30 58 87 115 120 142 145 33 69 111 ... Building plots You will now begin building up the previous plot using principles described in "The Grammar of Graphics", upon which ggplot2 is based. To build a plot using ggplot, think about it in terms of aesthetic mappings and geometries, which are used to create layers that make up the plot. Calling ggplot() without any aesthetics or geometries defined provides an empty canvas. Aesthetics and geometries Aesthetics are the visual properties (for example, size, shape, color, fill, and so on) of the geometries present in the graph. In this context, a geometry refers to objects that directly represent data points (that is, rows in a data frame), such as dots, lines, or bars. In ggplot2, create aesthetics using the aes() function. Inside aes(), you define which variables will map to aesthetics in the plot. Here, we wish to map the "age" variable to the x-axis aesthetic, the "circumference" variable to the y-axis aesthetic, and the "Tree" factor variable to the color aesthetic, with each factor level being represented by a different color, as follows: p <- ggplot(data = Orange, aes(x=age, y=circumference, col=Tree)) If you run the code after defining only the aesthetics, you will see that there is nothing on the plot except the axes: This is because although you have mapped aesthetics to data, you have yet to represent these mappings with geometries (or geoms). To create this representation, you add a layer on the plot using a call to the line geometry and the geom_line() function, as follows: p <-p +geom_line() p Take a look at the full listing of geoms that can be used here. Polishing the plot With the structure of the plot in place, polish the plot by: Editing the axis labels Adding a title Moving the legend Axis labels and the title You can create/change the axis labels of the plot using labs(), as follows: p <-p +labs(x="Age (days)", y="Circumference (mm)") You can also add a title using ggtitle(), as follows: p <- p + ggtitle("Tree Growth (ggplot2)") p Moving the legend To move the legend, use the theme() function and change the legend.justification and legend.position variables via the following code: p <- p + theme(legend.justification=c(0,1), legend.position=c(0,1)) p The justification for the legend is laid out as a grid, where (0,0) is lower-left and (1,1) is upper-right. The legend.position parameter can also take values such as "top", "bottom", "left", "right", or "none" (which removes the legend entirely). The theme() function is very powerful and allows very fine-grained control over the plot. You can find a listing of all the available parameters in the documentation here. Final words The plot is now identical to the plot used to motivate the article! The final code is as follows: ggplot(data=Orange, aes(x=age, y=circumference, col=Tree)) + geom_line() + labs(x="Age (days)", y="Circumference (mm)") + ggtitle("Tree Growth (ggplot2)") + theme(legend.justification=c(0,1), legend.position=c(0,1)) Clearly, the code is more readable, and I think you would agree that the plot is more attractive than the equivalent plot using base R. Good luck and happy plotting! About the author Joel Carlson is a recent MSc graduate from Seoul National University and current Data Science Fellow at Galvanize in San Francisco. He has contributed two R packages in CRAN (radiomics and RImagePalette). You can learn more about him or get in touch at his personal website.

0
0
11497

How-To Tutorials

article-image-exploring-shaders-and-effects

Packt

14 Jul 2016

5 min read

Exploring Shaders and Effects

Packt

14 Jul 2016

5 min read

In this article by Jamie Dean, the author of the book Mastering Unity Shaders and Effects, we will use transparent shaders and atmospheric effects to present the volatile conditions of the planet, Ridley VI, from the surface. In this article, we will cover the following topics: Exploring the difference between cutout, transparent, and fade Rendering Modes Implementing and adjusting Unity's fog effect in the scene (For more resources related to this topic, see here.) Creating the dust cloud material The surface of Ridley VI is made inhospitable by dangerous nitrogen storms. In our game scene, these are represented by dust cloud planes situated near the surface. We need to set up the materials for these clouds with the following steps: In the Project panel, click on the PACKT_Materials folder to view its contents in the Assets panel. In the Assets panel, right-click on an empty area and choose Create| Material. Rename the material dustCloud. In the Hierarchy panel, click to select the dustcloud object. The object's properties will appear in the Inspector. Drag the dustCloud material from the Assets panel onto the Materials field in the Mesh Renderer property visible in the Inspector. Next, we will set the texture map of the material. Reselect the dustCloud material by clicking on it in the Assets panel. Lock the Inspector by clicking on the small lock icon on the top-right corner of the panel. Locking the Inspector allows you to maintain the focus on assets while you are hooking up an associated asset in your project. In the Project panel, click on the PACKT_Textures folder. Locate the strato texture map and drag it into the dustCloud material's Albedo texture slot in the Inspector. The texture map contains four atlassed variations of the cloud effect. We need to adjust how much of the whole texture is shown in the material. In the Inspector, set the Tiling Y value to 0.25. This will ensure that only a quarter of the complete height of the texture will be used in the material. The texture map also contains opacity data. To use this in our material, we need to adjust the Rendering Mode. The Rendering Mode of Standard Shader allows us to specify the opaque nature of a surface. Most often, scene objects are Opaque. Objects behind them are blocked by them and are not visible through their surface. The next option is Cutout. This is used for surfaces containing areas of full opacity and full transparency, such as leaves on a tree or a chain link fence. The opacity is basically on or off for each pixel in a texture. Fade allows objects to have cutout areas where there are completely transparent and partially transparent pixels. The Transparent option is suitable for truly transparent surfaces such as windows, glass, and some types of plastic. When specular is used with a transparent material, it is applied over the whole surface, making it unsuitable for cutout effects. Comparison of Standard Shader transparency types The Fade Rendering Mode is the best option for our dustCloud material as we want the cloud objects to be cutout so that the edges of the quad where the material is applied to is not visible. We want the surface to be partially transparent so that other dustcloud quads are visible behind them, blending the effect. At the top of the material properties in the Inspector, click on the Rendering Mode drop-down menu and set it to Fade: Transparent dustCloud material applied The dust clouds should now be visible with their opacity reading correctly as shown in the preceding figure. In the next step, we will add some further environmental effects to the scene. Adding fog to the scene In this step, we will add fog to the scene. Fog can be set to fade out distant background elements to reduce the amount of scenery that needs to be rendered. It can be colored, allowing us to blend elements together and give our scene some depth. If the Lighting tab is not already visible in the Unity project, activate it from the menu bar by navigating to Windows | Lighting. Dock the Lighting panel if necessary. Scroll to the bottom to locate the Fog properties group. Check the checkbox next to Fog to enable it. You will see that fog is added to the environment in the Scene view as shown in the following figure. The default values do not quite match to what we need in the planet surface environment: Unity's default fog effect Click within the color swatch next to Fog Color to define the color value. When the color picker appears over the main Unity interface, type the hexcode E8BE80FF into the Hex Color field near the bottom as shown in the following screenshot: Fog effect color selection This will define the yellow orange color that is appropriate for our planet's atmosphere. Set the Fog Mode to Exponential Squared to allow it to give the appearance of becoming thicker in the distance. Increase the fog by increasing the End value to 0.05: Adjusted fog blended with dust cloud transparencies Our dust cloud objects are being blended with the fog as shown in the preceding image. Summary In this article, we took a closer look at material Rendering Modes and how transparent effects can be implemented in a scene. We further explored the real-time environmental effects by creating dust clouds that fade in and out using atlassed textures. We then set up an environmental fog effect using Unity's built-in tools. For more information on Unity shaders and effects, you can refer to the following books: Unity 5.x Animation Cookbook: https://www.packtpub.com/game-development/unity-5x-animation-cookbook Unity 5.x Shaders and Effects Cookbook: https://www.packtpub.com/game-development/unity-5x-shaders-and-effects-cookbook Unity Shaders and Effects Cookbook: https://www.packtpub.com/game-development/unity-shaders-and-effects-cookbook Resources for Article: Further resources on this subject: Looking Good – The Graphical Interface [article] Build a First Person Shooter [article] The Vertex Functio [article]

0
0
31296

article-image-basic-website-using-nodejs-and-mysql-database

Packt

14 Jul 2016

5 min read

Basic Website using Node.js and MySQL database

Packt

14 Jul 2016

5 min read

0
0
47023

Data Extracting, Transforming, and Loading

Rapid Application Development with Django, the Openduty story

Memory

WebRTC in FreeSWITCH

Visualizing Time Spent Typing in Slack

Detecting and Protecting against Your Enemies

Debugging Your .NET Application

Managing EAP in Domain Mode

Reactive Programming with C#

Getting Started with Packages in R

Trending Topics

Overview of Certificate Management

MicroStrategy 10

Building a Line Chart with ggplot2

Exploring Shaders and Effects

Basic Website using Node.js and MySQL database

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access