How-To Tutorials

article-image-fundamental-selinux-concepts

14 Nov 2016

41 min read

Fundamental SELinux Concepts

14 Nov 2016

In this article by Sven Vermeulen, the author of the book SELinux System Administration Second Edition, we will see how Security Enhanced Linux (SELinux) brings additional security measures for your Linux system to further protect the resources on the system. This article explains why SELinux has opted to use labels to identify resources, the way SELinux differentiates itself from regular Linux access controls through the enforcement of security rules, how the access control rules enforced by SELinux are provided through policy files, the different SELinux implementations between Linux distributions. (For more resources related to this topic, see here.) Providing more security to Linux Seasoned Linux administrators and security engineers already know that they need to put some trust in the users and processes on their system in order for the system to remain secure. This is partially because users can attempt to exploit vulnerabilities found in the software running on the system, but a large contribution to this trust level is because the secure state of the system depends on the behavior of the users. A Linux user with access to sensitive information could easily leak that out to the public, manipulate the behavior of the applications he or she launches, and do many other things that affect the security of the system. The default access controls that are active on a regular Linux system are discretionary; it is up to the user's how the access controls should behave. The Linux discretionary access control (DAC) mechanism is based on the user and/or group information of the process and is matched against the user and/or group information of the file, directory, or other resource being manipulated. Consider the /etc/shadow file, which contains the password and account information of the local Linux accounts: $ ls -l /etc/shadow -rw------- 1 root root 1010 Apr 25 22:05 /etc/shadow Without additional access control mechanisms in place, this file is readable and writable by any process that is owned by the root user, regardless of the purpose of the process on the system. The shadow file is a typical example of a sensitive file that we don't want to see leaked or abused in any other fashion. Yet, the moment someone has access to the file, they can copy it elsewhere, for example to a home directory, or even mail it to a different computer and attempt to attack the password hashes stored within. Another example of how Linux DAC requires trust from its users is when a database is hosted on the system. Database files themselves are (hopefully) only accessible to runtime users of the database management system (DBMS) and the Linux root user. Properly secured systems will only grant trusted users access to these files (for instance, through sudo) by allowing them to change their effective user ID from their personal user to database runtime user or even root account for a well-defined set of commands. These users too can analyze the database files and gain access to potentially confidential information in the database without going through the DBMS. However, regular users are not the only reason for securing a system. Lots of software daemons run as the Linux root user or have significant privileges on the system. Errors within those daemons can easily lead to information leakage or might even lead to exploitable remote command execution vulnerabilities. Backup software, monitoring software, change management software, scheduling software, and so on: they all often run with the highest privileged account possible on a regular Linux system. Even when the administrator does not allow privileged users, their interaction with daemons induces a potential security risk. As such, the users are still trusted to correctly interact with these applications in order for the system to function properly. Through this, the administrator leaves the security of the system to the discretion of its (many) users. Enter SELinux, which provides an additional access control layer on top of the standard Linux DAC mechanism. SELinux provides a mandatory access control (MAC) system that, unlike its DAC counterpart, gives the administrator full control over what is allowed on the system and what isn't. It accomplishes this by supporting a policy-driven approach over what processes are and aren't allowed to do and by enforcing this policy through the Linux kernel. Mandatory means that access control is enforced by the operating system and defined solely by the administrator. Users and processes do not have permission to change the security rules, so they cannot work around the access controls; security is not left to their discretion anymore. The word mandatory here, just like the word discretionary before, was not chosen by accident to describe the abilities of the access control system: both are known terms in the security research field and have been described in many other publications, including the Trusted Computer System Evaluation Criteria (TCSEC) (http://csrc.nist.gov/publications/history/dod85.pdf) standard (also known as the Orange Book) by the Department of Defense in the United States of America in 1985. This publication has led to the common criteria standard for computer security certification (ISO/IEC 15408), available at http://www.commoncriteriaportal.org/cc/. Using Linux security modules Consider the example of the shadow file again. A MAC system can be configured to only allow a limited number of processes to read and write to the file. A user logged on as root cannot directly access the file or even move it around. He can't even change the attributes of the file: # id uid=0(root) gid=0(root) # cat /etc/shadow cat: /etc/shadow: Permission denied # chmod a+r /etc/shadow chmod: changing permissions of '/etc/shadow': Permission denied This is enforced through rules that describe when the contents of a file can be read. With SELinux, these rules are defined in the SELinux policy and are loaded when the system boots. It is the Linux kernel itself that is responsible for enforcing the rules. Mandatory access control systems such as SELinux can be easily integrated into the Linux kernel through its support for Linux Security Modules (LSM): High-level overview of how LSM is integrated into the Linux kernel LSM has been available in the Linux kernel since version 2.6, sometime in December 2003. It is a framework that provides hooks inside the Linux kernel to various locations, including the system call entry points, and allows a security implementation such as SELinux to provide functions to be called when a hook is triggered. These functions can then do their magic (for instance, checking the policy and other information) and give a go/no-go back to allow the call to go through or not. LSM by itself does not provide any security functionality; instead, it relies on security implementations that do the heavy lifting. SELinux is one of the implementations that use LSM, but there are several others: AppArmor, Smack, TOMOYO Linux, and Yama, to name a few. At the time of writing this, only one main security implementation can be active through the LSM hooks. Work is underway to enable stacking multiple security implementations, allowing system administrators to have more than one implementation active. Recent work has already allowed multiple implementations to be defined (but not simultaneously active). When supported, this will allow administrators to pick the best features of a number of implementations and enforce smaller LSM-implemented security controls on top of the more complete security model implementations, such as SELinux, TOMOYO, Smack, or AppArmor. Extending regular DAC with SELinux SELinux does not change the Linux DAC implementation nor can it override denials made by the Linux DAC permissions. If a regular system (without SELinux) prevents a particular access, there is nothing SELinux can do to override this decision. This is because the LSM hooks are triggered after the regular DAC permission checks have been executed. For instance, if you need to allow an additional user access to a file, you cannot add an SELinux policy to do that for you. Instead, you will need to look into other features of Linux, such as the use of POSIX access control lists. Through the setfacl and getfacl commands (provided by the acl package) the user can set additional permissions on files and directories, opening up the selected resource to additional users or groups. As an example, let's grant user lisa read-write access to a file using setfacl: $ setfacl -m u:lisa:rw /path/to/file Similarly, to view the current POSIX ACLs applied to the file, use this command: $ getfacl /path/to/file # file: file # owner: swift # group: swift user::rw- user:lisa:rw- group::r-- mask::r-- other::r-- Restricting root privileges The regular Linux DAC allows for an all-powerful user: root. Unlike most other users on the system, the logged-on root user has all the rights needed to fully manage the entire system, ranging from overriding access controls to controlling audits, changing user IDs, managing the network, and much more. This is supported through a security concept called capabilities (for an overview of Linux capabilities, check out the capabilities manual page: man capabilities). SELinux is also able to restrict access to these capabilities in a fine-grained manner. Due to this fine-grained authorization aspect of SELinux, even the root user can be confined without impacting the operations on the system. The aforementioned example of accessing /etc/shadow is just one example of a restriction that a powerful user as root still might not be able to make due to the SELinux access controls being in place. When SELinux was added to the mainstream Linux kernel, some security projects even went as far as providing public root shell access to an SELinux-protected system, asking hackers and other security researchers to compromise the box. The ability to restrict root was welcomed by system administrators who sometimes need to pass on the root password or root shell to other users (for example, database administrators) who needed root privileges when their software went haywire. Thanks to SELinux, the administrator can now pass on a root shell while resting assured that the user only has those rights he needs, and not full system-administration rights. Reducing the impact of vulnerabilities If there is one benefit of SELinux that needs to be stressed, while often also being misunderstood, it is its ability to reduce the impact of vulnerabilities. A properly written SELinux policy confines applications so that their allowed activities are reduced to a minimum set. This least-privilege model ensures that abnormal application behavior is not only detected and audited but also prevented. Many application vulnerabilities can be exploited to execute tasks that an application is not meant to do. When this happens, SELinux will prevent this. However, there are two misconceptions about SELinux state and its ability to thwart exploits, namely, the impact of the policy and the exploitation itself. If the policy is not written in a least-privilege model, then SELinux might consider this nonstandard behavior as normal and allow the actions to continue. For policy writers, this means that their policy code has to be very fine-grained. Sadly, that makes writing policies very time-consuming; there are more than 80 classes and over 200 permissions known to SELinux, and policy rules need to take into account all these classes and permissions for each interaction between two objects or resources. As a result, policies tend to become convoluted and harder to maintain. Some policy writers make the policies more permissive than is absolutely necessary, which might result in exploits becoming successful even though the action is not expected behavior from an application point of view. Some application policies are explicitly marked as unconfined (which is discussed later in this article), showing that they are very liberal in their allowed permissions. Red Hat Enterprise Linux even has several application policies as completely permissive, and it starts enforcing access controls for those applications only after a few releases. The second misconception is the exploit itself. If an application's vulnerability allows an unauthenticated user to use the application services as if he were authorized, SELinux will not play a role in reducing the impact of the vulnerability; it only notices the behavior of the application itself and not of the sessions internal to the application. As long as the application itself behaves as expected (accessing its own files and not poking around in other filesystems), SELinux will happily allow the actions to take place. It is only when the application starts behaving erratically that SELinux stops the exploit from continuing. Exploits such as remote command execution (RCE) against applications that should not be executing random commands (such as database management systems or web servers, excluding CGI-like functionality) will be prevented, whereas session hijacking or SQL injection attacks are not controllable through SELinux policies. Enabling SELinux support Enabling SELinux on a Linux system is not just a matter of enabling the SELinux LSM module within the Linux kernel. An SELinux implementation comprises the following: The SELinux kernel subsystem, implemented in the Linux kernel through LSM Libraries, used by applications that need to interact with SELinux Utilities, used by administrators to interact with SELinux Policies, which define the access controls themselves The libraries and utilities are bundled by the SELinux user space project (https://github.com/SELinuxProject/selinux/wiki). Next to the user space applications and libraries, various components on a Linux system are updated with SELinux-specific code, including the init system and several core utilities. Because SELinux isn't just a switch that needs to be toggled, Linux distributions that support SELinux usually come with SELinux predefined and loaded: Fedora and Red Hat Enterprise Linux (with its derivatives, such as CentOS and Oracle Linux) are well-known examples. Other supporting distributions might not automatically have SELinux enabled but can easily support it through the installation of additional packages (which is the case with Debian and Ubuntu), and others have a well-documented approach on how to convert a system to SELinux (for example, Gentoo and Arch Linux). Labeling all resources and objects When SELinux has to decide whether it has to allow or deny a particular action, it makes a decision based on the context of both the subject (which is initiating the action) and the object (which is the target of the action). These contexts (or parts of the context) are mentioned in the policy rules that SELinux enforces. The context of a process is what identifies the process to SELinux. SELinux has no notion of Linux process ownership and, frankly, does not care how the process is called, which process ID it has, and what account the process runs as. All it wants to know is what the context of that process is, which is represented to users and administrators as a label. Label and context are often used interchangeably, and although there is a technical distinction (one is a representation of the other), we will not dwell on that much. Let's look at an example label: the context of the current user (try it out yourself if you are on an SELinux-enabled system): $ id -Z unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 The id command, which returns information about the current user, is executed here with the -z switch (a commonly agreed-upon switch for displaying SELinux information). It shows us the context of the current user (actually the context of the id process itself when it was executing). As we can see, the context has a string representation and looks as if it has five fields (it doesn't; it has four fields—the last field just happens to contain a :). SELinux developers decided to use labels instead of real process and file (or other resource) metadata for its access controls. This is different to MAC systems such as AppArmor, which use the path of the binary (and thus the process name) and the paths of the resources to handle permission checks. The decision to make SELinux a label-based mandatory access control was taken for various reasons, which are as follows: Using paths might be easier to comprehend for administrators, but this doesn't allow us to keep the context information close to the resource. If a file or directory is moved or remounted or a process has a different namespace view on the files, then the access controls might behave differently. With label-based contexts, this information is retained and the system keeps controlling the resource properly. Contexts reveal the purpose of the process very well. The same binary application can be launched in different contexts depending on how it got started. The context value (such as the one shown in the id -Z output earlier) is exactly what the administrator needs. With it, he knows what the rights are of each of the running instances, but he can also deduce from it how the process might have been launched and what its purpose is. Contexts also make abstractions of the object itself. We are used to talking about processes and files, but contexts are also applicable to less tangible resources such as pipes (interprocess communication) or database objects. Path-based identification only works as long as you can write a path. As an example, consider the following policies: Allow the httpd processes to bind to TCP port 80 Allow the processes labeled with httpd_t to bind to TCP ports labeled with http_port_t In the first example, we cannot easily reuse this policy when the web server process isn't using the httpd binary (perhaps because it was renamed or it isn't Apache but another web server) or when we want to have HTTP access on a different port. With the labeled approach, the binary can be called apache2 or MyWebServer.py; as long as the process is labeled httpd_t, the policy applies. The same happens with the port definition: you can label port 8080 with http_port_t and thus allow the web servers to bind to that port as well. Dissecting the SELinux context To come to a context, SELinux uses at least three, and sometimes four, values. Let's look at the context of an Apache web server as an example: $ ps -eZ | grep httpd system_u:system_r:httpd_t:s0 511 ? 00:00:00 httpd As we can see, the process is assigned a context that contains following fields: system_u: This represents the SELinux user system_r: This represents the SELinux role httpd_t: This represents the SELinux type (also known as the domain in the case of a process) s0: This represents the sensitivity level This structure can be depicted as follows: The structure of a SELinux context, using the id -Z output as an example When we work with SELinux, contexts are all we need. In the majority of cases, it is the third field (called the domain or type) that is most important since the majority of SELinux policy rules (over 99 percent) consist of rules related to the interaction between two types (without mentioning roles, users, or sensitivity levels). SELinux contexts are aligned with LSM security attributes and exposed to the user space, allowing end users and applications to easily query the contexts. An interesting place where these attributes are presented is within the /proc pseudo filesystem. Inside each process's /proc/<pid> location, we find a subdirectory called attr, inside of which the following files can be found: $ ls /proc/$$/attr current fscreate prev exec keycreate sockcreate All these files, if read, display either nothing or an SELinux context. If it is empty, then that means the application has not explicitly set a context for that particular purpose, and the SELinux context will be deduced either from the policy or inherited from its parent. The meaning of the files are as follows: The current file displays the current SELinux context of the process. The exec file displays the SELinux context that will be assigned by the next application execution done through this application. It is usually empty. The fscreate file displays the SELinux context that will be assigned to the next file that is written by the application. It is usually empty. The keycreate file displays the SELinux context that will be assigned to the keys cached in the kernel by this application. It is usually empty. The prev file displays the previous SELinux context for this particular process. This is usually the context of its parent application. The sockcreate file displays the SELinux context that will be assigned to the next socket created by the application. It is usually empty. If an application has multiple subtasks, then the same information is available in each subtask directory at /proc/<pid>/task/<taskid>/attr. Enforcing access through types The SELinux type (the third part of an SELinux context) of a process (called the domain) is the basis of the fine-grained access controls of that process with respect to itself and other types (which can be processes, files, sockets, network interfaces, and more). In most SELinux literature, the SELinux label-based access control mechanism is fine-tuned to say that SELinux is a type enforcement mandatory access control system: when some actions are denied, the fine-grained access controls on the type level are most likely to blame. With type enforcement, SELinux is able to control what an application is allowed to do based on how it got executed in the first place: a web server that is launched interactively by a user will run with a different type than a web server executed through the init system, even though the process binary and path are the same. The web server launched from the init system is most likely trusted (and thus allowed to do whatever web servers are supposed to do), whereas a manually launched web server is less likely to be considered normal behavior and as such will have different privileges. The majority of SELinux resources will focus on types. Even though the SELinux type is just the third part of an SELinux context, it is the most important one for most administrators. Most documentation will even just talk about a type such as httpd_t rather than a full SELinux context. Take a look at the following dbus-daemon processes: # ps -eZ | grep dbus-daemon system_u:system_r:system_dbusd_t 4531 ? 00:00:00 dbus-daemon staff_u:staff_r:staff_dbusd_t 5266 ? 00:00:00 dbus-daemon In this example, one dbus-daemon process is the system D-Bus daemon running with the aptly named system_dbusd_t type, whereas another one is running with the staff_dbusd_t type assigned to it. Even though their binaries are completely the same, they both serve a different purpose on the system and as such have a different type assigned. SELinux then uses this type to govern the actions allowed by the process towards other types, including how system_dbusd_t can interact with staff_dbusd_t. SELinux types are by convention suffixed with _t, although this is not mandatory. Granting domain access through roles SELinux roles (the second part of an SELinux context) allow SELinux to support role-based access controls. Although type enforcement is the most used (and known) part of SELinux, role-based access control is an important method to keep a system secure, especially from malicious user attempts. SELinux roles are used to define which process types (domains) user processes can be in. As such, they help define what a user can and cannot do. By convention, SELinux roles are defined with an _r suffix. On most SELinux-enabled systems, the following roles are made available to be assigned to users: user_r This role is meant for restricted users: the user_r SELinux role is only allowed to have processes with types specific to end-user applications. Privileged types, including those used to switch to another Linux user, are not allowed for this role. staff_r This role is meant for non-critical operations: the SELinux staff_r role is generally restricted to the same applications as the restricted user, but it has the ability to switch roles. It is the default role for operators to be in (so as to keep those users in the least privileged role as long as possible). sysadm_r This role is meant for system administrators: the sysadm_r SELinux role is very privileged, enabling various system-administration tasks. However, certain end-user application types might not be supported (especially if those types are used for potentially vulnerable or untrusted software) to keep the system free from infections. system_r This role is meant for daemons and background processes: the system_r SELinux role is quite privileged, supporting the various daemon and system process types. However, end-user application types and other administrative types are not allowed in this role. unconfined_r This role is meant for end users: the unconfined_r role is allowed a limited number of types, but those types are very privileged as it is meant for running any application launched by a user in a more or less unconfined manner (not restricted by SELinux rules). This role as such is only available if the system administrator wants to protect certain processes (mostly daemons) while keeping the rest of the system operations almost untouched by SELinux. Other roles might be supported as well, such as guest_r and xguest_r, depending on the distribution. It is wise to consult the distribution documentation for more information about the supported roles. An overview of available roles can be obtained through the seinfo command (part of setools-console in RHEL or app-admin/setools in Gentoo): # seinfo --role Roles: 14 auditadm_r dbadm_r ... unconfined_r Limiting roles through users An SELinux user (the first part of an SELinux context) is different from a Linux user. Unlike Linux user information, which can change while the user is working on the system (through tools such as sudo or su), the SELinux policy can (and generally will) enforce that the SELinux user remain the same even when the Linux user itself has changed. Because of the immutable state of the SELinux user, specific access controls can be implemented to ensure that users cannot work around the set of permissions granted to them, even when they get privileged access. An example of such an access control is the user-based access control (UBAC) feature that some Linux distributions (optionally) enable, which prevents users from accessing files of different SELinux users even when those users try to use the Linux DAC controls to open up access to each other's files. The most important feature of SELinux users, however, is that SELinux user definitions restrict which roles the (Linux) user is allowed to be in. A Linux user is first assigned to an SELinux user—multiple Linux users can be assigned to the same SELinux user. Once set, that user cannot switch to an SELinux role he isn't meant to be in. This is the role-based access control implementation of SELinux: Mapping Linux accounts to SELinux users SELinux users are, by convention, defined with a _u suffix, although this is not mandatory. The SELinux users that most distributions have available are named after the role they represent, but instead of ending with _r, they end with _u. For instance, for the sysadm_r role, there is a sysadm_u SELinux user. Controlling information flow through sensitivities The fourth part of an SELinux context, the sensitivity, is not always present (some Linux distributions by default do not enable sensitivity labels). If they are present though, then this part of the label is needed for the multi-level security (MLS) support within SELinux. Sensitivity labels allow classification of resources and restriction of access to those resources based on a security clearance. These labels consist of two parts: a confidentiality value (prefixed with s) and a category value (prefixed with c). In many larger organizations and companies, documents are labeled internal, confidential, or strictly confidential. SELinux can assign processes a certain clearance level towards these resources. With MLS, SELinux can be configured to follow the Bell-LaPadula model, a security model that can be characterized by no read up and no write down: based on a process clearance level, that process cannot read anything with a higher confidentiality level nor write to (or communicate otherwise with) any resource with a lower confidentiality level. SELinux does not use the internal, confidential, and other labels. Instead, it uses numbers from 0 (lowest confidentiality) to whatever the system administrator has defined as the highest value (this is configurable and set when the SELinux policy is built). Categories allow resources to be tagged with one or more categories, on which access controls are also possible. The idea behind categories is to support multitenancy (for example, systems hosting applications for multiple customers) within a Linux system, by having processes and resources belonging to one tenant to be assigned a particular set of categories, whereas the processes and resources of another tenant get a different set of categories. When a process does not have proper categories assigned, it cannot do anything with the resources (or other processes) that have other categories assigned. An unwritten convention in the SELinux world is that (at least) two categories are used to differentiate between tenants. By having services randomly pick two categories for a tenant out of a predefined set of categories, while ensuring each tenant has a unique combination, these services receive proper isolation. The use of two categories is not mandatory but is implemented by services such as sVirt and Docker. In that sense, categories can be seen as tags, allowing access to be granted only when the tags of the process and the target resource match. As multilevel security is not often used, the benefits of only using categories is persisted in what is called multi-category security (MCS). This is a special MLS case, where only a single confidentiality level is supported (s0). Defining and distributing policies Enabling SELinux does not automatically start the enforcement of access. If SELinux is enabled and it cannot find a policy, it will refuse to start. That is because the policy defines the behavior of the system (what SELinux should allow). SELinux policies are generally distributed in a compiled form (just like with software) as policy modules. These modules are then aggregated into a single policy store and loaded in memory to allow SELinux to enforce the policy rules on the system. Gentoo, being a source-based meta-distribution, distributes the SELinux policies as (source) code as well, which is compiled and built at install time, just like it does with other software. The following diagram shows the relationship between policy rules, policy modules, and a policy package (which is often a one-to-one mapping towards a policy store): Relationship between policy rules, policy modules and policy store Writing SELinux policies A SELinux policy writer can write down the policy rules in (currently) three possible languages: In standard SELinux source format—a human-readable and well-established language for writing SELinux policies In reference policy style—this extends the standard SELinux source format with M4 macros to facilitate the development of policies. In the SELinux Common Intermediate Language (CIL)—a computer-readable (and, with some effort, human-readable) format for SELinux policies. Most SELinux supporting distributions base their policy on the reference policy (https://github.com/TresysTechnology/refpolicy/wiki), a fully functional SELinux policy set managed as a free software project. This allows distributions to ship with a functional policy set rather than having to write one themselves. Many project contributors are distribution developers, trying to push changes of their distribution to the reference policy project itself, where the changes are peer-reviewed to make sure no rules are brought into the project that might jeopardize the security of any platform. It easily becomes very troublesome to write reusable policy modules without the extensive set of M4 macros offered by the reference policy project. The SELinux CIL format is quite recent (RHEL 7.2 does not support it yet), and although it is very much in use already (the recent SELinux user space converts everything in CIL in the background), it is not that common yet for policy writers to use it directly. As an example, consider the web server rule we discussed earlier, repeated here for your convenience: Allow the processes labeled with httpd_t to bind to TCP ports labeled with http_port_t. In the standard SELinux source format, this is written down as follows: allow httpd_t http_port_t : tcp_socket { name_bind }; Using reference policy style, this rule is part of the following macro call: corenet_tcp_bind_http_port(httpd_t) In CIL language, the rule would be expressed as follows: (allow httpd_t http_port_t (tcp_socket (name_bind))) In most representations, we can see what the rule is about: The subject (who is taking the action): In this case, it is a processes labeled with the httpd_t type. The target resource or object (the target for the action): In this case, it is a TCP socket (tcp_socket) labeled with the http_port_t type. In reference policy style, this is implied by the function name. The action or permission: In this case, it is binding to a port (name_bind). In reference policy style, this is implied by the function name. The result that the policy will enforce: In this case, it is that the action is allowed (allow). In reference policy style, this is implied by the function name. A policy is generally written for an application or set of applications. So the preceding example will be part of the policy written for web servers. Policy writers will generally create three files per application or application set: A .te file, which contains the type enforcement rules. An .if file, which contains interface and template definitions, allowing policy writers to easily use the newly generated policy rules to enhance other policies with. You can compare this to header files in other programming languages. An .fc file, which contains file context expressions. These are rules that assign labels to resources on the filesystem. A finished policy will then be packaged into an SELinux policy module. Distributing policies through modules Initially, SELinux used a single, monolithic policy approach: all possible access control rules are maintained in a single policy file. It quickly became clear that this is not manageable in the long term, and the idea of developing a modular policy approach was born. Within the modular approach, policy developers can write isolated policy sets for a particular application (or set of applications), roles, and so on. These policies then get built and distributed as policy modules. Platforms that need access controls for a particular application load the SELinux policy module that defines the access rules for that application. The process of building policy modules is shown in the next diagram. It also shows where CIL comes into play, even when the policy rules themselves are not written in CIL. For distributions that do not yet support CIL, semodule will directly go from the .pp file to the policy.## file. Build process from policy rule to policy store With the recent SELinux user space, the *.pp files (which are the SELinux policy modules) are considered to be written in a high-level language (HLL). Do not assume that this means they are human readable: these files are binary files. The consideration here is that SELinux wants to support writing SELinux policies in a number of formats, which it calls high-level languages, as long as it has a parser that can convert the files into CIL. Marking the binary module formats as high-level allowed the SELinux project to introduce the distinction between high-level languages and CIL in a backward-compatible manner. When distributing SELinux policy modules, most Linux distributions place the *.pp SELinux policy modules inside /usr/share/selinux, usually within a subdirectory named after the policy store (such as targeted). There, these modules are ready for administrators to activate them. When activating a module, the semodule command (part of the policycoreutils package) will copy those modules into a dedicated directory: /etc/selinux/targeted/modules/active/modules (RHEL) or /var/lib/selinux/mcs/active/modules (Gentoo). This location is defined by the version of the SELinux user space—more recent versions use the /var/lib location. When all modules are aggregated in a single location, the final policy binary is compiled, resulting in /etc/selinux/targeted/policy/policy.30 (or some other number) and loaded in memory. On RHEL, the SELinux policies are provided by the selinux-policy-targeted (or -minimum or -mls) package. On Gentoo, they are provided by the various sec-policy/selinux-* packages (Gentoo uses separate packages for each module, reducing the number of SELinux policies that are loaded on an average system). Bundling modules in a policy store A policy store contains a single comprehensive policy, and only a single policy can be active on a system at any point in time. Administrators can switch policy stores, although this often requires the system to be rebooted and might even require relabeling the entire system (relabeling is the act of resetting the contexts on all files and resources available on that system). The active policy on the system can be queried using sestatus (SELinux status, provided through the policycoreutils package), as follows: # sestatus | grep Loaded policy Loaded policy name: targeted In this example, the currently loaded policy (store) is named targeted. The policy name that SELinux will use upon its next reboot is defined in the /etc/selinux/config configuration file as the SELINUXTYPE parameter. It is the init system of systems (be it a SysV-compatible init system or systemd) that is generally responsible for loading the SELinux policy, effectively activating SELinux support on the system. The init system reads the configuration, locates the policy store, and loads the policy file in memory. If the init system does not support this (in other words, it is not SELinux-aware) then the policy can be loaded through the load_policy command, part of the policycoreutils package. Distinguishing between policies The most common SELinux policy store names are strict, targeted, mcs, and mls. None of the names assigned to policy stores are fixed, though, so it is a matter of convention. Hence, it is recommended to consult the distribution documentation to verify what should be the proper name of the policy. Still, the name often provides some information about the SELinux options that are enabled through the policy. Supporting MLS One of the options that can be enabled is MLS support. If it's disabled, the SELinux context will not have a fourth field with sensitivity information in it, making the contexts of processes and files look as follows: staff_u:sysadm_r:sysadm_t To check whether MLS is enabled, it is sufficient to see whether the context indeed doesn't contain such a fourth field, but it can also be acquired from the Policy MLS status line in the output of sestatus: # sestatus | grep MLS Policy MLS Status: disabled Another method would be to look into the pseudo file, /sys/fs/selinux/mls. A value of 0 means disabled, whereas a value of 1 means enabled: # cat /sys/fs/selinux/mls 0 Policy stores that have MLS enabled are generally targeted, mcs and mls, whereas strict generally has MLS disabled. Dealing with unknown permissions Permissions (such as read, open, and lock) are defined both in the Linux kernel and in the policy itself. However, sometimes, newer Linux kernels support permissions that the current policy does not yet understand. Take the block_suspend permission (to be able to block system suspension) as an example. If the Linux kernel supports (and checks) this permission but the loaded SELinux policy does not understand that permission yet, then SELinux has to decide how it should deal with the permission. SELinux can be configured to do one of the following actions: allow: assume everything that is not understood is allowed deny: assume no one is allowed to perform this action reject: stop and halt the system This is configured through the deny_unknown value. To see the state for unknown permissions, look for the Policy deny_unknown status line in sestatus: # sestatus | grep deny_unknown Policy deny_unknown status: denied Administrators can set this for themselves in the /etc/selinux/semanage.conf file through the handle-unknown variable (with allow, deny, or reject). RHEL by default allows unknown permissions, whereas Gentoo by default denies them. Supporting unconfined domains An SELinux policy can be very strict, limiting applications as close as possible to their actual behavior, but it can also be very liberal in what applications are allowed to do. One of the concepts available in many SELinux policies is the idea of unconfined domains. When enabled, it means that certain SELinux domains (process contexts) are allowed to do almost anything they want (of course, within the boundaries of the regular Linux DAC permissions, which still hold) and only a select number of domains are truly confined (restricted) in their actions. Unconfined domains have been brought forward to allow SELinux to be active on desktops and servers where administrators do not want to fully restrict the entire system, but only a few of the applications running on it. Generally, these implementations focus on constraining network-facing services (such as web servers and database management systems) while allowing end users and administrators to roam around unrestricted. With other MAC systems, such as AppArmor, unconfinement is inherently part of the design of the system as they only restrict actions for well-defined applications or users. However, SELinux was designed to be a full mandatory access control system and thus needs to provide access control rules even for those applications that shouldn't need any. By marking these applications as unconfined, almost no additional restrictions are imposed by SELinux. We can see whether unconfined domains are enabled on the system through seinfo, which we use to query the policy for the unconfined_t SELinux type. On a system where unconfined domains are supported, this type will be available: # seinfo -tunconfined_t unconfined_t For a system where unconfined domains are not supported, the type will not be part of the policy: # seinfo -tunconfined_t ERROR: could not find datum for type unconfined_t Most distributions that enable unconfined domains call their policy targeted, but this is just a convention that is not always followed. Hence, it is always best to consult the policy using seinfo. RHEL enables unconfined domains, whereas with Gentoo, this is a configurable setting through the unconfined USE flag. Limiting cross-user sharing When UBAC is enabled, certain SELinux types will be protected by additional constraints. This will ensure that one SELinux user cannot access files (or other specific resources) of another user, even when those users are sharing their data through the regular Linux permissions. UBAC provides some additional control over information flow between resources, but it is far from perfect. In its essence, it is made to isolate SELinux users from one another. A constraint in SELinux is an access control rule that uses all parts of a context to make its decision. Unlike type-enforcement rules, which are purely based on the type, constraints can take the SELinux user, SELinux role, or sensitivity label into account. Constraints are generally developed once and then left untouched—most policy writers will not touch constraints during their development efforts. Many Linux distributions, including RHEL, disable UBAC. Gentoo allows users to select whether or not they want UBAC through the Gentoo ubac USE flag (which is enabled by default). Incrementing policy versions While checking the output of sestatus, we see that there is also a notion of policy versions: # sestatus | grep version Max kernel policy version: 28 This version has nothing to do with the versioning of policy rules but with the SELinux features that the currently running kernel supports. In the preceding output, 28 is the highest policy version the kernel supports. Every time a new feature is added to SELinux, the version number is increased. The policy file itself (which contains all the SELinux rules loaded at boot time by the system) can be found in /etc/selinux/targeted/policy (where targeted refers to the policy store used, so if the system uses a policy store named strict, then the path would be /etc/selinux/strict/policy). If multiple policy files exist, we can use the output of seinfo to find out which policy file is used: # seinfo Statistics for policy file: /etc/selinux/targeted/policy/policy.30 Policy Version & Type: v.30 (binary, mls) ... The next table provides the current list of policy feature enhancements and the Linux kernel version in which that feature is introduced. Many of the features are only of concern to the policy developers, but knowing the evolution of the features gives us a good idea about the evolution of SELinux. Version Linux kernel Description 12 The old API for SELinux, now deprecated. 15 2.6.0 Introduced the new API for SELinux. 16 2.6.5 Added support for conditional policy extensions. 17 2.6.6 Added support for IPv6. 18 2.6.8 Added support for fine-grained netlink socket permissions. 19 2.6.12 Added support for MLS. 20 2.6.14 Reduced the size of the access vector table. 21 2.6.19 Added support for MLS range transitions. 22 2.6.25 Introduced policy capabilities. 23 2.6.26 Added support for per-domain permissive mode. 24 2.6.28 Added support for explicit hierarchy (type bounds). 25 2.6.39 Added support for filename-based transitions. 26 3.0 Added support for role transitions for non-process classes. Added support for role attributes. 27 3.5 Added support for flexible inheritance of user and role for newly created objects. 28 3.5 Added support for flexible inheritance of type for newly created objects. 29 3.14 Added support for attributes within SELinux constraints. 30 4.3 Added support for extended permissions and implemented first on IOCTL controls. Enhanced SELinux XEN support. History of SELinux feature evolution By default, when an SELinux policy is built, the highest supported version as defined by the Linux kernel and libsepol (the library responsible for building the SELinux policy binary) is used. Administrators can force a version to be lower using the policy-version parameter in /etc/selinux/semanage.conf. Different policy content Besides the aforementioned policy capabilities, the main difference between policies (and distributions) is the policy content itself. We already covered that most distributions base their policy on the reference policy project. But although that project is considered the master for most distributions, each distribution has its own deviation from the main policy set. Many distributions make extensive additions to the policy without directly passing the policies to the upstream reference policy project. There are several possible reasons why this is not directly done: The policy enhancements or additions are still immature: Red Hat initially starts with policies being active but permissive, meaning the policies are not enforced. Instead, SELinux logs what it would have prevented and, based on those, logs the policies that are enhanced. This ensures that a policy is only ready after a few releases. The policy enhancements or additions are too specific to the distribution: If a policy set is not reusable for other distributions, then some distributions will opt to keep those policies to themselves as the act of pushing changes to upstream projects takes quite some effort. The policy enhancements or additions haven't followed the upstream rules and guidelines: The reference policy has a set of guidelines that policies need to adhere to. If a policy set does not comply with these rules, then it will not be accepted. The policy enhancements or additions are not implementing the same security model as the reference policy project wants: As SELinux is a very extensive mandatory access control system, it is possible to write completely different policies. The distribution does not have the time or resources to push changes upstream. This ensures that SELinux policies between distributions (and even releases of the same distribution) can, content-wise, be quite different. Gentoo for instance aims to follow the reference policy project closely, with changes being merged within a matter of weeks. Summary In this article, we saw that SELinux offers a more fine-grained access control mechanism on top of the Linux access controls. SELinux is implemented through Linux Security Modules and uses labels to identify its resources and processes based on ownership (user), role, type, and even the security sensitivity and categorization of the resource. We covered how SELinux policies are handled within an SELinux-enabled system and briefly touched upon how policy writers structure policies. Linux distributions implement SELinux policies, which might be a bit different from each other based on supporting features, such as sensitivity labels, default behavior for unknown permissions, support for confinement levels, or specific constraints put in place such as UBAC. However, most of the policy rules themselves are similar and are even based on the same upstream reference policy project. Resources for Article: Further resources on this subject: SELinux - Highly Secured Web Hosting for Python-based Web Applications [article] Introduction to Docker [article] Booting the System [article]

0
0
13507

article-image-software-defined-data-center

Packt

14 Nov 2016

33 min read

The Software-defined Data Center

Packt

14 Nov 2016

33 min read

In this article by Valentin Hamburger, author of the book Building VMware Software-Defined Data Centers, we are introduced and briefed about the software-defined data center (SDDC) that has been introduced by VMware, to further describe the move to a cloud like IT experience. The term software-defined is the important bit of information. It basically means that every key function in the data center is performed and controlled by software, instead of hardware. This opens a whole new way of operating, maintaining but also innovating in a modern data center. (For more resources related to this topic, see here.) But how does a so called SDDC look like – and why is a whole industry pushing so hard towards its adoption? This question might also be a reason why you are reading this article, which is meant to provide a deeper understanding of it and give practical examples and hints how to build and run such a data center. Meanwhile it will also provide the knowledge of mapping business challenges with IT solutions. This is a practice which becomes more and more important these days. IT has come a long way from a pure back office, task oriented role in the early days, to a business relevant asset, which can help organizations to compete with their competition. There has been a major shift from a pure infrastructure provider role to a business enablement function. Today, most organizations business is just as good as their internal IT agility and ability to innovate. There are many examples in various markets where a whole business branch was built on IT innovations such as Netflix, Amazon Web Services, Uber, Airbnb – just to name a few. However, it is unfair to compare any startup with a traditional organization. A startup has one application to maintain and they have to build up a customer base. A traditional organization has a proven and wide customer base and many applications to maintain. So they need to adapt their internal IT to become a digital enterprise, with all the flexibility and agility of a startup, but also maintaining the trust and control over their legacy services. This article will cover the following points: Why is there a demand for SDDC in IT What is SDDC Understand the business challenges and map it to SDDC deliverables The relation of a SDDC and an internal private cloud Identify new data center opportunities and possibilities Become a center of innovation to empower your organizations business The demand for change Today organizations face different challenges in the market to stay relevant. The biggest move was clearly introduced by smartphones and tablets. It was not just a computer in a smaller device, they changed the way IT is delivered and consumed by end users. These devices proved that it can be simple to consume and install applications. Just search in an app store – choose what you like – use it as long as you like it. If you do not need it any longer, simply remove it. All with very simplistic commands and easy to use gestures. More and more people relying on IT services by using a smartphone as their terminal to almost everything. These devices created a demand for fast and easy application and service delivery. So in a way, smartphones have not only transformed the whole mobile market, they also transformed how modern applications and services are delivered from organizations to their customers. Although it would be quite unfair to compare a large enterprise data center with an app store or enterprise service delivery with any app installs on a mobile device, there are startups and industries which rely solely on the smartphone as their target for services, such as Uber or WhatsApp. On the other side, smartphone apps also introduce a whole new way of delivering IT services, since any company never knows how many people will use the app simultaneously. But in the backend they still have to use web servers and databases to continuously provide content and data for these apps. This also introduces a new value model for all other companies. People start to judge a company by the quality of their smartphone apps available. Also people started to migrate to companies which might offer a better smartphone integration as the previous one used. This is not bound to a single industry, but affects a broad spectrum of industries today such as the financial industry, car manufacturers, insurance groups, and even food retailers, just to name a few. A classic data center structure might not be ideal for quick and seamless service delivery. These architectures are created by projects to serve a particular use case for a couple of years. An example of this bigger application environments are web server farms, traditional SAP environments, or a data warehouse. Traditionally these were designed with an assumption about their growth and use. Special project teams have set them up across the data center pillars, as shown in the following figure. Typically, those project teams separate after such the application environment has been completed. All these pillars in the data center are required to work together, but every one of them also needs to mind their own business. Mostly those different divisions also have their own processes which than may integrate in a data center wide process. There was a good reason to structure a data center in this way, the simple fact that nobody can be an expert for every discipline. Companies started to create groups to operate certain areas in a data center, each building their own expertise for their own subject. This was evolving and became the most applied model for IT operations within organizations. Many, if not all, bigger organizations have adopted this approach and people build their careers on this definitions. It served IT well for decades and ensured that each party was adding its best knowledge to any given project. However, this setup has one flaw, it has not been designed for massive change and scale. The bigger these divisions get, the slower they can react to request from other groups in the data center. This introduces a bi-directional issue – since all groups may grow in a similar rate, the overall service delivery time might also increase exponentially. Unfortunately, this also introduces a cost factor when it comes to service deployments across these pillars. Each new service, an organization might introduce or develop, will require each area of IT to contribute. Traditionally, this is done by human hand overs from one department to the other. Each of these hand overs will delay the overall project time or service delivery time, which is also often referred to as time to market. It reflects the needed time interval from the request of a new service to its actual delivery. It is important to mention that this is a level of complexity every modern organization has to deal with, when it comes to application deployment today. The difference between organizations might be in the size of the separate units, but the principle is always the same. Most organizations try to bring their overall service delivery time down to be quicker and more agile. This is often related to business reasons as well as IT cost reasons. In some organizations the time to deliver a brand new service from request to final roll out may take 90 working days. This means – a requestor might wait 18 weeks or more than four and a half month from requesting a new business service to its actual delivery. Do not forget that this reflects the complete service delivery – over all groups until it is ready for production. Also, after these 90 days the requirement of the original request might have changed which would lead into repeating the entire process. Often a quicker time to market is driven by the lines of business (LOB) owners to respond to a competitor in the market, who might already deliver their services faster. This means that today's IT has changed from a pure internal service provider to a business enabler supporting its organization to fight the competition with advanced and innovative services. While this introduces a great chance to the IT department to enable and support their organizations business, it also introduces a threat at the same time. If the internal IT struggles to deliver what the business is asking for, it may lead to leverage shadow IT within the organization. The term shadow IT describes a situation where either the LOBs of an organization or its application developers have grown so disappointed with the internal IT delivery times, that they actually use an external provider for their requirements. This behavior is not agreed with the IT security and can lead to heavy business or legal troubles. This happens more often than one might expect, and it can be as simple as putting some internal files on a public cloud storage provider. These services grant quick results. It is as simple as Register – Download – Use. They are very quick in enrolling new users and sometimes provide a limited use for free. The developer or business owner might not even be aware that there is something non-compliant going on while using this services. So besides the business demand for a quicker service delivery and the security aspect, there an organizations IT department has now also the pressure of staying relevant. But SDDC can provide much more value to the IT than just staying relevant. The automated data center will be an enabler for innovation and trust and introduce a new era of IT delivery. It can not only provide faster service delivery to the business, it can also enable new services or offerings to help the whole organization being innovative for their customers or partners. Business challenges—the use case Today's business strategies often involve a digital delivery of services of any kind. This implies that the requirements a modern organization has towards their internal IT have changed drastically. Unfortunately, the business owners and the IT department tend to have communication issues in some organizations. Sometimes they even operate completely disconnected from each other, as if each of them where their own small company within the organization. Nevertheless, a lot of data center automation projects are driven by enhanced business requirements. In some of these cases, the IT department has not been made aware of what these business requirements look like, or even what the actual business challenges are. Sometimes IT just gets as little information as: We are doing cloud now. This is a dangerous simplification since, the use case is key when it comes to designing and identifying the right solution to solve the organizations challenges. It is important to get the requirements from both sides, the IT delivery side as well as the business requirements and expectations. Here is a simple example how a use case might be identified and mapped to technical implementation. The business view John works as a business owner in an insurance company. He recognizes that their biggest competitor in the market started to offer a mobile application to their clients. The app is simple and allows to do online contract management and tells the clients which products the have enrolled as well as rich information about contract timelines and possible consolidation options. He asks his manager to start a project to also deliver such an application to their customers. Since it is only a simple smartphone application, he expects that it's development might take a couple of weeks and than they can start a beta phase. To be competitive he estimates that they should have something useable for their customers within maximum of 5 months. Based on these facts, he got approval from his manager to request such a product from the internal IT. The IT view Tom is the data center manager of this insurance company. He got informed that the business wants to have a smartphone application to do all kinds of things for the new and existing customers. He is responsible to create a project and bring all necessary people on board to support this project and finally deliver the service to the business. The programming of the app will be done by an external consulting company. Tom discusses a couple of questions regarding this request with his team: How many users do we need to serve? How much time do we need to create this environment? What is the expected level of availability? How much compute power/disk space might be required? After a round of brainstorming and intense discussion, the team still is quite unsure how to answer these questions. For every question there is a couple of variables the team cannot predict. Will only a few of their thousands of users adopt to the app, what if they undersize the middleware environment? What if the user adoption rises within a couple of days, what if it lowers and the environment is over powered and therefor the cost is too high? Tom and his team identified that they need a dynamic solution to be able to serve the business request. He creates a mapping to match possible technical capabilities to the use case. After this mapping was completed, he is using it to discuss with his CIO if and how it can be implemented. Business challenge Question IT capability Easy to use app to win new customers/keep existing How many users do we need to server? Dynamic scale of an environment based on actual performance demand. How much time do we need to create this environment? To fulfill the expectations the environment needs to be flexible. Start small – scale big. What is the expected level of availability? Analytics and monitoring over all layers. Including possible self healing approach. How much compute power/disk space might be required? Create compute nodes based on actual performance requirements on demand. Introduce a capacity on demand model for required resources. Given this table, Tom revealed that with their current data center structure it is quite difficult to deliver what the business is asking for. Also, he got a couple of requirements from other departments, which are going in a similar direction. Based on these mappings, he identified that they need to change their way of deploying services and applications. They will need to use a fair amount of automation. Also, they have to span these functionalities across each data center department as a holistic approach, as shown in the following diagram: In this example, Tom actually identified a very strong use case for SDDC in his company. Based on the actual business requirements of a "simple" application, the whole IT delivery of this company needs to adopt. While this may sound like pure fiction, these are the challenges modern organizations need to face today. It is very important to identify the required capabilities for the entire data center and not just for a single department. You will also have to serve the legacy applications and bring them onto the new model. Therefore it is important to find a solution, which is serving the new business case as well as the legacy applications either way. In the first stage of any SDDC introduction in an organization, it is key to keep always an eye on the big picture. Tools to enable SDDC There is a basic and broadly accepted declaration of what a SDDC needs to offer. It can be considered as the second evolutionary step after server virtualization. It offers an abstraction layer from the infrastructure components such as compute, storage, and network by using automation and tools as such as a self service catalog In a way, it represents a virtualization of the whole data center with the purpose to simplify the request and deployment of complex services. Other capabilities of an SDDC are: Automated infrastructure/service consumption Policy based services and applications deployment Changes to services can be made easily and instantly All infrastructure layers are automated (storage, network, and compute) No human intervention is needed for infrastructure/service deployment High level of standardization is used Business logic is for chargeback or show back functionality All of the preceding points define a SDDC technically. But it is important to understand that a SDDC is considered to solve the business challenges of the organization running it. That means based on the actual business requirements, each SDDC will serve a different use case. Of course there is a main setup you can adopt and roll out – but it is important to understand your organizations business challenges in order to prevent any planning or design shortcomings. Also, to realize this functionality, SDDC needs a couple of software tools. These are designed to work together to deliver a seamless environment. The different parts can be seen like gears in a watch where each gear has an equally important role to make the clockwork function correctly. It is important to remember this when building your SDDC, since missing on one part can make another very complex or even impossible afterwards. This is a list of VMware tools building a SDDC: vRealize Business for Cloud vRealize Operations Manager vRealize Log Insight vRealize Automation vRealize Orchestrator vRealize Automation Converged Blueprint vRealize Code Stream VMware NSX VMware vSphere vRealize Business for Cloud is a charge back/show back tool. It can be used to track cost of services as well as the cost of a whole data center. Since the agility of a SDDC is much higher than for a traditional data center, it is important to track and show also the cost of adding new services. It is not only important from a financial perspective, it also serves as a control mechanism to ensure users are not deploying uncontrolled services and leaving them running even if they are not required anymore. vRealize Operations Manager is serving basically two functionalities. One is to help with the troubleshooting and analytics of the whole SDDC platform. It has an analytics engine, which applies machine learning to the behavior of its monitored components. The other important function is capacity management. It is capable of providing what-if analysis and informs about possible shortcomings of resources way before they occur. These functionalities also use the machine learning algorithms and get more accurate over time. This becomes very important in an dynamic environment where on-demand provisioning is granted. vRealize Log Insight is a unified log management. It offers rich functionality and can search and profile a large amount of log files in seconds. It is recommended to use it as a universal log endpoint for all components in your SDDC. This includes all OSes as well as applications and also your underlying hardware. In an event of error, it is much simpler to have a central log management which is easy searchable and delivers an outcome in seconds. vRealize Automation (vRA) is the base automation tool. It is providing the cloud portal to interact with your SDDC. The portal it provides offers the business logic such as service catalogs, service requests, approvals, and application life cycles. However, it relies strongly on vRealize Orchestrator for its technical automation part. vRA can also tap into external clouds to extend the internal data center. Extending a SDDC is mostly referred to as hybrid cloud. There are a couple of supported cloud offerings vRA can manage. vRealize Orchestrator (vRO) is providing the workflow engine and the technical automation part of the SDDC. It is literally the orchestrator of your new data center. vRO can be easily bound together with vRA to form a very powerful automation suite, where anything with an application programming interface (API) can be integrated. Also it is required to integrate third-party solutions into your deployment workflows, such as configuration management database (CMDB), IP address management (IPAM), or ticketing systems via IT service management (ITSM). vRealize Automation Converged Blueprint was formally known as vRealize Automation Application Services and is an add-on functionality to vRA, which takes care of application installations. It can be used with pre-existing scripts (like Windows PowerShell or Bash on Linux) – but also with variables received from vRA. This makes it very powerful when it comes to on demand application installations. This tool can also make use of vRO to provide even better capabilities for complex application installations. vRealize Code Stream is an addition to vRA and serves specific use cases in the DevOps area of the SDDC. It can be used with various development frameworks such as Jenkins. Also it can be used as a tool for developers to build and operate their own software test, QA and deployment environment. Not only can the developer build these separate stages, the migration from one stage into another can also be fully automated by scripts. This makes it a very powerful tool when it comes to stage and deploy modern and traditional applications within the SDDC. VMware NSX is the network virtualization component. Given the complexity some applications/services might introduce, NSX will provide a good and profound solution to help solving it. The challenges include: Dynamic network creation Microsegmentation Advanced security Network function virtualization VMware vSphere is mostly the base infrastructure and used as the hypervisor for server virtualization. You are probably familiar with vSphere and its functionalities. However, since the SDDC is introducing a change to you data center architecture, it is recommended to re-visit some of the vSphere functionalities and configurations. By using the full potential of vSphere it is possible to save effort when it comes to automation aspects as well as the service/application deployment part of the SDDC. This represents your toolbox required to build the platform for an automated data center. All of them will bring tremendous value and possibilities, but they also will introduce change. It is important that this change needs to be addressed and is a part of the overall SDDC design and installation effort. Embrace the change. The implementation journey While a big part of this article focuses on building and configuring the SDDC, it is important to mention that there are also non-technical aspects to consider. Creating a new way of operating and running your data center will always involve people. It is important to also briefly touch this part of the SDDC. Basically there are three major players when it comes to a fundamental change in any data center, as shown in the following image: Basically there are three major topics relevant for every successful SDDC deployment. Same as for the tools principle, these three disciplines need to work together in order to enable the change and make sure that all benefits can be fully leveraged. These three categories are: People Process Technology The process category Data center processes are as established and settled as IT itself. Beginning with the first operator tasks like changing tapes or starting procedures up to highly sophisticated processes to ensure that the service deployment and management is working as expected they have already come a long way. However, some of these processes might not be fit for purpose anymore, once automation is applied to a data center. To build a SDDC it is very important to revisit data center processes and adopt them to work with the new automation tasks. The tools will offer integration points into processes, but it is equally important to remove bottle necks for the processes as well. However, keep in mind that if you automate a bad process, the process will still be bad – but fully automated. So it is also necessary to re-visit those processes so that they can become slim and effective as well. Remember Tom, the data center manager. He has successfully identified that they need a SDDC to fulfill the business requirements and also did a use case to IT capabilities mapping. While this mapping is mainly talking about what the IT needs to deliver technically, it will also imply that the current IT processes need to adopt to this new delivery model. The process change example in Tom's organization If the compute department works on a service involving OS deployment, they need to fill out an Excel sheet with IP addresses and server names and send it to the networking department. The network admins will ensure that there is no double booking by reserving the IP address and approve the requested host name. After successfully proving the uniqueness of this data, name and IP gets added to the organizations DNS server. The manual part of this process is not longer feasible once the data center enters the automation era – imagine that every time somebody orders a service involving a VM/OS deploy, the network department gets an e-mail containing the Excel with the IP and host name combination. The whole process will have to stop until this step is manually finished. To overcome this, the process has to be changed to use an automated solution for IPAM. The new process has to track IP and host names programmatically to ensure there is no duplication within the entire data center. Also, after successfully checking the uniqueness of the data, it has to be added to the Domain Name System (DNS). While this is a simple example on one small process, normally there is a large number of processes involved which need to be re-viewed for a fully automated data center. This is a very important task and should not be underestimated since it can be a differentiator for success or failure of an SDDC. Think about all other processes in place which are used to control the deploy/enable/install mechanics in your data center. Here is a small example list of questions to ask regarding established processes: What is our current IPAM/DNS process? Do we need to consider a CMDB integration? What is our current ticketing process? (ITSM) What is our process to get resources from network, storage, and compute? What OS/VM deployment process is currently in place? What is our process to deploy an application (hand overs, steps, or departments involved)? What does our current approval process look like? Do we need a technical approval to deliver a service? Do we need a business approval to deliver a service? What integration process do we have for a service/application deployment? DNS, Active Directory (AD), Dynamic Host Configuration Protocol (DHCP), routing, Information Technology Infrastructure Library (ITIL), and so on Now for the approval question, normally these are an exception for the automation part, since approvals are meant to be manual in the first place (either technical or business). If all the other answers to this example questions involve human interaction as well, consider to change these processes to be fully automated by the SDDC. Since human intervention creates waiting times, it has to be avoided during service deployments in any automated data center. Think of it as the robotic construction bands todays car manufacturers are using. The processes they have implemented, developed over ages of experience, are all designed to stop the band only in case of an emergency. The same comes true for the SDDC – try to enable the automated deployment through your processes, stop the automation only in case of an emergency. Identifying processes is the simple part, changing them is the tricky part. However, keep in mind that this is an all new model of IT delivery, therefore there is no golden way of doing it. Once you have committed to change those processes, keep monitoring if they truly fulfill their requirement. This leads to another process principle in the SDDC: Continual Service Improvement (CSI). Re-visit what you have changed from time to time and make sure that those processes are still working as expected, if they don't, change them again. The people category Since every data center is run by people, it is important to also consider that a change of technology will also impact those people. There are some claims that a SDDC can be run with only half of the staff or save a couple of employees since all is automated. The truth is, a SDDC will transform IT roles in a data center. This means that some classic roles might vanish, while others will be added by this change. It is unrealistic to say that you can run an automated data center with half the staff than before. But it is realistic to say that your staff can concentrate on innovation and development instead of working a 100% to keep the lights on. And this is the change an automated data center introduces. It opens up the possibilities to evolve into a more architecture and design focused role for current administrators. The people example in Tom's organization Currently there are two admins in the compute department working for Tom. They are managing and maintaining the virtual environment, which is largely VMware vSphere. They are creating VMs manually, deploying an OS by a network install routine (which was a requirement for physical installs – so they kept the process) and than handing the ready VMs over to the next department to finish installing the service they are meant for. Recently they have experienced a lot of demand for VMs and each of them configures 10 to 12 VMs per day. Given this, they cannot concentrate on other aspects of their job, like improving OS deployments or the hand over process. At a first look it seems like the SDDC might replace these two employees since the tools will largely automate their work. But that is like saying a jackhammer will replace a construction worker. Actually their roles will shift to a more architectural aspect. They need to come up with a template for OS installations and an improvement how to further automate the deployment process. Also they might need to add new services/parts to the SDDC in order to fulfill the business needs continuously. So instead of creating all the VMs manually, they are now focused on designing a blueprint, able to be replicated as easy and efficient as possible. While their tasks might have changed, their workforce is still important to operate and run the SDDC. However, given that they focus on design and architectural tasks now, they also have the time to introduce innovative functions and additions to the data center. Keep in mind that an automated data center affects all departments in an IT organization. This means that also the tasks of the network and storage as well as application and database teams will change. In fact, in a SDDC it is quite impossible to still operate the departments disconnected from each other since a deployment will affect all of them. This also implies that all of these departments will have admins shifting to higher-level functions in order to make the automation possible. In the industry, this shift is also often referred to as Operational Transformation. This basically means that not only the tools have to be in place, you also have to change the way how the staff operates the data center. In most cases organizations decide to form a so-called center of excellence (CoE) to administer and operate the automated data center. This virtual group of admins in a data center is very similar to project groups in traditional data centers. The difference is that these people should be permanently assigned to the CoE for a SDDC. Typically you might have one champion from each department taking part in this virtual team. Each person acts as an expert and ambassador for their department. With this principle, it can be ensured that decisions and overlapping processes are well defined and ready to function across the departments. Also, as an ambassador, each participant should advertise the new functionalities within their department and enable their colleagues to fully support the new data center approach. It is important to have good expertise in terms of technology as well as good communication skills for each member of the CoE. The technology category This is the third aspect of the triangle to successfully implement a SDDC in your environment. Often this is the part where people spend most of their attention, sometimes by ignoring one of the other two parts. However, it is important to note that all three topics need to be equally considered. Think of it like a three legged chair, if one leg is missing it can never stand. The term technology does not necessarily only refer to new tools required to deploy services. It also refers to already established technology, which has to be integrated with the automation toolset (often referred to as third-party integration). This might be your AD, DHCP server, e-mail system, and so on. There might be technology which is not enabling or empowering the data center automation, so instead of only thinking about adding tools, there might also be tools to be removed or replaced. This is a normal IT lifecycle task and has been gone through many iterations already. Think of things like a fax machine or the telex – you might not use them anymore, they have been replaced by e-mail and messaging. The technology example in Tom's organization The team uses some tools to make their daily work easier when it comes to new service deployments. One of the tools is a little graphical user interface to quickly add content to AD. The admins use it to insert the host name, Organizational Unit as well as creating the computer account with it. This was meant to save admin time, since they don't have to open all the various menus in the AD configuration to accomplish these tasks. With the automated service delivery, this has to be done programmatically. Once a new OS is deployed it has to be added to the AD including all requirements by the deployment tool. Since AD offers an API this can be easily automated and integrated into the deployment automation. Instead of painfully integrating the graphical tool, this is now done directly by interfacing the organizations AD, ultimately replacing the old graphical tool. The automated deployment of a service across the entire data center requires a fair amount of communication. Not in a traditional way, but machine-to-machine communication leveraging programmable interfaces. Using such APIs is another important aspect of the applied data center technologies. Most of today's data center tools, from backup all the way up to web servers, do come with APIs. The better the API is documented, the easier the integration into the automation tool. In some cases you might need the vendors to support you with the integration of their tools. If you have identified a tool in the data center, which does not offer any API or even command-line interface (CLI) option at all, try to find a way around this software or even consider replacing it with a new tool. APIs are the equivalent of hand overs in the manual world. The better the communication works between tools, the faster and easier the deployment will be completed. To coordinate and control all this communication, you will need far more than scripts to run. This is a task for an orchestrator, which can run all necessary integration workflows from a central point. This orchestrator will act like a conductor for a big orchestra. It will form the backbone of your SDDC. Why are these three topics so important? The technology aspect closes the triangle and brings the people and the processes parts together. If the processes are not altered to fit the new deployment methods – automation will be painful and complex to implement. If the deployment stops at some point, since the processes require manual intervention, the people will have to fill in this gap. This means that they now have new roles, but also need to maintain some of their old tasks to keep the process running. By introducing such an unbalanced implementation of an automated data center, the workload for people can actually increase, while the service delivery times may not dramatically decrease. This may lead to an avoidance of the automated tasks, since the manual intervention might seen as faster by individual admins. So it is very important to accept all three aspects as the main part of the SDDC implementation journey. They all need to be addressed equally and thoughtfully to unveil the benefits and improvements an automated data center has to offer. However, keep in mind that this truly is a journey. A SDDC is not implemented in days but in month. Given this, also the implementation team in the data center has this time to adopt themselves and their process to this new way of delivering IT services. Also all necessary departments and their lead needs to be involved in this procedure. A SDDC implementation is always a team effort. Additional possibilities and opportunities All the previews mentioned topics serve the sole goal to install and use the SDDC within your data center. However, once you have the SDDC running the real fun begins since you can start to introduce additional functionalities impossible for any traditional data center. Lets just briefly touch on some of the possibilities from an IT view. The self-healing data center This is a concept where the automatic deployment of services is connected to a monitoring system. Once the monitoring system detects that a service or environment may be facing constraints, it can automatically trigger an additional deployment for this service to increase the throughput. While this is application dependent, for infrastructure services this can become quite handy. Think of ESXi host auto deployments if compute power is becoming a constraint, or data store deployments if disk space is running low. If this automation is acting to aggressive for your organization, it can be used with an approval function. Once the monitoring detects a shortcoming it will ask for approval to fix it with a deployment action. Instead of getting an e-mail from your monitoring system that there is a constraint identified, you get an e-mail with the constraint and the resolving action. All you need to do is to approve the action. The self-scaling data center A similar principle is to use a capacity management tool to predict the growth of your environment. If it approaches a trigger, the system can automatically generate an order letter, containing all needed components to satisfy the growing capacity demands. This can than be sent to finance or the purchasing management for approval and before you even get into any capacity constraints, the new gear might be available and ready to run. However, consider the regular turnaround time for ordering hardware, which might affect how far in the future you have to set the trigger for such functionality. Both of this opportunities are more than just nice to haves, they enable your data center to be truly flexible and proactive. Due to the fact that a SDDC is offering a high amount of agility, it will also need some self-monitoring to stay flexible and useable and to fulfill unpredicted demand. Summary In this article we discussed the main principles and declarations of an SDDC. It provided an overview of the opportunities and possibilities this new data center architecture provides. Also, it covered the changes which will be introduced by this new approach. Finally it discussed the implementation journey and its involvement with people, processes and technology. Resources for Article: Further resources on this subject: VM, It Is Not What You Think! [article] Introducing vSphere vMotion [article] Creating a VM using VirtualBox - Ubuntu Linux [article]

0
0
9714

article-image-building-our-first-app-7-minute-workout

Packt

14 Nov 2016

27 min read

Building Our First App – 7 Minute Workout

Packt

14 Nov 2016

27 min read

0
0
13971

How-To Tutorials

Packt

14 Nov 2016

6 min read

The TensorFlow Toolbox

Packt

14 Nov 2016

6 min read

In this article by Saif Ahmed, author of the book Machine Learning with TensorFlow, we learned how most machine learning platforms are focused toward scientists and practitioners in academic or industrial settings. Accordingly, while quite powerful, they are often rough around the edges and have few user-experience features. (For more resources related to this topic, see here.) Quite a bit of effort goes into peeking at the model at various stages and viewing and aggregating performance across models and runs. Even viewing the neural network can involve far more effort than expected. While this was acceptable when neural networks were simple and only a few layers deep, today's networks are far deeper. In 2015, Microsoft won the annual ImageNet competition using a deep network with 152 layers. Visualizing such networks can be difficult, and peeking at weights and biases can be overwhelming. Practitioners started using home-built visualizers and bootstrapped tools to analyze their networks and run performance. TensorFlow changed this by releasing TensorBoard directly alongside their overall platform release. TensorBoard runs out of box with no additional installations or setup. Users just need to instrument their code according to what they wish to capture. It features plotting of events, learning rate and loss over time; histograms, for weights and biases; and images. The Graph Explorer allows interactive reviews of the neural network. A quick preview You can follow along with the code here: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/models/image/cifar10/cifar10_train.py The example uses the CIFAR-10 image set. The CIFAR-10 dataset consists of 60,000 images in ten classes compiled by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton. The dataset has become one of several standard learning tools and benchmarks for machine learning efforts. Let's start with the Graph Explorer. We can immediately see a convolutional network being used. This is not surprising as we're trying to classify images here. This is just one possible view of the graph. You can try the Graph Explorer as well. It allows deep dives into individual components. Our next stop on the quick preview is the EVENTS tab. This tab shows scalar data over time. The different statistics are grouped into individual tabs on the right-hand side. The following screenshot shows a number of popular scalar statistics, such as loss, learning rate, cross entropy, and sparsity across multiple parts of the network. The HISTOGRAMS tab is a close cousin as it shows tensor data over time. Despite the name, as of TensorFlow v0.7, it does not actually display histograms. Rather, it shows summaries of tensor data using percentiles. The summary view is shown in the following figure. Just like with the EVENTS tab, the data is grouped into tabs on the right-hand side. Different runs can be toggled on and off and runs can be shown overlaid, allowing interesting comparisons. It features three runs, which we can see on the left side, and we'll look at just the softmax function and associated parameters. For now, don't worry too much about what these mean, we're just looking at what we can achieve for our own classifiers. However, the summary view does not do justice to the utility of the HISTOGRAMS tab. Instead, we will zoom into a single graph to observe what is going on. This is shown in the following figure: Notice that each histogram chart shows a time series of nine lines. The top is the maximum, the middle the median, and the bottom the minimum. The three lines directly above and below the median are one and half standard deviation, one standard deviation, and half standard deviation marks. Obviously, this does represent multimodal distributions as it is not a histogram. However, it does provide a quick gist of what would otherwise be a mountain of data to sift through. A couple of things to note are how data can be collected and segregated by runs, how different data streams can be collected, how we can enlarge the views, and how we can zoom into each of the graphs. Enough of graphics, lets jump into code so we can run this for ourselves! Installing TensorBoard TensorFlow comes prepackaged with TensorBoard, so it will already be installed. It runs as a locally served web application accessible via the browser at http://0.0.0.0:6006. Conveniently, there is no server-side code or configurations required. Depending on where your paths are, you may be able to run it directly, as follows: tensorboard --logdir=/tmp/tensorlogs If your paths are not correct, you may need to prefix the application accordingly, as shown in the following command line: tf_install_dir/ tensorflow/tensorboard --logdir=/tmp/tensorlogs On Linux, you can run it in the background and just let it keep running, as follows: nohup tensorboard --logdir=/tmp/tensorlogs & Some thought should be put into the directory structure though. The Runs list on the left side of the dashboard is driven by subdirectories in the logdir location. The following image shows two runs: MNIST_Run1 and MNIST_Run2. Having an organized runs folder will allow plotting successive runs side by side to see differences. When initializing the writer, you will pass in the log_location as the first parameter, as follows: writer = tf.train.SummaryWriter(log_location, sess.graph_def) Consider saving a base location and appending run-specific subdirectories for each run. This will help organize outputs without expending more thought on it. We’ll discuss more about this later. Incorporating hooks into our code The best way to get started with TensorBoard is by taking existing working examples and instrument them with the code required for TensorBoard. We will do this for several common training scripts. Summary In this article, we covered the major areas of TensorBoard—EVENTS, HISTOGRAMS, and viewing GRAPH. We modified popular models to see the exact changes required before TensorBoard could be up and running. This should have demonstrated the fairly minimal effort required to get started with TensorBoard. Resources for Article: Further resources on this subject: Supervised Machine Learning [article] Implementing Artificial Neural Networks with TensorFlow [article] Why we need Design Patterns? [article]

0
0
2584

article-image-data-visualization-ggplot2

Janu Verma

14 Nov 2016

6 min read

Data Visualization with ggplot2

Janu Verma

14 Nov 2016

6 min read

0
0
50026

How-To Tutorials

Packt

11 Nov 2016

11 min read

Getting Started with Flocker

Packt

11 Nov 2016

11 min read

0
0
2962

How-To Tutorials

Packt

11 Nov 2016

10 min read

Manage Security in Excel

Packt

11 Nov 2016

10 min read

0
0
11750

How-To Tutorials

article-image-creating-reusable-generic-modals-react-and-redux

Mark Erikson

11 Nov 2016

6 min read

Creating Reusable Generic Modals in React and Redux

Mark Erikson

11 Nov 2016

6 min read

Modal dialogs are a common part of user interface design. As with most other parts of a UI, modals in a given application probably fall into two general categories: modals that are specific to a given feature or task, and modals that are intended to be generic and reusable. However, defining generic reusable modal components in a React/Redux application presents some interesting challenges. Here's one approach you can use to create generic reusable modals that can be used in a variety of contexts throughout a React/Redux application. First, we need a way to manage modals in general. In a typical object-oriented widget API, we might manually create an instance of a modal class, and pass in some kind of callback function to do something when it's closed. Here's what this might look like for a ColorPicker modal in an OOP API: const colorPickerInstance = new ColorPicker({ initialColor : "red", onColorPicked(color) { // do something useful with the "returned" color value } }); colorPickerInstance.show(); This presents some problems, though. Who really "owns" the ColorPicker? What happens if you want to show multiple modals stacked on each other? What happens with the ColorPicker instance while it's being displayed? In a React/Redux application, we really want our entire UI to be declarative, and to be an output of our current state. Rather than imperatively creating modal instances and calling show(), we'd really like any nested part of our UI to be able to "request" that some modal be shown, and have the state and UI updated appropriately to show the modal. Dan Abramov describes a wonderful approach on Stack Overflow to React/Redux modal management, in response to a question about displaying modal dialogs in Redux. It's worth reading his answer in full, but here's a summary: Dispatch an action that indicates you want to show a modal. This includes some string that can be used to identify which modal component should be shown, and includes any arbitrary values we want to be passed along to the rendered modal component: dispatch({ type : 'SHOW_MODAL", payload : { modalType : "SomeModalComponentIdentifier", modalProps : { // any arbitrary values here that we want to be passed to the modal } } }); Have a reducer that simply stores the modalType and modalProps values for 'SHOW_MODAL', and clears them for 'HIDE_MODAL'. Create a central component that connects to the store, retrieves the details ofwhat modal is open and what its props should be, looks up the correct component type, and renders it: import FirstModal from "./FirstModal"; import SecondModal from "./SecondModal"; // lookup table mapping string identifiers to component classes const MODAL_COMPONENTS = { FirstModal, SecondModal }; const ModalRoot = ({modalType, modalProps}) => { if(!modalType) return null; const SpecificModal = MODAL_COMPONENTS[modalType]; return <SpecificModal {...modalProps} /> } const mapState = state => state.modal; export default connect(mapState)(ModalRoot); From there, each modal component class can be connected to the store, retrieve any other needed data, and dispatch specific actions for both internal behavior as well as ultimately dispatching a 'HIDE_MODAL' action when it's ready to close itself. This way, the handling of modal display is centralized, and nested components don't have to "own" the details of showing a modal. Unfortunately, this pattern runs into a problem when we want to create and use a very generic component, such as a ColorPicker. We would probably want to use the ColorPicker in a variety of places and features within the UI, each needing to use the "result" color value in a different way, so having it dispatch a generic 'COLOR_SELECTED' action won't really suffice. We could include some kind of a callback function within the action, but that's an anti-pattern with Redux, because using non-serializable values in actions or state can break features like time-travel debugging. What we really need is a way to specify behavior specific to a feature, and use that from within the generic component. The answer that I came up with is to have the modal component accept a plain Redux action object as a prop. The component that requested the dialog be shown should specify that action as one of the props to be passed to the modal. When the modal is closed successfully, it should copy the action object, attach its "return value" to the action, and dispatch it. This way, different parts of the UI can use the "return value" of the generic modal in whatever specific functionality they need. Here's how the different pieces look: // In some arbitrary component: const onColorSelected = { type : 'FEATURE_SPECIFIC_ACTION', payload : { someFeatureSpecificData : 42, } }; this.props.dispatch({ type : 'SHOW_MODAL", payload : { modalType : "ColorPicker", modalProps : { initialColor : "red", // Include the pre-configured action object as a prop for the modal onColorSelected } } }); // In the ColorPicker component: handleOkClicked() { if(this.props.onColorSelected) { // If the code that requested this modal included an action object, // clone the action, attach our "return value", and dispatch it const clonedAction = _.clone(this.props.onColorSelected); clonedAction.payload.color = this.state.currentColor; this.props.dispatch(clonedAction); } this.props.hideModal(); } // In some reducer: function handleFeatureSpecificAction(state, action) { const {payload} = action; // Use the data provided by the original requesting code, as well as the // "return value" given to us by the generic modal component const {color, someFeatureSpecificData} = payload; return { ...state, [someFeatureSpecificData] : { ...state[someFeatureSpecificData], color } }; } This technique satisfies all the constraints for our problem. Any part of our application can request that a specific modal component be shown, without needing a nested component to "own" the modal. The display of the modal is driven by our Redux state. And most importantly, we can specify per-feature behavior and use "return values" from generic modals while keeping both our actions and our Redux state plain and serializable, ensuring that features like time-travel debugging still work correctly. About the author Mark Erikson is a software engineer living in southwest Ohio, USA, where he patiently awaits the annual heartbreak from the Reds and the Bengals. Mark is author of the Redux FAQ, maintains the React/Redux Links list and Redux Addons Catalog, and occasionally tweets at @acemarke. He can be usually found in the Reactiflux chat channels, answering questions about React and Redux. He is also slightly disturbed by the number of third-person references he has written in this bio!

0
0
40946

How-To Tutorials

article-image-devops-tools-and-technologies

Packt

11 Nov 2016

15 min read

DevOps Tools and Technologies

Packt

11 Nov 2016

15 min read

In this article by Ritesh Modi, the author of the book DevOps with Windows Server 2016, we will introduce foundational platforms and technologies instrumental in enabling and implementing DevOps practices. (For more resources related to this topic, see here.) These include: Technology stack for implementing Continuous Integration, Continuous Deployment, Continuous Deliver, Configuration Management, and Continuous Improvement. These form the backbone for DevOps processes and include source code services, build services, and release services through Visual Studio Team Services. Platform and technology used to create and deploy a sample web application. This includes technologies such as Microsoft .NET, ASP.NET and SQL Server databases. Tools and technology for configuration management, testing of code and application, authoring infrastructure as code, and deployment of environments. Examples of these tools and technologies are Pester for environment validation, environment provisioning through Azure Resource Manager (ARM) templates, Desired State Configuration (DSC) and Powershell, application hosting on containers through Windows Containers and Docker, application and database deployment through Web Deploy packages, and SQL Server bacpacs. Cloud technology Cloud is ubiquitous. Cloud is used for our development environment, implementation of DevOps practices, and deployment of applications. Cloud is a relatively new paradigm in infrastructure provisioning, application deployment, and hosting space. The only options prior to the advent of cloud was either self-hosted on-premises deployments or using services from a hosting service provider. However, cloud is changing the way enterprises look at their strategy in relation to infrastructure and application development, deployment, and hosting. In fact, the change is so enormous that it has found its way into every aspect of an organization's software development processes, tools, and practices. Cloud computing refers to the practice of deploying applications and services on the Internet with a cloud provider. A cloud provider provides multiple types of services on cloud. They are divided into three categories based on their level of abstraction and degree of control on services. These categories are as follows: Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) These three categories differ based on the level of control a cloud provider exercises compared to the cloud consumer. The services provided by a cloud provider can be divided into layers, with each layer providing a type of service. As we move higher in the stack of layers, the level of abstraction increases in line with the cloud provider's control over services. In other words, the cloud consumer starts to lose control over services as you move higher in each column: Figure 1: Cloud Services – IaaS, PaaS and SaaS Figure 1 shows the three types of service available through cloud providers and the layers that comprise these services. These layers are stacked vertically on each other and show the level of control a cloud provider has compared to a consumer. From Figure 1, it is clear that for IaaS, a cloud provider is responsible for providing, controlling, and managing layers from the network layer up to the virtualization layer. Similarly, for PaaS, a cloud provider controls and manages from the hardware layer up to the runtime layer, while the consumer controls only the application and data layers. Infrastructure as a Service (IaaS) As the name suggests, Infrastructure as a Service is an infrastructure service provided by a cloud provider. This service includes the physical hardware and its configuration, network hardware and its configuration, storage hardware and its configuration, load balancers, compute, and virtualization. Any layer above virtualization is the responsibility of the consumer to provision, configure, and manage. The consumer can decide to use the provided underlying infrastructure in whatever way best suits their requirements. Consumers can consume the storage, network, and virtualization to provision their virtual machines on top of. It is the consumer's responsibility to manage and control the virtual machines and the things deployed within it. Platform as a Service (PaaS) Platform as a Service enables consumers to deploy their applications and services on the provided platform, consuming the underlying runtime, middleware, and services. The cloud provider provides the services from infrastructure to runtime. The consumers cannot provision virtual machines as they cannot access and control them. Instead, they can only control and manage their applications. This is a comparatively faster method of development and deployment because now the consumer can focus on application development and deployment. Examples of Platform as a Service include Azure Automation, Azure SQL, and Azure App Services. Software as a Service (SaaS) Software as a Service provides complete control of the service to the cloud provider. The cloud provider provisions, configures, and manages everything from infrastructure to the application. It includes the provisioning of infrastructure, deployment and configuration of applications, and provides application access to the consumer. The consumer does not control and manage the application, and can use and configure only parts of the application. They control only their data and configuration. Generally, multi-tenant applications used by multiple consumers, such as Office 365 and Visual Studio Team Services, are examples of SaaS. Advantages of using cloud computing There are multiple distinct advantages of using cloud technologies. The major among them are as follows: Cost effective: Cloud computing helps organizations to reduce the cost of storage, networks, and physical infrastructure. It also prevents them from having to buy expensive software licenses. The operational cost of managing these infrastructures also reduces due to lesser effort and manpower requirements. Unlimited capacity: Cloud provides unlimited resources to the consumer. This ensures applications will never get throttled due to limited resource availability. Elasticity: Cloud computing provides the notion of unlimited capacity and applications deployed on it can scale up or down on an as-needed basis. When demand for the application increases, cloud can be configured to scale up the infrastructure and application by adding additional resources. At the same time, it can scale down unnecessary resources during periods of low demand. Pay as you go: Using cloud eliminates capital expenditure and organizations pay only for what they use, thereby providing maximum return on investment. Organizations do not need to build additional infrastructure to host their application for times of peak demand. Faster and better: Cloud provides ready-to-use applications and faster provisioning and deployment of environments. Moreover, organizations get better-managed services from their cloud provider with higher service-level agreements. We will use Azure as our preferred cloud computing provider for the purpose of demonstrating samples and examples. However, you can use any cloud provider that provides complete end-to-end services for DevOps. We will use multiple features and services provided by Azure across IaaS and PaaS. We will consume Operational Insights and Application Insights to monitor our environment and application, which will help capture relevant telemetry for auditing purposes. We will provision Azure virtual machines running Windows and Docker Containers as a hosting platform. We will use Windows Server 2016 as the target operating system for our applications on cloud. Azure Resource Manager (ARM). We will also use Desired State Configuration and PowerShell as our configuration management platform and tool. We will use Visual Studio Team Services (VSTS), a suite of PaaS services on cloud provided by Microsoft, to set up and implement our end-to-end DevOps practices. Microsoft also provides the same services as part of Team Foundation Services (TFS) as an on-premises solution. Technologies like Pester, DSC, and PowerShell can be deployed and configured to run on any platform. These will help both in the validation of our environment and in the configuration of both application and environment, as part of our Configuration management process. Windows Server 2016 is a breakthrough operating system from Microsoft also referred to as Cloud Operating System. We will look into Windows Server 2016 in the following section. Windows Server 2016 Windows Server 2016 has come a long way. All the way from Windows NT to Windows 2000 and 2003, then Windows 2008 (R2) and 2012 (R2), and now Windows Server 2016. Windows NT was the first popular Windows server among enterprises. However, the true enterprise servers were Windows 2000 and Windows 2003. The popularity of Windows Server 2003 was unprecedented and it was widely adopted. With Windows Server 2008 and 2008 R2, the idea of the data center took priority and enterprises with their own data center adopted it. Even the Windows Server 2008 series was quite popular among enterprises. In 2010, the Microsoft cloud, Azure, was launched. The first steps towards a cloud operating system were Windows Server 2012 and 2012 R2. They had the blueprints and technology to be seamlessly provisioned on Azure. Now, when Azure and cloud are gaining enormous popularity, Windows Server 2016 is released as a true cloud operating system. The evolution of Windows Server is shown in Figure 2: Figure 2: Windows Server evolution Windows Server 2016 is referred to as a cloud operating system. It is built with cloud in mind. It is also referred to as the first operating system that enables DevOps seamlessly by providing relevant tools and technologies. It makes implementing DevOps simpler and easier through its productivity tools. Let us look briefly into these tools and technologies. Multiple choices for Application platform Windows Server 2016 comes with many choices for application platform for applications. It provides the following: Windows Server 2016 Nano Server Windows and Docker Containers Hyper-V Containers Nested virtual machines Windows Server as a hosting platform Windows server 2016 can be used in the ways it has always been used, such as hosting applications and providing server functionalities. It provides the services necessary to make applications secure, scalable, and highly available. It also provides virtualization, directory services, certificate services, web server, databases, and more. These services can be consumed by the enterprise’s services and applications. Nano Server Windows Server provides a new option to host applications and services. This is a new variety of lightweight, scaled-down Windows server containing only the kernel and drivers necessary to run as an operating system. They are also known as headless servers. They do not have any graphical user interface and the only way to interact and manage them is through remote PowerShell. Out of the box, they do not contain any service or feature. The services need to be added to Nano servers explicitly before use. So far, they are the most secure servers from Microsoft. They are very lightweight and their resource requirements and consumption is less than 80% of a normal Windows server. The number of services running, the number of ports open, the number of processes running and the amount of memory and storage required, also are less than 80% compared to normal Windows server. Even though Nano Server out of box just has the kernel and drivers, its capabilities can be enhanced by adding features and deploying any Windows application on it. Windows Containers and Docker Containers are one of the most revolutionary features added to Windows Server 2016 after Nano Server. With the popularity and adoption of Docker Containers, which primarily run on Linux, Microsoft decided to introduce container services to Windows Server 2016. Containers are operating system virtualization. This means that multiple containers can be deployed on the same operating system and each one of them will share the host operating system kernel. It is the next level of virtualization after server virtualization (virtual machines). Containers generate the notion of complete operating system isolation and independence, even though it uses the same host operating system underneath it. This is possible through the use of namespace isolation and image layering. Containers are created from images. Images are immutable and cannot be modified. Each image has a base operating system and a series of instructions that are executed against it. Each instruction creates a new image on top of the previous image and contains only the modification. Finally, a writable image is stacked on top of these images. These images are combined into a single image, which can then be used for provisioning containers. A container made up of multiple image layers is shown in Figure 3: Figure 3: Containers made up of multiple image layers Namespace isolation helps provide containers with pristine new environments. The containers cannot see the host resources and the host cannot view the container resources. For the application within the container, a complete new installation of the operating system is available. The containers share the host's memory, CPU, and storage. Containers offer operating system virtualization, which means the containers can host only those operating systems supported by the host operating system. There cannot be a Windows container running on a Linux host, and a Linux container cannot run on a Windows host operating system. Hyper-V containers Another type of container technology Windows Server 2016 provides is Hyper-V Containers. These containers are similar to Windows Containers. They are managed through the same Docker client and extend the same Docker APIs. However, these containers contain their own scaled down operating system kernel. They do not share the host operating system but have their own dedicated operating system, and their own dedicated memory and CPU assigned in exactly the same way virtual machines are assigned resources. Hyper-V Containers brings in a higher level of isolation of containers from the host. While Windows Containers runs in full trust on the host operating system, Hyper-V Containers does not have full trust from the host’s perspective. It is this isolation that differentiates Hyper-V Containers from Windows Containers. Hyper-V Containers is ideal for hosting applications that might harm the host server affecting every other container and service on it. Scenarios where users can bring in and execute their own code are examples of such applications. Hyper-V Containers provides adequate isolation and security to ensure that applications cannot access the host resources and change them. Nested virtual machines Another breakthrough innovation of Windows Server 2016 is that now, virtual machines can host virtual machines. Now, we can deploy multiple virtual machines containing all tiers of an application within a single virtual machine. This is made possible through software-defined networks and storage. Enabling Microservices Nano Servers and Containers helps provide advanced lightweight deployment options through which we can now decompose the entire application into multiple smaller, independent services, each with their own scalability and high availability configuration, and deploy them independently of each other. Microservices helps in making the entire DevOps lifecycle agile. With Microservices, changes to services do not demand that every other Microservices undergo every test validation. Only the changed service needs to be tested rigorously, along with its integration with other services. Compare this to a monolithic application. Even a single small change will result in having to test the entire application. Microservices helps in that it requires smaller teams for its development, testing of a service can happen independently of other services, and deployment can be done for each service in isolation. Continuous Integration, Continuous Deployment, and Continuous Delivery for each service can be executed in isolation rather than compiling, testing, and deploying the whole application every time there is a change. Reduced maintenance Because of their intrinsic nature, Windows Nano Servers and Containers are lightweight and quick to provision. They help to quickly provision and configure environments, thereby reducing the overall time needed for Continuous Integration and deployment. Also, these resources can be provisioned on Azure on-demand without waiting for hours. Because of their small footprint in terms of size, storage, memory, and features, they need less maintenance. These servers are patched less often, with fewer fixes, they are secure by default, and have less chance of failing applications, which makes them ideal for operations. The operations team needs to spend fewer hours maintaining these servers compared to normal servers. This reduces the overall cost for the organization and helps DevOps ensure a high-quality delivery. Configuration management tools Windows Server 2016 comes with Windows Management Framework 5.0 installed by default. Desired State Configuration (DSC) is the new configuration management platform available out of the box in Windows Server 2016. It has a rich, mature set of features that enables configuration management for both environments and applications. With DSC, the desired state and configuration of environments are authored as part of Infrastructure as Code and executed on every server on a scheduled basis. They help check the current state of servers with the documented desired state and bring them back to the desired state. DSC is available as part of PowerShell and PowerShell helps with authoring these configuration documents. Windows server 2016 provides a PowerShell unit testing framework known as PESTER. Historically, unit testing for infrastructure environments was always missing as a feature. PESTER enables the testing of infrastructure provisioned either manually or through Infrastructure as Code using DSC configuration or ARM templates. These help with the operational validation of the entire environment, bringing in a high level of cadence and confidence in Continuous Integration and deployment processes. Deployment and packaging Package management and the deployment of utilities and tools through automation is a new concept in the Windows world. Package management has been ubiquitous in the Linux world for a long time. Packing management helps search, save, install, deploy, upgrade, and remove software packages from multiple sources and repositories on demand. There are public repositories such as Chocolatey and PSGallery available for storing readily deployable packages. Tools such as NuGet can connect these repositories and help with package management. They also help with the versioning of packages. Applications that rely on a specific package version can download it on an as-needed basis. Package management helps with the building of environments and application deployment. Package deployment is much easier and faster with this out-of-the-box Windows feature. Summary We have covered a lot of ground in this article. DevOps concepts were discussed mapping technology to those concepts. In this we saw the impetus DevOps can get from technology. We looked at cloud computing and the different services provided by cloud providers. From there, we went on to look at the benefits Windows Server 2016 brings to DevOps practices and how Windows Server 2016 makes DevOps easier and faster with its native tools and features. Resources for Article: Further resources on this subject: Introducing Dynamics CRM [article] Features of Dynamics GP [article] Creating Your First Plug-in [article]

0
0
11686

Packt

11 Nov 2016

12 min read

Algorithm Analysis

Packt

11 Nov 2016

12 min read

In this article by Prakash and Achyutuni Sri Krishna Rao, authors of the book R Data Structures and Algorithms, we will discuss how an algorithm can be defined as a set of step-by-step instructions which govern the outline of a program that needs to be executed using computational resources. The execution can be in any programming language such as R, Python, and Java. Data is an intricate component of any program, and depending on how data is organized (data structure), your execution time can vary drastically. That’s why data structure is such a critical component of any good algorithm implementation. (For more resources related to this topic, see here.) The sorting algorithm, which acts as a connecter between the user-defined input and user-desired output, can be approached in multiple ways: Bubble sort and Shell sort, which are simple variants of sorting, but are highly inefficient Insertion sort and Selection sort, primarily used for sorting small datasets Merge sort, Heap sort, and Quick sort, which are efficient ways of sorting based on the complexities involved in an average system runtime Distributed sorts such as counting sort, bucket sort, and radix sort, which can handle both runtime and memory usage Each of these options can, in turn, handle a particular set of instances more effectively. This essentially deduces the concept of a “good algorithm”. An algorithm can be termed as “good” if it possesses attributes such as the following among many others: Shorter running time Lesser memory utilization Simplicity in reading the code Generality in accepting inputs This book will concentrate primarily on running time or time complexity, partly on memory utilization, and their relationship during program execution. Introduction A problem can be approached using multiple algorithms, and each algorithm can be assessed based on certain parameters such as: System runtime Memory requirement However, these parameters are generally affected by external environmental factors such as: Handling of data structures System software and hardware configurations Style of writing and compiling codes Programming language As it is highly impossible to control all external parameters, it becomes difficult to estimate the system runtime of multiple algorithms for performance comparison (ideal scenario analysis). Asymptotic analysis is one such technique which can be used to assess an algorithm’s efficiency without actually coding and compiling the entire program. It is a functional form representing a pseudo system runtime based on the size of input data and the number of operations. It is based on the principle that the growth rate of input data is directly proportional to the system runtime. For example, in the case of insertion sorting, the size represents the length of the input vector, and the number of operations represents the complexity of sort operations. This analysis can only be used to gauge the consideration of implementing the algorithm rather than evaluating the merits and demerits of algorithms in comparison. The following table represents the most widely used growth rate functional forms. The most widely used functional forms of growth rates are based on the size of input data, which are used to analyze the performance of algorithms. These are also considered as pseudo-functional forms to evaluate an algorithm’s system runtime. Memory management in R Memory management primarily deals with the administration of available memory and prediction of additional memory required for smoother and faster execution of functions. The current section will cover the concept of memory allocation, which deals with storage of an object in the R environment. Memory Allocation: R allocates memory differently to different objects in its environment. Memory allocation can be determined using the object_size function from the pryr package. The pryr package can be installed from the CRAN repository using install.packages(“pryr”). The object_size function in pryr is similar to the object.size function in the base package. However, it is more accurate as it: Takes into account the environment size associated with the current object Takes into account the shared elements within a given object under consideration The following are examples of using the object_size function in R to evaluate memory allocation: > object_size(1) ## Memory allocated for a single numeric vector 48 B > object_size(“R”) ## Memory allocated for a single character vector 96 B > object_size(TRUE) ## Memory allocated for a single logical vector 48 B > object_size(1i) ## Memory allocated for a single complex vector 56 B The storage required by an object can be attributed to the following parameters: Metadata: Metadata of an object is defined by the type of object used such as character, integers, logical, and so on. The type can also usually be helpful during debugging. Node pointer: The node pointer maintains the link between the different nodes, and depending on the number of node pointers used, memory requirement changes. For example, a doubly linked list requires more memory than a singly linked list, as it uses two node pointers to connect to the previous and next nodes. Attribute pointer: Pointer to keep reference for attributes; this helps to reduce memory allocation, especially the data stored by a variable. Memory allocation: Length of the vector representing the currently used space Size: Size represents the true allocated space length of the vector. Memory Padding: Padding applied to a component, for example, each element begins after an 8-byte boundary. The Object_size() command is also used to see the inherent memory allocation as shown in the following table: The preceding table shows inherent memory allocated by each data structure/type. Let’s simulate scenarios with varying lengths of a vector with different data types such as integer, character, Boolean, and complex. The simulation is performed by simulating a vector length from 0 to 60 as follows: > vec_length <- 0:60 > num_vec_size <- sapply(vec_length, function(x) object_size(seq(x))) > char_vec_size <- sapply(vec_length, function(x) object_size(rep(“a”,x))) > log_vec_size <- sapply(vec_length, function(x) object_size(rep(TRUE,x))) > comp_vec_size <- sapply(vec_length, function(x) object_size(rep(“2i”,x))) Num_vec_size computes the memory requirement for each numeric vector from zero to 60 number of elements. These elements are integers increasing sequentially, as stated in the function. Similarly, incremental memory requirements are calculated for character (char_vec_size), logical (log_vec_size), and complex (comp_vec_size) vectors. The result obtained from the simulation can be plotted using code. > par(mfrow=c(2,2)) > plot(num_vec_size ~ vec_length, xlab = “Numeric seq vector”, ylab = “Memory allocated (in bytes)”, + type = “n”) > abline(h = (c(0,8,16,32,48,64,128)+40), col = “grey”) > lines(num_vec_size, type = “S”) The result obtained on running the preceding code is shown in following figure. From the following figure, it can be observed that memory allocated to a vector is a function of its length and the object type used. However, the relationship does not seem to be linear—rather, it seems to increase in step. This is due to the fact that for better and consistent performance, R initially assigns big blocks of memory from RAM and handles them internally. These memory blocks are individually assigned to vectors based on the type and the number of elements within. Initially, memory blocks seem to be irregular towards a particular level (128 bytes for numeric/logical vector, and 176 bytes for character/complex vectors), and later become stable with small increments of 8 bytes as can be seen in the plots: Memory allocation based on length of vector Due to initial memory allocation differences, numeric and logical vectors show similar memory allocation patterns, and complex vectors behave similar to the character vectors. Memory management helps to efficiently run an algorithm. However, before the execution of any program, we should evaluate it based on its runtime. In the next sub-section, we will discuss the basic concepts involved in obtaining the runtime of any function, and its comparison with similar functions. System runtime in R System runtime is very essential for benchmarking different algorithms. The process helps us in comparing different options, and pick the best algorithm. The CRAN package microbenchmark is used to evaluate the runtime of any expression/function/code at an accuracy of a sub-millisecond. It is an accurate replacement to the system.time() function. Also, all the evaluations are performed in C code to minimize any overhead. The following methods are used to measure the time elapsed: The QueryPerformanceCounter interface on Windows OS The clock_gettime API on Linux OS The mach_absolute_time function on MAC OS The gethrtime function on Solaris OS In our current example, we shall be using the mtcars data, which is in the package datasets. This data is obtained from 1974 Motor Trend US magazine, which comprises of fuel consumption comparison along with 10 automobile designs and the performance of 32 automobiles (1973-74 models). Now, we would like to perform an operation in which a specific numeric attribute (mpg means miles per gallon) needs to be averaged to the corresponding unique values in an integer attribute (carb means no of carburetors). This can be performed using multiple ways such as aggregate, group_by, by, split, ddply(plyr), tapply, data.table, dplyr, sqldf, dplyr and so on. In our current scenario, we have used the following four ways: aggregate function aggregate(mpg~carb,data=mtcars,mean) ddply from plyr package ddply( mtcars, .(carb),function(x) mean(x$mpg)) data.table format mtcars_tb[,mean(mpg),by=carb] group_by function summarize(group_by(mtcars, carb), mean(mpg)) Then, microbenchmark is used to determine the performance of each of the four ways mentioned in the preceding list. Here, we will be evaluating each expression 100 times. > library(microbenchmark) > MB_res <- microbenchmark( + Aggregate_func=aggregate(mpg~carb,data=mtcars,mean), + Ddply_func=ddply( mtcars, .(carb),function(x) mean(x$mpg)), + Data_table_func = mtcars_tb[,mean(mpg),by=carb], + Group_by_func = summarize(group_by(mtcars, carb), mean(mpg)), + times=1000 + ) The output table is as follows: > MB_res Unit: microseconds expr min lq mean median uq max neval Aggregate_func 851.489 913.8015 1001.9007 944.775 1000.4905 6094.209 1000 Ddply_func 1370.519 1475.1685 1579.6123 1517.322 1575.7855 6598.578 1000 Data_table_func 493.739 552.7540 610.7791 577.495 621.6635 3125.179 1000 Group_by_func 932.129 1008.5540 1095.4193 1033.113 1076.1825 4279.435 1000 The output plot is as follows: > library(ggplot2) > autoplot(MB_res) Distribution of time (microseconds) for 1000 iterations in each type of aggregate operation Among these four expressions and for the given dataset, data.table has performed effectively in the least possible time as compared to the others. However, expressions need to be tested under scenarios with a high number of observations, high number of attributes, and both prior to finalizing the best operator. Best, worst, and average Cases Based on the performance in terms of system runtime, a code can be classified under best, worst or average category for a particular algorithm. Let’s consider a sorting algorithm to understand in detail. A sorting algorithm is used to arrange a numeric vector in an ascending order, wherein the output vector should have the smallest number as its first element and largest number as its last element with intermediate elements in subsequent increasing order. In insertion sorting algorithm, the elements within a vector are arranged based on moving positions. In our scenario, we will be inserting each element at a time into a previously sorted vector, with a smaller set of elements moving towards the end. Now, let’s define best, worst and average-case scenarios for an insertion sorting algorithm. Best Case: A best case is one which requires the least running time. For example: a vector with all elements arranged in increasing order requires least amount of time for sorting. Worst Case: A worst case is one which requires the maximum possible runtime to complete sorting a vector. For example: a vector with all the elements sorted in decreasing order requires most amount of time for sorting. Average Case: An average case is one which requires intermediate time to complete sorting a vector. For example: a vector with half elements sorted in increasing order and the remaining in decreasing order. An average case is assessed using multiple vectors of differently arranged elements. Generally, the best-case scenarios are not considered to benchmark an algorithm, since they evaluate an algorithm most optimistically. However, if the probability of occurrence of best case is high, then algorithms can be compared using the best-case scenarios. Similar to best case, worst-case scenarios evaluate the algorithm most pessimistically. It is only used to benchmark algorithms which are used in real-time applications, such as railway network controls, air traffic controls, and the like. Sometimes, when we are not aware of input data distributions, it is safe to assess the performance of the algorithm based on the worst-case scenario. Most of the times, average-case scenario is used as a representative measure of an algorithm’s performance; however, this is valid only when we are aware of the input data distribution. Average-case scenarios may not evaluate the algorithm properly if the distribution of input data is skewed. In the case of sorting, if most of the input vectors are arranged in descending order, the average-case scenario may not be the best form of evaluating the algorithm. In a nutshell, real-time application scenarios, along with input data distribution, are major criterions to analyze the algorithms based on best, worst, and average cases. Summary This article summarizes the basic concepts and nuances of evaluating algorithms in R. We covered the conceptual theory of memory management and system runtime in R. We discussed the best, worst, and average-case scenarios to evaluate the performance of algorithms. Resources for Article: Further resources on this subject: Reconstructing 3D Scenes [article] Raster Calculations [article] Remote Sensing and Histogram [article]

0
0
21409

Packt

11 Nov 2016

17 min read

Introduction to C# and .NET

Packt

11 Nov 2016

17 min read

In this article by Marino Posadas, the author of the book, Mastering C# and .NET Programming, we will cover the core concepts of C# and .NET, starting from the initial version and principal motivations behind its creation, and covering also the new aspects of the language, that appeared in version 2.0 and 3.0. (For more resources related to this topic, see here.) We'll illustrate all the main concepts with small code snippets, short enough to facilitate its understanding and easy reproduction. We will cover the following topics: C# and its role in the Microsoft Development ecosystem Difference between strongly typed and weakly typed languages The evolution in versions 2.0 and 3.0 Generics Extension methods C#: what's different in the language I had the chance to chat with Hejlsberg a couple of times about the C # language and what the initial purposes and requirements imposed in its creation were and which other languages inspired him or contributed to his ideas. The first time we talked, in Tech-Ed 2001 (at Barcelona, Spain), I asked him about the principles of his language and what makes it different from others. He first said that it was not only him who created the language, but also a group of people, especially Scott Wiltamuth, Peter Golde, Peter Sollich, and Eric Gunnerson. One of the first books ever published on the subject was, A Programmer's Introduction to C#, Gunnerson's.E., APress, 2000). About the principles, he mentioned this: One of the key differences between C# and these other languages, particularly Java, is that we tried to stay much closer to C++ in our design. C# borrows most of its operators, keywords, and statements directly from C++. But beyond these more traditional language issues, one of our key design goals was to make the C# language component-oriented, to add to the language itself all of the concepts that you need when you write components. Concepts such as properties, methods, events, attributes, and documentation are all first-class language constructs. He stated also this: When you write code in C#, you write everything in one place. There is no need for header files, IDL files (Interface Definition Language), GUIDs and complicated interfaces. This means that you can write code that is self-descriptive in this way given that you're dealing with a self-contained unit (let's remember the role of the manifest, optionally embedded in assemblies). In this mode, you can also extend existing technologies in a variety of ways, as we'll see in the examples. Languages: strongly typed, weakly typed, dynamic, and static The C# language is a strongly typed language: this means that any attempt to pass a wrong kind of parameter as an argument, or to assign a value to a variable that is not implicitly convertible, will generate a compilation error. This avoids many errors that only happen at runtime in other languages. In addition, by dynamic, we mean those languages whose rules are applied at runtime, while static languages apply their rules at compile time. JavaScript or PHP are good examples of the former case, and C/C++ of the latter. If we make a graphic representation of this situation, we might come up with something like what is shown in the following figure: In the figure, we can see that C# is clearly strongly typed, but it's much more dynamic than C++ or Scala, to mention a few. Of course, there are several criteria to catalog languages for their typing (weak versus strong) and for their dynamism (dynamic versus static). Note that this has implications in the IDE as well. Editors can tell us which type is expected in every case, and if you use a dynamic declaration such as var, the right side of the equality (if any) will be evaluated, and we will be shown the calculated value for every declaration: Even outside of the .NET world, Visual Studio's IDE is now able to provide strongly typed and Intellisense experiences when using languages such as TypeScript, a superset of JavaScript that transpiles (converts into) pure JavaScript but can be written using the same coding experience as what we would have in C# or any other .NET language. It's available as a separate type of project, if you're curious about it, and the latest up-to-date version is TypeScript 1.8, and it was recently published (you can take a look at a detailed description of its new capabilities at https://blogs.msdn.microsoft.com/typescript/2016/02/22/announcing-typescript-1-8-2/). The main differences So, going back to the title, what made C# different? I'll point out five core points: Everything is an object. Other languages, such as Smalltalk, Lisp, among others, have done this earlier, but due to different reasons, the performance penalty was pretty hard. As you know, it's enough to take a look at the Object Explorer to be able to check where an object comes from. It's a good practice to check the very basic values, such as int or String, which are nothing but aliases of System.Int32 and System.String, and both come from object, as shown in the following screenshot: Using the Boxing and Unboxing techniques, any value type can be converted into an object, and the value of an object can be converted into a simple value type. These conversions are made by simply casting the type to an object (and vice versa) in this manner: // Boxing and Unboxing int y = 3; // this is declared in the stack // Boxing y in a Heap reference z // If we change z, y remains the same. object z = y; // Unboxing y into h (the value of // z is copied to the stack) int h = (int)z; Using Reflection (the technique that allows you to read a component's metadata), an application can call itself or other applications, creating new instances of their containing classes. As a short demo, this simple code launches another instance of a WPF application (a very simple one with just one button, but that doesn't matter): static short counter = 1; private void btnLaunch_Click(object sender, RoutedEventArgs e) { // Establish a reference to this window Type windowType = this.GetType(); // Creates an instance of the Window object objWindow = Activator.CreateInstance(windowType); // cast to a MainWindow type MainWindow aWindow = (MainWindow)objWindow; aWindow.Title = "Reflected Window No: " + (++counter).ToString(); aWindow.Show(); } Now, every time we click on the button, a new instance of the window is created and launched, indicating its creation order in the title's window: You can have access to other components through a technology called Platform Invoke, which means you can call operating systems' functions by importing the existing DLLs using the DllImport attribute: For instance, you can make an external program's window the child of your own window using the SetParent API, which is part of User32.dll, or you can control operating system events, such as trying to shut down the system while our application is still active. Actually, once the permissions are given, your application can call any function located in any of the system's DLL if you need access to native resources. The schema that gives us access to these resources looks like what is shown in the following figure: If you want to try out some of these possibilities, the mandatory resource to keep in mind is http://www.PInvoke.net, where you have most of the useful system APIs, with examples of how to use them in C#. These interoperation capabilities are extended to interactions with applications that admit Automation, such as those in the Microsoft Office Suite, AutoCAD, and so on. Finally, unsafe code allows you to write inline C code with pointers, perform unsafe casts, and even pin down memory in order to avoid accidental garbage collection. However, unsafe does not mean that it is unmanaged. Unsafe code is deeply tied into the security system. There are many situations in which this is very useful. It might be an algorithm that's difficult to implement or a method whose execution is so CPU-intensive that performance penalties become unacceptable. While all this is important, I was surprised by the fact that every event handler in C# (as also in other .NET languages) would have two and only two arguments. So, I asked Anders about it, and his answer was one of the most clear and logical ones that I've ever heard. The evolution in versions 2.0 and 3.0 As we see, even from the very beginning, the Hejlsberg's team started with a complete, flexible, and modern platform, capable to be extended in many ways as technology evolves. This intention became clear since version 2.0. The first actual fundamental change that took place in the language was the incorporation of Generics. Don Syme, who would later on lead the team that created the F# language, was very active and led this team as well, so it was ready for version 2.0 of the .NET Framework (not just in C# but in C++ and VB.NET as well). Generics The purpose of generics was mainly to facilitate the creation of more reusable code (one of the principles of OOP, by the way). The name refers to a set of language features that allow classes, structures, interfaces, methods, and delegates to be declared and defined with unspecified or generic type parameters instead of specific types (see https://msdn.microsoft.com/en-us/library/ms379564(v=vs.80).aspx, for more details). So, you can define members in a sort of abstract definition, and later on, at the time of using it, a real, concrete type will be applied. The basic .NET classes (BCL) were enhanced in the System namespace and a new System.Collections.Generic namespace was created to support this new feature in depth. In addition, new support methods were added to ease the use of this new type, such as Type.IsGenericType (obviously, to check types), Type.GetGenericArguments (self-descriptive), and the very useful Type.MakeGenericType, which can create a generic type of any kind from a previous nonspecified declaration. The following code uses the generic type definition for a Dictionary (Dictionary<,>) and creates an actual (build) type using this technique. The relevant code is the following (the rest, including the output to the console is included in Demo_02_03): // Define a generic Dictionary (the // comma is enough for the compiler to infer number of // parameters, but we didn't decide the types yet. Type generic = typeof(Dictionary<,>); ShowTypeData(generic); // We define an array of types for the Dictionary (Key, Value) // Key is of type string, and Value is of -this- type (Program) // Notice that types could be -in this case- of any kind Type[] typeArgs = { typeof(string), typeof(Program) }; // Now we use MakeGenericType to create a Type representing // the actualType generic type. Type actualType = generic.MakeGenericType(typeArgs); ShowTypeData(actualType); As you see, MakeGenericType expects an array of (concrete) types. Later on (not in the preceding code), we use GetGenericTypeDefinition, IsGenericType, and GetGenericArguments in order to introspect the resulting types and present the following output in the console: So, we have different ways to declare generics with identical results as far as the operations in the code are concerned. Obviously, manipulating already constructed generic types is not the only possibility, since one of the main goals of generics is to avoid casting operations by simplifying the work with collections. Up until version 2.0, collections could only hold basic types: integers, longs, strings, and so on, along with emulating different types of data structures, such as stacks, queues, linked lists, and so on. Besides this, Generics have another big advantage: you can write methods that support working with different types of arguments (and return values) as long as you provide a correct way to handle all possible cases. Once again, the notion of contract will be crucial here. Creating custom generic types and methods Other useful feature is the possibility to use custom generic types. Generic types and the support for optional values through the System.Nullable<T> type were, for many developers, two of the most important features included in version 2.0 of the language. Imagine you have a Customer class, which your application manages. So, in different use cases, you will read collections of customers and perform operations with them. Now, what if you need an operation such as Compare_Customers? What would be the criteria to use in this case? Even worse, what if we would like to use the same criteria with different types of entities, such as Customer and Provider? In these cases, some characteristics of generics come in handy. To start with, we can build a class that has an implementation of the IComparer interface, so we establish out of any uncertainty what the criteria to be used is in order to consider customer C1 bigger or smaller than customer C2. For instance, if the criteria is only Balance, we can start with a basic Customer class, to which we add a static method in order to generate a list of random customers: public class Customer { public string Name { get; set; } public string Country { get; set; } public int Balance { get; set; } public static string[] Countries = { "US", "UK", "India", "Canada", "China" }; public static List<Customer> customersList(int number) { List<Customer> list = new List<Customer>(); Random rnd = new Random(System.DateTime.Now.Millisecond); for (int i = 1; i <= number; i++) { Customer c = new Customer(); c.Name = Path.GetRandomFileName().Replace(".", ""); c.Country = Countries[rnd.Next(0, 4)]; c.Balance = rnd.Next(0, 100000); list.Add(c); } return list; } } Then, we build another CustomerComparer class, which implements the IComparer interface. The difference is that this comparison method is a generic instantiation customized for the Customer objects, so we have the freedom of implementing this scenario just in the way that seems convenient for our logic. In this case, we're using Balance as an ordering criteria, so that we would have the following: public class CustomerComparer : IComparer<Customer> { public int Compare(Customer x, Customer y) { // Implementation of IComparer returns an int // indicating if object x is less than, equal to or // greater than y. if (x.Balance < y.Balance) { return -1; } else if (x.Balance > y.Balance) return 1; else { return 0; } // they're equal } } We can see that the criteria used to compare is just the one we decided for our business logic. Finally, another class, GenericCustomer, which implements an entry point of the application, uses both classes in this manner: public class GenericCustomers { public static void Main() { List<Customer> theList = Customer.customersList(25); CustomerComparer cc = new CustomerComparer(); // Sort now uses our own definition of comparison theList.Sort(cc); Console.WriteLine(" List of customers ordered by Balance"); Console.WriteLine(" " + string.Concat(Enumerable.Repeat("-", 36))); foreach (var item in theList) { Console.WriteLine(" Name: {0}, Country: {1}, t Balance: {2}", item.Name, item.Country, item.Balance); } Console.ReadKey(); } } This produces an output of random customers order by their balance: This is even better: we can change the method so that it supports both customers and providers indistinctly. To do this, we need to abstract a common property of both entities that we can use for comparison. If our implementation of Provider has different or similar fields (but they're not the same), it doesn't matter as long as we have the common factor: a Balance field. So we begin with a simple definition of this common factor, an interface called IPersonBalance: public interface IPersonBalance { int Balance { get; set; } } As long as our Provider class implements this interface, we can later create a common method that's able to compare both objects, so, let's assume our Provider class looks like this: public class Provider : IPersonBalance { public string ProviderName { get; set; } public string ShipCountry { get; set; } public int Balance { get; set; } public static string[] Countries = { "US", "Spain", "India", "France", "Italy" }; public static List<Provider> providersList(int number) { List<Provider> list = new List<Provider>(); Random rnd = new Random(System.DateTime.Now.Millisecond); for (int i = 1; i <= number; i++) { Provider p = new Provider(); p.ProviderName = Path.GetRandomFileName().Replace(".", ""); p.ShipCountry = Countries[rnd.Next(0, 4)]; p.Balance = rnd.Next(0, 100000); list.Add(p); } return list; } } Now, we rewrite the Comparer method to be a GenericComparer class, capable of dealing with both types of entities: public class GenericComparer : IComparer<IPersonBalance> { public int Compare(IPersonBalance x, IPersonBalance y) { if (x.Balance < y.Balance) { return -1; } else if (x.Balance > y.Balance) return 1; else { return 0; } } } Note that in this implementation, IComparer depends on an interface, not on an actual class, and that this interface simply defines the common factor of these entities. Now, our new entry point will put everything together in order to obtain an ordered list of random Provider classes that uses the common comparison method just created: public static void Main() { List<Provider> providerList = Provider.providersList(25); GenericComparer gc = new GenericComparer(); // Sort now uses our own definition of comparison providerList.Sort(gc); Console.WriteLine(" List of providers ordered by Balance"); Console.WriteLine(" " + ("").PadRight(36, '-')); foreach (var item in providerList) { Console.WriteLine(" ProviderName: {0}, S.Country: {1}, t Balance: {2}", item.ProviderName, item.ShipCountry, item.Balance); } Console.ReadKey(); } In this way, we obtain an output like what is shown in the following figure (note that we didn't take much care of formatting in order to focus on the process): The example shows how generics (and interfaces: also generic) come to our rescue in these type of situations, and—as we'll have the opportunity to prove when talking about implementations of design patterns—this is key to facilitating good practices. So far, some of the most critical concepts behind generics have been discussed. However, the real power comes from joining these capabilities with two new features of the language: lambda expressions and the LINQ syntax. Extension methods Finally, we can extend existing classes' functionality. This means extending even the .NET Framework base types, such as int or String. This is a very useful feature, and it's performed in the way it is recommended by the documentation; no violation of basic principles of OOP occur. The procedure is fairly simple. We need to create a new public static top level (not nested) class containing a public static method with an initial argument declaration especially suited for the compiler to assume that the compiled code will be appended to the actual functionality of the type. The procedure can be used with any class, either belonging to the .NET framework or a customized user or class. Once we have the declaration, its usage is fairly simple, as shown in this code: public static class StringExtension { public static string ExtendedString(this string s) { return "{{ " + s + " }}"; } } Note that the first argument, referred with the this keyword, references the string to be used; so, in this example, we will call the method without any extra arguments (although we can pass as many arguments as we need for other extensions). To put it to work, we just have to add something like this: Console.WriteLine("The word " + "evaluate".ExtendedString() + " is extended"); We will get the extended output with the word enclosed in double brackets: Summary So in this article we saw some of the most relevant enhancements made to the C# language in versions 2 and 3. We started by reviewing the main differences between C# and other languages and understanding the meaning of strongly typed, in this case, together with the concepts of static and dynamic. We followed this up with an examination of the generics feature that appeared in version 2.0 of the framework and analyzed some samples to illustrate some typical use cases, including the creation of custom generic methods. Finally, we covered the extension methods. Resources for Article: Further resources on this subject: Debugging Your .NET Application [article] Why we need Design Patterns? [article] Creating a NHibernate session to access database within ASP.NET [article]

0
0
12375

article-image-planning-and-structuring-your-test-driven-ios-app

Packt

11 Nov 2016

13 min read

Planning and Structuring Your Test-Driven iOS App

Packt

11 Nov 2016

13 min read

0
0
27364

Packt

10 Nov 2016

15 min read

Introduction to JavaScript

Packt

10 Nov 2016

15 min read

In this article by Simon Timms, author of the book, Mastering JavaScript Design Patterns - Second Edition, we will explore the history of JavaScript and how it came to be the important language that it is today (For more resources related to this topic, see here.) JavaScript is an evolving language that has come a long way from its inception. Possibly more than any other programming language, it has grown and changed with the growth of the World Wide Web. As JavaScript has evolved and grown in importance, the need to apply rigorous methods to its construction has also grown. The road to JavaScript We'll never know how language first came into being. Did it slowly evolve from a series of grunts and guttural sounds made during grooming rituals? Perhaps it developed to allow mothers and their offspring to communicate. Both of these are theories, all but impossible to prove. Nobody was around to observe our ancestors during that important period. In fact, the general lack of empirical evidence lead the Linguistic Society of Paris to ban further discussions on the topic, seeing it as unsuitable for serious study. The early days Fortunately, programming languages have developed in recent history and we've been able to watch them grow and change. JavaScript has one of the more interesting histories of modern programming languages. During what must have been an absolutely frantic 10 days in May of 1995, a programmer at Netscape wrote the foundation for what would grow up to be modern JavaScript. At the time, Netscape was involved in the first of the browser wars with Microsoft. The vision for Netscape was far grander than simply developing a browser. They wanted to create an entire distributed operating system making use of Sun Microsystems' recently-released Java programming language. Java was a much more modern alternative to the C++ Microsoft was pushing. However, Netscape didn't have an answer to Visual Basic. Visual Basic was an easier to use programming language, which was targeted at developers with less experience. It avoided some of the difficulties around memory management that make C and C++ notoriously difficult to program. Visual Basic also avoided strict typing and overall allowed more leeway: Brendan Eich was tasked with developing Netscape repartee to VB. The project was initially codenamed Mocha, but was renamed LiveScript before Netscape 2.0 beta was released. By the time the full release was available, Mocha/LiveScript had been renamed JavaScript to tie it into the Java applet integration. Java Applets were small applications which ran in the browser. They had a different security model from the browser itself and so were limited in how they could interact with both the browser and the local system. It is quite rare to see applets these days, as much of their functionality has become part of the browser. Java was riding a popular wave at the time and any relationship to it was played up. The name has caused much confusion over the years. JavaScript is a very different language from Java. JavaScript is an interpreted language with loose typing, which runs primarily on the browser. Java is a language that is compiled to bytecode, which is then executed on the Java Virtual Machine. It has applicability in numerous scenarios, from the browser (through the use of Java applets), to the server (Tomcat, JBoss, and so on), to full desktop applications (Eclipse, OpenOffice, and so on). In most laypersons' minds, the confusion remains. JavaScript turned out to be really quite useful for interacting with the web browser. It was not long until Microsoft had also adopted JavaScript into their Internet Explorer to complement VBScript. The Microsoft implementation was known as JScript. By late 1996, it was clear that JavaScript was going to be the winning web language for the near future. In order to limit the amount of language deviation between implementations, Sun and Netscape began working with the European Computer Manufacturers Association (ECMA) to develop a standard to which future versions of JavaScript would need to comply. The standard was released very quickly (very quickly in terms of how rapidly standards organizations move), in July of 1997. On the off chance that you have not seen enough names yet for JavaScript, the standard version was called ECMAScript, a name which still persists in some circles. Unfortunately, the standard only specified the very core parts of JavaScript. With the browser wars raging, it was apparent that any vendor that stuck with only the basic implementation of JavaScript would quickly be left behind. At the same time, there was much work going on to establish a standard Document Object Model (DOM) for browsers. The DOM was, in effect, an API for a web page that could be manipulated using JavaScript. For many years, every JavaScript script would start by attempting to determine the browser on which it was running. This would dictate how to address elements in the DOM, as there were dramatic deviations between each browser. The spaghetti of code that was required to perform simple actions was legendary. I remember reading a year-long 20-part series on developing a Dynamic HTML (DHTML) drop down menu such that it would work on both Internet Explorer and Netscape Navigator. The same functionally can now be achieved with pure CSS without even having to resort to JavaScript. DHTML was a popular term in the late 1990s and early 2000s. It really referred to any web page that had some sort of dynamic content that was executed on the client side. It has fallen out of use, as the popularity of JavaScript has made almost every page a dynamic one. Fortunately, the efforts to standardize JavaScript continued behind the scenes. Versions 2 and 3 of ECMAScript were released in 1998 and 1999. It looked like there might finally be some agreement between the various parties interested in JavaScript. Work began in early 2000 on ECMAScript 4, which was to be a major new release. A pause Then, disaster struck. The various groups involved in the ECMAScript effort had major disagreements about the direction JavaScript was to take. Microsoft seemed to have lost interest in the standardization effort. It was somewhat understandable, as it was around that time that Netscape self-destructed and Internet Explorer became the de-facto standard. Microsoft implemented parts of ECMAScript 4 but not all of it. Others implemented more fully-featured support, but without the market leader on-board, developers didn't bother using them. Years passed without consensus and without a new release of ECMAScript. However, as frequently happens, the evolution of the Internet could not be stopped by a lack of agreement between major players. Libraries such as jQuery, Prototype, Dojo, and Mootools, papered over the major differences in browsers, making cross-browser development far easier. At the same time, the amount of JavaScript used in applications increased dramatically. The way of GMail The turning point was, perhaps, the release of Google's GMail application in 2004. Although XMLHTTPRequest, the technology behind Asynchronous JavaScript and XML (AJAX), had been around for about five years when GMail was released, it had not been well-used. When GMail was released, I was totally knocked off my feet by how smooth it was. We've grown used to applications that avoid full reloads, but at the time, it was a revolution. To make applications like that work, a great deal of JavaScript is needed. AJAX is a method by which small chunks of data are retrieved from the server by a client instead of refreshing the entire page. The technology allows for more interactive pages that avoid the jolt of full page reloads. The popularity of GMail was the trigger for a change that had been brewing for a while. Increasing JavaScript acceptance and standardization pushed us past the tipping point for the acceptance of JavaScript as a proper language. Up until that point, much of the use of JavaScript was for performing minor changes to the page and for validating form input. I joke with people that in the early days of JavaScript, the only function name which was used was Validate(). Applications such as GMail that have a heavy reliance on AJAX and avoid full page reloads are known as Single Page Applications or SPAs. By minimizing the changes to the page contents, users have a more fluid experience. By transferring only JavaScript Object Notation (JSON) payload instead of HTML, the amount of bandwidth required is also minimized. This makes applications appear to be snappier. In recent years, there have been great advances in frameworks that ease the creation of SPAs. AngularJS, backbone.js, and ember are all Model View Controller style frameworks. They have gained great popularity in the past two to three years and provide some interesting use of patterns. These frameworks are the evolution of years of experimentation with JavaScript best practices by some very smart people. JSON is a human-readable serialization format for JavaScript. It has become very popular in recent years, as it is easier and less cumbersome than previously popular formats such as XML. It lacks many of the companion technologies and strict grammatical rules of XML, but makes up for it in simplicity. At the same time as the frameworks using JavaScript are evolving, the language is too. 2015 saw the release of a much-vaunted new version of JavaScript that had been under development for some years. Initially called ECMAScript 6, the final name ended up being ECMAScript-2015. It brought with it some great improvements to the ecosystem. Browser vendors are rushing to adopt the standard. Because of the complexity of adding new language features to the code base, coupled with the fact that not everybody is on the cutting edge of browsers, a number of other languages that transcompile to JavaScript are gaining popularity. CoffeeScript is a Python-like language that strives to improve the readability and brevity of JavaScript. Developed by Google, Dart is being pushed by Google as an eventual replacement for JavaScript. Its construction addresses some of the optimizations that are impossible in traditional JavaScript. Until a Dart runtime is sufficiently popular, Google provides a Dart to the JavaScript transcompiler. TypeScript is a Microsoft project that adds some ECMAScript-2015 and even some ECMAScript-201X syntax, as well as an interesting typing system, to JavaScript. It aims to address some of the issues that large JavaScript projects present. The point of this discussion about the history of JavaScript is twofold: first, it is important to remember that languages do not develop in a vacuum. Both human languages and computer programming languages mutate based on the environments in which they are used. It is a popularly held belief that the Inuit people have a great number of words for "snow", as it was so prevalent in their environment. This may or may not be true, depending on your definition for the word and exactly who makes up the Inuit people. There are, however, a great number of examples of domain-specific lexicons evolving to meet the requirements for exact definitions in narrow fields. One need look no further than a specialty cooking store to see the great number of variants of items which a layperson such as myself would call a pan. The Sapir–Whorf hypothesis is a hypothesis within the linguistics domain, which suggests that not only is language influenced by the environment in which it is used, but also that language influences its environment. Also known as linguistic relativity, the theory is that one's cognitive processes differ based on how the language is constructed. Cognitive psychologist Keith Chen has proposed a fascinating example of this. In a very highly-viewed TED talk, Dr. Chen suggested that there is a strong positive correlation between languages that lack a future tense and those that have high savings rates (https://www.ted.com/talks/keith_chen_could_your_language_affect_your_ability_to_save_money/transcript). The hypothesis at which Dr. Chen arrived is that when your language does not have a strong sense of connection between the present and the future, this leads to more reckless behavior in the present. Thus, understanding the history of JavaScript puts one in a better position to understand how and where to make use of JavaScript. The second reason I explored the history of JavaScript is because it is absolutely fascinating to see how quickly such a popular tool has evolved. At the time of writing, it has been about 20 years since JavaScript was first built and its rise to popularity has been explosive. What more exciting thing is there than to work in an ever-evolving language? JavaScript everywhere Since the GMail revolution, JavaScript has grown immensely. The renewed browser wars, which pit Internet Explorer and Edge against Chrome, against Firefox, have lead to building a number of very fast JavaScript interpreters. Brand new optimization techniques have been deployed and it is not unusual to see JavaScript compiled to machine-native code for the added performance it gains. However, as the speed of JavaScript has increased, so has the complexity of the applications built using it. JavaScript is no longer simply a language for manipulating the browser, either. The JavaScript engine behind the popular Chrome browser has been extracted and is now at the heart of a number of interesting projects such as Node.js. Node.js started off as a highly asynchronous method of writing server-side applications. It has grown greatly and has a very active community supporting it. A wide variety of applications have been built using the Node.js runtime. Everything from build tools to editors have been built on the base of Node.js. Recently, the JavaScript engine for Microsoft Edge, ChakraCore, was also open sourced and can be embedded in NodeJS as an alternative to Google's V8. SpiderMonkey, the FireFox equivalent, is also open source and is making its way into more tools. JavaScript can even be used to control microcontrollers. The Johnny-Five framework is a programming framework for the very popular Arduino. It brings a much simpler approach to programming devices than the traditional low-level languages used for programming these devices. Using JavaScript and Arduino opens up a world of possibilities, from building robots to interacting with real-world sensors. All of the major smartphone platforms (iOS, Android, and Windows Phone) have an option to build applications using JavaScript. The tablet space is much the same with tablets supporting programming using JavaScript. Even the latest version of Windows provides a mechanism for building applications using JavaScript: JavaScript is becoming one of the most important languages in the world. Although language usage statistics are notoriously difficult to calculate, every single source which attempts to develop a ranking puts JavaScript in the top 10: Language index Rank of JavaScript Langpop.com 4 Statisticbrain.com 4 Codeval.com 6 TIOBE 8 What is more interesting is that most of of these rankings suggest that the usage of JavaScript is on the rise. The long and short of it is that JavaScript is going to be a major language in the next few years. More and more applications are being written in JavaScript and it is the lingua franca for any sort of web development. Developer of the popular Stack Overflow website Jeff Atwood created Atwood's Law regarding the wide adoption of JavaScript: "Any application that can be written in JavaScript, will eventually be written in JavaScript" – Atwood's Law, Jeff Atwood This insight has been proven to be correct time and time again. There are now compilers, spreadsheets, word processors—you name it—all written in JavaScript. As the applications which make use of JavaScript increase in complexity, the developer may stumble upon many of the same issues as have been encountered in traditional programming languages: how can we write this application to be adaptable to change? This brings us to the need for properly designing applications. No longer can we simply throw a bunch of JavaScript into a file and hope that it works properly. Nor can we rely on libraries such as jQuery to save ourselves. Libraries can only provide additional functionality and contribute nothing to the structure of an application. At least some attention must now be paid to how to construct the application to be extensible and adaptable. The real world is ever-changing, and any application that is unable to change to suit the changing world is likely to be left in the dust. Design patterns provide some guidance in building adaptable applications, which can shift with changing business needs. Summary JavaScript has an interesting history and is really coming of age. With server-side JavaScript taking off and large JavaScript applications becoming common, there is a need for more diligence in building JavaScript applications. For more information on JavaScript, you can check other books by Packt mentioned as follows: Mastering JavaScript Promises: https://www.packtpub.com/application-development/mastering-javascript-promises Mastering JavaScript High Performance: https://www.packtpub.com/web-development/mastering-javascript-high-performance JavaScript : Functional Programming for JavaScript Developers: https://www.packtpub.com/web-development/javascript-functional-programming-javascript-developers Resources for Article: Further resources on this subject: API with MongoDB and Node.js [article] Tips & Tricks for Ext JS 3.x [article] Saying Hello! [article]

0
0
12306

How-To Tutorials

Packt

10 Nov 2016

7 min read

Managing Users and Groups

Packt

10 Nov 2016

7 min read

In this article, we will cover the following recipes: Creating user account Creating user accounts in batch mode Creating a group Introduction In this article by Uday Sawant, the author of the book Ubuntu Server Cookbook, you will see how to add new users to the Ubuntu server, update existing users. You will get to know the default setting for new users and how to change them. (For more resources related to this topic, see here.) Creating user account While installing Ubuntu, we add a primary user account on the server; if you are using the cloud image, it comes preinstalled with the default user. This single user is enough to get all tasks done in Ubuntu. There are times when you need to create more restrictive user accounts. This recipe shows how to add a new user to the Ubuntu server. Getting ready You will need super user or root privileges to add a new user to the Ubuntu server. How to do it… Follow these steps to create the new user account: To add a new user in Ubuntu, enter following command in your shell: $ sudo adduser bob Enter your password to complete the command with sudo privileges: Now enter a password for the new user: Confirm the password for the new user: Enter the full name and other information about new user; you can skip this part by pressing the Enter key. Enter Y to confirm that information is correct: This should have added new user to the system. You can confirm this by viewing the file /etc/passwd: How it works… In Linux systems, the adduser command is higher level command to quickly add a new user to the system. Since adduser requires root privileges, we need to use sudo along with the command, adduser completes following operations: Adds a new user Adds a new default group with the same name as the user Chooses UID (user ID) and GID (group ID) conforming to the Debian policy Creates a home directory with skeletal configuration (template) from /etc/skel Creates a password for the new user Runs the user script, if any If you want to skip the password prompt and finger information while adding the new user, use the following command: $ sudo adduser --disabled-password --gecos "" username Alternatively, you can use the useradd command as follows: $ sudo useradd -s <SHELL> -m -d <HomeDir> -g <Group> UserName Where: -s specifies default login shell for the user -d sets the home directory for the user -m creates a home directory if one does not already exist -g specifies the default group name for the user Creating a user with the command useradd does not set password for the user account. You can set or change the user password with the following command: $sudo passwd bob This will change the password for the user account bob. Note that if you skip the username part from the preceding command you will end up changing the password of root account. There's more… With adduser, you can do five different tasks: Add a normal user Add a system user with system option Add user group with the--group option and without the--system option Add a system group when called with the --system option Add an existing user to existing group when called with two non-option arguments Check out the manual page man adduser to get more details. You can also configure various default settings for the adduser command. A configuration file /etc/adduser.conf can be used to set the default values to be used by the adduser, addgroup, and deluser commands. A key value pair of configuration can set various default values, including the home directory location, directory structure skel to be used, default groups for new users, and so on. Check the manual page for more details on adduser.conf with following command: $ man adduser.conf See also Check out the command useradd, a low level command to add new user to system Check out the command usermod, a command to modify a user account See why every user has his own group at: http://unix.stackexchange.com/questions/153390/why-does-every-user-have-his-own-group Creating user accounts in batch mode In this recipe, we will see how to create multiple user accounts in batch mode without using any external tool. Getting ready You will need a user account with root or root privileges. How to do it... Follow these steps to create a user account in batch mode: Create a new text file users.txt with the following command: $ touch users.txt Change file permissions with the following command: $ chmod 600 users.txt Open users.txt with GNU nano and add user accounts details: $ nano users.txt Press Ctrl + O to save the changes. Press Ctrl + X to exit GNU nano. Enter $ sudo newusers users.txt to import all users listed in users.txt file. Check /etc/passwd to confirm that users are created: How it works… We created a database of user details listed in format as the passwd file. The default format for each row is as follows: username:passwd:uid:gid:full name:home_dir:shell Where: username: This is the login name of the user. If a user exists, information for user will be changed; otherwise, a new user will be created. password: This is the password of the user. uid: This is the uid of the user. If empty, a new uid will be assigned to this user. gid: This is the gid for the default group of user. If empty, a new group will be created with the same name as the username. full name: This information will be copied to the gecos field. home_dir: This defines the home directory of the user. If empty, a new home directory will be created with ownership set to new or existing user. shell: This is the default login shell for the user. The new user command reads each row and updates the user information if user already exists, or it creates a new user. We made the users.txt file accessible to owner only. This is to protect this file, as it contains the user's login name and password in unencrypted format. Creating a group Group is a way to organize and administer user accounts in Linux. Groups are used to collectively assign rights and permissions to multiple user accounts. Getting ready You will need super user or root privileges to add a group to the Ubuntu server. How to do it... Enter the following command to add a new group: $ sudo addgroup guest Enter your password to complete addgroup with root privileges. How it works… Here, we are simply adding a new group guest to the server. As addgroup needs root privileges, we need to use sudo along with the command. After creating a new group, addgroup displays the GID of the new group. There's more… Similar to adduser, you can use addgroup in different modes: Add a normal group when used without any options Add a system group with the--system option Add an existing user to existing group when called with two non-option arguments Check out groupadd, a low level utility to add new group to the server See also Check out groupadd, a low level utility to add new group to the server Summary In this article, we have discussed how to create user account, how to create a group and also about how to create user accounts in batch mode. Resources for Article: Further resources on this subject: Directory Services [article] Getting Started with Ansible [article] Lync 2013 Hybrid and Lync Online [article]

0
0
25543

Packt

10 Nov 2016

11 min read

Using ROS with UAVs

Packt

10 Nov 2016

11 min read

In this article by Carol Fairchild and Dr. Thomas L. Harman, co-authors of the book ROS Robotics by Example, you will discover the field of ROS Unmanned Air Vehicles (UAVs), quadrotors, in particular. The reader is invited to learn about the simulated hector quadrotor and take it for a flight. The ROS wiki currently contains a growing list of ROS UAVs. These UAVs are as follows: (For more resources related to this topic, see here.) AscTec Pelican and Hummingbird quadrotors Berkeley's STARMAC Bitcraze Crazyflie DJI Matrice 100 Onboard SDK ROS support Erle-copter ETH sFly Lily CameraQuadrotor Parrot AR.Drone Parrot Bebop Penn's AscTec Hummingbird Quadrotors PIXHAWK MAVs Skybotix CoaX helicopter Refer to http://wiki.ros.org/Robots#UAVs for future additions to this list and to the website http://www.ros.org/news/robots/uavs/ to get the latest ROS UAV news. The preceding list contains primarily quadrotors except for the Skybotix helicopter. A number of universities have adopted the AscTec Hummingbird as their ROS UAV of choice. For this book, we present a simulator called Hector Quadrotor and two real quadrotors Crazyflie and Bebop that use ROS. Introducing Hector quadrotor The hardest part of learning about flying robots is the constant crashing. From the first-time learning of flight control to testing new hardware or flight algorithms, the resulting failures can have a huge cost in terms of broken hardware components. To answer this difficulty, a simulated air vehicle designed and developed for ROS is ideal. A simulated quadrotor UAV for the ROS Gazebo environment has been developed by the Team Hector Darmstadt of Technische Universität Darmstadt. This quadrotor, called Hector Quadrotor, is enclosed in the hector_quadrotor metapackage. This metapackage contains the URDF description for the quadrotor UAV, its flight controllers, and launch files for running the quadrotor simulation in Gazebo. Advanced uses of the Hector Quadrotor simulation allows the user to record sensor data such as Lidar, depth camera, and many more. The quadrotor simulation can also be used to test flight algorithms and control approaches in simulation. The hector_quadrotor metapackage contains the following key packages: hector_quadrotor_description: This package provides a URDF model of Hector Quadrotor UAV and the quadrotor configured with various sensors. Several URDF quadrotor models exist in this package each configured with specific sensors and controllers. hector_quadrotor_gazebo: This package contains launch files for executing Gazebo and spawning one or more Hector Quadrotors. hector_quadrotor_gazebo_plugins: This package contains three UAV specific plugins, which are as follows: The simple controller gazebo_quadrotor_simple_controller subscribes to a geometry_msgs/Twist topic and calculates the required forces and torques A gazebo_ros_baro sensor plugin simulates a barometric altimeter The gazebo_quadrotor_propulsion plugin simulates the propulsion, aerodynamics, and drag from messages containing motor voltages and wind vector input hector_gazebo_plugins: This package contains generic sensor plugins not specific to UAVs such as IMU, magnetic field, GPS, and sonar data. hector_quadrotor_teleop: This package provides a node and launch files for controlling a quadrotor using a joystick or gamepad. hector_quadrotor_demo: This package provides sample launch files that run the Gazebo quadrotor simulation and hector_slam for indoor and outdoor scenarios. The entire list of packages for the hector_quadrotor metapackage appears in the next section. Loading Hector Quadrotor The repository for the hector_quadrotor software is at the following website: https://github.com/tu-darmstadt-ros-pkg/hector_quadrotor The following commands will install the binary packages of hector_quadrotor into the ROS package repository on your computer. If you wish to install the source files, instructions can be found at the following website: http://wiki.ros.org/hector_quadrotor/Tutorials/Quadrotor%20outdoor%20flight%20demo (It is assumed that ros-indigo-desktop-full has been installed on your computer.) For the binary packages, type the following commands to install the ROS Indigo version of Hector Quadrotor: $ sudo apt-get update $ sudo apt-get install ros-indigo-hector-quadrotor-demo A large number of ROS packages are downloaded and installed in the hector_quadrotor_demo download with the main hector_quadrotor packages providing functionality that should now be somewhat familiar. This installation downloads the following packages: hector_gazebo_worlds hector_geotiff hector_map_tools hector_mapping hector_nav_msgs hector_pose_estimation hector_pose_estimation_core hector_quadrotor_controller hector_quadrotor_controller_gazebo hector_quadrotor_demo hector_quadrotor_description hector_quadrotor_gazebo hector_quadrotor_gazebo_plugins hector_quadrotor_model hector_quadrotor_pose_estimation hector_quadrotor_teleop hector_sensors_description hector_sensors_gazebo hector_trajectory_serve hector_uav_msgs message_to_tf A number of these packages will be discussed as the Hector Quadrotor simulations are described in the next section. Launching Hector Quadrotor in Gazebo Two demonstration tutorials are available to provide the simulated applications of the Hector Quadrotor for both outdoor and indoor environments. These simulations are described in the next sections. Before you begin the Hector Quadrotor simulations, check your ROS master using the following command in your terminal window: $ echo $ROS_MASTER_URI If this variable is set to localhost or the IP address of your computer, no action is needed. If not, type the following command: $ export ROS_MASTER_URI=http://localhost:11311 This command can also be added to your .bashrc file. Be sure to delete or comment out (with a #) any other commands setting the ROS_MASTER_URI variable. Flying Hector outdoors The quadrotor outdoor flight demo software is included as part of the hector_quadrotor metapackage. Start the simulation by typing the following command: $ roslaunch hector_quadrotor_demo outdoor_flight_gazebo.launch This launch file loads a rolling landscape environment into the Gazebo simulation and spawns a model of the Hector Quadrotor configured with a Hokuyo UTM-30LX sensor. An rviz node is also started and configured specifically for the quadrotor outdoor flight. A large number of flight position and control parameters are initialized and loaded into the Parameter Server. Note that the quadrotor propulsion model parameters for quadrotor_propulsion plugin and quadrotor drag model parameters for quadrotor_aerodynamics plugin are displayed. Then look for the following message: Physics dynamic reconfigure ready. The following screenshots show both the Gazebo and rviz display windows when the Hector outdoor flight simulation is launched. The view from the onboard camera can be seen in the lower left corner of the rviz window. If you do not see the camera image on your rviz screen, make sure that Camera has been added to your Displays panel on the left and that the checkbox has been checked. If you would like to pilot the quadrotor using the camera, it is best to uncheck the checkboxes for tf and robot_model because the visualizations sometimes block the view: Hector Quadrotor outdoor gazebo view Hector Quadrotor outdoor rviz view The quadrotor appears on the ground in the simulation ready for takeoff. Its forward direction is marked by a red mark on its leading motor mount. To be able to fly the quadrotor, you can launch the joystick controller software for the Xbox 360 controller. In a second terminal window, launch the joystick controller software with a launch file from the hector_quadrotor_teleop package: $ roslaunch hector_quadrotor_teleop xbox_controller.launch This launch file launches joy_node to process all joystick input from the left stick and right stick on the Xbox 360 controller as shown in the following figure. The message published by joy_node contains the current state of the joystick axes and buttons. The quadrotor_teleop node subscribes to these messages and publishes messages on the cmd_vel topic. These messages provide the velocity and direction for the quadrotor flight. Several joystick controllers are currently supported by the ROS joy package including PS3 and Logitech devices. For this launch, the joystick device is accessed as /dev/input/js0 and is initialized with a deadzone of 0.050000. Parameters to set the joystick axes are as follows: * /quadrotor_teleop/x_axis: 5 * /quadrotor_teleop/y_axis: 4 * /quadrotor_teleop/yaw_axis: 1 * /quadrotor_teleop/z_axis: 2 These parameters map to the Left Stick and the Right Stick controls on the Xbox 360 controller shown in the following figure. The direction of these sticks control are as follows: Left Stick: Forward (up) is to ascend Backward (down) is to descend Right is to rotate clockwise Left is to rotate counterclockwise Right Stick: Forward (up) is to fly forward Backward (down) is to fly backward Right is to fly right Left is to fly left Xbox 360 joystick controls for Hector Now use the joystick to fly around the simulated outdoor environment! The pilot's view can be seen in the Camera image view on the bottom left of the rviz screen. As you fly around in Gazebo, keep an eye on the Gazebo launch terminal window. The screen will display messages as follows depending on your flying ability: [ INFO] [1447358765.938240016, 617.860000000]: Engaging motors! [ WARN] [1447358778.282568898, 629.410000000]: Shutting down motors due to flip over! When Hector flips over, you will need to relaunch the simulation. Within ROS, a clearer understanding of the interactions between the active nodes and topics can be obtained by using the rqt_graph tool. The following diagram depicts all currently active nodes (except debug nodes) enclosed in oval shapes. These nodes publish to the topics enclosed in rectangles that are pointed to by arrows. You can use the rqt_graph command in a new terminal window to view the same display: ROS nodes and topics for Hector Quadrotor outdoor flight demo The rostopic list command will provide a long list of topics currently being published. Other command line tools such as rosnode, rosmsg, rosparam, and rosservice will help gather specific information about Hector Quadrotor's operation. To understand the orientation of the quadrotor on the screen, use the Gazebo GUI to show the vehicle's tf reference frame. Select quadrotor in the World panel on the left, then select the translation mode on the top environment toolbar (looks like crossed double-headed arrows). This selection will bring up the red-green-blue axis for the x-y-z axes of the tf frame, respectively. In the following figure, the x axis is pointing to the left, the y axis is pointing to the right (toward the reader), and the z axis is pointing up. Hector Quadrotor tf reference frame An YouTube video of hector_quadrotor outdoor scenario demo shows the hector_quadrotor in Gazebo operated with a gamepad controller: https://www.youtube.com/watch?v=9CGIcc0jeuI Flying Hector indoors The quadrotor indoor SLAM demo software is included as part of the hector_quadrotor metapackage. To launch the simulation, type the following command: $ roslaunch hector_quadrotor_demo indoor_slam_gazebo.launch The following screenshots show both the rviz and Gazebo display windows when the Hector indoor simulation is launched: Hector Quadrotor indoor rviz and gazebo views If you do not see this image for Gazebo, roll your mouse wheel to zoom out of the image. Then you will need to rotate the scene to a top-down view, in order to find the quadrotor press Shift + right mouse button. The environment was the offices at Willow Garage and Hector starts out on the floor of one of the interior rooms. Just like in the outdoor demo, the xbox_controller.launch file from the hector_quadrotor_teleop package should be executed: $ roslaunch hector_quadrotor_teleop xbox_controller.launch If the quadrotor becomes embedded in the wall, waiting a few seconds should release it and it should (hopefully) end up in an upright position ready to fly again. If you lose sight of it, zoom out from the Gazebo screen and look from a top-down view. Remember that the Gazebo physics engine is applying minor environment conditions as well. This can create some drifting out of its position. The rqt graph of the active nodes and topics during the Hector indoor SLAM demo is shown in the following figure. As Hector is flown around the office environment, the hector_mapping node will be performing SLAM and be creating a map of the environment. ROS nodes and topics for Hector Quadrotor indoor SLAM demo The following screenshot shows Hector Quadrotor mapping an interior room of Willow Garage: Hector mapping indoors using SLAM The 3D robot trajectory is tracked by the hector_trajectory_server node and can be shown in rviz. The map along with the trajectory information can be saved to a GeoTiff file with the following command: $ rostopic pub syscommand std_msgs/String "savegeotiff" The savegeotiff map can be found in the hector_geotiff/map directory. An YouTube video of hector_quadrotor stack indoor SLAM demo shows hector_quadrotor in Gazebo operated with a gamepad controller: https://www.youtube.com/watch?v=IJbJbcZVY28 Summary In this article, we learnt about Hector Quadrotors, loading Hector Quadrotors, launching Hector Quadrotor in Gazebo, and also about flying Hector outdoors and indoors. Resources for Article: Further resources on this subject: Working On Your Bot [article] Building robots that can walk [article] Detecting and Protecting against Your Enemies [article]

0
0
37490

How-To Tutorials

Fundamental SELinux Concepts

The Software-defined Data Center

Building Our First App – 7 Minute Workout

The TensorFlow Toolbox

Data Visualization with ggplot2

Getting Started with Flocker

Manage Security in Excel

Creating Reusable Generic Modals in React and Redux

DevOps Tools and Technologies

Algorithm Analysis

Trending Topics

Introduction to C# and .NET

Planning and Structuring Your Test-Driven iOS App

Introduction to JavaScript

Managing Users and Groups

Using ROS with UAVs

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access