Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7014 Articles
article-image-deploying-and-synchronizing-azure-active-directory
Packt
08 Feb 2017
5 min read
Save for later

Deploying and Synchronizing Azure Active Directory

Packt
08 Feb 2017
5 min read
In this article by Jan-henrik Damaschke and Oliver Michalski, authors of the book Implementing Azure Solutions, we will learn about growing cloud services identity management and security as well as access policies within cloud environments and cloud services become more essential and important. The Microsoft central instance for this is Azure Active Directory. Every security policy or identity Microsoft provides for their cloud services is based on Azure Active Directory. Within this article you will learn the basics about Azure Active Directory, how you implement Azure AD and hybrid Azure Active Directory with connection to Active Directory Domain Services. We are going to explore the following topics: Azure Active Directory overview Azure Active Directory Subscription Options Azure Active Directory Deployment Azure Active Directory User and Subscription Management How to deploy Azure Active Directory Hybrid Identities with Active Directory Domain Service Azure Active Directory Hybrid high available and none high available Deployments (For more resources related to this topic, see here.) Azure Active Directory Azure Active Directory or Azure AD a multi-tenant cloud based directory and identity management service developed by Microsoft. Azure AD also includes a full suite of identity management capabilities including multi-factor authentication, device registration, self-service password management, self-service group management, privileged account management, role based access control, application usage monitoring, rich auditing and security monitoring and alerting. Azure AD can be integrated with an existing Windows Server Active Directory, giving organizations the ability to leverage their existing on-premises identities to manage access to cloud based SaaS applications. After this article you will know how to setup Azure AD and Azure Connect. You will also able to design a high available infrastructure for identity replication. The following figure describes the general structure of Azure AD in a hybrid deployment with Active Directory Domain Services: Source: https://azure.microsoft.com/en-us/documentation/articles/active-directory-whatis/ Customers who already using Office 365, CRM online or Intune using Azure AD for their service. You can easily identify if you use Azure AD if you have a username like user@domain.onmicrosoft.com (.de or .cn are also possible if you are using Microsoft Cloud Germany or Azure China). Azure AD is a multi-tenant, geo-distributed, high available service running in 28 and more datacenters around the world. Microsoft implemented automated failover and at least a minimum of two copies of you Active Directory service in other regional or global datacenters. Regularly your directory is running in your primary datacenter, is replicated into another two in your region. If you only have two Azure datacenters in your region like in Europe, the copy will distribute to another region outside yours: Azure Active Directory options There are currently three selectable options for Azure Active Directory with different features to use. There will be a fourth option, Azure Active Directory Premium P2 available later 2016. Azure AD free Supports common features such as: Directory objects with up to 500,000 objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Azure AD basic Supports common features such as: Directory objects with unlimited objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Basic features such as: Group-based access management/provisioning Self-service password reset for cloud users Company branding (logon pages/ access panel customization) Application proxy Service level agreement 99,9% Azure AD premium P1 Supports common features such as: Directory Objects with unlimited objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Basic features such as: Group-based access management/provisioning Self-service password reset for cloud users Company branding (logon pages/ access panel customization) Application proxy Service level agreement 99,9% Premium features such as: Self-service group and app management/self-service application additions/ dynamic groups Self-service password reset/change/unlock with on-premises write-back Multi-factor authentication (the Cloud and on-premises (MFA server)) MIM CAL with MIM server Cloud app discovery Connect health Automatic password rollover for group accounts Within Q3/2016 Microsoft will also enable customers to use the Azure Active Directory P2 plan, includes all the capabilities in Azure AD Premium P1 as well as the new identity protection and privileged identity management capabilities. That is a necessary step for Microsoft to extend it’s offering for Windows 10 Device Management with Azure AD. Currently Azure AD enable Windows 10 customers to join a device to Azure AD, implement SSO for desktops, Microsoft passport for Azure AD and central Administrator Bitlocker recovery. It also adds automatic classification to Active Directory Rights Management Service in Azure AD. Depending on what you plan to do with your Azure Environment, you should choose your Azure Active Directory Option.  
Read more
  • 0
  • 0
  • 9938

article-image-deploying-openstack-devops-way
Packt
07 Feb 2017
16 min read
Save for later

Deploying OpenStack – the DevOps Way

Packt
07 Feb 2017
16 min read
In this article by Chandan Dutta Chowdhury, Omar Khedher, authors of the book Mastering OpenStack - Second Edition, we will cover the Deploying an OpenStack environment based on the profiled design. Although we created our design by taking care of several aspects related to scalability and performance, we still have to make it real. If you are still looking at OpenStack as a single block system. (For more resources related to this topic, see here.) Furthermore, in the introductory section of this article, we covered the role of OpenStack in the next generation of data centers. A large-scale infrastructure used by cloud providers with a few thousand servers needs a very different approach to set up. In our case, deploying and operating the OpenStack cloud is not as simple as you might think. Thus, you need to make the operational task easier or, in other words, automated. In this article, we will cover new topics about the ways to deploy OpenStack. The next part will cover the following points: Learning what the DevOps movement is and how it can be adopted in the cloud Knowing how to see your infrastructure as code and how to maintain it Getting closer to the DevOps way by including configuration management aspects in your cloud Making your OpenStack environment design deployable via automation Starting your first OpenStack environment deployment using Ansible DevOps in a nutshell The term DevOps is a conjunction of development (software developers) and operations (managing and putting software into production). Many IT organizations have started to adopt such a concept, but the question is how and why? Is it a job? Is it a process or a practice? DevOps is development and operations compounded, which basically defines a methodology of software development. It describes practices that streamline the software delivery process. It is about raising communication and integration between developers, operators (including administrators), and quality assurance teams. The essence of the DevOps movement lies in leveraging the benefits of collaboration. Different disciplines can relate to DevOps in different ways and bring their experiences and skills together under the DevOps banner to gain shared values. So, DevOps is a methodology that integrates the efforts of several disciplines, as shown in the following figure: This new movement is intended to resolve the conflict between developers and operators. Delivering a new release affects the production systems. It puts different teams in conflicting positions by setting different goals for them, for example, the development team wants the their latest code to go live while the operations team wants more time to test and stage the changes before going to production. DevOps fills the gap and streamlines the process of bringing in change by bringing in collaboration between the developers and operators. DevOps is neither a toolkit nor a job; it is the synergy that streamlines the process of change. Let's see how DevOps can incubate a cloud project. DevOps and cloud – everything is code Let's look at the architecture of cloud computing. While discussing a cloud infrastructure, we must remember that we are talking about a large scalable environment! The amazing switch to bigger environments requires us to simplify everything as much as possible. System architecture and software design are becoming more and more complicated. Every new release of software affords new functions and new configurations. Administering and deploying a large infrastructure would not be possible without adopting a new philosophy: infrastructure as code. When infrastructure is seen as code, the components of a given infrastructure are modeled as modules of code. What you need to do is to abstract the functionality of the infrastructure into discrete reusable components, design the services provided by the infrastructure as modules of code, and finally implement them as blocks of automation. Furthermore, in such a paradigm, it will be essential to adhere to the same well-established discipline of software development as an infrastructure developer. The essence of DevOps mandates that developers, network engineers, and operators must work alongside each other to deploy, operate, and maintain cloud infrastructure which will power our next-generation data center. DevOps and OpenStack OpenStack is an open source project, and its code is extended, modified, and fixed in every release. It is composed of multiple projects and requires extensive skills to deploy and operate. Of course, it is not our mission to check the code and dive into its different modules and functions. So what can we do with DevOps, then? Deploying complex software on a large-scale infrastructure requires adopting new strategy. The ever-increasing complexity of software such as OpenStack and deployment of huge cloud infrastructure must be simplified. Everything in a given infrastructure must be automated! This is where OpenStack meets DevStack. Breaking down OpenStack into pieces Let's gather what we covered previously and signal a couple of steps towards our first OpenStack deployment: Break down the OpenStack infrastructure into independent and reusable services. Integrate the services in such a way that you can provide the expected functionalities in the OpenStack environment. It is obvious that OpenStack includes many services, What we need to do is see these services as packages of code in our infrastructure as code experience. The next step will investigate how to integrate the services and deploy them via automation. Deploying service as code is similar to writing a software application. Here are important points you should remember during the entire deployment process: Simplify and modularize the OpenStack services Develop OpenStack services as building blocks which integrate with other components to provide a complete system Facilitate the customization and improvement of services without impacting the complete system. Use the right tool to build the services Be sure that the services provide the same results with the same input Switch your service vision from how to do it to what we want to do Automation is the essence of DevOps. In fact, many system management tools are intensely used nowadays due to their efficiency of deployment. In other words, there is a need for automation! You have probably used some of the automation tools, such as Ansible, Chef, Puppet, and many more. Before we go through them, we need to create a succinct, professional code management step. Working with the infrastructure deployment code While dealing with infrastructure as code, the code that abstracts, models, and builds the OpenStack infrastructure must be committed to source code management. This is required for tracking changes in our automation code and reproducibility of results. Eventually, we must reach a point where we shift our OpenStack infrastructure from a code base to a deployed system while following the latest software development best practices. At this stage, you should be aware of the quality of your OpenStack infrastructure deployment, which roughly depends on the quality of the code that describes it. It is important to highlight a critical point that you should keep in mind during all deployment stages: automated systems are not able to understand human error. You'll have to go through an ensemble of phases and cycles using agile methodologies to end up with a release that is a largely bug-free to be promoted to the production environment. On the other hand, if mistakes cannot be totally eradicated, you should plan for the continuous development and testing of code. The code's life cycle management is shown in the following figure: Changes can be scary! To handle changes, it is recommended that you do the following: Keep track of and monitor the changes at every stage Build flexibility into the code and make it easy to change Refactor the code when it becomes difficult to manage Test, test, and retest your code Keep checking every point that has been described previously till you start to get more confident that your OpenStack infrastructure is being managed by code that won't break. Integrating OpenStack into infrastructure code To keep the OpenStack environment working with a minimum rate of surprises and ensure that the code delivers the functionalities that are required, we must continuously track the development of our infrastructure code. We will connect the OpenStack deployment code to a toolchain, where it will be constantly monitored and tested as we continue to develop and refine our code. This toolchain is composed of a pipeline of tracking, monitoring, testing, and reporting phases and is well known as a continuous integration and continuous development (CI-CD) process. Continuous integration and delivery Let's see how continuous integration (CI) can be applied to OpenStack. The life cycle of our automation code will be managed by the following categories of tools: System Management Tool Artifact (SMTA) can be any IT automation tool juju charms. Version Control System (VCS) tracks changes to our infrastructure deployment code. Any version control system, such as CVS, Subversion, or Bazaar, that you are most familiar with can be used for this purpose. Git can be a good outfit for our VCS. Jenkins is a perfect tool that monitors to changes in the VCS and does the continuous integration testing and reporting of results. Take a look at the model in the following figure: The proposed life-cycle for infrastructure as code consists of infrastructure configuration files that are recorded in a version control system and are built continuously by the means of a CI server (Jenkins, in our case). Infrastructure configuration files can be used to set up a unit test environment (a virtual environment using Vagrant, for example) and makes use of any system management tool to provision the infrastructure (Ansible, Chef, puppet, and so on). The CI server keeps listening to changes in version control and automatically propagates any new versions to be tested, and then it listens to target environments in production. Vagrant allows you to build a virtual environment very easily; it is based on Oracle VirtualBox (https://www.virtualbox.org/) to run virtual machines, so you will need these before moving on with the installation in your test environment. The proposed life-cycle for infrastructure code highlights the importance of a test environment before moving on to production. You should give a lot of importance to and care a good deal about the testing stage, although this might be a very time-consuming task. Especially in our case, with infrastructure code for deploying OpenStack that is complicated and has multiple dependencies on other systems, the importance of testing cannot be overemphasized. This makes it imperative to ensure effort is made for an automated and consistent testing of the infrastructure code. The best way to do this is to keep testing thoroughly in a repeated way till you gain confidence about your code. Choosing the automation tool At first sight, you may wonder what the best automation tool is that will be useful for our OpenStack production day. We have already chosen Git and Jenkins to handle our continuous integration and testing. It is time to choose the right tool for automation. It might be difficult to select the right tool. Most likely, you'll have to choose between several of them. Therefore, giving succinct hints on different tools might be helpful in order to distinguish the best outfit for certain particular setups. Of course, we are still talking about large infrastructures, a lot of networking, and distributed services. Giving the chance for one or more tools to be selected, as system management parties can be effective and fast for our deployment. We will use Ansible for the next deployment phase. Introducing Ansible We have chosen Ansible to automate our cloud infrastructure. Ansible is an infrastructure automation engine. It is simple to get started with and yet is flexible enough to handle complex interdependent systems. The architecture of Ansible consists of the deployment system where Ansible itself is installed and the target systems that are managed by Ansible. It uses an agentless architecture to push changes to the target systems. This is due to the use of SSH protocol as its transport mechanism to push changes to the target systems. This also means that there is no extra software installation required on the target system. The agentless architecture makes setting up Ansible very simple. Ansible works by copying modules over SSH to the target systems. It then executes them to change the state of the target systems. Once executed, the Ansible modules are cleaned up, leaving no trail on the target system. Although the default mechanism for making changes to the client system is an SSH-based push-model, if you feel the push-based model for delivering changes is not scalable enough for your infrastructure, Ansible also supports an agent-based pull-model. Ansible is developed in python and comes with a huge collection of core automation modules. The configuration files for Ansible are called Playbooks and they are written in YAML, which is just a markup language. YAML is easier to understand; it's custom-made for writing configuration files. This makes learning Ansible automation much easier. The Ansible Galaxy is a collection of reusable Ansible modules that can be used for your project. Modules Ansible modules are constructs that encapsulate a system resource or action. A module models the resource and its attributes. Ansible comes with packages with a wide range of core modules to represent various system resources; for example, the file module encapsulates a file on the system and has attributes such as owner, group, mode, and so on. These attributes represent the state of a file in the system; by changing the attributes of the resources, we can describe the required final state of the system. Variables While modules can represent the resources and actions on a system, the variables represent the dynamic part of the change. Variables can be used to modify the behavior of the modules. Variables can be defined from the environment of the host, for example, the hostname, IP address, version of software and hardware installed on a host, and so on. They can also be user-defined or provided as part of a module. User-defined variables can represent the classification of a host resource or its attribute. Inventory An inventory is a list of hosts that are managed by Ansible. The inventory list supports classifying hosts into groups. In its simplest form, an inventory can be an INI file. The groups are represented as article on the INI file. The classification can be based on the role of the hosts or any other system management need. It is possible to have a host to appear in multiple groups in an inventory file. The following example shows a simple inventory of hosts: logserver1.example.com [controllers] ctl1.example.com ctl2.example.com [computes] compute1.example.com compute2.example.com compute3.example.com compute[20:30].example.com The inventory file supports special patterns to represent large groups of hosts. Ansible expects to find the inventory file at /etc/ansible/hosts, but a custom location can be passed directly to the Ansible command line. Ansible also supports dynamic inventory that can be generated by executing scripts or retrieved from another management system, such as a cloud platform. Roles Roles are the building blocks of an Ansible-based deployment. They represent a collection of tasks that must be performed to configure a service on a group of hosts. The Role encapsulates tasks, variable, handlers, and other related functions required to deploy a service on a host. For example, to deploy a multinode web server cluster, the hosts in the infrastructure can be assigned roles such as web server, database server, load balancer, and so on. Playbooks Playbooks are the main configuration files in Ansible. They describe the complete system deployment plan. Playbooks are composed a series of tasks and are executed from top to bottom. The tasks themselves refer to group of hosts that must be deployed with roles. Ansible playbooks are written in YAML. The following is an example of a simple Ansible Playbook: --- - hosts: webservers vars: http_port: 8080 remote_user: root tasks: - name: ensure apache is at the latest version yum: name=httpd state=latest - name: write the apache config file template: src=/srv/httpd.j2 dest=/etc/httpd.conf notify: - restart apache handlers: - name: restart apache service: name=httpd state=restarted Ansible for OpenStack OpenStack Ansible (OSA) is an official OpenStack Big Tent project. It focuses on providing roles and playbooks for deploying a scalable, production-ready OpenStack setup. It has a very active community of developers and users collaborating to stabilize and bring new features to OpenStack deployment. One of the unique features of the OSA project is the use of containers to isolate and manage OpenStack services. OSA installs OpenStack services in LXC containers to provide each service with an isolated environment. LXC is an OS-level container and it encompasses a complete OS environment that includes a separate filesystem, networking stack, and resource isolation using cgroups. OpenStack services are spawned in separate LXC containers and speak to each other using the REST APIs. The microservice-based architecture of OpenStack complements the use of containers to isolate services. It also decouples the services from the physical hardware and provides encapsulation of the service environment that forms the foundation for providing portability, high availability, and redundancy. The OpenStack Ansible deployment is initiated from a deployment host. The deployment host is installed with Ansible and it runs the OSA playbooks to orchestrate the installation of OpenStack on the target hosts: The Ansible target hosts are the ones that will run the OpenStack services. The target nodes must be installed with Ubuntu 14.04 LTS and configured with SSH-key-based authentication to allow login from the deployment host. Summary In this article, we covered several topics and terminologies on how to develop and maintain a code infrastructure using the DevOps style. Viewing your OpenStack infrastructure deployment as code will not only simplify node configuration, but also improve the automation process. You should keep in mind that DevOps is neither a project nor a goal, but it is a methodology that will make your deployment successfully empowered by the team synergy with different departments. Despite the existence of numerous system management tools to bring our OpenStack up and running in an automated way, we have chosen Ansible for automation of our infrastructure. Puppet, Chef, Salt, and others can do the job but in different ways. You should know that there isn't one way to perform automation. Both Puppet and Chef have their own OpenStack deployment projects under the OpenStack Big Tent. Resources for Article: Further resources on this subject: Introduction to Ansible [article] OpenStack Networking in a Nutshell [article] Introducing OpenStack Trove [article]
Read more
  • 0
  • 0
  • 26289

Packt
07 Feb 2017
32 min read
Save for later

Context – Understanding your Data using R

Packt
07 Feb 2017
32 min read
In this article by James D Miller, the author of the book Big Data Visualization we will explore the idea of adding context to the data you are working with. Specifically, we’ll discuss the importance of establishing data context, as well as the practice of profiling your data for context discovery as well how big data effects this effort. The article is organized into the following main sections: Adding Context About R R and Big Data R Example 1 -- healthcare data R Example 2 -- healthcare data (For more resources related to this topic, see here.) When writing a book, authors leave context clues for their readers. A context clue is a “source of information” about written content that may be difficult or unique that helps readers understand. This information offers insight into the content being read or consumed (an example might be: “It was an idyllic day; sunny, warm and perfect…”). With data, context clues should be developed, through a process referred to as profiling (we’ll discuss profiling in more detail later in this article), so that the data consumer can better understand (the data) when visualized. (Additionally, having context and perspective on the data you are working with is a vital step in determining what kind of data visualization should be created). Context or profiling examples might be calculating the average age of “patients” or subjects within the data or “segmenting the data into time periods” (years or months, usually). Another motive for adding context to data might be to gain a new perspective on the data. An example of this might be recognizing and examining a comparison present in the data. For example, body fat percentages of urban high school seniors could be compared to those of rural high school seniors. Adding context to your data before creating visualizations can certainly make it (the data visualization) more relevant, but context still can’t serve as a substitute for value. Before you consider any factors such as time of day or geographic location, or average age, first and foremost, your data visualization needs to benefit those who are going to consume it so establishing appropriate context requirements will be critical. For data profiling (adding context), the rule is: Before Context, Think →Value Generally speaking, there are a several visualization contextual categories, which can be used to argument or increase the value and understanding of data for visualization. These include: Definitions and explanations, Comparisons, Contrasts Tendencies Dispersion Definitions andexplanations This is providing additional information or “attributes” about a data point. For example, if the data contains a field named “patient ID” and we come to know that records describe individual patients, we may choose to calculate and add each individual patients BMI or body mass index: Comparisons This is adding a comparable value to a particular data point. For example, you might compute and add a national ranking to each “total by state”: Contrasts This is almost like adding an “opposite” to a data point to see if it perhaps determines a different perspective. An example might be reviewing average body weights for patients who consume alcoholic beverages verses those who do not consume alcoholic beverages: Tendencies These are the “typical” mathematical calculations (or summaries) on the data as a whole or by other category within the data, such as Mean, Median, and Mode. For example, you might add a median heart rate for the age group each patient in the data is a member of: Dispersion Again, these are mathematical calculations (or summaries), such as Range, Variance, and Standard Deviation, but they describe the "average" of a data set (or group within the data). For example, you may want to add the “range” for a selected value, such as the minimum and maximum number of hospital stays found in the data for each patient age group: The “art” of profiling data to add context and identify new and interesting perspectives for visualization is still and ever evolving; no doubt there are additional contextual categories existing today that can be investigated as you continue your work with big data visualization projects. Adding Context So, how do we add context to data? …is it merely select Insert, then Data Context? No, it’s not that easy (but it’s not impossible either). Once you have identified (or “pulled together”) your big data source (or at least a significant amount of data), how do you go from mountains of raw big data to summarizations that can be used as input to create valuable data visualizations, helping you to further analyze that data and support your conclusions? The answer is through data profiling. Data profiling involves logically “getting to know” the data you think you may want to visualize – through query, experimentation & review. Following the profiling process, you can then use the information you have collected to add context (and/or apply new “perspectives”) to the data. Adding context to data requires the manipulation of that data to perhaps reformat, adding calculations, aggregations or additional columns or re-ordering and so on. Finally, you will be ready to visualize (or “picture”) your data. The complete profiling process is shown below; as in: Pull together (the data or enough of the data), Profile (the data through query, experimentation and review), add Perspective(s) (or context) and finally… Picture (visualize) the data About R R is a language and environment easy to learn, very flexible in nature and also very focused on statistical computing- making it great for manipulating, cleaning, summarizing, producing probability statistics, etc. (as well as actually creating visualizations with your data), so it’s a great choice for the exercises required for profiling, establishing context and identifying additional perspectives. In addition, here are a few more reasons to use R when profiling your big data: R is used by a large number of academic statisticians – so it’s a tool that is not “going away” R is pretty much platform independent – what you develop will run almost any where R has awesome help resources – just Goggle it; you’ll see! R and Big Data Although R is free (open sourced), super flexible, and feature rich, you must keep in mind that R preserves everything in your machine’s memory and this can become problematic when you are working with big data (even with the introduction of the low resource costs of today). Thankfully, though there are various options and strategies to “work with” this limitation, such as imploring a sort of “pseudo-sampling” technique, which we will expound on later in this article (as part of some of the examples provided). Additionally, R libraries have been developed and introduced that can leverage hard drive space (as sort of a virtual extension to your machines memory), again exposed in this article’s examples. Example 1 In this article’s first example we’ll use data collected from a theoretical hospital where upon admission, patient medical history information is collected though an online survey. Information is also added to a “patients file” as treatment is provided. The file includes many fields including basic descriptive data for the patient such as: sex, date of birth, height, weight, blood type, etc. Vital statistics such as: blood pressure, heart rate, etc. Medical history such as: number of hospital visits, surgeries, major illnesses or conditions, currently under a doctor’s care, etc. Demographical statistics such as: occupation, home state, educational background, etc. Some additional information is also collected in the file in an attempt to develop patient characters and habits such as the number of times the patient included beef, pork and fowl in their weekly diet or if they typically use a butter replacement product, and so on. Periodically, the data is “dumped” to text files, are comma-delimited and contain the following fields (in this order): Patientid, recorddate_month, recorddate_day, recorddate_year, sex, age, weight, height, no_hospital_visits, heartrate, state, relationship, Insured, Bloodtype, blood_pressure, Education, DOBMonth, DOBDay, DOBYear, current_smoker, current_drinker, currently_on_medications, known_allergies, currently_under_doctors_care, ever_operated_on, occupation, Heart_attack, Rheumatic_Fever Heart_murmur, Diseases_of_the_arteries, Varicose_veins, Arthritis, abnormal_bloodsugar, Phlebitis, Dizziness_fainting, Epilepsy_seizures, Stroke, Diphtheria, Scarlet_Fever, Infectious_mononucleosis, Nervous_emotional_problems, Anemia, hyroid_problems, Pneumonia, Bronchitis, Asthma, Abnormal_chest_Xray, lung_disease, Injuries_back_arms_legs_joints_Broken_bones, Jaundice_gallbladder_problems, Father_alive, Father_current_age, Fathers_general_health, Fathers_reason_poor_health, Fathersdeceased_age_death, mother_alive, Mother_current_age, Mother_general_health, Mothers_reason_poor_health, Mothers_deceased_age_death, No_of_brothers, No_of_sisters, age_range, siblings_health_problems, Heart_attacks_under_50, Strokes_under_50, High_blood_pressure, Elevated_cholesterol, Diabetes, Asthma_hayfever, Congenital_heart_disease, Heart_operations, Glaucoma, ever_smoked_cigs, cigars_or_pipes, no_cigs_day, no_cigars_day, no_pipefuls_day, if_stopped_smoking_when_was_it, if_still_smoke_how_long_ago_start,target_weight, most_ever_weighed, 1_year_ago_weight, age_21_weight, No_of_meals_eatten_per_day, No_of_times_per_week_eat_beef, No_of_times_per_week_eat_pork, No_of_times_per_week_eat_fish, No_of_times_per_week_eat_fowl, No_of_times_per_week_eat_desserts, No_of_times_per_week_eat_fried_foods, No_servings_per_week_wholemilk, No_servings_per_week_2%_milk, No_servings_per_week_tea, No_servings_per_week_buttermilk, No_servings_per_week_1%_milk, No_servings_per_week_regular_or_diet_soda, No_servings_per_week_skim_milk, No_servings_per_week_coffee No_servings_per_week_water, beer_intake, wine_intake, liquor_intake, use_butter, use_extra_sugar, use_extra_salt, different_diet_weekends, activity_level, sexually_active, vision_problems, wear_glasses Following is the image showing a portion of the file (displayed in MS Windows notepad): Assuming we have been given no further information about the data, other than the provided field name list and the knowledge that the data is captured by hospital personnel upon patient admission, the next step would be to perform some sort of profiling of the data- investigating to start understanding the data and then to start adding context and perspectives (so ultimately we can create some visualizations). Initially, we start out by looking through the field or column names in our file and some ideas start to come to mind. For example: What is the data time-frame we are dealing with? Using the field record date, can we establish a period of time (or time frame) for the data? (In other words, over what period of time was this data captured). Can we start “grouping the data” using fields such as sex, age and state? Eventually, what we should be asking is, “what can we learn from visualizing the data?” Perhaps: What is the breakdown of those currently smoking by age group? What is the ratio of those currently smoking to the number of hospital visits? Do those patients currently under a doctor’s care, on average have better BMI ratios? And so on. Dig-in with R Using the power of R programming, we can run various queries on the data; noting that the results of those quires may spawn additional questions and queries and eventually, yield data ready for visualizing. Let’s start with a few simple profile queries. I always start my data profiling by “time boxing” the data. The following R scripts (although as mentioned earlier, there are many ways to accomplish the same objective) work well for this: # --- read our file into a temporary R table tmpRTable4TimeBox<-read.table(file="C:/Big Data Visualization/Chapter 3/sampleHCSurvey02.txt”, sep=",") # --- convert to an R data frame and filter it to just include # --- the 2nd column or field of data data.df <- data.frame(tmpRTable4TimeBox) data.df <- data.df[,2] # --- provides a sorted list of the years in the file YearsInData = substr(substr(data.df[],(regexpr('/',data.df[])+1),11),( regexpr('/',substr(data.df[],(regexpr('/',data.df[])+1),11))+1),11) # -- write a new file named ListofYears write.csv(sort(unique(YearsInData)),file="C:/Big Data Visualization /Chapter 3/ListofYears.txt",quote = FALSE, row.names = FALSE) The above simple R script provides a sorted list file (ListofYears.txt) (shown below) containing the years found in the data we are profiling: Now we can see that our patient survey data covers patient survey data collected during the years 1999 through 2016 and with this information we start to add context (or allow us to gain a perspective) on our data. We could further time-box the data by perhaps breaking the years into months (we will do this later on in this article) but let’s move on now to some basic “grouping profiling”. Assuming that each record in our data represents a unique hospital visit, how can we determine the number of hospital visits (the number of records) by sex, age and state? Here I will point out that it may be worthwhile establishing the size (number of rows or records (we already know the number of columns or fields) of the file you are working with. This is important since the size of the data file will dictate the programming or scripting approach you will need to use during your profiling. Simple R functions valuable to know are: nrow and head. These simple command can be used to count the total rows in a file: nrow:mydata Of to view the first n umber of rows of data: head(mydata, nrow=10) So, using R, one could write a script to load the data into a table, convert it to a data frame and then read through all the records in the file and “count up” or “tally” the number of hospital visits (the number of records) for males and females. Such logic is a snap to write: # --- assuming tmpRTable holds the data already datas.df<-data.frame(tmpRTable) # --- initialize 2 counter variables NumberMaleVisits <-0;NumberFemaleVisits <-0 # --- read through the data for(i in 1:nrow(datas.df)) { if (datas.df[i,3] == 'Male') {NumberMaleVisits <- NumberMaleVisits + 1} if (datas.df[i,3] == 'Female') {NumberFemaleVisits <- NumberFemaleVisits + 1} } # --- show me the totals NumberMaleVisits NumberFemaleVisits The previous script works, but in a big data scenario, there is a more efficient way, since reading or “looping through” and counting each record will take far too long. Thankfully, R provides the table function that can be used similar to the SQL “group by” command. The following script assumes that our data is already in an R data frame (named datas.df), so using the sequence number of the field in the file, if we want to see the number of hospital visits for Males and the number of hospital visits for Females we can write: # --- using R table function as "group by" field number # --- patient sex is the 3rd field in the file table(datas.df[,3]) Following is the output generated from running the above stated script. Notice that R shows “sex” with a count of 1 since the script included the files “header record” of the file as a unique value: We can also establish the number of hospital visits by state (state is the 9th field in the file): table(datas.df[,9]) Age (or the fourth field in the file) can also be studied using the R functions sort and table: Sort(table(datas.df[,4])) Note that since there are quite a few more values for age within the file, I’ve sorted the output using the R sort function. Moving on now, let’s see if there is a difference between the number of hospital visits for patients who are current smokers (field name current_smoker and is field number 16 in the file) and those indicating that they are non-current smokers. We can use the same R scripting logic: sort(table(datas.df[16])) Surprisingly (one might think) it appears from our profiling that those patients who currently do not smoke have had more hospital visits (113,681) than those who currently are smokers (12,561): Another interesting R script to continue profiling our data might be: table(datas.df[,3],datas.df[,16]) The above shown script again uses the R table function to group data, but shows how we can “group within a group”, in other words, using this script we can get totals for “current” and “non-current” smokers, grouped by sex. In the below image we see that the difference between female smokers and male smokers might be considered to be marginal: So we see that by using the above simple R script examples, we’ve been able to add some context to our healthcare survey data. By reviewing the list of fields provided in the file we can come up with the R profiling queries shown (and many others) without much effort. We will continue with some more complex profiling in the next section, but for now, let’s use R to create a few data visualizations - based upon what we’ve learned so far through our profiling. Going back to the number of hospital visits by sex, we can use the R function barplot to create a visualization of visits by sex. But first, a couple of “helpful hints” for creating the script. First, rather than using the table function, you can use the ftable function which creates a “flat” version of the original function’s output. This makes it easier to exclude the header record count of 1 that comes back from the table function. Next, we can leverage some additional arguments of the barplot function like col, border, names.arg and Title to make the visualization a little “nicer to look at”. Below is the script: # -- use ftable function to drop out the header record forChart<- ftable(datas.df[,3]) # --- create bar names barnames<-c("Female","Male") # -- use barplot to draw bar visual barplot(forChart[2:3], col = "brown1", border = TRUE, names.arg = barnames) # --- add a title title(main = list("Hospital Visits by Sex", font = 4)) The scripts output (our visualization) is shown below: We could follow the same logic for creating a similar visualization of hospital visits by state: st<-ftable(datas.df[,9]) barplot(st) title(main = list("Hospital Visits by State", font = 2)) But the visualization generated isn’t very clear: One can always experiment a bit more with this data to make the visualization a little more interesting. Using the R functions substr and regexpr, we can create an R data frame that contains a record for each hospital visit by state within each year in the file. Then we can use the function plot (rather than barplot) to generate the visualization. Below is the R script: # --- create a data frame from our original table file datas.df <- data.frame(tmpRTable) # --- create a filtered data frame of records from the file # --- using the record year and state fields from the file dats.df<-data.frame(substr(substr(datas.df[,2],(regexpr('/',datas.df[,2])+1),11),( regexpr('/',substr(datas.df[,2],(regexpr('/',datas.df[,2])+1),11))+1),11),datas.df[,9]) # --- plot to show a visualization plot(sort(table(dats.df[2]),decreasing = TRUE),type="o", col="blue") title(main = list("Hospital Visits by State (Highest to Lowest)", font = 2)) Here is the different (perhaps more interesting) version of the visualization generated by the previous script: Another earlier perspective on the data was concerning Age. We grouped the hospital visits by the age of the patients (using the R table function). Since there are many different patient ages, a common practice is to establish age ranges, such as the following: 21 and under 22 to 34 35 to 44 45 to 54 55 to 64 65 and over To implement the previous age ranges, we need to organize the data and could use the following R script: # --- initialize age range counters a1 <-0;a2 <-0;a3 <-0;a4 <-0;a5 <-0;a6 <-0 # --- read and count visits by age range for(i in 2:nrow(datas.df)) { if (as.numeric(datas.df[i,4]) < 22) {a1 <- a1 + 1} if (as.numeric(datas.df[i,4]) > 21 & as.numeric(datas.df[i,4]) < 35) {a2 <- a2 + 1} if (as.numeric(datas.df[i,4]) > 34 & as.numeric(datas.df[i,4]) < 45) {a3 <- a3 + 1} if (as.numeric(datas.df[i,4]) > 44 & as.numeric(datas.df[i,4]) < 55) {a4 <- a4 + 1} if (as.numeric(datas.df[i,4]) > 54 & as.numeric(datas.df[i,4]) < 65) {a5 <- a5 + 1} if (as.numeric(datas.df[i,4]) > 64) {a6 <- a6 + 1} } Big Data Note: Looping or reading through each of the records in our file isn’t very practical if there are a trillion records. Later in this article we’ll use a much better approach, but for now will assume a smaller file size for convenience. Once the above script is run, we can use the R pie function and the following code to create our pie chart visualization: # --- create Pie Chart slices <- c(a1, a2, a3, a4, a5, a6) lbls <- c("under 21", "22-34","35-44","45-54","55-64", "65 & over") pie(slices, labels = lbls, main="Hospital Visits by Age Range") Following is the generated visualization: Finally, earlier in this section we looked at the values in field 16 of our file - which indicates whether the survey patient was a current smoker. We could build a simple visual showing the totals, but (again) the visualization isn’t very interesting or all that informative. With some simple R scripts, we can proceed to create a visualization showing the number of hospital visits, year-over-year by those patients that are current smokers. First, we can “reformat” the data in our R data frame (named datas.df) to store only the year (of the record date) using the R function substr. This makes it a little easier to aggregate the data by year shown in the next steps. The R script using the substr function is shown below: # --- redefine the record date field to hold just the record # --- year value datas.df[,2]<-substr(substr(datas.df[,2],(regexpr('/',datas.df[,2])+1),11),( regexpr('/',substr(datas.df[,2],(regexpr('/',datas.df[,2])+1),11))+1),11) Next, we can create an R table named c to hold the record date year and totals (of non and current smokers) for each year. Following is the R script: used: # --- create a table holding record year and total count for # --- smokers and not smoking c<-table(datas.df[,2],datas.df[,16]) Finally, we can use the R barplot function to create our visualization. Again, there is more than likely a cleverer way to setup the objects bars and lbls, but for now, I simply hand-coded the year’s data I wanted to see in my visualization: # --- set up the values to chart and the labels for each bar # --- in the chart bars<-c(c[2,3], c[3,3], c[4,3],c[5,3],c[6,3],c[7,3],c[8,3],c[9,3],c[10,3],c[11,3],c[12,3],c[13,3]) lbls<-c("99","00","01","02","03","04","05","06","07","08","09","10") Now the R script to actually produce the bar chart visualization is shown below: # --- create the bar chart barplot(bars, names.arg=lbls, col="red") title(main = list("Smoking Patients Year to Year", font = 2)) Below is the generated visualization: Example 2 In the above examples, we’ve presented some pretty basic and straight forward data profiling exercises. Typically, once you’ve become somewhat familiar with your data – having added some context (though some basic profiling), one would extend the profiling process, trying to look at the data in additional ways using technics such as those mentioned in the beginning of this article: Defining new data points based upon the existing data, performing comparisons, looking at contrasts (between data points), identifying tendencies and using dispersions to establish the variability of the data. Let’s now review some of these options for extended profiling using simple examples as well as the same source data as was used in the previous section examples. Definitions & Explanations One method of extending your data profiling is to “add to” the existing data by creating additional definition or explanatory “attributes” (in other words add new fields to the file). This means that you use existing data points found in the data to create (hopefully new and interesting) perspectives on the data. In the data used in this article, a thought-provoking example might be to use the existing patient information (such as the patients weight and height) to calculate a new point of data: body mass index (BMI) information. A generally accepted formula for calculating a patient’s body mass index is: BMI = (Weight (lbs.) / (Height (in))2) x 703 For example: (165 lbs.) / (702) x 703 = 23.67 BMI. Using the above formula, we can use the following R script (assuming we’ve already loaded the R object named tmpRTable with our file data) to generate a new file of BMI percentages and state names: j=1 for(i in 2:nrow(tmpRTable)) { W<-as.numeric(as.character(tmpRTable[i,5])) H<-as.numeric(as.character(tmpRTable[i,6])) P<-(W/(H^2)*703) datas2.df[j,1]<-format(P,digits=3) datas2.df[j,2]<-tmpRTable[i,9] j=j+1 } write.csv(datas2.df[1:j-1,1:2],file="C:/Big Data Visualization/Chapter 3/BMI.txt", quote = FALSE, row.names = FALSE) Below is a portion of the generated file: Now we have a new file of BMI percentages by state (one BMI record for each hospital visit in each state). Earlier in this article we touched on the concept of looping or reading through all of the records in a file or data source and creating counts based on various field or column values. Such logic works fine for medium or smaller files but a much better approach (especially with big data files) would be to use the power of various R commands. No Looping Although the above described R script does work, it requires looping through each record in our file which is slow and inefficient to say the least. So, let’s consider a better approach. Again, assuming we’ve already loaded the R object named tmpRTable with our data, the below R script can accomplish the same results (create the same file) in just 2 lines: PDQ<-paste(format((as.numeric(as.character(tmpRTable[,5]))/(as.numeric(as.character(tmpRTable[,6]))^2)*703),digits=2),',',tmpRTable[,9],sep="") write.csv(PDQ,file="C:/Big Data Visualization/Chapter 3/BMI.txt", quote = FALSE,row.names = FALSE) We could now use this file (or one similar) as input to additional profiling exercise or to create a visualization, but let’s move on. Comparisons Performing comparisons during data profiling can also add new and different perspectives to the data. Beyond simple record counts (like total smoking patients visiting a hospital verses the total non-smoking patients visiting a hospital) one might ponder to compare the total number of hospital visits for each state to the average number of hospital visits for a state. This would require calculating the total number of hospital visits by state as well as the total number of hospital visits over all (then computing the average). The following 2 lines of code use the R functions table and write.csv to create a list (a file) of the total number of hospital visits found for each state: # --- calculates the number of hospital visits for each # --- state (state ID is in field 9 of the file StateVisitCount<-table(datas.df[9]) # --- write out a csv file of counts by state write.csv (StateVisitCount, file="C:/Big Data Visualization/Chapter 3/visitsByStateName.txt", quote = FALSE, row.names = FALSE) Below is a portion of the file that is generated: The following R command can be used to calculate the average number of hospitals by using the nrow function to obtain a count of records in the data source and then divide it by the number of states: # --- calculate the average averageVisits<-nrow(datas.df)/50 Going a bit further with this line of thinking, you might consider that the nine states the U.S. Census Bureau designates as the Northeast region are Connecticut, Maine, Massachusetts, New Hampshire, New York, New Jersey, Pennsylvania, Rhode Island and Vermont. What is the total number of hospital visits recorded in our file for the northeast region? R makes it simple with the subset function: # --- use subset function and the “OR” operator to only have # --- northeast region states in our list NERVisits<-subset(tmpRTable, as.character(V9)=="Connecticut" | as.character(V9)=="Maine" | as.character(V9)=="Massachusetts" | as.character(V9)=="New Hampshire" | as.character(V9)=="New York" | as.character(V9)=="New Jersey" | as.character(V9)=="Pennsylvania" | as.character(V9)=="Rhode Island" | as.character(V9)=="Vermont") Extending our scripting we can add some additional queries to calculate the average number of hospital visits for the northeast region and the total country: AvgNERVisits<-nrow(NERVisits)/9 averageVisits<-nrow(tmpRTable)/50 And let’s add a visualization: # -- the c objet is the the data for the barplot function to # --- graph c<-c(AvgNERVisits, averageVisits) # --- use R barplot barplot(c, ylim=c(0,3000), ylab="Average Visits", border="Black", names.arg = c("Northeast","all")) title("Northeast Region vs Country") The generated visualzation is shown below: Contrasts The examination of contrasting data is another form of extending data profiling. For example, using this article’s data, one could contrast the average body weight of patients that are under doctor’s care against the average body weight of patients that are not under a doctor’s care (after calculating average body weights for each group). To accomplish this, we can calculate the average weights for patients that fall into each category (those currently under a doctor’s care and those not currently under a doctor’s care) as well as for all patients, using the following R script: # --- read in our entire file tmpRTable<-read.table(file="C:/Big Data Visualization/Chapter 3/sampleHCSurvey02.txt",sep=",") # --- use the subset functionto create the 2 groups we are # --- interested in UCare.sub<-subset(tmpRTable, V20=="Yes") NUCare.sub<-subset(tmpRTable, V20=="No") # --- use the mean function to get the average body weight of all pateints in the file as well as for each of our separate groups average_undercare<-mean(as.numeric(as.character(UCare.sub[,5]))) average_notundercare<-mean(as.numeric(as.character(NUCare.sub[,5]))) averageoverall<-mean(as.numeric(as.character(tmpRTable[2:nrow(tmpRTable),5]))) average_undercare;average_notundercare;averageoverall In “short order”, we can use R’s ability to create subsets (using the subset function) of the data based upon values in a certain field (or column), then use the mean function to calculate the average patient weight for the group. The results from running the script (the calculated average weights) are shown below: And if we use the calculated results to create a simple visualization: # --- use R barplot to create the bar graph of # --- average patient weight barplot(c, ylim=c(0,200), ylab="Patient Weight", border="Black", names.arg = c("under care","not under care", "all"), legend.text= c(format(c[1],digits=5),format(c[2],digits=5),format(c[3],digits=5)))> title("Average Patient Weight") Tendencies Identifying tendencies present within your data is also an interesting way of extending data profiling. For example, using this article’s sample data, you might determine what the number of servings of water that was consumed per week by each patient age group. Earlier in this section we created a simple R script to count visits by age groups; it worked, but in a big data scenario, this may not work. A better approach would be to categorize the data into the age groups (age is the fourth field or column in the file) using the following script: # --- build subsets of each age group agegroup1<-subset(tmpRTable, as.numeric(V4)<22) agegroup2<-subset(tmpRTable, as.numeric(V4)>21 & as.numeric(V4)<35) agegroup3<-subset(tmpRTable, as.numeric(V4)>34 & as.numeric(V4)<45) agegroup4<-subset(tmpRTable, as.numeric(V4)>44 & as.numeric(V4)<55) agegroup5<-subset(tmpRTable, as.numeric(V4)>54 & as.numeric(V4)<66) agegroup6<-subset(tmpRTable, as.numeric(V4)>64) After we have our grouped data, we can calculate water consumption. For example, to count the total weekly servings of water (which is in field or column 96) for age group 1 we can use: # --- field 96 in the file is the number of servings of water # --- below line counts the total number of water servings for # --- age group 1 sum(as.numeric(agegroup1[,96])) Or the average number of servings of water for the same age group: mean(as.numeric(agegroup1[,96])) Take note that R requires the explicit conversion of the value of field 96 (even though it comes in the file as a number) to a number using the R function as.numeric. Now, let’s see create the visualization of this perspective of our data. Below is the R script used to generate the visualization: # --- group the data into age groups agegroup1<-subset(tmpRTable, as.numeric(V4)<22) agegroup2<-subset(tmpRTable, as.numeric(V4)>21 & as.numeric(V4)<35) agegroup3<-subset(tmpRTable, as.numeric(V4)>34 & as.numeric(V4)<45) agegroup4<-subset(tmpRTable, as.numeric(V4)>44 & as.numeric(V4)<55) agegroup5<-subset(tmpRTable, as.numeric(V4)>54 & as.numeric(V4)<66) agegroup6<-subset(tmpRTable, as.numeric(V4)>64) # --- calculate the averages by group g1<-mean(as.numeric(agegroup1[,96])) g2<-mean(as.numeric(agegroup2[,96])) g3<-mean(as.numeric(agegroup3[,96])) g4<-mean(as.numeric(agegroup4[,96])) g5<-mean(as.numeric(agegroup5[,96])) g6<-mean(as.numeric(agegroup6[,96])) # --- create the visualization barplot(c(g1,g2,g3,g4,g5,g6), + axisnames=TRUE, names.arg = c("<21", "22-34", "35-44", "45-54", "55-64", ">65")) > title("Glasses of Water by Age Group") The generated visualization is shown below: Dispersion Finally, dispersion is still another method of extended data profiling. Dispersion measures how various elements selected behave with regards to some sort of central tendency, usually the mean. For example, we might look at the total number of hospital visits for each age group, per calendar month in regards to the average number of hospital visits per month. For this example, we can use the R function subset in the R scripts (to define our age groups and then group the hospital records by those age groups) like we did in our last example. Below is the script, showing the calculation for each group: agegroup1<-subset(tmpRTable, as.numeric(V4) <22) agegroup2<-subset(tmpRTable, as.numeric(V4)>21 & as.numeric(V4)<35) agegroup3<-subset(tmpRTable, as.numeric(V4)>34 & as.numeric(V4)<45) agegroup4<-subset(tmpRTable, as.numeric(V4)>44 & as.numeric(V4)<55) agegroup5<-subset(tmpRTable, as.numeric(V4)>54 & as.numeric(V4)<66) agegroup6<-subset(tmpRTable, as.numeric(V4)>64) Remember, the previous scripts create subsets of the entire file (which we loaded into the object tmpRTable) and they contain all of the fields of the entire file. The agegroup1 group is partially displayed as follows: Once we have our data categorized by age group (agegroup1 through agegroup6), we can then go on and calculate a count of hospital stays by month for each group (shown in the following R commands). Note that the substr function is used to look at the month code (the first 3 characters of the record date) in the file since we (for now) don’t care about the year. The table function then can be used to create an array of counts by month. az1<-table(substr(agegroup1[,2],1,3)) az2<-table(substr(agegroup2[,2],1,3)) az3<-table(substr(agegroup3[,2],1,3)) az4<-table(substr(agegroup4[,2],1,3)) az5<-table(substr(agegroup5[,2],1,3)) az6<-table(substr(agegroup6[,2],1,3)) Using the above month totals, we can then calculate an average number of hospital visits for each month using the R function mean. This will be the mean function of the total for the month for ALL age groups: JanAvg<-mean(az1["Jan"], az2["Jan"], az3["Jan"], az4["Jan"], az5["Jan"], az6["Jan"]) Note that the above code example can be used to calculate an average for each month Next we can calculate the totals for each month, for each age group: Janag1<-az1["Jan"];Febag1<-az1["Feb"];Marag1<-az1["Mar"];Aprag1<-az1["Apr"];Mayag1<-az1["May"];Junag1<-az1["Jun"] Julag1<-az1["Jul"];Augag1<-az1["Aug"];Sepag1<-az1["Sep"];Octag1<-az1["Oct"];Novag1<-az1["Nov"];Decag1<-az1["Dec"] The following code “stacks” the totals so we can more easily visualize it later (we would have one line for each age group (that is, Group1Visits, Group2Visits and so on). Monthly_Visits<-c(JanAvg, FebAvg, MarAvg, AprAvg, MayAvg, JunAvg, JulAvg, AugAvg, SepAvg, OctAvg, NovAvg, DecAvg) Group1Visits<-c(Janag1,Febag1,Marag1,Aprag1,Mayag1,Junag1,Julag1,Augag1,Sepag1,Octag1,Novag1,Decag1) Group2Visits<-c(Janag2,Febag2,Marag2,Aprag2,Mayag2,Junag2,Julag2,Augag2,Sepag2,Octag2,Novag2,Decag2) Finally, we can now create the visualization: plot(Monthly_Visits, ylim=c(1000,4000)) lines(Group1Visits, type="b", col="red") lines(Group2Visits, type="b", col="purple") lines(Group3Visits, type="b", col="green") lines(Group4Visits, type="b", col="yellow") lines(Group5Visits, type="b", col="pink") lines(Group6Visits, type="b", col="blue") title("Hosptial Visits", sub = "Month to Month", cex.main = 2, font.main= 4, col.main= "blue", cex.sub = 0.75, font.sub = 3, col.sub = "red") and enjoy the generated output: Summary In this article we went over the idea and importance of establishing context and perhaps identifying perspectives to big data, using the data profiling with R. Additionally, we introduced and explored the R Programming language as an effective means to profile big data and used R in numerous illustrative examples. Once again, R is an extremely flexible and powerful tool that works well for data profiling and the reader would be well served researching and experimenting with the languages vast libraries available today as we have only scratched the surface of the features currently available. Resources for Article: Further resources on this subject: Introduction to R Programming Language and Statistical Environment [article] Fast Data Manipulation with R [article] DevOps Tools and Technologies [article]
Read more
  • 0
  • 2
  • 30180

article-image-getting-started-cisco-ucs-and-virtual-networking
Packt
07 Feb 2017
20 min read
Save for later

Getting Started with Cisco UCS and Virtual Networking

Packt
07 Feb 2017
20 min read
Introduction In this article by Anuj Modi, the author of the book Implementing CISCO UCS Solutions - Second Edition, we are given some insight into Cisco UCS products and its innovative architecture, which abstracts the underlying physical hardware and provides unified management to all devices. We will walk through the installation of some of the UCS hardware components and deployment of VSM. (For more resources related to this topic, see here.) UCS storage servers To provide a solution to meet storage requirements, Cisco introduced the new C3000 family of storage-optimized UCS rack servers in 2014. A standalone server with a capacity of 360 TB can be used for any kind of data-intensive application, such as big data, Hadoop, object-oriented storage, OpenStack, and other such enterprise applications requiring higher throughput and more efficient transfer rates. This server family can be integrated with existing any B-Series, C-Series, and M-Series servers to provide them with all the required storage and backup requirements. This is an ideal solution for customers who don't want to invest in high-end storage arrays and still want to meet the business requirements. Cisco C3000 storage servers are an optimal solution for medium and large scale-out storage requirements. Cisco UCS 3260 is the latest edition with better density, throughput, and dual-server support, whereas the earlier UCS 3160 provides the same density with a single server. These servers can be managed through CIMC like C-Series rack servers. UCS C3206 Cisco UCS C3206 is 4U rack server with Intel® Xeon® E5-2600 v2 processor family CPUs and up to 320 TB local storage with dual-server nodes and 4x40 Gig I/O throughput using Cisco VIC 1300. This storage can be shared across two compute nodes and provide HA solution inside the box. UCS C3106 Cisco UCS C3106 is a 4U rack server with Intel® Xeon® E5-2600 v2 processor family CPUs and up to 320 TB local storage with a single server node and 4x10 Gig I/O throughput. UCS M-Series modular servers Earlier in 2014, Cisco unleashed a new line of M-Series modular servers to meet the high-density, low-power demands of massively parallelized and cloud-scale applications. These modular servers separate the infrastructure components, such as network, storage, power, and cooling, from the compute nodes and deliver compute and memory to provide scalable applications. These compute nodes disaggregate resources such as processor and memory from the subsystem to provide a dense and power-efficient platform designed to increase compute capacity with unified management capabilities through UCS Manager called UCS System Link Technology. A modular chassis and compute cartridges are the building blocks of these M-Series servers. The M4308 chassis can hold up to eight compute cartridges. Each cartridge has a single or dual CPU with two memory channels and provides two nodes in a single cartridge to support up to 16 compute nodes in a single chassis. The cartridges are hot-pluggable and can be added or removed for the system. The Cisco VIC provides connectivity to UCS Fabric Interconnects for network and management. The UCS M-Series modular server includes the following components: Modular chassis M4308 Modular compute cartridges—M2814, M1414, and M142 M4308 The M4308 modular chassis is a 2U chassis that can accommodate eight compute cartridges with two servers per cartridge, and this makes for 16 servers in a single chassis. The chassis can be connected to a pair of Cisco Fabric Interconnects, providing network, storage, and management capabilities. M2814 The M2814 cartridge has dual-socket Intel® Xeon® E5-2600 v3 and v4 process family CPUs with 512 GB memory. This cartridge can be used for web-scale and small, in-memory databases. M1414 The M1414 cartridge has a single-socket Intel® Xeon® E3-1200 v3 and v4 process family CPU with 64 GB memory. This cartridge can be used for electronic design automation and simulation. M142 The M142 cartridge has a single-socket Intel® Xeon® E3-1200 v3 and v4 process family CPU with 128 GB memory. This cartridge can be used for content delivery, dedicated hosting, and financial modeling and business analytics. Cisco UCS Mini Cisco brings the power of UCS in a smaller form factor for smaller business, remote, and branch offices with the UCS Mini solution, a combination of blade and rack servers with unified management provided by Cisco UCS Manager. UCS Mini is compact version with specialized the 6324 Fabric Interconnect model embedded into the Cisco UCS blade chassis for a smaller server footprint. It provides servers, storage, network, and management capabilities similar to classic UCS. The 6324 Fabric Interconnect combines the Fabric Extender and Fabric Interconnect functions into one plugin module to provide direct connections to upstream switches. A pair of 6324 Fabric Interconnects will be directly inserted into the chassis, called the primary chassis, and provide internal connectivity to UCS blades. A Fabric Extender is not required for the primary chassis. The primary chassis can be connected to another chassis, called the secondary chassis, through a pair of 2208XP or 2204XP Fabric Extenders. Cisco UCS Mini supports a wide variety of UCS B-Series servers, such as B200 M3, 200 M4, B420 M3, and B420 M4. These blades can be inserted into mixed form factors. UCS Mini also supports connectivity to UCS C-Series servers C220 M3, C220 M4, C240 M3 and C240M4. A maximum of seven C-Series servers can be connected to a single chassis. It can support a maximum of 20 servers with a combination of two chassis with eight half-width blades per chassis and four C-Series servers. Here is an overview of Cisco UCS Mini: 6324 Cisco 6324 is embedded into the Cisco UCS 5108 blade server chassis. The 6324 Fabric Interconnect integrates the functions of a Fabric Interconnect and Fabric Extender and provides LAN, SAN, and management connectivity to blade servers and rack servers with embedded UCS Manager. 1G/10G SFP+ ports will be used for external network and SAN connectivity, while 40 Gig QSPF can only be used for connecting another chassis, rack server, and storage. One or two Cisco UCS 6324 FIs can be installed in UCS Mini. The following are the features of the 6324: A maximum of 20 blade/rack servers per UCS domain Four 1 Gig/10 Gig SFP+ unified ports One 40 Gig QSFP Fabric throughput of 500 Gbps Installing UCS hardware components Now, as we have a better understanding of the various components of the Cisco UCS platform, we can dive into the physical installation of UCS Fabric Interconnects and chassis, including blade servers, IOM modules, fan units, power supplies, SFP+ modules, and physical cabling. Before the physical installation of the UCS solution, it is also imperative to consider other data center design factors, including: Building floor load-bearing capacity Rack requirements for UCS chassis and Fabric Interconnects Rack airflow, heating, and ventilation (HVAC) Installing racks for UCS components Any standard 19-inch rack with a minimum depth of 29 inches to a maximum of 35 inches from front to rear rails with 42U height can be an ideal rack solution for Cisco UCS equipment. However, rack size can vary based on different requirements and the data center's landscape. The rack should provide sufficient space for power, cooling, and cabling for all the devices. It should be designed according to data center best practices to optimize its resources. The front and rear doors should provide sufficient space to rack and stack the equipment and adequate space for all types of cabling. Suitable cable managers can be used to manage the cables coming from all the devices. It is always recommended that heavier devices be placed at the bottom and lighter devices at the top. For example, a Cisco UCS B-Series chassis can be placed at the bottom and Fabric Interconnect at the top of the rack. The number of UCS or Nexus devices in a rack can vary based on the total power available for it. The Cisco R-Series R42610 rack is certified for Cisco UCS devices and Nexus switches, which provides better reliability, space, and structural integrity to data centers. Installing UCS chassis components Care must be taken during the installation of all components as failure to follow installation procedures may result in component malfunction and bodily injury. UCS chassis Don'ts: Do not try to lift even the empty chassis alone. At least two people are required to handle a UCS chassis. Do not handle internal components such as CPU, RAM, and mezzanine cards without an ESD field kit. Physical installation is divided into three sections: Blade server component (CPU, memory, hard drives, and mezzanine card) installation Chassis component (blade servers, IOMs, fan units, and power supplies) installation Fabric Interconnect installation and physical cabling Installing blade servers Cisco UCS blade servers are designed with industry-standard components with some enhancements. Anyone with prior server installation experience should be comfortable installing internal components using guidelines provided in the blade server's manual and following standard safety procedures. Transient ESD charges may result in thousands of volts of charge, which can degrade or permanently damage electronic components. The Cisco ESD training course can be found at http://www.cisco.com/web/learning/le31/esd/WelcomeP.html. All Cisco UCS blade servers have a similar cover design, with a button at the front top of the blade, which should be pushed down. Then, there are slight variations among models in the way that cover is slid off, which can be toward the rear and up or toward yourself and up. Installing and removing CPUs The following is the procedure to mount a CPU into a UCS B-Series blade server: Make sure you are wearing an ESD wrist wrap grounded to the blade server cover. To release the CPU clasp, slide it down and to the side. Move the lever up and remove the CPU's blank cover. Keep the blank in a safe place just in case you to remove the CPU later. Lift the CPU by its plastic edges and align it with the socket. CPU can only fit one way. Lower the mounting bracket with the side lever, and secure the CPU into the socket. Align the heat sink with its fins in a position allowing unobstructed airflow from front to back. Gently tighten the heat sink screws to the motherboard. CPU removal is the reverse of the installation process. It is critical to place the blank socket cover back over the CPU socket. Damage could occur to the socket without the blank cover. Installing and removing RAM The following is the procedure to install RAM modules into a UCS B-Series blade server: Make sure you are wearing an ESD wrist wrap grounded to the blade server cover. Undo the clips on the side of the memory slot. Hold the memory module from both edges in an upright position and firmly push it straight down, matching the notch of the module to the socket. Close the side clips to hold the memory module. Memory removal is the reverse of the preceding process. Memory modules must be inserted in pairs and split equally between each CPU if all the memory slots are not populated. Refer to the server manual for identifying memory slots pairs, slot-CPU relation, and optimized memory performance. Installing and removing internal hard disks UCS supports SFF, serial attached SCSI (SAS), and SATA solid-state disk (SSD) hard drives. B200 M4 blade servers also support non-volatile memory express (NVMe) SFF 2.5-inch hard drives via PCI Express. The B200 M4 also provides the option of an SD card to deploy small footprint hypervisors such as ESXi. To insert a hard disk into B200, B260, B420, and B460 blade servers, follow these steps: Make sure you are wearing an ESD wrist wrap grounded to the blade server cover. Remove the blank cover. Press the button on the catch lever on the ejector arm. Slide the hard disk completely into the slot. Push the ejector lever until it clicks to lock the hard disk. To remove a hard disk, press the button, release the catch lever, and slide the hard disk out. Do not leave a hard disk slot empty. If you do not intend to replace the hard disk, cover it with a blank plate to ensure proper airflow. Installing mezzanine cards The UCS B200 supports a single mezzanine card, B260/B420 support two cards, and B460 supports four cards. The procedure for installing these cards is the same for all servers: Make sure you are wearing an ESD wrist wrap grounded to the blade server cover. Open the server's top cover. Grab the card by the edges and align the male molex connector, the female connector, and the motherboard. Press the card gently into the slot. Once card is properly seated, secure it by tightening the screw on the top. Removing a mezzanine card is the reverse of the preceding process. Installing blade servers into a chassis The installation and removal of half-width and full-width blade servers is almost identical, with the only difference being that there's one ejector arm in a half-width blade server, whereas in a full-width blade server, there are two ejector arms. Only the B460 M4 blade server requires two full-width slots in the chassis with an additional Scalability Connector to connect two B260 M4 blade servers and allow them to function as a single server. The bottom blade in this pair serves as the master and top blade as slave. The process is as follows: Make sure you are wearing an ESD wrist wrap grounded to the chassis. Open the ejector arm for a half-width blade or both ejector arms for a full-width blade. Push the blade into the slot. Once it's firmly in, close the ejector arm on the face of server and hand-tighten the screw. The removal of a blade server is the reverse of the preceding process. In order to install a full-width blade, it is necessary to remove the central divider. This can be done with a Philips-head screw driver to push two clips, one downward and the other upward, and sliding the divider out of the chassis. Installing rack servers Cisco UCS rack servers are designed as standalone servers; however, these can be managed or unmanaged through UCS Managers. Unmanaged servers can be connected to any standard upstream switch or Nexus switch to provide network access. However, managed servers need to connect with Fabric Interconnects directly or indirectly through Fabric Extenders to provide LAN, SAN, and management network functionality. Based on the application requirement, the model of rack server can be selected from the various options discussed in the previous chapter. All the rack servers are designed in a similar way; however, they vary in capacity, size, weight, and dimensions. These rack servers can be fit into any standard 19-inch rack with adequate air space for servicing the server. Servers should be installed on the slide rails provided with them. At least two people or a mechanical lift should be used to place the servers in the rack. Always place the heavy rack servers at the bottom of the rack, and lighter servers can be placed at the top. Cisco UCS rack servers provide front-to-rear cooling, and the data center should maintain adequate air conditioning to dissipate the heat from these servers. A Cisco rack server's power requirement depends on the model and can be checked in the server data sheet. Installing Fabric Interconnects The Cisco UCS Fabric Interconnect is a top-of-rack switch that connects the LAN, SAN, and management to underlying physical compute servers for Ethernet and Fibre Channel access. Normally, a pair of Fabric Interconnects can be installed either in a single rack or distributed across two racks to provide rack and power redundancy. Fabric Interconnects can be installed at the bottom in case the network cabling is planned under the floor. All the components in a Fabric Interconnect come installed by default from the factory; only external components such as Ethernet modules, power supplies, and fans need to be connected. It includes two redundant hot-swappable power supply units. It is recommended that the power supply in the rack come from two different energy sources and installed with a redundant power distribution unit (PDU). It requires a maximum of 750 Watts of power. Although the Fabric Interconnect can be powered with a single power source, it is recommended you have redundant power sources. It includes four redundant hot-swappable fans to provide the most efficient front-to-rear cooling design and is designed to work in hot aisle/cold aisle environments. It dissipates approximately 2500 BTU/hour, and adequate cooling should be available in the data center for getting better performance. For detailed information on power, cooling, physical, and environmental specifications, check the data sheet. UCS FI 6300 series data sheet: http://www.cisco.com/c/dam/en/us/products/collateral/servers-unified-computing/ucs-b-series-blade-servers/6332-specsheet.pdf UCS FI 6200 series data sheet: http://www.cisco.com/c/en/us/products/collateral/servers-unified-computing/ucs-6200-series-fabric-interconnects/data_sheet_c78-675245.pdf Fabric Interconnects provide various options to connect with networks and servers. The 6300 Fabric Interconnect series can provide 10/40 Gig connectivity, while the FI 6200 series can provide 1/10 Gig to the upstream network. The third-generation FI 6332 has 32x40 Gig fixed ports. Ports 1-12 and 15-26 can be configured as 40 Gig QSFP+ ports or as 4x10 Gig SFP+ breakout ports, while ports 13 and 14 can be configured as 40 Gig or 1/10 Gig with QSA adapters but can't be configured with 10 Gig SFP+ breakout cables. The last six ports, 27-32, are dedicated for 40 Gig upstream switch connectivity. The third-generation FI 6332-16UP can be deployed where native Fibre Channel connectivity is essential. The first 16 unified ports, 1-16, can be configured as 1/10 Gig SFP+ Ethernet ports or 4/8/16 Gig Fibre Channel ports. Ports 17-34 can be configured as 40 Gig QSFP+ ports or 4x10 Gig SFP+ breakout ports. The last six ports, 35-40, are dedicated for 40 Gig upstream switch connectivity. The second-generation FI 6248UP has 32x10 Gig fixed unified ports and one expansion module to provide an additional 16 unified ports. All unified ports can be configured as either 1/10 Gig Ethernet or 1/2/4/8 Gig Fibre Channel. The second-generation FI 6296UP has 48x10 Gig fixed unified ports and three expansion modules to provide an additional 48 unified ports. All unified ports can be configured as either 1/10 Gig Ethernet or 1/2/4/8 Gig Fibre Channel. Deploying VSM Cisco has simplified VSM deployment with Virtual Switch Update Manager (VSUM). It is a virtual appliance that can be installed on a VMware ESXi hypervisor and registered as plugin with VMware vCenter. VSUM enables the SA to install, monitor, migrate, and upgrade Cisco Nexus 1000V in high availability or standalone mode. With VSUM, you can deploy multiple instances of Nexus 1000V and can manage them from a single appliance. VSUM only provides layer 3 connectivity with ESXi hosts. For layer 2 connectivity with ESXi, you still have to deploy the Nexus 1000V VSM manually and configure L2 mode in basic configuration. VSUM architecture VSUM has two components: backend as virtual appliance and frontend as GUI integrated into VMware vCenter Server. The virtual appliance will be configured with an IP address, similar to a VMware vCenter Management IP address subnet. Once a virtual appliance is deployed on the ESXi, it establishes connectivity with VMware vCenter Server to get access to vCenter and hosts. VSUM and VSM installation VSM deployment will be divided into two parts. First, we will install VSUM and register it as a plugin with VMware vCenter. The second part is importing the Nexus 1000V package into VSUM and deploying Nexus 1000V through the VSUM GUI interface. For installing and configuring Cisco VSUM, you require one management port group, which will communicate with vCenter. However, the Nexus 1000V VSM will require two port groups for control and management. You can use just one VLAN for all of these, but it is preferred to have them configured with separate VLANs. Download the Nexus 1000v package and VSUM from https://software.cisco.com/download/type.html?mdfid=282646785&flowid=42790, and then use the following procedure to deploy VSUM and VSM: Go to vCenter, click on the File menu, and select Deploy OVF Template. Browse to the location of the VSUM OVA file, click on Browse… to install the required OVA file, and click on Next, as shown in the following screenshot: Click on Next twice. Click on Accept for the End User License Agreement page, and then click on Next. Specify a Name for the VSUM, and select the data center where you would like it to be installed. Then, click on Next, as shown in the following screenshot: Select the Datastore object to install the VSUM. Choose Thin Provision for the virtual disk of VSUM and click on Next. Under the Destination network page, select the network that you had created earlier for Management. Then, click on Next, as shown in the following screenshot: Enter the management IP address, DNS, vCenter IP address, username and password values, as shown in the following screenshot: A summary of all your settings is then presented. If you are happy with these settings, click on Finish. Once VSUM is deployed in the vCenter, power on the VM. This will take 5 minutes for installation and registration as a vCenter plugin. To view the VSUM GUI, you need to re-login into vCenter server to activate the VSUM plugin. Click on Cisco Virtual Switch Update Manager, and select the Upload option under Image Tasks. Upload the Nexus 1000v package downloaded earlier under Upload switch image. This will open Virtual Switch Image File Uploader and select the appropriate package. Once the image is uploaded, it will show up under Manage Uploaded switch images. Click on the Nexus 1000V tab under Basic Tasks, click on Install, and select the desired data center. In Nexus 1000V installer page, go to install a new control plane VSM, and select the proper options for Nexus1000V Switch Deployment Type, VSM Version, and Choose a Port Group. Select the host, VSM Domain ID, Switch Name, IP Address, Username, and Password. Then, click on Finish. A window will appear to show the progress. Once it's deployed, select the Nexus 1000v VM, and click on Power On in the Summary screen. Go to the VSUM GUI, and click on Dashboard under Nexus 1000V Basic Tasks. Verify that the VC Connection Status is green for your installed VSM. You need to create a port profile of the Ethernet type for uplink management. For this, log in to the Nexus 1000v switch using SSH, and type the following commands: port-profile type ethernet system-uplink vmware port-group switchport mode trunk switchport trunk allowed vlan 1-1000 mtu 9000 no shutdown system vlan 1-1000 state enabled Move over to vCenter, and add your ESXi host to the Nexus environment. Open vCenter, click on Inventory, and select Networking. Expand Datacenter and select Nexus Switch. Right-click on it and select Add Host. Select one of the unused VMNICs on the host, then select the uplink port group created earlier (the one carrying the system VLAN data), and click on Next. On the Network Connectivity page, click on Next. On the Virtual Machine Networking page, click on Next. Do not select the checkbox to migrate virtual machine networking. On the Ready to complete page, click on Finish. Finally, run the vemcmd show card command to confirm whether the opaque data is now being received by the VEM. Log in to your Nexus 1000v switch and type in show module to check whether your host has been added or not. Log in to your ESXi server and type VEM Status. As you can see, VEM is now loaded on the VMNIC. Summary In this article, we saw the various available options for all UCS components, such as storage servers, modular servers, and UCS Mini. We also learned about rack server installation and UCS hardware component installation such as chassis, blade, I/O module, and Fabric Interconnect. We learned about the VSUM architecture and Nexus 1000V components and installed those components. Resources for Article: Further resources on this subject: EMC Storage connectivity of Cisco UCS B-Series server [article] Creating Identity and Resource Pools [article] Securing your Elastix System [article]
Read more
  • 0
  • 0
  • 7565

article-image-encapsulation-data-properties
Packt
07 Feb 2017
7 min read
Save for later

Encapsulation of Data with Properties

Packt
07 Feb 2017
7 min read
In this article by Gastón C. Hillar, the author of the book Swift 3 Object Oriented Programming - Second Edition, you will learn about all the elements that might compose a class. We will start organizing data in blueprints that generate instances. We will work with examples to understand how to encapsulate and hide data by working with properties combined with access control. In addition, you will learn about properties, methods, and mutable versus immutable classes. (For more resources related to this topic, see here.) Understanding the elements that compose a class So far, we worked with a very simple class and many instances of this class in the Playground, the Swift REPL and the web-based Swift Sandbox. Now, it is time to dive deep into the different members of a class. The following list enumerates the most common element types that you can include in a class definition in Swift and their equivalents in other programming languages. We have already worked with a few of these elements: Initializers: This is equivalent to constructors in other programming languages Deinitializer: This is equivalent to destructors in other programming languages Type properties: This is equivalent to class fields or class attributes in other programming languages Type methods: This is equivalent to class methods in other programming languages Subscripts: This is also known as shortcuts Instance properties: This is equivalent to instance fields or instance attributes in other programming languages Instance methods: This is equivalent to instance functions in other programming languages Nested types: These are types that only exist within the class in which we define them We could access the instance property without any kind of restrictions as a variable within an instance. However, as it happens sometimes in real-world situations, restrictions are necessary to avoid serious problems. Sometimes, we want to restrict access or transform specific instance properties into read-only attributes. We can combine the restrictions with computed properties that can define getters and/or setters. Computed properties can define get and or set methods, also known as getters and setters. Setters allow us to control how values are set, that is, these methods are used to change the values of related properties. Getters allow us to control the values that we return when computed properties are accessed. Getters don't change the values of related properties. Sometimes, all the members of a class share the same attribute, and we don't need to have a specific value for each instance. For example, the superhero types have some profile values, such as the average strength, average running speed, attack power, and defense power. We can define the following type properties to store the values that are shared by all the instances: averageStrength, averageRunningSpeed, attackPower, and defensePower. All the instances have access to the same type properties and their values. However, it is also possible to apply restrictions to their access. It is also possible to define methods that don't require an instance of a specific class to be called; therefore, you can invoke them by specifying both the class and method names. These methods are known as type methods, operate on a class as a whole, and have access to type properties, but they don't have access to any instance members, such as instance properties or methods, because there is no instance at all. Type methods are useful when you want to include methods related to a class and don't want to generate an instance to call them. Type methods are also known as static or class methods. However, we have to pay attention to the keyword we use to declare type methods in Swift because a type method declared with the static keyword has a different behavior from a type method declared with the class keyword. Declaring stored properties When we design classes, we want to make sure that all the necessary data is available to the methods that will operate on this data; therefore, we encapsulate data. However, we just want relevant information to be visible to the users of our classes that will create instances, change values of accessible properties, and call the available methods. Thus, we want to hide or protect some data that is just needed for internal use. We don't want to make accidental changes to sensitive data. For example, when we create a new instance of any superhero, we can use both its name and birth year as two parameters for the initializer. The initializer sets the values of two properties: name and birthYear. The following lines show a sample code that declares the SuperHero class. class SuperHero { var name: String var birthYear: Int init(name: String, birthYear: Int) { self.name = name self.birthYear = birthYear } } The next lines create two instances that initialize the values of the two properties and then use the print function to display their values in the Playground: var antMan = SuperHero(name: "Ant-Man", birthYear: 1975) print(antMan.name) print(antMan.birthYear) var ironMan = SuperHero(name: "Iron-Man", birthYear: 1982) print(ironMan.name) print(ironMan.birthYear) The following screenshot shows the results of the declaration of the class and the execution of the lines in the Playground: The following screenshot shows the results of running the code in the Swift REPL. The REPL displays details about the two instances we just created: antMan and ironMan. The details include the values of the name and birthYear properties: The following lines show the output that the Swift REPL displays after we create the two SuperHero instances: antMan: SuperHero = { name = "Ant-Man" birthYear = 1975 } ironMan: SuperHero = { name = "Iron-Man" birthYear = 1982 } We can read the two lines as follows: the antMan variable holds an instance of SuperHero with its name set to "Ant-Man" and its birthYear set to 1975. The ironMan variable holds an instance of SuperHero with its name set to "Iron-Man" and its birthYear set to 1982. The following screenshot shows the results of running the code in the web-based IBM Swift Sandbox: We don't want a user of our SuperHero class to be able to change a superhero's name after an instance is initialized because the name is not supposed to change. There is a simple way to achieve this goal in our previously declared class. We can use the let keyword to define an immutable name stored property of type string instead of using the var keyword. We can also replace the var keyword with let when we define the birthYear stored property because the birth year will never change after we initialize a superhero instance. The following lines show the new code that declares the SuperHero class with two stored immutable properties: name and birthYear. Note that the initializer code hasn't changed, and it is possible to initialize both the immutable stored properties with the same code: class SuperHero { let name: String let birthYear: Int init(name: String, birthYear: Int) { self.name = name self.birthYear = birthYear } } Stored immutable properties are also known as stored nonmutating properties. The next lines create an instance that initializes the values of the two immutable stored properties and then use the print function to display their values in the Playground. Then, the two highlighted lines of code try to assign a new value to both properties and fail to do so because they are immutable properties: var antMan = SuperHero(name: "Ant-Man", birthYear: 1975) print(antMan.name) print(antMan.birthYear) antMan.name = "Batman" antMan.birthYear = 1976 The Playground displays the following two error messages for the last two lines, as shown in the next screenshot. We will see similar error messages in the Swift REPL and in the Swift Sandbox: Cannot assign to property: 'name' is a 'let' constant Cannot assign to property: 'birthYear' is a 'let' constant When we use the let keyword to declare a stored property, we can initialize the property, but it becomes immutable-that is, a constant-after its initialization. Summary In this article, you learned about the different members of a class or blueprint. We worked with instance properties, type properties, instance methods, and type methods. We worked with stored properties, getters, setters. Resources for Article: Further resources on this subject: Swift's Core Libraries [article] The Swift Programming Language [article] Network Development with Swift [article]
Read more
  • 0
  • 0
  • 2477

article-image-classification-using-convolutional-neural-networks
Mohammad Pezeshki
07 Feb 2017
5 min read
Save for later

Classification using Convolutional Neural Networks

Mohammad Pezeshki
07 Feb 2017
5 min read
In this blog post, we begin with a simple classification task that the reader can readily relate to. The task is a binary classification of 25000 images of cats and dogs, divided into 20000 training, 2500 validation, and 2500 testing images. It seems reasonable to use the most promising model for object recognition, which is convolutional neural network (CNN). As a result, we use CNN as the baseline for the experiments, and along with this post, we will try to improve its performance using different techniques. So, in the next sections, we will first introduce CNN and its architecture and then we will explore three techniques to boost the performance and speed. These three techniques are using Parametric ReLU and a method of Batch Normalization. In this post, we will show the experimental results as we go through each technique. The complete code for CNN is available online in the author’s GitHub repository. Convolutional Neural Networks Convolutional neural networks can be seen as feedforward neural networks that multiple copies of the same neuron are applied to in different places. It means applying the same function to different patches of an image. Doing this means that we are explicitly imposing our knowledge about data (images) into the model structure. That's because we already know that natural image data is translation invariant, meaning that probability distribution of pixels are the same across all images. This structure, which is followed by a non-linearity and a pooling and subsampling layer, makes CNN’s powerful models, especially, when dealing with images. Here's a graphical illustration of CNN from Prof. Hugo Larochelle's course of Neural Networks, which is originally from Prof. YannLecun's paper on ConvNets. Implementation of a CNN in a GPU-based language of Theano is so straightforward as well. So, we can create a layer like this: And then we can stack them on top of each other like this: CNN Experiments Armed with CNN, we attacked the task using two baseline models. A relatively big, and a relatively small model. In the figures below, you can see the number for layer, filter size, pooling size, stride, and a number of fully connected layers. We trained both networks with a learning rate of 0.01, and a momentum of 0.9 on a GTX580 GPU. We also used early stopping. The small model can be trained in two hours and results in 81 percent accuracy on validation sets. The big model can be trained in 24 hours and results in 92 percent accuracy on validation sets. Parametric ReLU Parametric ReLU (aka Leaky ReLU) is an extension to Rectified Linear Unitthat allows the neuron to learn the slope of activation function in the negative region. Unlike the actual paper of Parametric ReLU by Microsoft Research, I used a different parameterizationthat forces the slope to be between 0 and 1. As shown in the figure below, when alpha is 0, the activation function is just linear. On the other hand, if alpha is 1, then the activation function is exactly the ReLU. Interestingly, although the number of trainable parameters is increased using Parametric ReLU, it improves the model both in terms of accuracy and in terms of convergence speed. Using Parametric ReLU makes the training time 3/4 and increases the accuracy around 1 percent. In Parametric ReLU,to make sure that alpha remains between 0 and 1, we will set alpha = Sigmoid(beta) and optimize beta instead. In our experiments, we will set the initial value of alpha to 0.5. After training, all alphas were between 0.5 and 0.8. That means that the model enjoys having a small gradient in the negative region. “Basically, even a small slope in negative region of activation function can help training a lot. Besides, it's important to let the model decide how much nonlinearity it needs.” Batch Normalization Batch Normalization simply means normalizing preactivations for each batch to have zero mean and unit variance. Based on a recent paper by Google, this normalization reduces a problem called Internal Covariance Shift and consequently makes the learning much faster. The equations are as follows: Personally, during this post, I found this as one of the most interesting and simplest techniques I've ever used. A very important point to keep in mind is to feed the whole validation set as a single batch at testing time to have a more accurate (less biased) estimation of mean and variance. “Batch Normalization, which means normalizing pre-activations for each batch to have zero mean and unit variance, can boost the results both in terms of accuracy and in terms of convergence speed.” Conclusion All in all, we will conclude this post with two finalized models. One of them can be trained in 10 epochs or, equivalently, 15 minutes, and can achieve 80 percent accuracy. The other model is a relatively large model. In this model, we did not use LDNN, but the two other techniques are used, and we achieved 94.5 percent accuracy. About the Author Mohammad Pezeshki is a PhD student in the MILA lab at University of Montreal. He obtained his bachelor's in computer engineering from Amirkabir University of Technology (Tehran Polytechnic) in July 2014. He then obtained his Master’s in June 2016. His research interests lie in the fields of Artificial Intelligence, Machine Learning, Probabilistic Models and, specifically,Deep Learning.
Read more
  • 0
  • 0
  • 26196
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-hierarchical-clustering
Packt
07 Feb 2017
6 min read
Save for later

Hierarchical Clustering

Packt
07 Feb 2017
6 min read
In this article by Atul Tripathi, author of the book Machine Learning Cookbook, we will cover hierarchical clustering with a World Bank sample dataset. (For more resources related to this topic, see here.) Introduction Hierarchical clustering is one of the most important methods in unsupervised learning is hierarchical clustering. In hierarchical clustering for a given set of data points, the output is produced in the form of a binary tree (dendrogram). In the binary tree, the leaves represent the data points while internal nodes represent nested clusters of various sizes. Each object is assigned to a separate cluster. Evaluation of all the clusters shall take place based on a pairwise distance matrix. The distance matrix shall be constructed using distance values. The pair of clusters with the shortest distance must be considered. The pair identified should then be removed from the matrix and merged together. The merged clusters must be distance must be evaluated with the other clusters and the distance matrix must be updated. The process must be repeated until the distance matrix is reduced to a single element. Hierarchical clustering - World Bank sample dataset One of the main goals for establishing the World Bank has been to fight and eliminate poverty. Continuous evolution and fine tuning its policies in the ever-evolving world has been helping the institution to achieve the goal of poverty elimination. The barometer of success in elimination of poverty is measured in terms of improvement of each of the parameters in health, education, sanitation, infrastructure, and other services needed to improve the lives of poor. The development gains which will ensure the goals must be pursued in an environmentally, socially, and economically sustainable manner. Getting ready In order to perform hierarchical clustering, we shall be using a dataset collected from the World Bank dataset.  Step 1 - Collecting and describing data The dataset titled WBClust2013 shall be used. This is available in the CSV format titled WBClust2013.csv. The dataset is in standard format. There are 80 rows of data. There are 14 variables. The numeric variables are: new.forest Rural log.CO2 log.GNI log.Energy.2011 LifeExp Fertility InfMort log.Exports log.Imports CellPhone RuralWater Pop The non-numeric variables are: Country How to do it Step 2 - exploring data Version info: Code for this page was tested in R version 3.2.3 (2015-12-10) Let's explore the data and understand the relationships among the variables. We'll begin by importing the CSV file named WBClust2013.csv. We will be saving the data to the wbclust data frame: > wbclust=read.csv("d:/WBClust2013.csv",header=T) Next, we shall print the wbclust data frame. The head() function returns the wbclust data frame. The wbclust data frame is passed as an input parameter: > head(wbclust) The results are as follows: Step 3 - transforming data Centering variables and creating z-scores are two common data analysis activities to standardize data. The numeric variables mentioned above need to create z-scores. The scale()function is a generic function whose default method centers and/or scales the columns of a numeric matrix. The data frame, wbclust is passed to the scale function. All the numeric fields are only considered. The result is then stored in another data frame, wbnorm. > wbnorm<-scale(wbclust[,2:13]) > wbnorm The results are as follows: All data frames have a row names attribute. In order to retrieve or set the row or column names of a matrix-like object, the rownames()function is used. The data frame, wbclust with the first column is passed to the rownames()function. > rownames(wbnorm)=wbclust[,1] > rownames(wbnorm) The call to the function rownames(wbnorm)results in display of the values from the first column. The results are as follows: Step 4 - training and evaluating the model performance The next step is about training the model. The first step is to calculate the distance matrix. The dist()function is used. Using the specified distance measure, distances between the rows of a data matrix are computed. The distance measure used can be Euclidean, maximum, Manhattan, Canberra, binary, or Minkowski. The distance measure used is Euclidean. The Euclidean distance calculates the distance between two vectors as sqrt(sum((x_i - y_i)^2)).The result is then stored in a new data frame, dist1. > dist1<-dist(wbnorm, method="euclidean") The next step is to perform clustering using Ward's method. The hclust() function is used. In order to perform cluster analysis on a set of dissimilarities of the n objects, the hclust()function is used. At the first stage, each of the objects is assigned to its own cluster. After which, at each stage the algorithm iterates and joins two of the most similar clusters. This process will continue till there is just a single cluster left. The hclust() function requires that we provide the data in the form of a distance matrix. The dist1 data frame is passed. By default, the complete linkage method is used. There are multiple agglomeration methods which can be used. Some of the agglomeration methods could be ward.D, ward.D2, single, complete, average. > clust1<-hclust(dist1,method="ward.D") > clust1 The call to the function, clust1results in display of the agglomeration methods used, the manner in which the distance is calculated, and the number of objects. The results are as follows: Step 5 - plotting the model The plot()function is a generic function for plotting of R objects. Here, the plot() function is used to draw the dendrogram: > plot(clust1,labels= wbclust$Country, cex=0.7, xlab="",ylab="Distance",main="Clustering for 80 Most Populous Countries") The result is as follows: The rect.hclust() function highlights the clusters and draws the rectangles around the branches of the dendrogram. The dendrogram is first cut at a certain level followed by drawing a rectangle around the selected branches. The object, clust1 is passed as an object to the function along with the number of clusters to be formed: > rect.hclust(clust1,k=5) The result is as follows: The cuts()function shall cut the tree into multiple groups on the basis of the desired number of groups or the cut height. Here, clust1 is passed as an object to the function along with the number of the desired group: > cuts=cutree(clust1,k=5) > cuts The result is as follows: Getting the list of countries in each group. The result is as follows: Summary In this article we covered hierarchical clustering by collecting, exploring its contents, transforming the data. We trained and evaluated it by using distance matrix and finally plotted the data as a dendrogram. Resources for Article: Further resources on this subject: Supervised Machine Learning [article] Specialized Machine Learning Topics [article] Machine Learning Using Spark MLlib [article]
Read more
  • 0
  • 0
  • 5811

article-image-building-data-driven-application
Packt
06 Feb 2017
8 min read
Save for later

Building Data Driven Application

Packt
06 Feb 2017
8 min read
In this article by Ashish Srivastav the authors of the book ServiceNow Cookbook, we will learn following recipes: Starting a new application Getting into new modules (For more resources related to this topic, see here.) Service-Now provides a great platform to developers. Although it's a java-based application. Starting a new application Out of the box, Service-Now provides many applications for facilitating business operations and other activities in an IT environment, but if you find that the customer's requirements are not fitting into the system's applications boundaries then you can think of creating new applications. In these recipes, we will build a small application which will include table, form, business rules, client script, ACL, update sets, deployment, and so on. Getting ready To step through this recipe, you must have an active Service-Now instance, valid credentials, and an admin role. How to do it… Open any standard web browser and type the instance address. Log in to the Service-Now instance with the credentials. On the left-hand side in the search box, type local update and Service-Now will search the module for you: Local update set for a new application Now, you need to click on Local Update Sets to create a new update set so that you can capture all your configuration in the update set:   Create new update set On clicking, you need to give the name as Book registration application and click on the Submit and Make Current button:   Local update set – book registration application Now you will able to see the Book registration application update set next to your profile, which means you are ready to create a new application:   Current update set On the left-hand side in the search box, type system definition and click on the Application Menus module:   Application menu to create a new application To understand better, you can click on any application menu as shown here, where you can see the application and associated modules. For example, you can clicked on the self-service application as shown here:   Self-service application Now to see the associated modules, you need to scroll down as shown here, and if you want to create a new module then you need to click on the New button:   Self-service application's modules Now you should have a better understanding of how applications look within the Service-Now environment. So, to make a new application, you need to click on the New button on the Application Menus page:   Applications repository After clicking on the New button, you will able to see the configuration page. To understand better, let's consider this example. For instance, you are creating a Book Registration application for your customer, with the following configuration: Title: Book Registration Role: Leave blank Application: Global Active: True Category: Custom Applications (You can change it as per your requirement)   Book registration configuration Click on the Submit button. After sometime, a new Book Registration application menu will be visible under the application menu:   Book registration Getting into new modules A module is a part of an application which contains actually workable items. As a developer, you will always have an option to add a new module to support business requirements. Getting ready To step through this recipe, you should have an active Service-Now instance, valid credentials, and an admin role. How to do it… Open any standard web browser and type the instance address. Log into the Service-Now instance with the credentials. Service-Now gives you many options to create a new module. You can create a new module from the Application Menu module or you can go through the Table & Column module as well. If you have chosen to create a new module from the Application Menu module, then in order to create the module, click on the Book Registration application menu and scroll down. To create a new module, click on the New button:   Creating a New Module After clicking on the New button, you will able to see a configuration screen, where you need configure the following fields: Title: Author Registration Application Menu: Book Registration   The Author Registration module registration under the Book Registration menu Now in the Link Type section, you will need to configure the new module, rather I would say, you will need to define the base line regarding what your new module will do? Whether the new module will show a form to create a record or it will show a list of reports from a table or it will execute some the reports? That's why this is a critical step:   Link type to create a new module and select a table Link type gives you many options to decide the behavior of the new module.   Link type options Now, let's take a different approach to create a new module: On the left-hand side, type column and Service-Now will search the Tables & Columns module for you. Click on the Tables & Columns module:   The Tables & Columns module Now you will able to see the configuration page as shown here, where you need to click on the Create Table button. Note that by clicking on Create Table, you can create a new table:   Tables & Columns – Create a new table After clicking on the Create Table button, you will able to see the configuration page as given here, where you need to configure the following fields: Label: Author Registration (Module name) Name: Auto populate Extends table: Task (By extends, your module will incorporate all fields of the base table, in this scenario, Task table)   Module Configuration Create module: To create a new module through table, check the Create module check box, and to add a module under your application, you will need to select the application name:   Add module in application Controls is a critical section as out of the box, Service-Now gives an option to auto create number. For incident, INC is the prefix or for change ticker CHG is the prefix; here you are also allowed to create your prefix for a new module record:   Configure the new module Controls section Now you will able to auto number the configuration, as shown here; your new records will start with the AUT prefix:   New module auto numbering Click on Submit. After submission of the form, Service-Now will create a role automatically for the new module, as shown here. Only the u_author_registration_user role holder will able to view the module. So whenever a request is generated, you will need to go into that particular user's profile from the user administration module, to add a role:   Role created for module Your module is created, but there are no fields. So for rapid development, you can directly add a column to the table by clicking on Insert a new row...:   The Insert field in the form As an output, you will able to see that a new Author Registrations module is added under the Book Registration application:   Search newly created module Now if you click on the Author Registrations module, you will be able to see the following page:   The Author Registration Page On clicking the New button, you will be able to see the form as shown here. Note that you have not added any field on the u_author_registration table, but the table extends to the TASK table. That's why you are able to see fields on the form, but they are coming through the TASK table:   Author registration form without new fields If you want to add new fields to the form, then you can do so by performing the following steps: Right-click on banner. Select Configure. Click on Form Layout. Now in the Create new field section, you can enter the desired field Name and Type, as shown here: Form Fields Field Type Author Name String Author Primary Email Address String Author Secondary Email Address String Author Mobile Number String Author Permanent Address String Author Temporary Address String Country India USA UK Australia Author Experience String Book Type Choice Book Title String Contract Start Date String Contract End Date String Author registration form fields Click on Add and Save. After saving the form, new fields are added to the Author Registration form:   Author registration form To create a new form section in the form view and section, click on the New option and add the name:   Create a new section After creating the new Payment Details section, you can add fields under this section: Form Fields Field Type Country String Preferred Payment Currency Currency Bank Name String Branch Name String IFSC Code String Bank Address String Payment Details section fields As an output, you will able to see the following screen:   Author registration Payment Details form section Summary In this article we have learned how Service-Now provides a great platform to developers and also starting a new application, and getting into new modules. Resources for Article: Further resources on this subject: Client and Server Applications [article] Modules and Templates [article] My First Puppet Module [article]
Read more
  • 0
  • 0
  • 1610

article-image-create-your-first-augmented-reality-experience-tools-and-terms-you-need-understand
Andreas Zeitler
06 Feb 2017
5 min read
Save for later

Create Your First Augmented Reality Experience: The Tools and Terms You Need to Understand

Andreas Zeitler
06 Feb 2017
5 min read
This post gives a short summary of the different methods and tools used to create AR experiences with today’s technology. We also outline the advantages and drawbacks of specific solutions. In a follow-up post, we outline how to create your own AR experience.  Augmented Reality (AR) Augmented Reality (AR) is the “it” thing of the moment thanks to Pokémon Go and the Microsoft Hololens. But how can you create your own AR “thing”? The good news is you don’t need to learn a new programming language. Today, there are tools available for most languages used for application development out there. The AR software out there is not packaged in the most user-friendly way, so most of the time, it’s a bit tricky to even get the samples, provided by the software company itself, up and running. The really bad news is that AR is one of those “native-only” features. This means that it cannot yet be achieved using web technologies (JavaScript & HTML5) only, or at least not with any real-world, production-grade performance and fidelity. Consequently, in order to run our AR experience on the intended device (a smartphone, tablet or Windows PC, or Windows tablet with a camera) you need to wrap it into an app, which you need to build yourself. I am including instructions on how to run an app with your own code and AR included for each programming language below.  3D Models First, there’s some more bad news: AR is primarily visual media, so you will need to have great content to create great experiences. “Content” in this case means 3D models optimized for use in real-time rendering on a mobile device. This should not keep you from trying it because you can always get great, free, or open source models in OBJ, FBX, or Collada file format on Turbosquid or the Unity 3D Asset Store. These are the most common 3D file exchange formats, which can be exported by pretty much any software. This will suffice for now.  One more thing before we dive into the code: Markers. Or Triggers. Or Target Images. Or Trackables. You will encounter these terms often. They all refer to the same thing. For a long time now, in order to be able to do AR at all, you needed to “teach” your app a visual pattern to look for in the real world. Once found, the position of the pattern in the real world will provide the frame of reference of positioning the 3D content into the camera picture. To make it easy, this pattern is an image, which you store inside your app either as a plain image or in a special format (depending on the AR framework you use). You can read more about what makes a good AR pattern here on the Vuforia Developer portal (be sure to check out “Natural Features and Image Ratings” and “Local Contrast Enhancement” for more theoretical background). Please note: the augmentation will only work as long as the AR pattern is actually visible to the camera; once it moves outside the field of view, the 3D model vanishes again. That’s a drawback. Using a visual AR pattern to do augmentations can have one benefit, though: in the cases where you know the exact dimensions of the image the AR uses to set the frame of reference (e.g. a magazine cover), you can configure the AR software in a way that the scale of the real world and the scale of the virtual frame of reference in your app match together. A 3D model of a desk, which is 2 feet tall, will then also be 2 feet tall when projected into a room using the magazine as the AR pattern. This enables life-size, 1:1 scale AR projections. Life-size AR projection with fixed-size image pattern  SLAM The opposite of using a pattern for AR is called SLAM. It’s a method of creating a frame of reference for AR without any known patterns within the camera’s field of view. This works reasonably well with today’s software and hardware and has one major benefit: no printed AR marker is needed. The AR can just start; you don’t need to worry about the marker being in the camera’s field of view. So, why is this not the default way of doing it? Firstly, the algorithm is so expensive to run that it drains the battery of smartphones in minutes (well, half an hour, but still). Secondly, it looses “scale” altogether. SLAM-based AR tracking can detect the environment and its horizon (i.e., the plan that is perpendicular to gravity, based on the phone’s gravity sensor), but it cannot detect the environment's scale. If you were to run a SLAM-based augmentation on three different devices in a row, the projected 3D model will likely be a different size every time. The quality of the software you use will make this more or less noticeable. Some frameworks offer hybrid SLAM tracking methods, which are started once a pre-defined pattern is detected. This is a good compromise; it’s called “Extended Tracking” or “Instant Tracking.” If you encounter an option for something like this, enable it. Summary This is pretty much all of the theory you will need to understand before you can create your first AR experience. We will tell you how to do just that in a programming language you already know in a follow-up post.  About the Author Andreas is the Founder and CEO of Vuframe. He’s been working with Augmented & Virtual Reality on a daily basis for the past 8 years. Vuframe’s mission is to democratize AR & VR by removing the tech barrier for everyone.
Read more
  • 0
  • 0
  • 31564

article-image-introduction-microsoft-dynamics-nav-2016
Packt
02 Feb 2017
9 min read
Save for later

Introduction to Microsoft Dynamics NAV 2016

Packt
02 Feb 2017
9 min read
In this article by Alexander Drogin, author of the book, Extending Microsoft Dynamics NAV 2016 Cookbook, we will see the basic introduction to Microsoft Dynamics NAV 2016 and C/AL programming. (For more resources related to this topic, see here.) Microsoft Dynamics NAV 2016 has a rich toolset for extending functionality. Besides, a wide number of external tools can be connected to NAV database to enrich data processing and analysis experience. But C/AL, internal NAV application language, is still the primary means of enhancing user experience when it comes to developing new functionality. C/AL Development is framed around objects representing different kinds of functionality and designers associated with each object type. Basic C/AL programming As an example, Let us create a simple function that receives a checksum as a parameter and verifies if a the check sum satisfies given criteria. It is a typical task for a developer working on an ERP system to implement verification like Luhn algorithm that is widely used to validate identification numbers, such as a credit card number. How to do it... In the Object Designer window, create a new codeunit. Assign a number and name (for example, 50000, Luhn Verification). Click View, then C/AL Globals, in the Globals window open the Functions tab. In the first line of the table enter the function name: SplitNumber. Verification function receives a BigInteger number as an input parameter, but the algorithm works with separate digits. Therefore, before starting the validation we  need to convert the number into an array of digits. Position into the function Split number you just declared and click  View, then C/AL Locals. First tab in the Locals window is Parameters. Enter the first parameter of the function: Name: Digits DataType: Byte Var: Set the checkmark in this field Still in the C/AL Locals window, click View, Properties to open the variable properties and set Dimensions = 11 Close the variable properties window and add the second function parameter AccountNumber with type BigInteger. Parameters window should now look like this: Next, navigate to the Variables tab. Insert a variable i of Integer type. Close the local declarations window to return to the code editor and type the function code FOR i := 11 DOWNTO 1 DO BEGIN Digits[i] := AccountNumber MOD 10; AccountNumber := AccountNumber DIV 10; END; Open the C/AL Globals again and insert the second function VerifyCheckSum. This is the main function that implements the verification algorithm. In the C/AL Locals window, insert a single parameter of this function AccountNumber of type BigInteger. Navigate to the Return Value tab and fill in the Return Type field. In this case type should be Boolean. In the C/AL Locals window, declare three local variables as follows: Name Data Type Digits Byte CheckSum Integer i Integer Select the Digits variable, open its properties and set Dimensions to 11. Type the function code: SplitNumber(Digits,AccountNumber); FOR i := 1 TO 10 DO BEGIN IF i MOD 2 = 1 THEN CheckSum += (Digits[i] * 2) DIV 10 + (Digits[i] * 2) MOD 10; END; CheckSum := 10 - (CheckSum * 9) MOD 10; EXIT(CheckSum = Digits[11]); In the OnRun trigger, place the code that will call the verification function: IF VerifyCheckSum(79927398712) THEN MESSAGE(CorrectCheckSumMsg) ELSE MESSAGE(WrongCheckSumMsg); To present the execution result to the user, OnRun uses two text constants that we have not declared yet. To do it, open C/AL Globals in the View menu. In the Text Constants tab, enter the values as shown in the following table: Name Value CorrectCheckSumMsg Account number has correct checksum WrongCheckSumMsg Account number has wrong checksum How it works... Function SplitNumber described in Step 1 through 8 uses a FOR..DOWNTO loop with a loop control variable to iterate on all digits in the BigInteger number, starting from the last digit. In each step the number is divided by 10 using integer division function DIV. Modulus division function MOD returns the remainder of this division that is placed in the corresponding element of an array. The Dimensions property of the parameter Digits tells that this variable is an array consisting of 11 elements (value of Dimensions is the number of elements. Variable with undefined dimensions is a scalar). When a function is called, it can receive arguments either by value or by reference. Var checkmark in the parameter declaration means that the argument will be passed to the function by reference, and all changes made to the Digits array in the function SplitNumber will be reflected in the function VerifyCheckSum that calls SplitNumber. Arrays cannot be function return values in C/AL, so passing an array variable by reference is the only way to send arrays between functions. Function VerifyCheckSum defined in Step 9 to 13 calls the helper function SplitNumber and then uses the same loop type, but iterates from the first digit to the last (FOR 1 TO 10). This loop computes the checksum, which is compared to the last digit of the account number. If the two values match, checksum is correct. Finally, the function returns the Boolean value conveying the verification result, TRUE or FALSE. Based on this result, the OnRun function in the codeunit will display one of the two text constants in a message. In the given example checksum is incorrect, so the message will look like this: To see the message for correct result, replace the last digit in the account number with 3. Correct number is 79927398713. Accessing the Database in C/AL Microsoft Dynamics NAV is an information system, and its primary purpose is to collect, store, organize and present data. Therefore C/AL has a rich set of functions for data access and manipulation. Next example will present a set of basic functions to read data from the NAV database, filter and search records in a table and calculate aggregated values based on database records. In this example suppose we want to calculate the total amount in all open sales orders and invoices for a certain customer. How to do it... In the NAV Object Designer, create a new codeunit object. Open the codeunit object you just created in code designer, position in the OnRun trigger and open local declarations window (C/AL Locals). Declare local variables: Name DataType Subtype SalesLine Record SalesLine StartingDate Date   EndingDate Date   Close the local variables window and declare a global text constant in the C/AL Globals window: Name ConstValue SalesAmountMsg Total amount in sales documents: %1 Return to the code editor and type the function code: StartingDate := CALCDATE('<-1M>',WORKDATE); EndingDate := WORKDATE; SalesLine.SETRANGE("Sell-to Customer No.",'10000'); SalesLine.SETFILTER( "Document Type",'%1|%2', SalesLine."Document Type"::Order, SalesLine."Document Type"::Invoice); SalesLine.SETRANGE("Posting Date",StartingDate,EndingDate); SalesLine.CALCSUMS("Line Amount"); MESSAGE(SalesAmountMsg,SalesLine."Line Amount"); Save the changes, then close the code editor and run the codeunit. How it works... Record is a complex data type. Variable declared as record refers to a table in the database. Variable contains a single table record and can move forward and backward through the record set. C/AL record resembles an object in object-oriented languages, although they are not exactly the same. You can call record methods and read fields using dot notation. For example following are valid statements with customer record variable: Customer.Name := 'New Customer'; IF Customer.Balance <= 0 THEN MESSAGE Variable we just declared refers to the table Sales Line which stores all open sales documents lines. Since we want to calculate sales amount in a certain period, first of all we need to define the date range for the calculation. First line in the code example finds the starting date of the period. In this calculation we refer to the system-defined global variable WORKDATE. If you are an experienced NAV user, you know what a workdate is – this is the default date for all documents created in the system. It does not always match the calendar date, so in the application code we use workdate as the pivot date. Another system variable TODAY stores the actual calendar date, but it is used much less frequently than workdate. Workdate is the last date of the period we want to analyze. To find the first date, use the CALCDATE function. It calculates a date based on the a formula and a reference date. CALCDATE('<-1M>',WORKDATE) means that the resulting date will be one month earlier than the workdate. In the NAV 9.0 demo database workdate is 25.01.2017, so the result of this CALCDATE will be 25.12.2016. Next line sets a filter on the SalesLine table. Filtering is used in C/AL to search for records corresponding given criteria. There are two functions to apply filters to a table: SETFILTER and SETRANGE. Both take the field name to which the filter is applied, as the first parameter. SETRANGE can filter all values within a given range or a single value. In the code example we use it to filter sales lines where the customer code is '10000'. Then we apply one more filter on Posting Date field to filter out all dates less than StartingDate and greater than EndingDate. Another filter is applied on the Document Type field. SalesLine.SETFILTER( "Document Type",'%1|%2', SalesLine."Document Type"::Order, SalesLine."Document Type"::Invoice); We want to see only invoices and orders in the final result, and we can combine these two values in a filter with the SETFILTER function. '%1|%2' is a combination of two placeholders that will be replaced with actual filter values in the runtime. The last database statement in this example is the CALCSUMS function. SETRANGE itself does not change the state of the record variable – it only prepares filters for the following records search or calculation. Now CALCSUNS will calculate the result based on the record filters. It will find the sum of the Line Amount field in all records within the filtered range. Only sales lines in which all filtering conditions are satisfied, will be taken into account: Customer No is '10000' Document Type is Order or Invoice Posting Date is between 25.12.2016 and 25.01.2017 Finally, we will show the result as a message with the MESSAGE function. Placeholders %1 in the message text will be replaced with the second parameter of the function (SalesLine."Line Amount"): Summary So in this article we covered the introduction of Microsoft Dynamics NAV 2016 and basics of C/AL programming. Resources for Article: Further resources on this subject: Code Analysis and Debugging Tools in Microsoft Dynamics NAV 2009 [article] Introduction to NAV 2017 [article] Exploring Microsoft Dynamics NAV – An Introduction [article]
Read more
  • 0
  • 0
  • 1949
article-image-ble-and-internet-things
Packt
02 Feb 2017
11 min read
Save for later

BLE and the Internet of Things

Packt
02 Feb 2017
11 min read
In this article by Muhammad Usama bin Aftab, the author of the book Building Bluetooth Low Energy (BLE) Systems, this article is a practical guide to the world of Internet of Things (IoT), where readers will not only learn the theoretical concepts of the Internet of Things but also will get a number of practical examples. The purpose of this article is to bridge the gap between the knowledge base and its interpretation. Much literature is available for the understanding of this domain but it is difficult to find something that follows a hands-on approach to the technology. In this article, the readers will get an introduction of Internet of Things with a special focus on Bluetooth Low Energy (BLE). There is no problem justifying that the most important technology for the Internet of Things is Bluetooth Low Energy as it is widely available throughout the world and almost every cell phone user keeps this technology in his pocket. The article will then go beyond Bluetooth Low Energy and will discuss many other technologies available for the Internet of Things. In this article we'll explore the following topics: Introduction to Internet of Things Current statistics about IoT and how we are living in a world which is going towards Machine to Machine (M2M) communication Technologies in IoT (Bluetooth Low Energy, Bluetooth beacons, Bluetooth mesh and wireless gateways and so on) Typical examples of IoT devices (catering wearables, sports gadgets and autonomous vehicles and so on) (For more resources related to this topic, see here.) Internet of Things The Internet is a system of interconnected devices which uses a full stack of protocols over a number of layers. In early 1960, the first packet-switched network ARPANET was introduced by the United States Department of Defense (DOD) which used a variety of protocols. Later, with the invention of TCP/IP protocols the possibilities were infinite. Many standards were evolved over time to facilitate the communication between devices over a network. Application layer protocols, routing layer protocols, access layer protocols, and physical layer protocols were designed to successfully transfer the Internet packets from the source address to the destination address. Security risks were also taken care of during this process and now we live in the world where the Internet is an essential part of our lives. The world had progressed quite afar from ARPANET and the scientific communities had realized that the need of connecting more and more devices was inevitable. Thus came the need of more Internet addresses. The Internet Protocol version 6 (IPv6) was developed to give support to an almost infinite number of devices. It uses 128 bits' address, allowing 2^128 (3.4 e38) devices to successfully transmit packets over the internet. With this powerful addressing mechanism, it was now possible to think beyond the traditional communication over the Internet. The availability of more addresses opened the way to connect more and more devices. Although, there are other limitations in expanding the number of connected devices, addressing scheme opened up significant ways. Modern Day IoT The idea of modern day Internet of Things is not significantly old. In 2013, the perception of the Internet of Things evolved. The reasons being the merger of wireless technologies, increase the range of wireless communication and significant advancement in embedded technology. It was now possible to connect devices, buildings, light bulbs and theoretically any device which has a power source and can be connected wirelessly. The combination of electronics, software, and network connectivity has already shown enough marvels in the computer industry in the last century and Internet of Things is no different. Internet of Things is a network of connected devices that are aware of their surrounding. Those devices are constantly or eventually transferring data to its neighboring devices in order to fulfil certain responsibility. These devices can be automobiles, sensors, lights, solar panels, refrigerators, heart monitoring implants or any day-to-day device. These things have their dedicated software and electronics to support the wireless connectivity. It also implements the protocol stack and the application level programming to achieve the required functionality: An illustration of connected devices in the Internet of Things Real life examples of the Internet of Things Internet of Things is fascinatingly spread in our surroundings and the best way to check it is to go to a shopping mall and turn on your Bluetooth. The devices you will see are merely a drop in the bucket of the Internet of Things. Cars, watches, printers, jackets, cameras, light bulbs, street lights, and other devices that were too simple before are now connected and continuously transferring data. It is to keep in mind that this progress in the Internet of Things is only 3 years old and it is not improbable to expect that the adoption rate of this technology will be something that we have never seen before. Last decade tells us that the increase in the internet users was exponential where it reached the first billion in 2005, second in 2010 and third in 2014. Currently, there are 3.4 billion internet users present in the world. Although this trend looks unrealistic, the adoption rate of the Internet of Things is even more excessive. The reports say that by 2020, there will be 50 billion connected devices in the world and 90 percent of the vehicles will be connected to the Internet. This expansion will bring $19 trillion in profits by the same year. By the end of this year, wearables will become a $6 billion market with 171 million devices sold. As the article suggests, we will discuss different kinds of IoT devices available in the market today. The article will not cover them all, but to an extent where the reader will get an idea about the possibilities in future. The reader will also be able to define and identify the potential candidates for future IoT devices. Wearables The most important and widely recognized form of Internet of Things is wearables. In the traditional definition, wearables can be any item that can be worn. Wearables technology can range from fashion accessories to smart watches. The Apple Watch is a prime example of wearable technology. It contains fitness tracking and health-oriented sensors/apps which work with iOS and other Apple products. A competitor of Apple Watch is Samsung Gear S2 which provides compatibility with Android devices and fitness sensors. Likewise, there are many other manufacturers who are building smart watches including, Motorola, Pebble, Sony, Huawei, Asus, LG and Tag Heuer. These devices are more than just watches as they form a part of the Internet of Things—they can now transfer data, talk to your phone, read your heart rate and connect directly to Wi-Fi. For example, a watch can now keep track of your steps and transfer this information to the cellphone: Fitbit Blaze and Apple Watch The fitness tracker The fitness tracker is another important example of the Internet of Things where the physical activities of an athlete are monitored and maintained. Fitness wearables are not confined to the bands, there are smart shirts that monitor the fitness goals and progress of the athlete. We will discuss two examples of fitness trackers in this article. Fitbit and Athos smart apparel. The Blaze is a new product from Fitbit which resembles a smart watch. Although it resembles a smart watch, it a fitness-first watch targeted at the fitness market. It provides step tracking, sleep monitoring, and 24/7 heart rate monitoring. Some of Fitbit's competitors like Garmin's vívoactive watch provide a built-in GPS capability as well. Athos apparel is another example of a fitness wearable which provides heart rate and EMG sensors. Unlike watch fitness tracker, their sensors are spread across the apparel. The theoretical definition of wearables may include augmented and virtual reality headsets and Bluetooth earphones/headphones in the list. Smart home devices The evolution of the Internet of Things is transforming the way we live our daily lives as people use wearables and other Internet of Things devices in their daily lives. Another growing technology in the field of the Internet of Things is the smart home. Home automation, sometimes referred to as smart homes, results from extending the home by including automated controls to the things like heating, ventilation, lighting, air-conditioning, and security. This concept is fully supported by the Internet of Things which demands the connection of devices in an environment. Although the concept of smart homes has already existed for several decades 1900s, it remained a niche technology that was either too expensive to deploy or with limited capabilities. In the last decade, many smart home devices have been introduced into the market by major technology companies, lowering costs and opening the doors to mass adoption. Amazon Echo A significant development in the world of home automation was the launch of Amazon Echo in late 2014. The Amazon Echo is a voice enabled device that performs tasks just by recognizing voice commands. The device responds to the name Alexa, a key word that can be used to wake up the device and perform an number of tasks. This keyword can be used followed by a command to perform specific tasks. Some basic commands that can be used to fulfil home automation tasks are: Alexa, play some Adele. Alexa, play playlist XYZ. Alexa, turn the bedroom lights on (Bluetooth enabled lights bulbs (for example Philips Hue) should be present in order to fulfil this command). Alexa, turn the heat up to 80 (A connected thermostat should be present to execute this command). Alexa, what is the weather? Alexa, what is my commute? Alexa, play audiobook a Game of Thrones. Alexa, Wikipedia Packt Publishing. Alexa, How many teaspoons are in one cup? Alexa, set a timer for 10 minutes. With these voice commands, Alexa is fully operable: Amazon Echo, Amazon Tap and Amazon Dot (From left to right) Amazon Echo's main connectivity is through Bluetooth and Wi-Fi. Wi-Fi connectivity enables it to connect to the Internet and to other devices present on the network or worldwide. Bluetooth Low Energy, on the other hand, is used to connect to other devices in the home which are Bluetooth Low Energy capable. For example, Philips Hue and Thermostat are controlled through Bluetooth Low Energy. In Google IO 2016, Google announced a competing smart home device that will use Google as a backbone to perform various tasks, similar to Alexa. Google intends to use this device to further increase their presence in the smart home market, challenging Amazon and Alexa. Amazon also launched Amazon Dot and Amazon Tap. Amazon Dot is a smaller version of Echo which does not have speakers. External speakers can be connected to the Dot in order to get full access to Alexa. Amazon Tap is a more affordable, cheaper and wireless version of Amazon Echo. Wireless bulbs The Philips Hue wireless bulb is another example of a smart home device. It is a Bluetooth Low Energy connected light bulb that's give full control to the user through his smartphone. These colored bulbs can display millions of colors and can be also controlled remotely through the away from home feature. The lights are also smart enough to sync with music: Illustration of controlling Philips Hue bulbs with smartphones Smart refrigerators Our discussion of home automation would not be complete incomplete without discussing kitchen and other house electronics, as several major vendors such as Samsung have begun offering smart appliances for a smarter home. The Family Hub refrigerator is a smart fridge that lets you access the Internet and runs applications. It is also categorized in the Internet of Things devices as it is fully connected to the Internet and provides various controls to the users: Samsung Family Hub refrigerator with touch controls Summary In this article we spoke about the Internet of Things technology and how it is rooting in our real lives. The introduction of the Internet of Things discussed wearable devices, autonomous vehicles, smart light bulbs, and portable media streaming devices. Internet of Things technologies like Wireless Local Area Network (WLAN), Mobile Ad-hoc Networks (MANETs) and Zigbee was discussed in order to have a better understanding of the available choices in the IoT. Resources for Article: Further resources on this subject: Get Connected – Bluetooth Basics [article] IoT and Decision Science [article] Building Voice Technology on IoT Projects [article]
Read more
  • 0
  • 0
  • 15404

article-image-introduction-magento-2
Packt
02 Feb 2017
10 min read
Save for later

Introduction to Magento 2

Packt
02 Feb 2017
10 min read
In this article, Gabriel Guarino, the author of the book Magento 2 Beginners Guide discusses, will cover the following topics: Magento as a life style: Magento as a platform and the Magento community Competitors: hosted and self-hosted e-commerce platforms New features in Magento 2 What do you need to get started? (For more resources related to this topic, see here.) Magento as a life style Magento is an open source e-commerce platform. That is the short definition, but I would like to define Magento considering the seven years that I have been part of the Magento ecosystem. In the seven years, Magento has been evolving to the point it is today, a complete solution backed up by people with a passion for e-commerce. If you choose Magento as the platform for your e-commerce website, you will receive updates for the platform on a regular basis. Those updates include new features, improvements, and bug fixes to enhance the overall experience in your website. As a Magento specialist, I can confirm that Magento is a platform that can be customized to fit any requirement. This means that you can add new features, include third-party libraries, and customize the default behavior of Magento. As the saying goes, the only limit is your imagination. Whenever I have to talk about Magento, I always take some time to talk about its community. Sherrie Rohde is the Magento Community Manager and she has shared some really interesting facts about the Magento community in 2016: Delivered over 725 talks on Magento or at Magento-centric events Produced over 100 podcast episodes around Magento Organized and produced conferences and meetup groups in over 34 countries Written over 1000 blog posts about Magento Types of e-commerce solutions There are two types of e-commerce solutions: hosted and self-hosted. We will analyze each e-commerce solution type, and we will cover the general information, pros, and cons of each platform from each category. Self-hosted e-commerce solutions The self-hosted e-commerce solution is a platform that runs on your server, which means that you can download the code, customize it based on your needs, and then implement it in the server that you prefer. Magento is a self-hosted e-commerce solution, which means that you have absolute control on the customization and implementation of your Magento store. WooCommerce WooCommerce is a free shopping cart plugin for WordPress that can be used to create a full-featured e-commerce website. WooCommerce has been created following the same architecture and standards of WordPress, which means that you can customize it with themes and plugins. The plugin currently has more than 18,000,000 downloads, which represents over 39% of all online stores. Pros: It can be downloaded for free Easy setup and configuration A lot of themes available Almost 400 extensions in the marketplace Support through the WooCommerce help desk Cons: WooCommerce cannot be used without WordPress Some essential features are not included out-of-the-box, such us PayPal as a payment method, which means that you need to buy several extensions to add those features Adding custom features to WooCommerce through extensions can be expensive PrestaShop PrestaShop is a free open source e-commerce platform. The platform is currently used by more than 250,000 online stores and is backed by a community of more than 1,000,000 members. The company behind PrestaShop provides a range of paid services, such us technical support, migration, and training to run, manage, and maintain the store. Pros: Free and open source 310 integrated features 3,500 modules and templates in the marketplace Downloaded over 4 million times 63 languages Cons: As WooCommerce, many basic features are not included by default and adding those features through extensions is expensive Multiple bugs and complaints from the PrestaShop community OpenCart OpenCart is an open source platform for e-commerce, available under the GNU General Public License. OpenCart is a good choice for a basic e-commerce website. Pros: Free and open source Easy learning curve More than 13,000 extensions available More than 1,500 themes available Cons: Limited features Not ready for SEO No cache management page in admin panel Hard to customize Hosted e-commerce solutions A hosted e-commerce solution is a platform that runs on the server from the company that provides that service, which means that the solution is easier to set up but there are limitations and you don’t have the freedom to customize the solution according to your needs. The monthly or annual fees increase when the store attracts more traffic and has more customers and orders placed. Shopify Shopify is a cloud-based e-commerce platform for small and medium-sized business. The platform currently powers over 325,000 online stores in approximately 150 countries. Pros: No technical skills required to use the platform Tool to import products from another platform during the sign up process More than 1,500 apps and integrations 24/7 support through phone, chat, and e-mail Cons: The source code is not provided Recurring fee to use the platform Hard to migrate from Shopify to another platform BigCommerce BigCommerce is one of the most popular hosted e-commerce platforms, which is powering more than 95,000 stores in 150 countries. Pros: No technical skills required to use the platform More than 300 apps and integrations available More than 75 themes available Cons: The source code is not provided Recurring fee to use the platform Hard to migrate from BigCommerce to another platform New features in Magento 2 Magento 2 is the new generation of the platform, with new features, technologies, and improvements that make Magento one of the most robust and complete e-commerce solutions available at the moment. In this section, we will describe the main differences between Magento 1 and Magento 2. New technologies Composer: This is a dependency manager for PHP. The dependencies can be declared and Composer will manage these dependencies by installing and updating them. In Magento 2, Composer simplifies the process of installing and upgrading extensions and upgrading Magento. Varnish 4: This is an open source HTTP accelerator. Varnish stores pages and other assets in memory to reduce the response time and network bandwidth consumption. Full Page Caching: In Magento 1, Full Page Caching was only included in the Magento Enterprise Edition. In Magento 2, Full Page Caching is included in all the editions, allowing the content from static pages to be cached, increasing the performance and reducing the server load. Elasticsearch: This is a search engine that improves the search quality in Magento and provides background re-indexing and horizontal scaling. RequireJS: It is a library to load Javascript files on-the-fly, reducing the number of HTTP requests and improving the speed of the Magento Store. jQuery: The frontend in Magento 1 was implemented using Prototype as the language for Javascript. In Magento 2, the language for Javascript code is jQuery. Knockout.js: This is an open source Javascript library that implements the Model-View-ViewModel (MVVM) pattern, providing a great way of creating interactive frontend components. LESS: This is an open source CSS preprocessor that allows the developer to write styles for the store in a more maintainable and extendable way. Magento UI Library: This is a modular frontend library that uses a set of mix-ins for general elements and allows developers to work more efficiently on frontend tasks. New tools Magento Performance Toolkit: This is a tool that allows merchants and developers to test the performance of the Magento installation and customizations. Magento 2 command-line tool: This is a tool to run a set of commands in the Magento installation to clear the cache, re-index the store, create database backups, enable maintenance mode, and more. Data Migration Tool: This tool allows developers to migrate the existing data from Magento 1.x to Magento 2. The tool includes verification, progress tracking, logging, and testing functions. Code Migration Toolkit: This allows developers to migrate Magento 1.x extensions and customizations to Magento 2. Manual verification and updates are required in order to make the Magento 1.x extensions compatible with Magento 2. Magento 2 Developer Documentation: One of the complaints by the Magento community was that Magento 1 didn’t have enough documentation for developers. In order to resolve this problem, the Magento team created the official Magento 2 Developer Documentation with information for developers, system administrators, designers, and QA specialists. Admin panel changes Better UI: The admin panel has a new look-and-feel, which is more intuitive and easier to use. In addition to that, the admin panel is now responsive and can be viewed from any device in any resolution. Inline editing: The admin panel grids allow inline editing to manage data in a more effective way. Step-by-step product creation: The product add/edit page is one of the most important pages in the admin panel. The Magento team worked hard to create a different experience when it comes to adding/editing products in the Magento admin panel, and the result is that you can manage products with a step-by-step page that includes the fields and import tools separated in different sections. Frontend changes Integrated video in product page: Magento 2 allows uploading a video for the product, introducing a new way of displaying products in the catalog. Simplified checkout: The steps in the checkout page have been reduced to allow customers to place orders in less time, increasing the conversion rate of the Magento store. Register section removed from checkout page: In Magento 1, the customer had the opportunity to register from step 1 of the checkout page. This required the customer to think about his account and the password before completing the order. In order to make the checkout simpler, Magento 2 allows the customer to register from the order success page without delaying the checkout process. What do you need to get started? Magento is a really powerful platform and there is always something new to learn. Just when you think you know everything about Magento, a new version is released with new features to discover. This makes Magento fun, and this makes Magento unique as an e-commerce platform. That being said, this book will be your guide to discover everything you need to know to implement, manage, and maintain your first Magento store. In addition to that, I would like to highlight additional resources that will be useful in your journey of mastering Magento: Official Magento Blog (https://magento.com/blog): Get the latest news from the Magento team: best practices, customer stories, information related to events, and general Magento news Magento Resources Library (https://magento.com/resources): Videos, webinars and publications covering useful information organized by categories: order management, marketing and merchandising, international expansion, customer experience, mobile architecture and technology, performance and scalability, security, payments and fraud, retail innovation, and business flexibility Magento Release Information (http://devdocs.magento.com/guides/v2.1/release-notes/bk-release-notes.html): This is the place where you will get all the information about the latest Magento releases, including the highlights of each release, security enhancements, information about known issues, new features, and instructions for upgrade Magento Security Center (https://magento.com/security): Information about each of the Magento security patches as well as best practices and guidelines to keep your Magento store secure Upcoming Events and Webinars (https://magento.com/events): The official list of upcoming Magento events, including live events and webinars Official Magento Forums (https://community.magento.com): Get feedback from the Magento community in the official Magento Forums Summary In this article, we reviewed Magento 2 and the changes that have been introduced in the new version of the platform. We also analyzed the types of e-commerce solutions and the most important platforms available. Resources for Article: Further resources on this subject: Installing Magento [article] Magento : Payment and shipping method [article] Magento 2 – the New E-commerce Era [article]
Read more
  • 0
  • 0
  • 36770

article-image-what-azure-api-management
Packt
01 Feb 2017
15 min read
Save for later

What is Azure API Management?

Packt
01 Feb 2017
15 min read
In this article by Martin Abbott, Ashish Bhambhani, James Corbould, Gautam Gyanendra, Abhishek Kumar, and Mahindra Morar, authors of the book Robust Cloud Integration with Azure, we learn that it is important to know how to control and manage API assets that exist or are built as part of any enterprise development. (For more resources related to this topic, see here.) Typically, modern APIs are used to achieve one of the following two outcomes: First, to expose the on-premises line of business applications, such as Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) solutions to other applications that need to consume and interact with these enterprise assets both on-premises and in the cloud Second, to provide access to the API for commercial purposes to monetize access to the assets exposed by the API The latter use case is important as it allows organizations to extend the use of their API investment, and it has led to what has become known as the API economy. The API economy provides a mechanism to gain additional value from data contained within the organizational boundary whether that data exists in the cloud or on-premises. When providing access to information via an API, two considerations are important: Compliance: This ensures that an access to the API and the use of the API meets requirements around internal or legal policies and procedures, and it provides reporting and auditing information Governance: This ensures the API is accessed and used only by those authorized to do so, and in a way, that is controlled and if necessary metered, and provides reporting and auditing information, which can be used, for example, to provide usage information for billing In order to achieve this at scale in an organization, a tool is required that can be used to apply both compliance and governance structures to an exposed endpoint. This is required to ensure that the usage of the information behind that endpoint is limited only to those who should be allowed access and only in a way that meets the requirements and policies of the organization. This is where API Management plays a significant role. There are two main types of tools that fit within the landscape that broadly fall under the banner of API Management: API Management: These tools provide the compliance and governance control required to ensure that the exposed API is used appropriately and data presented in the correct format. For example, a message may be received in the XML format, but the consuming service may need the data in the JSON format. They can also provide monitoring tools and access control that allows organizations to gain insight into the use of the API, perhaps with the view to charge a fee for access. API Gateway: These tools provide the same or similar level of management as normal API Management tools, but often include other functionality that allows some message mediation and message orchestration thereby allowing more complex interactions and business processes to be modeled, exposed, and governed. Microsoft Azure API Management falls under the first category above whilst Logic Apps, provide the capabilities (and more) that API Gateways offer. Another important aspect of providing management of APIs is creating documentation that can be used by consumers, so they know how to interact with and get the best out of the API. For APIs, generally, it is not a case of build it and they will come, so some form of documentation that includes endpoint and operation information, along with sample code, can lead to greater uptake of usage of the API. Azure API Management is currently offered in three tiers: Developer, Standard, and Premium. The details associated with these tiers at the time of writing are shown in the following table:   Developer Standard Premium API Calls (per unit) 32 K / day ( ~1 M / month ) 7 M / day ( ~217 M / month ) 32 M / day ( ~1 B / month ) Data Transfer (per unit) 161 MB / day ( ~5 GB / month ) 32 GB / day ( ~1 TB / month ) 161 GB / day ( ~5 TB / month ) Cache 10 MB 1 GB 5 GB Scale-out N/A 4 units Unlimited SLA N/A 99.9% 99.95% Multi-Region Deployment N/A N/A Yes Azure Active Directory Integration Unlimited user accounts N/A Unlimited user accounts VPN Yes N/A Yes Key items of note in the table are Scale-out, multiregion deployment, and Azure Active Directory Integration. Scale-out: This defines how many instances, or units, of the API instance are possible; this is configured through the Azure Classic Portal Multi-region deployment: When using Premium tier, it is possible to deploy the API Management instance to many locations to provided geographically distributed load Azure Active Directory Integration: If an organization synchronizes an on-premises Active Directory domain to Azure, access to the API endpoints can be configured to use Azure Active Directory to provide same sign-on capabilities The main use case for Premium tier is if an organization has many hundreds or even thousands of APIs they want to expose to developers, or in cases where scale and integration with line of business APIs is critical. The anatomy of Azure API Management To understand how to get the best out of an API, it is important to understand some terms that are used for APIs and within Azure API Management, and these are described here. API and operations An API provides an abstraction layer through an endpoint that allows interaction with entities or processes that would otherwise be difficult to consume. Most API developers favor using a RESTful approach to API applications since this allows us easy understanding on how to work with the operations that the API exposes and provides scalability, modifiability, reliability, and performance. Representational State Transfer (REST) is an architectural style that was introduced by Roy Fielding in his doctoral thesis in 2000 (http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm). Typically, modern APIs are exposed using HTTP since this makes it easier for different types of clients to interact with it, and this increased interoperability provides the greatest opportunity to offer additional value and greater adoption across different technology stacks. When building an API, a set of methods or operations is exposed that a user can interact with in a predictable way. While RESTful services do not have to use HTTP as a transfer method, nearly all modern APIs do, since the HTTP standard is well known to most developers, and it is simple and straightforward to use. Since the operations are called via HTTP, a distinct endpoint or Unified Resource Identifier (URI) is required to ensure sufficient modularity of the API service. When calling an endpoint, which may for example represent, an entity in a line of business system, HTTP verbs (GET, POST, PUT, and DELETE, for example) are used to provide a standard way of interacting with the object. An example of how these verbs are used by a developer to interact with an entity is given in the following table: TYPE GET POST PUT DELETE Collection Retrieve a list of entities and their URIs Create a new entity in the collection Replace (update) a collection Delete the entire collection Entity Retrieve a specific entity and its information usually in a particular data format Create a new entity in the collection, not generally used Replace (update) an entity in the collection, or if it does not exist, create it Delete a specific entity from a collection When passing data to and receiving data from an API operation, the data needs to be encapsulated in a specific format. When services and entities were exposed through SOAP-based services, this data format was typically XML. For modern APIs, JavaScript Object Notation (JSON) has become the norm. JSON has become the format of choice since it has a smaller payload than XML and a smaller processing overhead, which suits the limited needs of mobile devices (often running on battery power). JavaScript (as the acronym JSON implies) also has good support for processing and generating JSON, and this suits developers, who can leverage existing toolsets and knowledge. API operations should abstract small amounts of work to be efficient, and in order to provide scalability, they should be stateless, and they can be scaled independently. Furthermore, PUT and DELETE operations must be created that ensure consistent state regardless of how many times the specific operation is performed, this leads to the need of those operations being idempotent. Idempotency describes an operation that when performed multiple times produces the same result on the object that is being operated on. This is an important concept in computing, particularly, where you cannot guarantee that an operation will only be performed once, such as with interactions over the Internet. Another outcome of using a URI to expose entities is that the operation is easily modified and versioned because any new version can simply be made available on a different URI, and because HTTP is used as a transport mechanism, endpoint calls can be cached to provide better performance and HTTP Headers can be used to provide additional information, for example security. By default, when an instance of API Management is provisioned, it has a single API already available named Echo API. This has the following operations: Creating resource Modifying resource Removing resource Retrieving header only Retrieving resource Retrieving resource (cached) In order to get some understanding of how objects are connected, this API can be used, and some information is given in the next section. Objects within API Management Within Azure API Management, there are a number of key objects that help define a structure and provide the governance, compliance, and security artifacts required to get the best out of a deployed API, as shown in the following diagram: As can be seen, the most important object is a PRODUCT. A product has a title and description and is used to define a set of APIs that are exposed to developers for consumption. They can be Open or Protected, with an Open product being publicly available and a Protected product requiring a subscription once published. Groups provide a mechanism to organize the visibility of and access to the APIs within a product to the development community wishing to consume the exposed APIs. By default, a product has three standard groups that cannot be deleted: Administrators: Subscription administrators are included by default, and the members of this group manage API services instances, API creation, API policies, operations, and products Developers: The members of this group have authenticated access to the Developer Portal; they are the developers who have chosen to build applications that consume APIs exposed as a specific product Guests: Guests are able to browse products through the Developer Portal and examine documentation, and they have read-only access to information about the products In addition to these built-in groups, it is possible to create new groups as required, including the use of groups within an Azure Active Directory tenant. When a new instance of API Management is provisioned, it has the following two products already configured: Starter: This product limits subscribers to a maximum of five calls per minute up to a maximum of 100 calls per week Unlimited: This product has no limits on use, but subscribers can only use it with the administrator approval Both of these products are protected, meaning that they need to be subscribed to and published. They can be used to help gain some understanding of how the objects within API Management interact. These products are configured with a number of sample policies that can be used to provide a starting point. Azure API Management policies API Management policies are the mechanism used to provide governance structures around the API. They can define, for instance, the number of call requests allowed within a period, cross-origin resource sharing (CORS), or certificate authentication to a service backend. Policies are defined using XML and can be stored in source control to provide active management. Policies are discussed in greater detail later in the article. Working with Azure API Management Azure API Management is the outcome of the acquisition by Microsoft of Apiphany, and as such it has its own management interfaces. Therefore, it has a slightly different look and feel to the standard Azure Portal content. The Developer and Publisher Portals are described in detail in this section, but first a new instance of API Management is required. Once created and the provisioning in the Azure infrastructure can take some time, most interactions take place through the Developer and Publisher Portals. Policies in Azure API Management In order to provide control over interactions with Products or APIs in Azure API Management, policies are used. Policies make it possible to change the default behavior of an API in the Product, for example, to meet the governance needs of your company or Product, and are a series of statements executed sequentially on each request or response of an API. Three demo scenarios will provide a "taster" of this powerful feature of Azure API Management. How to use Policies in Azure API Management Policies are created and managed through the Publisher Portal. The first step in policy creation is to determine at what scope the policy should be applied. Policies can be assigned to all Products, individual Products, the individual APIs associated with a Product, and finally the individual operations associated with an API. Secure your API in Azure API Management We have previously discussed how it is possible to organize APIs in Products with those products further refined through the use of Policies. Access to and visibility of products is controlled through the use of Groups and developer subscriptions for those APIs requiring subscriptions. In most enterprise scenarios where you are providing access to some line of business system on-premises, it is necessary to provide sufficient security on the API endpoint to ensure that the solution remains compliant. There are a number of ways to achieve this level of security using Azure API Management, such as using certificates, Azure Active Directory, or extending the corporate network into Microsoft Azure using a Virtual Private Network (VPN), and creating a hybrid cloud solution. Securing your API backend with Mutual Certificates Certificate exchange allows Azure API Management and an API to create a trust boundary based on encryption that is well understood and easy to use. In this scenario, because Azure API Management is communicating with an API that has been provided, a self-signed certificate is allowed as the key exchange for the certificate is via a trusted party. For an in-depth discussion on how to configure Mutual Certificate authentication to secure your API, please refer to the Azure API Management documentation (https://azure.microsoft.com/en-us/documentation/articles/api-management-howto-mutual-certificates/). Securing your API backend with Azure Active Directory If an enterprise already uses Azure Active Directory to provide single or same sign-on to cloud-based services, for instance, on-premises Active Directory synchronization via ADConnect or DirSync, then this provides a good opportunity to leverage Azure Active Directory to provide a security and trust boundary to on-premises API solutions. For an in-depth discussion on how to add Azure Active Directory to an API Management instance, please see the Azure API Management documentation (https://azure.microsoft.com/en-us/documentation/articles/api-management-howto-protect-backend-with-aad/). VPN connection in Azure API Management Another way of providing a security boundary between Azure API Management and the API is managing the creation of a virtual private network. A VPN creates a tunnel between the corporate network edge and Azure, essentially creating a hybrid cloud solution. Azure API Management supports site-to-site VPNs, and these are created using the Classic Portal. If an organization already has an ExpressRoute circuit provisioned, this can also be used to provide connectivity via private peering. Because a VPN needs to communicate to on-premises assets, a number of firewall port exclusions need to be created to ensure the traffic can flow between the Azure API Management instance and the API endpoint. Monitoring your API Any application tool is only as good as the insight you can gain from the operation of the tool. Azure API Management is no exception and provides a number of ways of getting information about how the APIs are being used and are performing. Summary API Management can be used to provide developer access to key information in your organization, information that could be sensitive, or that needs to be limited in use. Through the use of Products, Policies, and Security, it is possible to ensure that firm control is maintained over the API estate. The developer experience can be tailored to provide a virtual storefront to any APIs along with information and blogs to help drive deeper developer engagement. Although not discussed in this article, it is also possible for developers to publish their own applications to the API Management instance for other developers to use. Resources for Article: Further resources on this subject: Creating Multitenant Applications in Azure [article] Building A Recommendation System with Azure [article] Putting Your Database at the Heart of Azure Solutions [article]
Read more
  • 0
  • 0
  • 28974
article-image-building-search-geo-locator-elasticsearch-and-spark
Packt
31 Jan 2017
12 min read
Save for later

Building A Search Geo Locator with Elasticsearch and Spark

Packt
31 Jan 2017
12 min read
In this article, Alberto Paro, the author of the book Elasticsearch 5.x Cookbook - Third Edition discusses how to use and manage Elasticsearch covering topics as installation/setup, mapping management, indices management, queries, aggregations/analytics, scripting, building custom plugins, and integration with Python, Java, Scala and some big data tools such as Apache Spark and Apache Pig. (For more resources related to this topic, see here.) Background Elasticsearch is a common answer for every needs of search on data and with its aggregation framework, it can provides analytics in real-time. Elasticsearch was one of the first software that was able to bring the search in BigData world. It’s cloud native design, JSON as standard format for both data and search, and its HTTP based approach are only the solid bases of this product. Elasticsearch solves a growing list of search, log analysis, and analytics challenges across virtually every industry. It’s used by big companies such as Linkedin, Wikipedia, Cisco, Ebay, Facebook, and many others (source https://www.elastic.co/use-cases). In this article, we will show how to easily build a simple search geolocator with Elasticsearch using Apache Spark for ingestion. Objective In this article, they will develop a search geolocator application using the world geonames database. To make this happen the following steps will be covered: Data collection Optimized Index creation Ingestion via Apache Spark Searching for a location name Searching for a city given a location position Executing some analytics on the dataset. All the article code is available on GitHub at https://github.com/aparo/elasticsearch-geonames-locator. All the below commands need to be executed in the code directory on Linux/MacOS X. The requirements are a local Elasticsearch Server instance, a working local Spark installation and SBT installed (http://www.scala-sbt.org/) . Data collection To populate our application we need a database of geo locations. One of the most famous and used dataset is the GeoNames geographical database, that is available for download free of charge under a creative commons attribution license. It contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places and 5.5 million alternate names. It can be easily downloaded from http://download.geonames.org/export/dump. The dump directory provided CSV divided in counties and but in our case we’ll take the dump with all the countries allCountries.zip file To download the code we can use wget via: wget http://download.geonames.org/export/dump/allCountries.zip Then we need to unzip it and put in downloads folder: unzip allCountries.zip mv allCountries.txt downloads The Geoname dump has the following fields: No. Attribute name Explanation 1 geonameid Unique ID for this geoname 2 name The name of the geoname 3 asciiname ASCII representation of the name 4 alternatenames Other forms of this name. Generally in several languages 5 latitude Latitude in decimal degrees of the Geoname 6 longitude Longitude in decimal degrees of the Geoname 7 fclass Feature class see http://www.geonames.org/export/codes.html 8 fcode Feature code see http://www.geonames.org/export/codes.html 9 country ISO-3166 2-letter country code 10 cc2 Alternate country codes, comma separated, ISO-3166 2-letter country code 11 admin1 Fipscode (subject to change to iso code 12 admin2 Code for the second administrative division, a county in the US 13 admin3 Code for third level administrative division 14 admin4 Code for fourth level administrative division 15 population The population of Geoname 16 elevation The elevation in meters of Geoname 17 gtopo30 Digital elevation model 18 timezone The timezone of Geoname 19 moddate The date of last change of this Geoname Table 1: Dataset characteristics Optimized Index creation Elasticsearch provides automatic schema inference for your data, but the inferred schema is not the best possible. Often you need to tune it for: Removing not-required fields Managing Geo fields. Optimizing string fields that are index twice in their tokenized and keyword version. Given the Geoname dataset, we will add a new field location that is a GeoPoint that we will use in geo searches. Another important optimization for indexing, it’s define the correct number of shards. In this case we have only 11M records, so using only 2 shards is enough. The settings for creating our optimized index with mapping and shards is the following one: { "mappings": { "geoname": { "properties": { "admin1": { "type": "keyword", "ignore_above": 256 }, "admin2": { "type": "keyword", "ignore_above": 256 }, "admin3": { "type": "keyword", "ignore_above": 256 }, "admin4": { "type": "keyword", "ignore_above": 256 }, "alternatenames": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "asciiname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "cc2": { "type": "keyword", "ignore_above": 256 }, "country": { "type": "keyword", "ignore_above": 256 }, "elevation": { "type": "long" }, "fclass": { "type": "keyword", "ignore_above": 256 }, "fcode": { "type": "keyword", "ignore_above": 256 }, "geonameid": { "type": "long" }, "gtopo30": { "type": "long" }, "latitude": { "type": "float" }, "location": { "type": "geo_point" }, "longitude": { "type": "float" }, "moddate": { "type": "date" }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "population": { "type": "long" }, "timezone": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } }, "settings": { "index": { "number_of_shards": "2", "number_of_replicas": "1" } } } We can store the above JSON in a file called settings.json and we can create an index via the curl command: curl -XPUT http://localhost:9200/geonames -d @settings.json Now our index is created and ready to receive our documents. Ingestion via Apache Spark Apache Spark is very hardy for processing CSV and manipulate the data before saving it in a storage both disk or NoSQL. Elasticsearch provides easy integration with Apache Spark allowing write Spark RDD with a single command in Elasticsearch. We will build a spark job called GeonameIngester that will execute the following steps: Initialize the Spark Job Parse the CSV Defining our required structures and conversions Populating our classes Writing the RDD in Elasticsearch Executing the Spark Job Initialize the Spark Job We need to import required classes: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ import org.elasticsearch.spark.rdd.EsSpark import scala.util.Try We define the GeonameIngester object and the SparkSession: object GeonameIngester { def main(args: Array[String]) { val sparkSession = SparkSession.builder .master("local") .appName("GeonameIngester") .getOrCreate() To easy serialize complex datatypes, we switch to use the Kryo encoder: import scala.reflect.ClassTag implicit def kryoEncoder[A](implicit ct: ClassTag[A]) = org.apache.spark.sql.Encoders.kryo[A](ct) import sparkSession.implicits._ Parse the CSV For parsing the CSV, we need to define the Geoname schema to be used to read: val geonameSchema = StructType(Array( StructField("geonameid", IntegerType, false), StructField("name", StringType, false), StructField("asciiname", StringType, true), StructField("alternatenames", StringType, true), StructField("latitude", FloatType, true), StructField("longitude", FloatType, true), StructField("fclass", StringType, true), StructField("fcode", StringType, true), StructField("country", StringType, true), StructField("cc2", StringType, true), StructField("admin1", StringType, true), StructField("admin2", StringType, true), StructField("admin3", StringType, true), StructField("admin4", StringType, true), StructField("population", DoubleType, true), // Asia population overflows Integer StructField("elevation", IntegerType, true), StructField("gtopo30", IntegerType, true), StructField("timezone", StringType, true), StructField("moddate", DateType, true))) Now we can read all the geonames from CSV via: val GEONAME_PATH = "downloads/allCountries.txt" val geonames = sparkSession.sqlContext.read .option("header", false) .option("quote", "") .option("delimiter", "t") .option("maxColumns", 22) .schema(geonameSchema) .csv(GEONAME_PATH) .cache() Defining our required structures and conversions The plain CSV data is not suitable for our advanced requirements, so we define new classes to store our Geoname data. We define a GeoPoint object to store the Geo Point location of our geoname. case class GeoPoint(lat: Double, lon: Double) We define also our Geoname class with optional and list types: case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String], latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String, cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4: Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate: String) To reduce the boilerplate of the conversion we define an implicit method that convert a String in an Option[String] if it is empty or null. implicit def emptyToOption(value: String): Option[String] = { if (value == null) return None val clean = value.trim if (clean.isEmpty) { None } else { Some(clean) } } During processing, in case of the population value is null we need a function to fix this value and set it to 0: to do this we define a function to fixNullInt: def fixNullInt(value: Any): Int = { if (value == null) 0 else { Try(value.asInstanceOf[Int]).toOption.getOrElse(0) } } Populating our classes We can populate the records that we need to store in Elasticsearch via a map on geonames DataFrame. val records = geonames.map { row => val id = row.getInt(0) val lat = row.getFloat(4) val lon = row.getFloat(5) Geoname(id, row.getString(1), row.getString(2), Option(row.getString(3)).map(_.split(",").map(_.trim).filterNot(_.isEmpty).toList).getOrElse(Nil), lat, lon, GeoPoint(lat, lon), row.getString(6), row.getString(7), row.getString(8), row.getString(9), row.getString(10), row.getString(11), row.getString(12), row.getString(13), row.getDouble(14), fixNullInt(row.get(15)), row.getInt(16), row.getString(17), row.getDate(18).toString ) } Writing the RDD in Elasticsearch The final step is to store our new build DataFrame records in Elasticsearch via: EsSpark.saveToEs(records.toJavaRDD, "geonames/geoname", Map("es.mapping.id" -> "geonameid")) The value “geonames/geoname” are the index/type to be used for store the records in Elasticsearch. To maintain the same ID of the geonames in both CSV and Elasticsearch we pass an additional parameter es.mapping.id that refers to where find the id to be used in Elasticsearch geonameid in the above example. Executing the Spark Job To execute a Spark job you need to build a Jar with all the required library and than to execute it on spark. The first step is done via sbt assembly command that will generate a fatJar with only the required libraries. To submit the Spark Job in the jar, we can use the spark-submit command: spark-submit --class GeonameIngester target/scala-2.11/elasticsearch-geonames-locator-assembly-1.0.jar Now you need to wait (about 20 minutes on my machine) that Spark will send all the documents to Elasticsearch and that they are indexed. Searching for a location name After having indexed all the geonames, you can search for them. In case we want search for Moscow, we need a complex query because: City in geonames are entities with fclass=”P” We want skip not populated cities We sort by population descendent to have first the most populated The city name can be in name, alternatenames or asciiname field To achieve this kind of query in Elasticsearch we can use a simple Boolean with several should queries for match the names and some filter to filter out unwanted results. We can execute it via curl via: curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{ "query": { "bool": { "minimum_should_match": 1, "should": [ { "term": { "name": "moscow"}}, { "term": { "alternatenames": "moscow"}}, { "term": { "asciiname": "moscow" }} ], "filter": [ { "term": { "fclass": "P" }}, { "range": { "population": {"gt": 0}}} ] } }, "sort": [ { "population": { "order": "desc"}}] }' We used “moscow” lowercase because it’s the standard token generate for a tokenized string (Elasticsearch text type). The result will be similar to this one: { "took": 14, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 9, "max_score": null, "hits": [ { "_index": "geonames", "_type": "geoname", "_id": "524901", "_score": null, "_source": { "name": "Moscow", "location": { "lat": 55.752220153808594, "lon": 37.61555862426758 }, "latitude": 55.75222, "population": 10381222, "moddate": "2016-04-13", "timezone": "Europe/Moscow", "alternatenames": [ "Gorad Maskva", "MOW", "Maeskuy", .... ], "country": "RU", "admin1": "48", "longitude": 37.61556, "admin3": null, "gtopo30": 144, "asciiname": "Moscow", "admin4": null, "elevation": 0, "admin2": null, "fcode": "PPLC", "fclass": "P", "geonameid": 524901, "cc2": null }, "sort": [ 10381222 ] }, Searching for cities given a location position We have processed the geoname so that in Elasticsearch, we were able to have a GeoPoint field. Elasticsearch GeoPoint field allows to enable search for a lot of geolocation queries. One of the most common search is to find cities near me via a Geo Distance Query. This can be achieved modifying the above search in curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{ "query": { "bool": { "filter": [ { "geo_distance" : { "distance" : "100km", "location" : { "lat" : 55.7522201, "lon" : 36.6155586 } } }, { "term": { "fclass": "P" }}, { "range": { "population": {"gt": 0}}} ] } }, "sort": [ { "population": { "order": "desc"}}] }' Executing an analytic on the dataset. Having indexed all the geonames, we can check the completes of our dataset and executing analytics on them. For example, it’s useful to check how many geonames there are for a single country and the feature class for every single top country to evaluate their distribution. This can be easily achieved using an Elasticsearch aggregation in a single query: curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d ' { "size": 0, "aggs": { "geoname_by_country": { "terms": { "field": "country", "size": 5 }, "aggs": { "feature_by_country": { "terms": { "field": "fclass", "size": 5 } } } } } }’ The result can be will be something similar: { "took": 477, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 11301974, "max_score": 0, "hits": [ ] }, "aggregations": { "geoname_by_country": { "doc_count_error_upper_bound": 113415, "sum_other_doc_count": 6787106, "buckets": [ { "key": "US", "doc_count": 2229464, "feature_by_country": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 82076, "buckets": [ { "key": "S", "doc_count": 1140332 }, { "key": "H", "doc_count": 506875 }, { "key": "T", "doc_count": 225276 }, { "key": "P", "doc_count": 192697 }, { "key": "L", "doc_count": 79544 } ] } },…truncated… These are simple examples how to easy index and search data with Elasticsearch. Integrating Elasticsearch with Apache Spark it’s very trivial: the core of part is to design your index and your data model to efficiently use it. After having correct indexed your data to cover your use case, Elasticsearch is able to provides your result or analytics in few microseconds. Summary In this article, we learned how to easily build a simple search geolocator with Elasticsearch using Apache Spark for ingestion. Resources for Article: Further resources on this subject: Basic Operations of Elasticsearch [article] Extending ElasticSearch with Scripting [article] Integrating Elasticsearch with the Hadoop ecosystem [article]
Read more
  • 0
  • 0
  • 6282

article-image-working-nav-and-azure-app-service
Packt
30 Jan 2017
12 min read
Save for later

Working with NAV and Azure App Service

Packt
30 Jan 2017
12 min read
In this article by Stefano Demiliani, the author of the book Building ERP Solutions with Microsoft Dynamics NAV, we will experience the quick solution to solve complex technical architectural scenarios and create external applications within a blink of an eye using Microsoft Dynamics NAV. (For more resources related to this topic, see here.) The business scenario Imagine a requirement where many Microsoft Dynamics NAV instances (physically located at different places around the world) have to interact with an external application. A typical scenario could be a headquarter of a big enterprise company that has a business application (called HQAPP) that must collect data about item shipments from the ERP of the subsidiary companies around the world (Microsoft Dynamics NAV): The cloud could help us to efficiently handle this scenario. Why not place the interface layer in the Azure Cloud and use the scalability features that Azure could offer? Azure App Service could be the solution to this. We can implement an architecture like the following schema: Here, the interface layer is placed on Azure App Service. Every NAV instance has the business logic (in our scenario, a query to retrieve the desired data) exposed as an NAV Web Service. The NAV instance can have an Azure VPN in place for security. HQAPP performs a request to the interface layer in Azure App Service with the correct parameters. The cloud service then redirects the request to the correct NAV instance and retrieves the data, which in turn is forwarded to HQAPP. Azure App Service can be scaled (manually or automatically) based on the resources requested to perform the data retrieval process. Azure App Service overview Azure App Service is a PaaS service for building scalable web and mobile apps and enabling interaction with on-premises or on-cloud data. With Azure App Service, you can deploy your application to the cloud and you can quickly scale your application to handle high traffic loads and manage traffic and application availability without interacting with the underlying infrastructure. This is the main difference with Azure VM, where you can run a web application on the cloud but in a IaaS environment (you control the infrastructure like OS, configuration, installed services, and so on). Some key features of Azure App Service are as follows: Support for many languages and frameworks Global scale with high availability (scaling up and out manually or automatically) Security Visual Studio integration for creating, deploying and debugging applications Application templates and connectors available Azure App Service offers different types of resources for running a workload, which are as follows: Web Apps: This hosts websites and web applications Mobile Apps: This hosts mobile app backends API Apps: This hosts RESTful APIs Logic Apps: This automates business processes across the cloud Azure App Service has the following different service plans where you can scale from depending on your requirements in terms of resources: Free: This is ideal for testing and development, no custom domains or SSL are required, you can deploy up to 10 applications. Shared: This has a fixed per-hour charge. This is ideal for testing and development, supports for custom domains and SSL, you can deploy up to 100 applications. Basic: This has a per-hour charge based on the number of instances. It runs on a dedicated instance. This is ideal for low traffic requirements, you can deploy an unlimited number of apps. It supports only a single SSL certificate per plan (not ideal if you need to connect to an Azure VPN or use deployment slots). Standard: This has a per-hour charge based on the number of instances. This provides full SSL support. This provides up to 10 instances with auto-scaling, automated backups, up to five deployment slots, ideal for production environments. Premium: This has per-hour charge based on the number of instances. This provides up to 50 instances with auto-scaling, up to 20 deployment slots, different daily backups, dedicated App Service Environment. Ideal for enterprise scale and integration. Regarding the application deployment, Azure App Service supports the concept of Deployment Slot (only on the Standard and Premium tiers). Deployment Slot is a feature that permits you to have a separate instance of an application that runs on the same VM but is isolated from the other deployment slots and production slots active in the App Service. Always remember that all Deployment Slots share the same VM instance and the same server resources. Developing the solution Our solution is essentially composed of two parts: The NAV business logic The interface layer (cloud service) The following steps will help you retrieve the required data from an external application: In the NAV instances of the subsidiary companies, we need to retrieve the sales shipment's data for every item. To do so, we need to create a Query object that reads Sales Shipment Header and Sales Shipment Line and exposes them as web services (OData).The Query object will be designed as follows: For every Sales Shipment Header web service, we retrieve the corresponding Sales Shipment Lines web service that have Type as DataItem: I've changed the name of the field No. in Sales Shipment Line in DataItem as ItemNo because the default name was in used in Sales Shipment Header in DataItem. Compile and save the Query object (here, I've used Object ID as 50009 and Service Name as Item Shipments). Now, we will publish the Query object as web service in NAV, so open the Web Services page and create the following entries:    Object Type: Query    Object ID: 50009    Service Name: Item Shipments    Published: TRUE When published, NAV returns the OData service URL. This Query object must be published as web service on every NAV instances in the subsidiary companies. To develop our interface layer, we need first to download and install (if not present) the Azure SDK for Visual Studio from https://azure.microsoft.com/en-us/downloads/. After that, we can create a new Azure Cloud Service project by opening Visual Studio and navigate to File | New | Project, select the Cloud templates, and choose Azure Cloud Service. Select the project's name (here, it is NAVAzureCloudService) and click on OK. After clicking on OK, Visual Studio asks you to select a service type. Select WCF Service Web Role, as shown in the following screenshot: Visual Studio now creates a template for our solution. Now right-click on the NAVAzureCloudService project and select New Web Role Project, and in the Add New .NET Framework Role Project window, select WCF Service Web Role and give it a proper name (here, we have named it WCFServiceWebRoleNAV): Then, rename Service1.svc with a better name (here, it is NAVService.svc). Our WCF Service Web Role must have the reference to all the NAV web service URLs for the various NAV instances in our scenario and (if we want to use impersonation) the credentials to access the relative NAV instance. You can right-click the WCFServiceWebRoleNAV project, select Properties and then the Settings tab. Here you can add the URL for the various NAV instances and the relative web service credentials. Let's start writing our service code. We create a class called SalesShipment that defines our data model as follows: public class SalesShipment { public string No { get; set; } public string CustomerNo { get; set; } public string ItemNo { get; set; } public string Description { get; set; } public string Description2 { get; set; } public string UoM { get; set; } public decimal? Quantity { get; set; } public DateTime? ShipmentDate { get; set; } } In next step, we have to define our service contract (interface). Our service will have a single method to retrieve shipments for a NAV instance and with a shipment date filter. The service contract will be defined as follows: public interface INAVService { [OperationContract] [WebInvoke(Method = "GET", ResponseFormat = WebMessageFormat.Xml, BodyStyle = WebMessageBodyStyle.Wrapped, UriTemplate = "getShipments?instance={NAVInstanceName}&date={shipmentDateFilter}"] List<SalesShipment> GetShipments(string NAVInstanceName, string shipmentDateFilter); //Date format parameter: YYYY-MM-DD } The WCF service definition will implement the previously defined interface as follows: public class NAVService : INAVService { } The GetShipments method is implemented as follows: public List<SalesShipment> GetShipments(string NAVInstanceName, string shipmentDateFilter) { try { DataAccessLayer.DataAccessLayer DAL = new DataAccessLayer.DataAccessLayer(); List<SalesShipment> list = DAL.GetNAVShipments(NAVInstanceName, shipmentDateFilter); return list; } catch(Exception ex) { // You can handle exceptions here… throw ex; } } This method creates an instance of a DataAccessLayer class (which we will discuss in detail later) and calls a method called GetNAVShipments by passing the NAV instance name and ShipmentDateFilter. To call the NAV business logic, we need to have a reference to the NAV OData web service (only to generate a proxy class, the real service URL will be dynamically called by code) so right-click on your project (WCFServiceWebRoleNAV) and navigate to Add | Service Reference. In the Add Service Reference window, paste the OData URL that comes from NAV and when the service is discovered, give it a reference name (here, it is NAVODATAWS). Visual Studio automatically adds a service reference to your project. The DataAccessLayer class will be responsible for handling calls to the NAV OData web service. This class defines a method called GetNAVShipments with the following two parameters: NAVInstanceName: This is the name of the NAV instance to call shipmentDateFilter: This filters date for the NAV shipment lines (greater than or equal to) According to NAVInstanceName, the method retrieves from the web.config file (appSettings) the correct NAV OData URL and credentials, calls the NAV query (by passing filters), and retrieves the data as a list of SalesShipment records (our data model). The DataAccessLayer class is defined as follows: public List<SalesShipment> GetNAVShipments(string NAVInstanceName, string shipmentDateFilter) { try { string URL = Properties.Settings.Default[NAVInstanceName].ToString(); string WS_User = Properties.Settings.Default[NAVInstanceName + "_User"].ToString(); string WS_Pwd = Properties.Settings.Default[NAVInstanceName + "_Pwd"].ToString(); string WS_Domain = Properties.Settings.Default[NAVInstanceName + "_Domain"].ToString(); DataServiceContext context = new DataServiceContext(new Uri(URL)); NAVODATAWS.NAV NAV = new NAVODATAWS.NAV(new Uri(URL)); NAV.Credentials = new System.Net.NetworkCredential(WS_User, WS_Pwd, WS_Domain); DataServiceQuery<NAVODATAWS.ItemShipments> q = NAV.CreateQuery<NAVODATAWS.ItemShipments>("ItemShipments"); if (shipmentDateFilter != null) { string FilterValue = string.Format("Shipment_Date ge datetime'{0}'", shipmentDateFilter); q = q.AddQueryOption("$filter", FilterValue); } List<NAVODATAWS.ItemShipments> list = q.Execute().ToList(); List<SalesShipment> sslist = new List<SalesShipment>(); foreach (NAVODATAWS.ItemShipments shpt in list) { SalesShipment ss = new SalesShipment(); ss.No = shpt.No; ss.CustomerNo = shpt.Sell_to_Customer_No; ss.ItemNo = shpt.ItemNo; ss.Description = shpt.Description; ss.Description2 = shpt.Description_2; ss.UoM = shpt.Unit_of_Measure; ss.Quantity = shpt.Quantity; ss.ShipmentDate = shpt.Shipment_Date; sslist.Add(ss); } return sslist; } catch (Exception ex) { throw ex; } } The method returns a list of the SalesShipment objects. It creates an instance of the NAV OData web service, applies the OData filter to the NAV query, reads the results, and loads the list of the SalesShipment objects. Deployment to Azure App Service Now that your service is ready, you have to deploy it to the Azure App Service by performing the following steps: Right-click on the NAVAzureCloudService project and select Package… as shown in the following screenshot: In the Package Azure Application window, select Service configuration as Cloud and Build configuration as Release, and then click on Package as shown in the following screenshot: This operation creates two files in the <YourProjectName>binReleaseapp.publish folder as shown in the following screenshot: These are the packages that must be deployed to Azure. To do so, you have to log in to the Azure Portal and navigate to Cloud Services | Add from the hub menu at the left. In the next window, set the following cloud service parameters: DNS name: This depicts name of your cloud service (yourname.cloudapp.net) Subscription: This is the Azure Subscription where the cloud service will be added Resource group: This creates a new resource group for your cloud service or use existing one Location: This is the Azure location where the cloud service is to be added Finally, you can click on the Create button to create your cloud service. Now, deploy the previously created cloud packages to your cloud service that was just created. In the cloud services list, click on NAVAZureCloudService, and in the next window, select the desired slot (for example, Production slot) and click on Upload as shown in the following screenshot: In the Upload a package window, provide the following parameters: Storage account: This is a previously created storage account for your subscription Deployment label: This is the name of your deployment Package: Select the .cspkg file previously created for your cloud service Configuration: Select the .cspkg file previously created for your cloud service configuration You can take a look at the preceding parameters in the following screenshot: Select the Start deployment checkbox and click on the OK button at the bottom to start the deployment process to Azure. Now you can start your cloud service and manage it (swap, scaling, and so on) directly from the Azure Portal: When running, you can use your deployed service by reaching this URL: http://navazurecloudservice.cloudapp.net/NAVService.svc This is the URL that the HQAPP in our business scenario has to call for retrieving data from the various NAV instances of the subsidiary companies around the world. In this way, you have deployed a service to the cloud, you can manage the resources in a central way (via the Azure Portal), and you can easily have different environments by using slots. Summary In this article, you learned to enable NAV instances placed at different locations to interact with an external application through Azure App Service and also the features that it provides. Resources for Article: Further resources on this subject: Introduction to NAV 2017 [article] Code Analysis and Debugging Tools in Microsoft Dynamics NAV 2009 [article] Exploring Microsoft Dynamics NAV – An Introduction [article]
Read more
  • 0
  • 0
  • 3234
Modal Close icon
Modal Close icon