How-To Tutorials

18 Jul 2017

15 min read

Up and Running with pandas

18 Jul 2017

In this article by Michael Heydt, author of the book Learning Pandas - Second Edition, we will cover how to install pandas and start using its basic functionality. The content is provided as IPython and Jupyter notebooks, and hence we will also take a quick look at using both of those tools. We will utilize the Anaconda Scientific Python distribution from Continuum. Anaconda is a popular Python distribution with both free and paid components. Anaconda provides cross-platform support—including Windows, Mac, and Linux. The base distribution of Anaconda installs pandas, IPython and Jupyter notebooks, thereby making it almost trivial to get started. (For more resources related to this topic, see here.) IPython and Jupyter Notebook So far we have executed Python from the command line / terminal. This is the default Read-Eval-Print-Loop (REPL)that comes with Python. Let’s take a brief look at both. IPython IPython is an alternate shell for interactively working with Python. It provides several enhancements to the default REPL provided by Python. If you want to learn about IPython in more detail check out the documentation at https://ipython.org/ipython-doc/3/interactive/tutorial.html To start IPython, simply execute the ipython command from the command line/terminal. When started you will see something like the following: The input prompt which shows In [1]:. Each time you enter a statement in the IPythonREPL the number in the prompt will increase. Likewise, output from any particular entry you make will be prefaced with Out [x]:where x matches the number of the corresponding In [x]:.The following demonstrates: This numbering of in and out statements will be important to the examples as all examples will be prefaced with In [x]:and Out [x]: so that you can follow along. Note that these numberings are purely sequential. If you are following through the code in the text and occur errors in input or enter additional statements, the numbering may get out of sequence (they can be reset by exiting and restarting IPython). Please use them purely as reference. Jupyter Notebook Jupyter Notebook is the evolution of IPython notebook. It is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and markdown. The original IPython notebook was constrained to Python as the only language. Jupyter Notebook has evolved to allow many programming languages to be used including Python, R, Julia, Scala, and F#. If you want to take a deeper look at Jupyter Notebook head to the following URL http://jupyter.org/: where you will be presented with a page similar to the following: Jupyter Notebookcan be downloaded and used independently of Python. Anaconda does installs it by default. To start a Jupyter Notebookissue the following command at the command line/Terminal: $ jupyter notebook To demonstrate, let's look at how to run the example code that comes with the text. Download the code from the Packt website and and unzip the file to a directory of your choosing. In the directory, you will see the following contents: Now issue the jupyter notebook command. You should see something similar to the following: A browser page will open displaying the Jupyter Notebook homepage which is http://localhost:8888/tree. This will open a web browser window showing this page, which will be a directory listing: Clicking on a .ipynb link opens a notebook page like the following: The notebook that is displayed is HTML that was generated by Jupyter and IPython. It consists of a number of cells that can be one of 4 types: code, markdown, raw nbconvert or heading. Jupyter runs an IPython kernel for each notebook. Cells that contain Python code are executed within that kernel and the results added to the notebook as HTML. Double-clicking on any of the cells will make the cell editable. When done editing the contents of a cell, press shift-return, at which point Jupyter/IPython will evaluate the contents and display the results. If you wan to learn more about the notebook format that underlies the pages, see https://ipython.org/ipython-doc/3/notebook/nbformat.html. The toolbar at the top of a notebook gives you a number of abilities to manipulate the notebook. These include adding, removing, and moving cells up and down in the notebook. Also available are commands to run cells, rerun cells, and restart the underlying IPython kernel. To create a new notebook, go to File > New Notebook > Python 3 menu item. A new notebook page will be created in a new browser tab. Its name will be Untitled: The notebook consists of a single code cell that is ready to have Python entered. EnterIPython1+1 in the cell and press Shift + Enter to execute. The cell has been executed and the result shown as Out [2]:. Jupyter also opened a new cellfor you to enter more code or markdown. Jupyter Notebook automatically saves your changes every minute, but it's still a good thing to save manually every once and a while. One final point before we look at a little bit of pandas. Code in the text will be in the format of command -line IPython. As an example, the cell we just created in our notebook will be shown as follows: In [1]: 1+1 Out [1]: 2 Introducing the pandas Series and DataFrame Let’s jump into using some pandas with a brief introduction to pandas two main data structures, the Series and the DataFrame. We will examine the following: Importing pandas into your application Creating and manipulating a pandas Series Creating and manipulating a pandas DataFrame Loading data from a file into a DataFrame The pandas Series The pandas Series is the base data structure of pandas. A Series is similar to a NumPy array, but it differs by having an index which allows for much richer lookup of items instead of just a zero-based array index value. The following creates a Series from a Python list.: In [2]: # create a four item Series s = Series([1, 2, 3, 4]) s Out [2]:0 1 1 2 2 3 3 4 dtype: int64 The output consists of two columns of information. The first is the index and the second is the data in the Series. Each row of the output represents the index label (in the first column) and then the value associated with that label. Because this Series was created without specifying an index (something we will do next), pandas automatically creates an integer index with labels starting at 0 and increasing by one for each data item. The values of a Series object can then be accessed through the using the [] operator, paspassing the label for the value you require. The following gets the value for the label 1: In [3]:s[1] Out [3]: 2 This looks very much like normal array access in many programming languages. But as we will see, the index does not have to start at 0, nor increment by one, and can be many other data types than just an integer. This ability to associate flexible indexes in this manner is one of the great superpowers of pandas. Multiple items can be retrieved by specifying their labels in a python list. The following retrieves the values at labels 1 and 3: In [4]: # return a Series with the row with labels 1 and 3 s[[1, 3]] Out [4]: 1 2 3 4 dtype: int64 A Series object can be created with a user-defined index by using the index parameter and specifying the index labels. The following creates a Series with the same valuesbut with an index consisting of string values: In [5]:# create a series using an explicit index s = pd.Series([1, 2, 3, 4], index = ['a', 'b', 'c', 'd']) s Out [5}:a 1 b 2 c 3 d 4 dtype: int64 Data in the Series object can now be accessed by those alphanumeric index labels. The following retrieves the values at index labels 'a' and 'd': In [6]:# look up items the series having index 'a' and 'd' s[['a', 'd']] Out [6]:a 1 d 4 dtype: int64 It is still possible to refer to the elements of this Series object by their numerical 0-based position. : In [7]:# passing a list of integers to a Series that has # non-integer index labels will look up based upon # 0-based index like an array s[[1, 2]] Out [7]:b 2 c 3 dtype: int64 We can examine the index of a Series using the .index property: In [8]:# get only the index of the Series s.index Out [8]: Index(['a', 'b', 'c', 'd'], dtype='object') The index is itself actually a pandas objectand this output shows us the values of the index and the data type used for the index. In this case note that the type of the data in the index (referred to as the dtype) is object and not string. A common usage of a Series in pandas is to represent a time-series that associates date/time index labels withvalues. The following demonstrates by creating a date range can using the pandas function pd.date_range(): In [9]:# create a Series who's index is a series of dates # between the two specified dates (inclusive) dates = pd.date_range('2016-04-01', '2016-04-06') dates Out [9]:DatetimeIndex(['2016-04-01', '2016-04-02', '2016- 04-03', '2016-04-04', '2016-04-05', '2016-04-06'], dtype='datetime64[ns]', freq='D') This has created a special index in pandas referred to as a DatetimeIndex which is a specialized type of pandas index that is optimized to index data with dates and times. Now let's create a Series using this index. The data values represent high-temperatures on those specific days: In [10]:# create a Series with values (representing temperatures) # for each date in the index temps1 = Series([80, 82, 85, 90, 83, 87], index = dates) temps1 Out [10]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, dtype: int64 This type of Series with a DateTimeIndexis referred to as a time-series. We can look up a temperature on a specific data by using the date as a string: In [11]:temps1['2016-04-04'] Out [11]:90 Two Series objects can be applied to each other with an arithmetic operation. The following code creates a second Series and calculates the difference in temperature between the two: In [12]:# create a second series of values using the same index temps2 = Series([70, 75, 69, 83, 79, 77], index = dates) # the following aligns the two by their index values # and calculates the difference at those matching labels temp_diffs = temps1 - temps2 temp_diffs Out [12]:2016-04-01 10 2016-04-02 7 2016-04-03 16 2016-04-04 7 2016-04-05 4 2016-04-06 10 Freq: D, dtype: int64 The result of an arithmetic operation (+, -, /, *, …) on two Series objects that are non-scalar values returns another Series object. Since the index is not integerwe can also then look up values by 0-based value: In [13]:# and also possible by integer position as if the # series was an array temp_diffs[2] Out [13]: 16 Finally, pandas provides many descriptive statistical methods. As an example, the following returns the mean of the temperature differences: In [14]: # calculate the mean of the values in the Serie temp_diffs Out [14]: 9.0 The pandas DataFrame A pandas Series can only have a single value associated with each index label. To have multiple values per index label we can use a DataFrame. A DataFrame represents one or more Series objects aligned by index label. Each Series will be a column in the DataFrame, and each column can have an associated name.— In a way, a DataFrame is analogous to a relational database table in that it contains one or more columns of data of heterogeneous types (but a single type for all items in each respective column). The following creates a DataFrame object with two columns and uses the temperature Series objects: In [15]:# create a DataFrame from the two series objects temp1 # and temp2 and give them column names temps_df = DataFrame( {'Missoula': temps1, 'Philadelphia': temps2}) temps_df Out [15]: Missoula Philadelphia 2016-04-01 80 70 2016-04-02 82 75 2016-04-03 85 69 2016-04-04 90 83 2016-04-05 83 79 2016-04-06 87 77 The resulting DataFrame has two columns named Missoula and Philadelphia. These columns are new Series objects contained within the DataFrame with the values copied from the original Series objects. Columns in a DataFrame object can be accessed using an array indexer [] with the name of the column or a list of column names. The following code retrieves the Missoula column: In [16]:# get the column with the name Missoula temps_df['Missoula'] Out [16]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, Name: Missoula, dtype: int64 And the following code retrieves the Philadelphia column: In [17]:# likewise we can get just the Philadelphia column temps_df['Philadelphia'] Out [17]:2016-04-01 70 2016-04-02 75 2016-04-03 69 2016-04-04 83 2016-04-05 79 2016-04-06 77 Freq: D, Name: Philadelphia, dtype: int64 A Python list of column names can also be used to return multiple columns.: In [18]:# return both columns in a different order temps_df[['Philadelphia', 'Missoula']] Out [18]: Philadelphia Missoula 2016-04-01 70 80 2016-04-02 75 82 2016-04-03 69 85 2016-04-04 83 90 2016-04-05 79 83 2016-04-06 77 87 There is a subtle difference in a DataFrame object as compared to a Series object. Passing a list to the [] operator of DataFrame retrieves the specified columns whereas aSerieswould return rows. If the name of a column does not have spaces it can be accessed using property-style: In [19]:# retrieve the Missoula column through property syntax temps_df.Missoula Out [19]:2016-04-01 80 2016-04-02 82 2016-04-03 85 2016-04-04 90 2016-04-05 83 2016-04-06 87 Freq: D, Name: Missoula, dtype: int64 Arithmetic operations between columns within a DataFrame are identical in operation to those on multiple Series. To demonstrate, the following code calculates the difference between temperatures using property notation: In [20]:# calculate the temperature difference between the two #cities temps_df.Missoula - temps_df.Philadelphia Out [20]:2016-04-01 10 2016-04-02 7 2016-04-03 16 2016-04-04 7 2016-04-05 4 2016-04-06 10 Freq: D, dtype: int64 A new column can be added to DataFrame simply by assigning another Series to a column using the array indexer [] notation. The following adds a new column in the DataFrame with the temperature differences: In [21]:# add a column to temp_df which contains the difference # in temps temps_df['Difference'] = temp_diffs temps_df Out [21]: Missoula Philadelphia Difference 2016-04-01 80 70 10 2016-04-02 82 75 7 2016-04-03 85 69 16 2016-04-04 90 83 7 2016-04-05 83 79 4 2016-04-06 87 77 10 The names of the columns in a DataFrame are accessible via the.columns property. : In [22]:# get the columns, which is also an Index object temps_df.columns Out [22]:Index(['Missoula', 'Philadelphia', 'Difference'], dtype='object') The DataFrame (and Series) objects can be sliced to retrieve specific rows. The following slices the second through fourth rows of temperature difference values: In [23]:# slice the temp differences column for the rows at # location 1 through 4 (as though it is an array) temps_df.Difference[1:4] Out [23]: 2016-04-02 7 2016-04-03 16 2016-04-04 7 Freq: D, Name: Difference, dtype: int64 Entire rows from a DataFrame can be retrieved using the .loc and .iloc properties. .loc ensures that the lookup is by index label, where .iloc uses the 0-based position. - The following retrieves the second row of the DataFrame. In[24]:# get the row at array position 1 temps_df.iloc[1] Out [24]:Missoula 82 Philadelphia 75 Difference 7 Name: 2016-04-02 00:00:00, dtype: int64 Notice that this result has converted the row into a Series with the column names of the DataFrame pivoted into the index labels of the resulting Series.The following shows the resulting index of the result. In [25]:# the names of the columns have become the index # they have been 'pivoted' temps_df.iloc[1].index Out [25]:Index(['Missoula', 'Philadelphia', 'Difference'], dtype='object') Rows can be explicitly accessed via index label using the .loc property. The following code retrieves a row by the index label: In [26]:# retrieve row by index label using .loc temps_df.loc['2016-04-05'] Out [26]:Missoula 83 Philadelphia 79 Difference 4 Name: 2016-04-05 00:00:00, dtype: int64 Specific rows in a DataFrame object can be selected using a list of integer positions. The following selects the values from the Differencecolumn in rows at integer-locations 1, 3, and 5: In [27]:# get the values in the Differences column in rows 1, 3 # and 5using 0-based location temps_df.iloc[[1, 3, 5]].Difference Out [27]:2016-04-02 7 2016-04-04 7 2016-04-06 10 Freq: 2D, Name: Difference, dtype: int64 Rows of a DataFrame can be selected based upon a logical expression that is applied to the data in each row. The following shows values in the Missoulacolumn that are greater than 82 degrees: In [28]:# which values in the Missoula column are > 82? temps_df.Missoula > 82 Out [28]:2016-04-01 False 2016-04-02 False 2016-04-03 True 2016-04-04 True 2016-04-05 True 2016-04-06 True Freq: D, Name: Missoula, dtype: bool The results from an expression can then be applied to the[] operator of a DataFrame (and a Series) which results in only the rows where the expression evaluated to Truebeing returned: In [29]:# return the rows where the temps for Missoula > 82 temps_df[temps_df.Missoula > 82] Out [29]: Missoula Philadelphia Difference 2016-04-03 85 69 16 2016-04-04 90 83 7 2016-04-05 83 79 4 2016-04-06 87 77 10 This technique is referred to as boolean election in pandas terminologyand will form the basis of selecting rows based upon values in specific columns (like a query in —SQL using a WHERE clause - but as we will see also being much more powerful). Visualization We will dive into visualization in quite some depth in Chapter 14, Visualization, but prior to then we will occasionally perform a quick visualization of data in pandas. Creating a visualization of data is quite simple with pandas. All that needs to be done is to call the .plot() method. The following demonstrates by plotting the Close value of the stock data. In [40]: df[['Close']].plot(); Summary In this article, took an introductory look at the pandas Series and DataFrame objects, demonstrating some of the fundamental capabilities. This exposition showed you how to perform a few basic operations that you can use to get up and running with pandas prior to diving in and learning all the details. Resources for Article: Further resources on this subject: Using indexes to manipulate pandas objects [article] Predicting Sports Winners with Decision Trees and pandas [article] The pandas Data Structures [article]

0
0
3151

Packt

18 Jul 2017

20 min read

Machine Learning Review

Packt

18 Jul 2017

20 min read

0
0
17953

Packt

17 Jul 2017

13 min read

Introduction to Moodle 3

Packt

17 Jul 2017

13 min read

0
0
15137

How-To Tutorials

article-image-initial-configuration-sco-2016

Packt

17 Jul 2017

13 min read

Initial Configuration of SCO 2016

Packt

17 Jul 2017

13 min read

In this article by Michael Seidl, author of the book Microsoft System Center 2016 Orchestrator Cookbook - Second Edition, will show you how to setup Orchestrator Environment and how to deploy and configure Orchestrator Integration Packs. (For more resources related to this topic, see here.) Deploying an additional Runbook designer Runbook designer is the key feature to build your Runbooks. After the initial installation, Runbook designer is installed on the server. For your daily work with orchestrator and Runbooks, you would like to install the Runbook designer on your client or on admin server. We will go through these steps in this recipe. Getting ready You must review the planning the Orchestrator deployment recipe before performing the steps in this recipe. There are a number of dependencies in the planning recipe you must perform in order to successfully complete the tasks in this recipe. You must install a management server before you can install the additional Runbook Designers. The user account performing the installation has administrative privileges on the server nominated for the SCO deployment and must also be a member of OrchestratorUsersGroup or equivalent rights. The example deployment in this recipe is based on the following configuration details: Management server called TLSCO01 with a remote database is already installed System Center 2016 Orchestrator How to do it... The Runbook designer is used to build Runbooks using standard activities and or integration pack activities. The designer can be installed on either a server class operating system or a client class operating system. Follow these steps to deploy an additional Runbook Designer using the deployment manager: Install a supported operating system and join the active directory domain in scope of the SCO deployment. In this recipe the operating system is Windows 10. Ensure you configure the allowed ports and services if the local firewall is enabled for the domain profile. See the following link for details: https://technet.microsoft.com/en-us/library/hh420382(v=sc.12).aspx. Log in to the SCO Management server with a user account with SCO administrative rights. Launch System Center 2016 Orchestrator Deployment Manager: Right-click on Runbook designers, and select Deploy new Runbook Designer: Click on Next on the welcome page. Type the computer name in the Computer field and click on Add. Click on Next. On the Deploy Integration Packs or Hotfixes page check all the integration packs required by the user of the Runbook designer (for this example we will select the AD IP). Click on Next. Click on Finish to begin the installation using the Deployment Manager. How it works... The Deployment Manager is a great option for scaling out your Runbook Servers and also for distributing the Runbook Designer without the need for the installation media. In both cases the Deployment Manager connects to the Management Server and the database server to configure the necessary settings. On the target system the deployment manager installs the required binaries and optionally deploys the integration packs selected. Using the Deployment Manager provides a consistent and coordinated approach to scaling out the components of a SCO deployment. See also The following official web link is a great source of the most up to date information on SCO: https://docs.microsoft.com/en-us/system-center/orchestrator/ Registering an SCO Integration Pack Microsoft System Center 2016 Orchestrator (SCO) automation is driven by process automation components. These process automation components are similar in concept to a physical toolbox. In a toolbox you typically have different types of tools which enable you to build what you desire. In the context of SCO these tools are known as Activities. Activities fall into two main categories: Built-in Standard Activities: These are the default activity categories available to you in the Runbook Designer. The standard activities on their own provide you with a set of components to create very powerful Runbooks. Integration Pack Activities: Integration Pack Activities are provided either by Microsoft, the community, solution integration organizations, or are custom created by using the Orchestrator Integration Pack Toolkit. These activities provide you with the Runbook components to interface with the target environment of the IP. For example, the Active Directory IP has the activities you can perform in the target Active Directory environment. This recipe provides the steps to find and register the second type of activities into your default implementation of SCO. Getting ready You must download the Integration Pack(s) you plan to deploy from the provider of the IP. In this example we will be deploying the Active Directory IP, which can be found at the following link: https://www.microsoft.com/en-us/download/details.aspx?id=54098. You must have deployed a System Center 2016 Orchestrator environment and have full administrative rights in the environment. How to do it... The following diagram provides a visual summary and order of the tasks you need to perform to complete this recipe: We will deploy the Microsoft Active Directory (AD) integration pack (IP). Integration pack organization A good practice is to create a folder structure for your integration packs. The folders should reflect versions of the IPs for logical grouping and management. The version of the IP will be visible in the console and as such you must perform this step after you have performed the step to load the IP(s). This approach will aid in change management when updating IPs in multiple environments. Follow these steps to deploy the Active Directory integration pack. Identify the source location for the Integration Pack in scope (for example, the AD IP for SCO2016). Download the IP to a local directory on the Management Server or UNC share. Log in to the SCO Management server. Launch the Deployment Manager: Under Orchestrator Management Server, right-click on Integration Packs. Select Register IP with the Orchestrator Management Server: Click on Next on the welcome page. Click on Add on the Select Integration Packs or Hotfixes page. Navigate to the directory where the target IP is located, click on Open, and then click on Next. Click on Finish . Click on Accept on End-User License Agreement to complete the registration. Click on Refresh to validate if the IP has successfully been registered. How it works... The process of loading an integration pack is simple. The prerequisite for successfully registering the IP (loading) is ensuring you have downloaded a supported IP to a location accessible to the SCO management server. Additionally the person performing the registration must be a SCO administrator. At this point we have registered the Integration Pack to our Deployment Wizard, 2 Steps are still necessary before we can use the Integration Pack, see our following Recipe for this. There's more... Registering the IP is the first part of the process of making the IP activities available to Runbook designers and Runbook Servers. The next Step has to be the Deployment of Integration Packs to Runbook Designer. See the next Recipe for that. Orchestrator Integration Packs are provided not only by Microsoft, also third party Companies like Cisco or NetAPP are providing OIP’s for their Products. Additionally there is a huge Community which are providing Orchestrator Integration Packs. There are several Sources of downloading Integration Packs, here are two useful links: http://www.techguy.at/liste-mit-integration-packs-fuer-system-center-orchestrator/ http://scorch.codeplex.com/ https://www.microsoft.com/en-us/download/details.aspx?id=54098 Deploying the IP to designers and Runbook servers Registering the Orchestrator Integration Pack is only the first step, you also need to deploy the OIP to your Designer or Runbook Server. Getting Ready You have to follow the steps described in Recipe Registering an SCO Integration Pack before you can start with the next steps to deploy an OIP. How to do it In our example we will deploy the Active Direcgtory Integration Pack to our Runbooks Desginer. Follow these steps to deploy the Active Directory integration pack. Once the IP in scope (AD IP in our example) has successfully been registered, follow these steps to deploy it to the Runbook Designers and Runbook Servers. Log in to the SCO Management server and launch Deployment Manager: Under Orchestrator Management Server, right-click on the Integration Pack in scope and select Deploy IP to Runbook Server or Runbook Designer: Click on Next on the welcome page, select the IP you would like to deploy (in our example, System Center Integration Pack for Active Directory , and then click on Next. On the computer Selection page. Type the name of the Runbook Server or designer in scope and click on Add (repeat for all servers in the scope). On the Installation Options page you have the following three options: Schedule the Installation: select this option if you want to schedule the deployment for a specific time. You still have to select one of the next two options. Stop all running Runbooks before installing the Integration Packs or Hotfixes: This option will as described stop all current Runbooks in the environment. Install the Integration Packs or Hotfixes without stopping the running Runbooks: This is the preferred option if you want to have a controlled deployment without impacting current jobs: Click on Next after making your installation option selection. Click on Finish The integration pack will be deployed to all selected designers and Runbook servers. You must close all Runbook designer consoles and re-launch to see the newly deployed Integration Pack. How it works… The process of deploying an integration pack is simple. The pre-requisite for successfully deploying the IP (loading) is ensuring you have registered a supported IP in the SCO management server. Now we have successfully deployed an Orchestrator Integration Pack. If you have deployed it to a Runbook designer, make sure you close and reopen the designer to be able to use the activities in this Integration Pack. Now your are able to use these activities to build your Runbooks, the only thing you have to do, is to follow our next recipe and configure this Integration Pack. This steps can be used for each single Integration Pack, also deploy multiple OIP with one deployment. There’s more… You have to deploy an OIP to every single Designer and Runbook Server, where you want to work with the Activities. Doesn’t matter if you want to edit a Runbook with the Designer or want to run a Runbook on a special Runbook Server, the OIP has to be deployed to both. With Orchestrator Deployment Manager, this is a easy task to do. Initial Integration Pack configuration This recipe provides the steps required to configure an integration pack for use once it has been successfully deployed to a Runbook designer. Getting ready You must deploy an Orchestrator environment and also deploy the IP you plan to configure to a Runbook designer before following the steps in this recipe. The authors assume the user account performing the installation has administrative privileges on the server nominated for the SCO Runbook designer. How to do it... Each integration pack serves as an interface to the actions SCO can perform in the target environment. In our example we will be focusing on the Active Directory connector. We will have two accounts under two categories of AD tasks in our scenario: IP name Category of actions Account name Active Directory Domain Account Management SCOAD_ACCMGT Active Directory Domain Administrator Management SCOAD_DOMADMIN The following diagram provides a visual summary and order of the tasks you need to perform to complete this recipe: Follow these steps to complete the configuration of the Active Directory IP options in the Runbook Designer: Create or identify an existing account for the IP tasks. In our example we are using two accounts to represent two personas of a typical active directory delegation model. SCOAD_ACCMGT is an account with the rights to perform account management tasks only and SCOAD_DOMADMIN is a domain admin account for elevated tasks in Active Directory. Launch the Runbook Designer as a SCO administrator, select Options from the menu bar, and select the IP to configure (in our example, Active Directory). Click on Add, type AD Account Management in the Name: field, select Microsoft Active Directory Domain Configuration in the Type field by clicking on the. In the Properties section type the following: Configuration User Name: SCOAD_ACCMGT Configuration Password: Enter the password for SCOAD_ACCMGT Configuration Domain Controller Name (FQDN): The FQDN of an accessible domain controller in the target AD (In this example, TLDC01.TRUSTLAB.LOCAL). Configuration Default Parent Container: This is an optional field. Leave it blank: Click on OK. Repeat steps 3 and 4 for the Domain Admin account and click on Finish to complete the configuration. How it works... The IP configuration is unique for each system environment SCO interfaces with for the tasks in scope of automation. The active directory IP configuration grants SCO the rights to perform the actions specified in the Runbook using the activities of the IP. Typical Active Directory activities include, but are not limited to creating user and computer accounts, moving user and computer accounts into organizational units, or deleting user and computer accounts. In our example we created two connection account configurations for the following reasons: Follow the guidance of scoping automation to the rights of the manual processes. If we use the example of a Runbook for creating user accounts we do not need domain admin access. A service desk user performing the same action manually would typically be granted only account management rights in AD. We have more flexibility with delegating management and access to Runbooks. Runbooks with elevated rights through the connection configuration can be separated from Runbooks with lower rights using folder security. The configuration requires planning and understanding of its implication before implementing. Each IP has its own unique options which you must specify before you create Runbooks using the specified IP. The default IPs that you can download from Microsoft include the documentation on the properties you must set. There’s more… As you have seen in this recipe, we need to configure each additional Integration Pack with a Connections String, User and Password. The built in Activities from SCO, are using the Service Account rights to perform this Actions, or you can configure a different User for most of the built in Activities. See also The official online documentation for Microsoft Integration Packs is updated regularly and should be a point for reference at https://www.microsoft.com/en-us/download/details.aspx?id=54098 The creating and maintaining a security model for Orchestrator in this article expands further on the delegation model in SCO. Summary In this article, we have covered the following: Deploying an Additional Runbook Designer Registering an SCO Integration Pack Deploying an SCO Integration Pack to Runbook Designer and Server Initial Integration Pack Configuration Resources for Article: Further resources on this subject: Deploying the Orchestrator Appliance [article] Unpacking System Center 2012 Orchestrator [article] Planning a Compliance Program in Microsoft System Center 2012 [article]

0
0
35877

Packt

17 Jul 2017

11 min read

Introduction to IOT

Packt

17 Jul 2017

11 min read

In this article by Kallol Bosu Roy Choudhuri, the author of the book Learn Arduino Prototyping in 10 days, we will learn about IoT. (For more resources related to this topic, see here.) As per Gartner, the number of connected devices around the world is going to reach 50 billion by the year 2020. Just imagine the magnitude and scale of the hyper-connectedness that is being forged every moment, as we read through this exciting article. Figure 1: A typical IoT scenario (automobile example) As we can see in the preceding figure, a typical IoT-based scenario is composed of the following fundamental building blocks: IoT edge device IoT cloud platform An IoT device is used to serve as a bridge between existing machines on the ground and an IoT cloud platform. The IoT cloud platform provides a cloud-based infrastructure backbone for data acquisition, data storage, and computing power for data analytics and reporting. The Arduino platform can be effectively used for prototyping IoT devices for almost any IoT solution very rapidly. Building the edge device In this section, we will learn how to use the ESP8266 Wi-Fi chip with the Arduino Uno for connecting to the Internet and posting data to an IoT cloud. There are numerous IoT cloud players in the market today, including Microsoft Azure and Amazon IoT. In this article, we will use the ThingSpeak IoT platform that is very simple to use with the Arduino platform. The following parts will be required for this prototype: 1 Arduino Uno R3 1 USB Cable 1 ESP8266-01 Wi-Fi Transceiver module 1 Breadboard 1 pc. 1K Ohms resistor 1 pc. 2k ohms resistor Some jumper wires Once all the parts have been assembled, follow the breadboard circuit shown in the following figure and build the edge device: Figure 2: ESP8266 with Arduino Uno Wiring The important facts to remember in the preceding setup are: The RXD pin of the ESP8266 chip should receive a 3.3V input signal. We have ensured this by employing the voltage division method. For test purposes, the preceding setup should work fine. However, the ESP8266 chip is demanding when it comes to power (read current) consumption, especially during transmission cycles. Just in case the ESP8266 chip does not respond to the Arduino sketch or AT commands properly, then the power supply may not be enough. Try using a separate battery for the setup. When using a separate battery, remember to use a voltage regulator that steps down the voltage to 3.3 volts before supplying the ESP8266 chip. For prolonged usage, a separate battery based power supply is recommended. Smart retail project inspiration In the previous sections, we looked at the basics of achieving IoT prototyping with the Arduino platform and an IoT cloud platform. With this basic knowledge, you are encouraged to start exploring more advanced IoT scenarios. As a future inspiration, the following smart retail project idea is being provided for you to try by applying the basic principles that you have learned in this article. After all, the goal of this article has been to show you the light and make you self-reliant with the Arduino platform. Imagine a large retail store where products are displayed in hundreds of aisles and thousands of racks. Such large warehouse-type retail layouts are common in some countries, usually with furniture sellers. One of the time-consuming tasks that these retail shops face is to keep the price of their displayed inventory matched with the ever changing competitive market rates. For example, the price of a sofa set could be marked at 350 dollars on aisle number 47 rack number 1. Now let's think from a customer’s perspective. Imagine being a potential customer, standing in front of the sofa set we would naturally search for the prices of that sofa set on the Internet. It would not be very surprising to find a similar sofa set that is priced a little lower, maybe at 325 dollars at another store. That is exactly how and when a potential customer would change her mind. The story after this is simple. The customer leaves store A, goes to store B and purchases the sofa set at 325 dollars. In order to address these types of lost sale opportunities, the furniture company management decides to lower the prices of the sofa set to 325 dollars, in order to match the competition. Thereafter, all that needs to be done is for someone to change the price label for the sofa set in aisle number 47 rack number 1, which is a 5–10 minute walk (considering the size of the shop floor) from the shop back office. In a localized store, it is still achievable without further loss of customers. Now, let's appreciate the problem by thinking hyperscale. The furniture seller’s management is located centrally, say in Sweden and they want to dynamically change the product prices for specific price-sensitive products that are displayed across more than 350 stores, in more than 40 countries. The price change should be automatic, near real time, and simultaneous across all company stores. Given the preceding problem statement, it seems a daunting task that could leave hundreds for shop floor staff scampering all day long, just to keep changing price tags for hundreds of products. An elegant solution to this problem lies in the Internet-of-Things. Figure 3: Smart retail with IoT Referring to the preceding figure, it depicts a basic IoT solution for addressing the furniture seller’s problem to matching product prices dynamically on the fly. Remember, since this is an Arduino prototyping article, the stress is on edge device prototyping. However, IoT as a whole; encompasses many areas that include edge devices and cloud platform capabilities. Our focus in this section is to be able to comprehend and build the edge device prototype that supports this smart retail IoT solution. In the preceding solution, the cloud platform takes care of running intelligent program batches to continuously analyze market rates for price-sensitive products. Once a new price has been determined by the cloud job/program, the new product prices are updated in the cloud-hosted database. Immediately after a new price change, all the IoT edge devices attached to the price tag of specific products in the company stores are notified of the new price. We can build a smart LCD panel for displaying product prices. For Internet connectivity, we can reuse the ESP8266 Wi-Fi chip that we learned in this article. Standalone Devices We are already familiar with the basic parts required for building a prototype. The two new aspects to consider when building a standalone project are an independent power source and a project container. Figure 4 - Typical parts of a Standalone Prototype As shown above, typically a standalone device prototype will contain the following parts: The device prototype (Arduino board + Peripherals + all the required Connections) An independent power source A project container/box After the basic prototype has been built the next consideration is to make it operable on its own, like an island. This is because in real world situations, you will often have to make a device that is not directly be connected to and powered from a computer. Therefore we will need to understand the various options that are available for powering our device prototypes and also understand when to choose which option. The second aspect to consider is an appropriate physical container to house the various parts of a project. A container is a physical device container will ensure that all parts of a project are nicely packaged in a safe and aesthetic manner. A distance measurement device Let's build an exciting project by combining an Ultrasonic Sensor with a 16x2 LCD character display to build an electronic distance measurement device. We will use one of the most easily available 9-volt batteries for powering this standalone device prototype. For building the distance measurement device, the following parts will be required. Arduino Uno R3 USB connector 1 pc. 9 volt battery 1 full sized bread board 1 HC-SR04 ultrasonic sensor 1 pc. 16x2 LCD character display 1 pc. 10K potentiometer 2 pcs. 220 ohms resistor 1 pc. 150 ohms resistor Some jumper wires Before we start building the device, let's understand what the device will do and the various parts involved in the device. The purpose of the device will be to be able to measure the distance of an object from the device. The following diagram depicts the overview of the device: Figure 5 - A standalone distance measurement device overview First, let's quickly understand each of the components involved in the preceding setup. Then, we will jump into hands-on prototyping and coding. The ultrasonic sensor model used in this example is known as HC-SR04. HC-SR04 is a standard commercially available ultrasonic transceiver. A transceiver is a device that is capable of transmitting as well as receiving signals. The HC-SR04 transceiver transmits ultrasonic signals. Once the signals hit an object/obstacle, the signals echo back to the HC-SR04 transceiver. The HC-SR04 ultrasonic module is shown below for reference. Figure 6 - The HC-SR04 Ultrasonic module The HC-SR-04 has four pins. The usage of the pins is explained below for easy understanding: Vcc: This pin is connected to a 5 volt power supply. Trig: This pin receives digital signals from the attached microcontroller unit in order to send out an ultrasonic burst. Echo: This pin sends the measured time duration proportional to the distance travelled by the ultrasonic burst. Gnd: This pin is connected to the ground terminal. The total time taken for the ultrasonic signals to echo back from an obstacle can be divided by 2 and then based on the speed of sound in air, the distance between the object and the HC-SR04 can be calculated. We will see how to calculate the distance in the sketch for this device prototype. As per the HC-SR04 data sheet, it is a 5-volt tolerant device, operating at 15 mA, and has a measurement range starting from 2 centimeters to a maximum of 4 meters. The HC-SR04 can be directly connected to the Arduino board pins. The 16x2 LCD character display is also a standard commercially available device, it has 16 columns and 2 rows. The LCD is controlled by its 4 data pins/lines. We will also see how to send string outputs to the LCD from the Arduino sketch. The power supply being used in today's example is a standard 9-volt battery plugged in Arduino's DC IN Jack. Alternatively, another option is to use 6 pieces of either AA-sized or AAA-sized batteries in series and plug them into the VIN pin of the Arduino board. Distance measurement device circuit Follow the bread board diagram shown next to build the distance measurement device. The diagram shown on the next page is quite complex. Take your time as you unravel through it. All the components (including the Arduino board) in this prototype are powered from the 9 volt battery. Sometimes the LCD procured online might not ship with soldered header pins. In such a case, you will have to solder 16 header pins. It is very important to note that unless the header pins are soldered properly into the LCD board, the LCD screen will not work correctly. This is a very challenging prototype to get it working at one go. Make sure there are no loose jumper wires. Notice how the positive and negative terminals of the power source are plugged into the VIN and GND pins of the Arduino board respectively. The 10K potentiometer has three legs. If you look straight at the breadboard diagram, the left hand side leg of the potentiometer is connected to the 5V power supply rail of the breadboard. Figure 7 - Typical potentiometersr The right hand-side leg is connected to the common ground rail of the breadboard. The leg in the middle is the regulated (via the potentiometer's 10K resistance dial) output that controls the LCD's V0/VEE pin. Basically, this pin controls the contrast of the display. You will also have to adjust (a simple screw driver may be used) the 10K potentiometer dial (to around halfway at 5K) to make the characters visible on the LCD screen. Initially, you may not see anything on the LCD, until the potentiometer is adjusted properly. Figure 8 - Distance measurement device prototype When the 'Trig' pin receives a signal (via pin D8 in this example) and results in sending out ultrasonic waves to the surroundings. As soon as the ultrasonic waves collide with an obstacle, they get reflected. The reflected ultrasonic waves are received by the HC-SR04 sensor. The Echo pin is used to read the output of the ultrasonic sensor (via pin D7 in this example). The output read from the 'Echo' pin is processed by the Arduino sketch to calculate the distance. Summary Thus an IoT device is used to serve as a bridge between existing machines on the ground and an IoT cloud platform. Resources for Article: Further resources on this subject: Introducing IoT with Particle's Photon and Electron [article] IoT and Decision Science [article] IoT Analytics for the Cloud [article]

0
0
31389

How-To Tutorials

Packt

11 Jul 2017

19 min read

Massive Graphs on Big Data

Packt

11 Jul 2017

19 min read

In this article by Rajat Mehta, author of the book Big Data Analytics with Java, we will learn about graphs. Graphs theory is one of the most important and interesting concepts of computer science. Graphs have been implemented in real life in a lot of use cases. If you use a GPS on your phone or a GPS device and it shows you a driving direction to a place, behind the scene there is an efficient graph that is working for you to give you the best possible direction. In a social network you are connected to your friends and your friends are connected to other friends and so on. This is a massive graph running in production in all the social networks that you use. You can send messages to your friends or follow them or get followed all in this graph. Social networks or a database storing driving directions all involve massive amounts of data and this is not data that can be stored on a single machine, instead this is distributed across a cluster of thousands of nodes or machines. This massive data is nothing but big data and in this article we will learn how data can be represented in the form of a graph so that we make analysis or deductions on top of these massive graphs. In this article, we will cover: A small refresher on the graphs and its basic concepts A small introduction on graph analytics, its advantages and how Apache Spark fits in Introduction on the GraphFrames library that is used on top of Apache Spark Before we dive deeply into each individual section, let's look at the basic graph concepts in brief. (For more resources related to this topic, see here.) Refresher on graphs In this section, we will cover some of the basic concepts of graphs, and this is supposed to be a refresher section on graphs. This is a basic section, hence if some users already know this information they can skip this section. Graphs are used in many important concepts in our day-to-day lives. Before we dive into the ways of representing a graph, let's look at some of the popular use cases of graphs (though this is not a complete list) Graphs are used heavily in social networks In finding driving directions via GPS In many recommendation engines In fraud detection in many financial companies In search engines and in network traffic flows In biological analysis As you must have noted earlier, graphs are used in many applications that we might be using on a daily basis. Graphs is a form of a data structure in computer science that helps in depicting entities and the connection between them. So, if there are two entities such as Airport A and Airport B and they are connected by a flight that takes, for example, say a few hours then Airport A and Airport B are the two entities and the flight connecting between them that takes those specific hours depict the weightage between them or their connection. In formal terms, these entities are called as vertexes and the relationship between them are called as edges. So, in mathematical terms, graph G = {V, E},that is, graph is a function of vertexes and edges. Let's look at the following diagram for the simple example of a graph: As you can see, the preceding graph is a set of six vertexes and eight edges,as shown next: Vertexes = {A, B, C, D, E, F} Edges = { AB, AF, BC, CD, CF, DF, DE, EF} These vertexes can represent any entities, for example, they can be places with the edges being 'distances' between the places or they could be people in a social network with the edges being the type of relationship, for example, friends or followers. Thus, graphs can represent real-world entities like this. The preceding graph is also called a bidirected graph because in this graph the edges go in either direction that is the edge from A to B can be traversed both ways from A to B as well as from B to A. Thus, the edges in the preceding diagrams that is AB can be BA or AF can be FA too. There are other types of graphs called as directed graphs and in these graphs the direction of the edges go in one way only and does not retrace back. A simple example of a directed graph is shown as follows:. As seen in the preceding graph, the edge A to B goes only in one direction as well as the edge B to C. Hence, this is a directed graph. A simple linked list data structure or a tree datastructure are also forms of graph only. In a tree, nodes can have children only and there are no loops, while there is no such rule in a general graph. Representing graphs Visualizing a graph makes it easily comprehensible but depicting it using a program requires two different approaches Adjacency matrix: Representing a graph as a matrix is easy and it has its own advantages and disadvantages. Let's look at the bidirected graph that we showed in the preceding diagram. If you would represent this graph as a matrix, it would like this: The preceding diagram is a simple representation of our graph in matrix form. The concept of matrix representation of graph is simple—if there is an edge to a node we mark the value as 1, else, if the edge is not present, we mark it as 0. As this is a bi-directed graph, it has edges flowing in one direction only. Thus, from the matrix, the rows and columns depict the vertices. There if you look at the vertex A, it has an edge to vertex B and the corresponding matrix value is 1. As you can see, it takes just one step or O[1]to figure out an edge between two nodes. We just need the index (rows and columns) in the matrix and we can extract that value from it. Also, if you would have looked at the matrix closely, you would have seen that most of the entries are zero, hence this is a sparse matrix. Thus, this approach eats a lot of space in computer memory in marking even those elements that do not have an edge to each other, and this is its main disadvantage. Adjacency list: Adjacency list solves the problem of space wastage of adjacency matrix. To solve this problem, it stores the node and its neighbors in a list (linked list) as shown in the following diagram: For maintaining brevity, we have not shown all the vertices but you can make out from the diagram that each vertex is storing its neighbors in a linked list. So when you want to figure out the neighbors of a particular vertex, you can directly iterate over the list. Of course this has the disadvantage of iterating when you have to figure out whether an edge exists between two nodes or not. This approach is also widely used in many algorithms in computer science. We have briefly seen how graphs can be represented, let's now see some important terms that are used heavily on graphs. Common terminology on graphs We will now introduce you to some common terms and concepts in graphs that you can use in your analytics on top of graphs: Vertices:As we mentioned earlier, vertices are the mathematical terms for the nodes in a graph. For analytic purposes, thevertices count shows the number of nodes in the system, for example, the number of people involved in a social graph. Edges: As we mentioned earlier, edges are the connection between vertices and edges can carry weights. The number of edges represent the number of relations in a system of graph. The weight on a graph represents the intensity of the relationship between the nodes involved; for example, in a social network, the relationship of friends is a stronger relationship than followed between nodes. Degrees: Represent the total number of connections flowing into as well as out of a node. For example, in the previous diagram the degree of node F is four. The degree count is useful, for example, in a social network graph it can represent how well a person is connected if his degree count is very high. Indegrees: This represents the number of connections flowing into a node. For example, in the previous diagram, for node F the indegree value is three. In a social network graph, this might represent how many people can send messages to this person or node. Oudegrees: This represents the number of connections flowing out of a node. For example, in the previous diagram, for node F the outdegree value is one. In a social network graph, this might represent how many people can send messages to this person or node. Common algorithms on graphs Let's look at the three common algorithms that are run on graphs frequently and some of their uses: Breadth first search: Breadth first search is an algorithm for graph traversal or searching. As the name suggests, the traversal occurs across the breadth of the graphs that is to say the neighbors of the node form where traversal starts are searched first before exploring further in the same manner. We will refer to the same graph we used earlier: If we start at vertex A, then according to breadth first search next we search or go at the neighbors of A that's B and F. After that, we will go at the neighbors of B and that will be C. Next we will go to the neighbours of F and those will be E and D. We only go through each node once and this mimics real-life travel as well, as to reach from a point to another point we seldom cover the same road or path again. Thus, our breadth first traversal starting from A will be {A , B , F , C , D , E }. Breadth first search is very useful in graph analytics and can tell us things such as the friends that are not your immediate friends but just at the next level after your immediate friends in a social network or in the case of a graph of a flights network it can show flights with just a single stop or two stops to the destination. Depth first search: This is another way of searching where we start from the source vertice and keep on searching until we reach the end node or the leaf node and then we backtrack. This algorithm is not as performant as the bread first search as it requires lots of traversals. So if you want to know if a node A is connected to node B, you might end up searching along a lot of wasteful nodes that do not have anything to do with the original nodes A and B before coming at the appropriate solution. Dijkstra's shortest path: This is a greedy algorithm to find the shortest path in a graph network. So in a weighted graph, if you need to find the shortest path between two nodes, you can start from the starting node and keep on picking the next node in path greedily to be the one with the least weight (in the case of weights being distances between nodes like in city graphs depicting interconnecting cities and roads).So in a road network, you can find the shortest path between two cities using this algorithm. PageRank algorithm: This is a very popular algorithm that came out from Google and it essentially is used to find the importance of a web page by figuring out how connected it is to other important websites. It gives a page rank score to each of the websites based on this approach and finally the search results are built based on this score. The best part about this algorithm is it can be applied to other areas in life too, for example, in figuring out the important airports in a flight graph, or figuring out the most important people in a social network group. So much for the basics and refresher on graphs, in the next section, we will now see how graphs can be used in real world in massive datasets such as social network data or in data used in the field of biology. We will also study how graph analytics can be used on top of these graphs to derive exclusive deductions. Plotting graphs There is a handy open source Java library called GraphStream, which can be used to plot graphs and this is very useful specially if you want to view the structure of your graphs. While viewing, you can also figure out if some of the vertices are very close to each other (clustered) or in general how they are placed. Using the GraphStream library is easy. Just download the jar from http://graphstream-project.org and put it in the classpath of your project. Next, we will show a simple example demonstrating how easy it is to plot a graph using this library. Just create an instance of a graph. For our example, we will create a simple DefaultGraph and name it SimpleGraph. Next, we will add the nodes or vertices of the graph. We will also add the attribute of the label that is displayed on the vertice. Graph graph = newDefaultGraph("SimpleGraph"); graph.addNode("A" ).setAttribute("ui.label", "A"); graph.addNode("B" ).setAttribute("ui.label", "B"); graph.addNode("C" ).setAttribute("ui.label", "C"); After building the nodes, it's now time to connect these nodes using the edges. The API is simple to use and on the graph instance we can define the edges, provided an ID is given to them and the starting and ending nodes are also given. graph.addEdge("AB", "A", "B"); graph.addEdge("BC", "B", "C"); graph.addEdge("CA", "C", "A"); All the information of nodes and edges is present on the graph instance. It's now time to plot this graph on the UI and we can just invoke the display method on the graph instance as shown next and display it on the UI. graph.display(); This would plot the graph on the UI as follows: This library is extensive and it will be good learning experience to explore this library further and we would urge the readers to further explore this library on their own. Massive graphs on big data Big data comprises huge amount of data distributed across a cluster of thousands (if not more) of machines. Building graphs based on this massive data has different challenges shown as follows: Due to the vast amount of data involved, the data for the graph is distributed across a cluster of machines. Hence, in actuality, it's not a single node graph and we have to build a graph that spans across a cluster of machines. A graph that spans across a cluster of machines would have vertices and edges spread across different machines and this data in a graph won't fit into the memory of one single machine. Consider your friend's list on Facebook; some of your friend's data in your Facebook friend list graph might lie on different machines and this data might be just tremendous in size. Look at an example diagram of a graph of 10 Facebook friends and their network shown as follows: As you can see in the preceding diagram, when for just 10 friends the data can be huge, and here since the graph is drawn by hand we have not even shown a lot of connections to make the image comprehensible, but in real life each person can have say more than thousands of connections. So imagine what will happen to a graph with say thousands if not more people on the list. As shown in the reasons we just saw, building massive graphs on big data is a different ball game altogether and there are few main approaches for building this massive graphs. From the perspective of big data building the massive graphs involve running and storing data parallely on many nodes. The two main approaches are bulk synchronous parallely and the pregel approach. Apache Spark follows the pregel approach. Covering these approaches in detail is out of scope of this book and if the users are interested more on these topics they should refer to other books and the Wikipedia for the same. Graph analytics The biggest advantage to using graphs is you can analyze these graphs and use them for analyzing complex datasets. You might ask what is so special about graph analytics that we can't do by relational databases. Let's try to understand this using an example, suppose we want to analyze your friends network on Facebook and pull information about your friends such as their name, their birth date, their recent likes, and so on. If Facebook had a relational database, then this would mean firing a query on some table using the foreign key of the user requesting this info. From the perspective of relational database, this first level query is easy. But what if we now ask you to go to the friends at level four in your network and fetch their data (as shown in the following diagram). The query to get this becomes more and more complicated from a relational database perspective but this is a trivial task on a graph or graphical database (such as Neo4j). Graphs are extremely good on operations where you want to pull information from one end of the node to another, where the other node lies after a lot of joins and hops. As such, graph analytics is good for certain use cases (but not for all use cases, relation database are still good on many other use cases). As you can see, the preceding diagram depicts a huge social network (though the preceding diagram might just be depicting a network of a few friends only). The dots represent actual people in a social network. So if somebody asks to pick one user on the left-most side of the diagram and see and follow host connections to the right-most side and pull the friends at the say 10th level or more, this is something very difficult to do in a normal relational database and doing it and maintaining it could easily go out of hand. There are four particular use cases where graph analytics is extremely useful and used frequently (though there are plenty more use cases too) Path analytics: As the name suggests, this analytics approach is used to figure out the paths as you traverse along the nodes of a graph. There are many fields where this can be used—simplest being road networks and figuring out details such as shortest path between cities, or in flight analytics to figure out the shortest time taking flight or direct flights. Connectivity analytics: As the name suggests, this approach outlines how the nodes within a graph are connected to each other. So using this you can figure out how many edges are flowing into a node and how many are flowing out of the node. This kind of information is very useful in analysis. For example, in a social network if there is a person who receives just one message but gives out say ten messages within his network then this person can be used to market his favorite products as he is very good in responding to messages. Community Analytics: Some graphs on big data are huge. But within these huge graphs there might be nodes that are very close to each other and are almost stacked in a cluster of their own. This is useful information as based on this you can extract out communities from your data. For example, in a social network if there are people who are part of some community, say marathon runners, then they can be clubbed into a single community and further tracked. Centrality Analytics: This kind of analytical approach is useful in finding central nodes in a network or graph. This is useful in figuring out sources that are single handedly connected to many other sources. It is helpful in figuring out influential people in a social network, or a central computer in a computer network. From the perspective of this article, we will be covering some of these use cases in our sample case studies and for this we will be using a library on Apache Spark called GraphFrames. GraphFrames GraphX library is advanced and performs well on massive graphs, but, unfortunately, it's currently only implemented in Scala and does not have any direct Java API. GraphFrames is a relatively new library that is built on top of Apache Spark and provides support for dataframe (now dataset) based graphs.It contains a lot of methods that are direct wrappers over the underlying sparkx methods. As such it provides similar functionality as GraphX except that GraphX acts on the Spark SRDD and GraphFrame works on the dataframe so GraphFrame is more user friendly (as dataframes are simpler to use). All the advantages of firing Spark SQL queries, joining datasets, filtering queries are all supported on this. To understand GraphFrames and representing massive big data graphs, we will take small baby steps first by building some simple programs using GraphFrames before building full-fledged case studies. Summary In this article, we learned about graph analytics. We saw how graphs can be built even on top of massive big datasets. We learned how Apache Spark can be used to build these massive graphs and in the process we learned about the new library GraphFrames that helps us in building these graphs. Resources for Article: Further resources on this subject: Saying Hello to Java EE [article] Object-Oriented JavaScript [article] Introduction to JavaScript [article]

0
0
25378

Packt

11 Jul 2017

10 min read

Introduction to HoloLens

Packt

11 Jul 2017

10 min read

In this article, Abhijit Jana, Manish Sharma, and Mallikarjuna Rao, the authors of the book, HoloLens Blueprints, we will be covering the following points to introduce you to using HoloLens for exploratory data analysis. Digital Reality - Under the Hood Holograms in reality Sketching the scenarios 3D Modeling workflow Adding Air Tap on speaker Real-time visualization through HoloLens (For more resources related to this topic, see here.) Digital Reality - Under the Hood Welcome to the world of Digital Reality. The purpose of Digital Reality is to bring immersive experiences, such as taking or transporting you to different world or places, make you interact within those immersive, mix digital experiences with reality, and ultimately open new horizons to make you more productive. Applications of Digital Reality are advancing day by day; some of them are in the field of gaming, education, defense, tourism, aerospace, corporate productivity, enterprise applications, and so on. The spectrum and scenarios of Digital Reality are huge. In order to understand them better, they are broken down into three different categories: Virtual Reality (VR): It is where you are disconnected from the real world and experience the virtual world. Devices available on the market for VR are Oculus Rift, Google VR, and so on. VR is the common abbreviation of Virtual Reality. Augmented Reality (AR): It is where digital data is overlaid over the real world. Pokémon GO, one of the very famous games, is an example of the this globally. A device available on the market, which falls under this category, is Google Glass. Augmented Reality is abbreviated to AR. Mixed Reality (MR): It spreads across the boundary of the real environment and VR. Using MR, you can have a seamless and immersive integration of the virtual and the real world. Mixed Reality is abbreviated to MR. This topic is mainly focused on developing MR applications using Microsoft HoloLens devices. Although these technologies look similar in the way they are used, and sometimes the difference is confusing to understand, there is a very clear boundary that distinguishes these technologies from each other. As you can see in the following diagram, there is a very clear distinction between AR and VR. However, MR has a spectrum, that overlaps across all three boundaries of real world, AR, and MR. Digital Reality Spectrum The following table describes the differences between the three: Holograms in reality Till now, we have mentioned Hologram several times. It is evident that these are crucial for HoloLens and Holographic apps, but what is a Hologram? Virtual Reality Complete Virtual World User is completely isolated from the Real World Device examples: Oculus Rift and Google VR Augmented Reality Overlays Data over the real world Often used for mobile devices Device example: Google Glass Application example: Pokémon GO Mixed Reality Seamless integration of the real and virtual world Virtual world interacts with Real world Natural interactions Device examples: HoloLens and Meta Holograms are the virtual objects which will be made up with light and sound and blend with the real world to give us an immersive MR experience with both real and virtual worlds. In other words, a Hologram is an object like any other real-world object; the only difference is that it is made up of light rather than matter. The technology behind making holograms is known as Holography. The following figure represent two holographic objects placed on the top of a real-size table and gives the experience of placing a real object on a real surface: Holograms objects in real environment Interacting with holograms There are basically five ways that you can interact with holograms and HoloLens. Using your Gaze, Gesture, and Voice and with spatial audio and spatial mapping. Spatial mapping provides a detailed illustration of the real-world surfaces in the environment around HoloLens. This allows developers to understand the digitalized real environments and mix holograms into the world around you. Gaze is the most usual and common one, and we start the interaction with it. At any time, HoloLens would know what you are looking at using Gaze. Based on that, the device can take further decisions on the gesture and voice that should be targeted. Spatial audio is the sound coming out from HoloLens and we use spatial audio to inflate the MR experience beyond the visual. HoloLens Interaction Model Sketching the scenarios The next step after elaborating scenario details is to come up with sketches for this scenario. There is a twofold purpose for sketching; first, it will be input to the next phase of asset development for the 3D Artist, as well as helping to validate requirements from the customer, so there are no surprises at the time of delivery. For sketching, either the designer can take it up on their own and build sketches, or they can take help from the 3D Artist. Let's start with the sketch for the primary view of the scenario, where the user is viewing the HoloLens's hologram: Roam around the hologram to view it from different angles Gaze at different interactive components Sketch for user viewing hologram for the HoloLens Sketching - interaction with speakers While viewing the hologram, a user can gaze at different interactive components. One such component, identified earlier, is the speaker. At the time of gazing at the speaker, it should be highlighted and the user can then Air Tap at it. The Air Tap action should expand the speaker hologram and the user should be able to view the speaker component in detail. Sketch for expanded speakers After the speakers are expanded, the user should be able to visualize the speaker components in detail. Now, if the user Air Taps on the expanded speakers, the application should do the following: Open the textual detail component about the speakers; the user can read the content and learn about the speakers in detail Start voice narration, detailing speaker details The user can also Air Tap on the expanded speaker component, and this action should close the expanded speaker Textual and voice narration for speaker details As you did sketching for the speakers, apply a similar approach and do sketching for other components, such as lenses, buttons, and so on. 3D Modeling workflow Before jumping to 3D Modeling, let's understand the 3D Modeling workflow across different tools that we are going to use during the course of this topic. The following diagram explains the flow of the 3D Modeling workflow: Flow of 3D Modeling workflow Adding Air Tap on speaker In this project, we will be using the left-side speaker for applying Air Tap on speaker. However, you can apply the same for the right-side speaker as well. Similar to Lenses, we have two objects here which we need to identify from the object explorer. Navigate to Left_speaker_geo and left_speaker_details_geo in Object Hierarchy window Tag them as leftspeaker and speakerDetails respectively By default, when you are just viewing the Holograms, we will be hiding the speaker details section. This section only becomes visible when we do the Air Tap, and goes back again when we Air Tap again: Speaker with Box Collider Add a new script inside the Scripts folder, and name it ShowHideBehaviour. This script will handle the Show and Hide behaviour of the speakerDetails game object. Use the following script inside the ShowHideBehaviour.cs file. This script we can use for any other object to show or hide. public class ShowHideBehaviour : MonoBehaviour { public GameObject showHideObject; public bool showhide = false; private void Start() { try { MeshRenderer render = showHideObject.GetComponent InChildren<MeshRenderer>(); if (render != null) { render.enabled = showhide; } } catch (System.Exception) { } } } The script finds the MeshRenderer component from the gameObject and enables or disables it based on the showhide property. In this script, the showhide is property exposed as public, so that you can provide the reference of the object from the Unity scene itself. Attach ShowHideBehaviour.cs as components in speakerDetails tagged object. Then drag and drop the object in the showhide property section. This just takes the reference for the current speaker details objects and will hide the object in the first instance. Attach show-hide script to the object By default, it is unchecked, showhide is set to false and it will be hidden from view. At this point in time, you must check the left_speaker_details_geo on, as we are now handling visibility using code. Now, during the Air Tapped event handler, we can handle the render object to enable visibility. Add a new script by navigating from the context menu Create | C# Scripts, and name it SpeakerGestureHandler. Open the script file in Visual Studio. Similar to SpeakerGestureHandler, by default, the SpeakerGestureHandler class will be inherited from the MonoBehaviour. In the next step, implement the InputClickHandler interface in the SpeakerGestureHandler class. This interface implement the methods OnInputClicked() that invoke on click input. So, whenever you do an Air Tap gesture, this method is invoked. RaycastHit hit; bool isTapped = false; public void OnInputClicked(InputEventData eventData) { hit = GazeManager.Instance.HitInfo; if (hit.transform.gameObject != null) { isTapped = !isTapped; var lftSpeaker = GameObject.FindWithTag("LeftSpeaker"); var lftSpeakerDetails = GameObject.FindWithTag("speakerDetails"); MeshRenderer render = lftSpeakerDetails.GetComponentInChildren <MeshRenderer>(); if (isTapped) { lftSpeaker.transform.Translate(0.0f, -1.0f * Time.deltaTime, 0.0f); render.enabled = true; } else { lftSpeaker.transform.Translate(0.0f, 1.0f * Time.deltaTime, 0.0f); render.enabled = false; } } } When it is gazed, we find the game object for both LeftSpeaker and speakerDetails by the tag name. For the LeftSpeaker object, we are applying transformation based on tapped or not tapped, which worked like what we did for lenses. In the case of speaker details object, we have also taken the reference of MeshRenderer to make it's visibility true and false based on the Air Tap. Attach the SpeakerGestureHandler class with leftSpeaker Game Object. Air Tap in speaker – see it in action Air Tap action for speaker is also done. Save the scene, build and run the solution in emulator once again. When you can see the cursor on the speaker, perform Air Tap. Default View and Air Tapped View Real-time visualization through HoloLens We have learned about the data ingress flow, where devices connect with the IoT Hub, and stream analytics processes the stream of data and pushes it to storage. Now, in this section, let's discuss how this stored data will be consumed for data visualization within holographic application. Solution to consume data through services Summary In this article, we demonstrated using HoloLens, for exploring Digital Reality - Under the Hood, Holograms in reality, Sketching the scenarios, 3D Modeling workflow, Adding Air Tap on speaker, and Real-time visualization through HoloLens. Resources for Article: Further resources on this subject: Creating Controllers with Blueprints [article] Raspberry Pi LED Blueprints [article] Exploring and Interacting with Materials using Blueprints [article]

0
0
28306

How-To Tutorials

Packt

10 Jul 2017

8 min read

HPC Cluster Computing

Packt

10 Jul 2017

8 min read

In this article by, Raja Malleswara Rao Pattamsetti, the author of the book Distributed Computing in Java 9, the author is going to look into the processing capabilities for organizational applications are more than a chronological computer configuration that can be addressed to an extent by increasing the processor capacity and other resource allocation. While this can alleviate the performance for a while, the future computational requirements are restricted for adding more powerful computational processors and cost incurred in producing such powerful systems. Also, there is a need to produce efficient algorithms and practices to produce the best result out of it. A practical and economic substitute for these single high-power computers is to establish multiple low-power capacity processors that work collectively and organize their processing capabilities, which results in a powerful system, that is, parallel computers that permit the processing activities to be distributed among multiple low-capacity computers and together obtain the best expected result. In this article, we will cover the following topics: Era of Computing Commanding Parallel System Architectures MPP: Massively Parallel Processors SMP: Symmetric Multiprocessors CC-NUMA: Cache Coherent Nonuniform Memory Access Distributed Systems Clusters Java support for High-Performance Computing (For more resources related to this topic, see here.) Era of Computing The technological developments are rapid in the industry of computing with the help of advancements in system software and hardware. The hardware advancements are around the processor advancements and their making techniques, and high-performance processors are getting generated in amazingly low cost. The hardware advancements are further boosted by the high-bandwidth and low-latency network systems. Very Large Scale Integration (VLSI) as a concept brought several phenomenal advancements in producing commanding chronological and parallel processing computers. Simultaneously, the System Software advancements have improved the ability of operating system, advanced software programming development techniques. It was observed as two commanding computing eras, namely Sequential and Parallel Era of Computing. The following diagram shows the advancements in both the Era of Computing from last few decades to the forecast for next two decades: In each of these era, it is observed that the hardware architecture growth is trailed by the system software advancements, which mean that as there was more powerful hardware evolved, correspondingly advanced software programs and operating system capacities doubled its strength. As the applications and problem solving environments are added to this with the advent of parallel computing, mini to microprocessor development, clustered computers. Let’s now review some of the commanding parallel system architectures from the last few years. Commanding parallel system architectures From the last few years, multiple varieties of computing models provisioning great processing performance have developed. They are classified depending on the memory allocated to their processors and their alignment are designed. Some of the important parallel systems are as follows: MPP: Massively Parallel Processors SMP: Symmetric Multiprocessors CC-NUMA: Cache Coherent Nonuniform Memory Access Distributed Systems Clusters Let’s now understand a little more detail about each of these system architectures and review some of their important performance characteristics. Massively Parallel Processors (MPP) Massively Parallel Processors (MPP), as the name advises, are a huge parallel processing system developed in no sharing architecture. Such systems usually contain large number of processing nodes that are integrated with a high-speed interconnected network switch. Node is nothing but an element of computer with diverse hardware component combination, usually containing a memory unit and more than on processor. Some of the purpose nodes are designed to have backup disks or additional storage capacity.The following diagram represents the massively parallel processors architecture: Symmetric Multiprocessors (SMP) Symmetric Multiprocessors (SMP), as the name advises, contain a set of limited number of processors ranging from 2 to 64 processors and share most of the resources among those processors. One instance of the operating system will be operating together on all these connected processors while they commonly share the I/O, memory, and the network bus. Based on the nature of similar set of processors connected and acting together as on operating system is the essential behavior of symmetric multiprocessors.The following diagram depicts the symmetric multiprocessors representation: Cache Coherent Nonuniform Memory Access (CC-NUMA) Cache Coherent Nonuniform Memory Access (CC-NUMA) is a special type of multiple processor system having mountable additional processor capability. In CC-NUMA, the SMP system and the other remote nodes communicate through the remote interconnection link. Each of the remote nodes contains local memory and processors of its own.The following diagram represents the cache coherent nonuniform memory access architecture: The nature of memory access is nonuniform. Just Like the symmetric multiprocessors, CC-NUMA system is a comprehensive sight to entire system memory, and as the name advises, it takes nonuniform time to access either the close or distant memory locations. Distributed systems Distributed systems, as we have been discussing from previous articles, are traditional individual set of computers interconnected through an Internet/intranet running on their own operating system. Any diverse set of computer systems can contribute to be part of the distributed system with this expectation, including the combinations of Massively Parallel Processors, Symmetric Multiprocessors, distinct computers, and clusters. Clusters Cluster is an assembly of terminals or computers that are assembled through internetwork connection to address a large processing requirement through parallel processing. Hence, the clusters are usually configured with terminals or computers having higher processing capabilities connected with high-speed internetwork. Usually, Cluster is considered as one image that integrates a number of nodes with a group of resource utilization. The following diagram is a sample clustered tomcat server for a typical J2EE web application deployment: The following table shows some of the important performance characteristics of the different parallel architecture systems discussed so far: Network of workstations A network of workstations is a group of resources connected like system processors, interface for networks, storage, and data disks that open the space for newer combinations such as: Parallel Computing: As discussed in the previous sections, the group of system processor can be connected as MPP or DSM, which can obtain the parallel processing ability. RAM Network: As a number of systems are connected, each system memory can collectively work as DRAM cache, which intensely expand the virtual memory for the entire system to improve its processing ability. Software Redundant Array of Inexpensive Disks (RAID): As a group of systems are used in the network connected in array improve the system stability, availability, and memory capacity with the help of low-cost multiple systems connected in the local area network. This also gives the simultaneous I/O system support. Multipath Communication: Multipath communication is a technique of using more than one network connection between the network of workstations to allow simultaneous information exchange among system nodes. Java support for High-Performance Computing Java is providing some numerous advantages for HPC (High-Performance Computing) as a programming language, particularly with Grid computing. Some of the key benefits of using Java in HPC include the following: Portability: The capability to write the programming in one platform and port it to run on any other operating platform has been the biggest strength of Java language. This continue to be the advantage when porting Java applications to HPC systems. In Grid computing, this is an essential feature, as the execution environment gets decided during the execution of the program. This is possible since the Java byte code executes in Java Virtual Machine (JVM), which itself acts as an abstract operating environment. Network Centricity: As discussed in previous articles, Java provides a great support for distributed systems with its network centric feature of remote procedure calls (RPC) through RMI and CORBA services. Along with these, Java support for socket programming is another great support for grid computing. Software Engineering: Another great feature of Java is the way it can be used for engineering the solutions. We can produce more object-oriented, loosely coupled software that can be independently added as API through jar files in multiple other systems. Encapsulation and Interface programming makes it a great advantage in such application development. Security: Java is highly secure programming language with its feature like byte code verification to limit the resource utilization by an untrusted intruder software or program. This is a great advantage when running application on distributed environment with remote communication established over shared network. GUI development: Java support for platform independent GUI development is a great advantage to develop and deploy the enterprise web applications over a HPC environment to interact with. Availability: Java is having a great supporting from multiple operating system including Windows, Linux, SGI, and Compaq. It is easily available to consume and develop using a great set of open source frameworks developed on top of it. While the advantages listed in the previous points in favor of using Java for HPC environments, some concerns need to be reviewed while designing Java applications, including its numerics (complex numbers, fastfp, multidimensional), performance, and parallel computing designs. Summary Through this article, you have learned about era of computing, commanding parallel, system architectures, MPP: Massively Parallel Processors, SMP: Symmetric Multiprocessors, CC-NUMA: Cache Coherent Nonuniform Memory Access, Distributed Systems, Clusters, Java support for High-Performance Computing Resources for Article: Further resources on this subject: The Amazon S3 SDK for Java [article] Gradle with the Java Plugin [article] Getting Started with Sorting Algorithms in Java [article]

0
0
6992

Packt

10 Jul 2017

8 min read

Queues and topics

Packt

10 Jul 2017

8 min read

In this article by Luca Stancapiano, the author of the book Mastering Java EE Development with WildFly, we will see how to implement Java Message Service (JMS) in a queue channel using WildFly console. (For more resources related to this topic, see here.) JMS works inside channels of messages that manage the messages asynchronously. These channels contain messages that they will collect or remove according the configuration and the type of channel. These channels are of two types, queues and topics. These channels are highly configurable by the WildFly console. As for all components in WildFly they can be installed through the console command line or directly with maven plugins of the project. In the next two paragraphs we will show what do they mean and all the possible configurations. Queues Queues collect the sent messages that are waiting to be read. The messages are delivered in the order they are sent and when beds are removed from the queue. Create the queue from the web console See now the steps to create a new queue through the web console. Connect to http://localhost:9990/. Go in Configuration | Subsystems/Messaging - ActiveMQ/default. And click on Queues/Topics. Now select the Queues menu and click on the Add button. You will see this screen: The parameters to insert are as follows: Name: The name of the queue. JNDI Names: The jndi names the queue will be bound to. Durable?: Whether the queue is durable or not. Selector: The queue selector. As for all enterprise components, JMS components are callable through Java Naming Directory Interface (JNDI). Durable queues keep messages around persistently for any suitable consumer to consume them. Durable queues do not need to concern themselves with which consumer is going to consume the messages at some point in the future. There is just one copy of a message that any consumer in the future can consume. Message Selectors allows to filter the messages that a Message Consumer will receive. The filter is a relatively complex language similar to the syntax of an SQL WHERE clause. The selector can use all the message headers and properties for filtering operations, but cannot use the message content.Selectors are mostly useful for channels that broadcast a very large number of messages to its subscribers. On Queues, only messages that match the selector will be returned. Others stay in the queue (and thus can be read by a MessageConsumer with different selector). The following SQL elements are allowed in our filters and we can put them in the Selector field of the form: Element Description of the Element Example of Selectors AND, OR, NOT Logical operators (releaseYear < 1986) ANDNOT (title = 'Bad') String Literals String literals in single quotes, duplicate to escape title = 'Tom''s' Number Literals Numbers in Java syntax. They can be double or integer releaseYear = 1982 Properties Message properties that follow Java identifier naming releaseYear = 1983 Boolean Literals TRUE and FALSE isAvailable = FALSE ( ) Round brackets (releaseYear < 1981) OR (releaseYear > 1990) BETWEEN Checks whether number is in range (both numbers inclusive) releaseYear BETWEEN 1980 AND 1989 Header Fields Any headers except JMSDestination, JMSExpiration and JMSReplyTo JMSPriority = 10 =, <>, <, <=, >, >= Comparison operators (releaseYear < 1986) AND (title <> 'Bad') LIKE String comparison with wildcards '_' and '%' title LIKE 'Mirror%' IN Finds value in set of strings title IN ('Piece of mind', 'Somewhere in time', 'Powerslave') IS NULL, IS NOT NULL Checks whether value is null or not null. releaseYear IS NULL *, +, -, / Arithmetic operators releaseYear * 2 > 2000 - 18 Fill the form now: In this article we will implement a messaging service to send coordinates of the bus means . The queue is created and showed in the queues list: Create the queue using CLI and Maven WildFly plugin The same thing can be done with the Command Line Interface (CLI). So start a WildFly instance, go in the bin directory of WildFly and execute the following script: bash-3.2$ ./jboss-cli.sh You are disconnected at the moment. Type 'connect' to connect to the server or 'help' for the list of supported commands. [disconnected /] connect [standalone@localhost:9990 /] /subsystem=messagingactivemq/ server=default/jmsqueue= gps_coordinates:add(entries=["java:/jms/queue/GPS"]) {"outcome" => "success"} The same thing can be done through maven. Simply add this snippet in your pom.xml: <plugin> <groupId>org.wildfly.plugins</groupId> <artifactId>wildfly-maven-plugin</artifactId> <version>1.0.2.Final</version> <executions> <execution> <id>add-resources</id> <phase>install</phase> <goals> <goal>add-resource</goal> </goals> <configuration> <resources> <resource> <address>subsystem=messaging-activemq,server=default,jmsqueue= gps_coordinates</address> <properties> <durable>true</durable> <entries>!!["gps_coordinates", "java:/jms/queue/GPS"]</entries> </properties> </resource> </resources> </configuration> </execution> <execution> <id>del-resources</id> <phase>clean</phase> <goals> <goal>undeploy</goal> Queues and topics [ 7 ] </goals> <configuration> <afterDeployment> <commands> <command>/subsystem=messagingactivemq/ server=default/jms-queue=gps_coordinates:remove </command> </commands> </afterDeployment> </configuration> </execution> </executions> </plugin> The Maven WildFly plugin lets you to do admin operations in WildFly using the same custom protocol used by command line. Two executions are configured: add-resources: It hooks the install maven scope and it adds the queue passing the name, JNDI and durable parameters seen in the previous paragraph. del-resources: It hooks the clean maven scope and remove the chosen queue by name. Create the queue through an Arquillian test case Or we can add and remove the queue through an Arquillian test case: @RunWith(Arquillian.class) @ServerSetup(MessagingResourcesSetupTask.class) public class MessageTestCase { ... private static final String QUEUE_NAME = "gps_coordinates"; private static final String QUEUE_LOOKUP = "java:/jms/queue/GPS"; static class MessagingResourcesSetupTask implements ServerSetupTask { @Override public void setup(ManagementClient managementClient, String containerId) throws Exception { getInstance(managementClient.getControllerClient()).createJmsQueue(QUEUE_NA ME, QUEUE_LOOKUP); } @Override public void tearDown(ManagementClient managementClient, String containerId) throws Exception { getInstance(managementClient.getControllerClient()).removeJmsQueue(QUEUE_NA ME); } } Queues and topics [ 8 ] ... } The Arquillian org.jboss.as.arquillian.api.ServerSetup annotation let to use an external setup manager used to install or remove new components inside WildFly. In this case we are installing the queue declared with the two variables QUEUE_NAME and QUEUE_LOOKUP. When the test ends, automatically the tearDown method will be started and it will remove the installed queue. To use Arquillian it's important add the WildFly testsuite dependency in your pom.xml project: ... <dependencies> <dependency> <groupId>org.wildfly</groupId> <artifactId>wildfly-testsuite-shared</artifactId> <version>10.1.0.Final</version> <scope>test</scope> </dependency> </dependencies> ... Going in the standalone-full.xml we will find the created queue as: <subsystem > <server name="default"> ... <jms-queue name="gps_coordinates" entries="java:/jms/queue/GPS"/> ... </server> </subsystem> JMS is available in the standalone-full configuration. By default WildFly supports 4 standalone configurations. They can be found in the standalone/configuration directory: standalone.xml: It supports all components except the messaging and corba/iiop standalone-full.xml: It supports all components standalone-ha.xml: It supports all components except the messaging and corba/iiop with the enabled cluster standalone-full-ha.xml: It supports all components with the enabled cluster To start WildFly with the chosen configuration simply add a -c with the configuration in the standalone.sh script. Here a sample to start the standalone full configuration: ./standalone.sh -c standalone-full.xml Create the java client for the queue See now how create a client to send a message to the queue. JMS 2.0 simplify very much the creation of clients. Here a sample of a client inside a stateless Enterprise Java Beans (EJB): @Stateless public class MessageQueueSender { @Inject private JMSContext context; @Resource(mappedName = "java:/jms/queue/GPS") private Queue queue; public void sendMessage(String message) { context.createProducer().send(queue, message); } } The javax.jms.JMSContext is injectable from any EE component. We will see the JMS context in details in the next paragraph The JMS Context. The queue is represented in JMS by the javax.jms.Queue class. It can be injected as JNDI resource through the @Resource annotation. The JMS context through the createProducer method creates a producer represented by the javax.jms.JMSProducer class used to send the messages. We can now create a client injecting the stateless and sending a string message hello! ... @EJB private MessageQueueSender messageQueueSender; ... messageQueueSender.sendMessage("hello!"); Summary In this article we have seen how to implement Java Message Service in a queue channel using web console, Command Line Interface and Maven WildFly plugins, Arquillian test cases and how to create Java clients for queue. Resources for Article: Further resources on this subject: WildFly – the Basics [article] WebSockets in Wildfly [article] Creating Java EE Applications [article]

0
0
15006

Packt

10 Jul 2017

6 min read

Thread of Execution

Packt

10 Jul 2017

6 min read

In this article by Anton Polukhin Alekseevic, the author of the book Boost C++ Application Development Cookbook - Second Edition, we will see the multithreading concept. Multithreading means multiple threads of execution exist within a single process. Threads may share process resources and have their own resources. Those threads of execution may run independently on different CPUs, leading to faster and more responsible programs. Let's see how to create a thread of execution. (For more resources related to this topic, see here.) Creating a thread of execution On modern multicore compilers, to achieve maximal performance (or just to provide a good user experience), programs usually must use multiple threads of execution. Here is a motivating example in which we need to create and fill a big file in a thread that draws the user interface: #include <algorithm> #include <fstream> #include <iterator> bool is_first_run(); // Function that executes for a long time. void fill_file(char fill_char, std::size_t size, const char* filename); // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { // This will be executing for a long time during which // users interface freezes. fill_file(0, 8 * 1024 * 1024, "save_file.txt"); } Getting ready This recipe requires knowledge of boost::bind or std::bind. How to do it... Starting a thread of execution never was so easy: #include <boost/thread.hpp> // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { boost::thread(boost::bind( &fill_file, 0, 8 * 1024 * 1024, "save_file.txt" )).detach(); } How it works... The boost::thread variable accepts a functional object that can be called without parameters (we provided one using boost::bind) and creates a separate thread of execution. That functional object is copied into a constructed thread of execution and run there. We are using version 4 of the Boost.Thread in all recipes (defined BOOST_THREAD_VERSION to 4). Important differences between Boost.Thread versions are highlighted. After that, we call the detach() function, which does the following: The thread of execution is detached from the boost::thread variable but continues its execution The boost::thread variable start to hold a Not-A-Thread state Note that without a call to detach(), the destructor of boost::thread will notice that it still holds a OS thread and will call std::terminate. std::terminate terminates our program without calling destructors, freeing up resources, and without other cleanup. Default constructed threads also have a Not-A-Thread state, and they do not create a separate thread of execution. There's more... What if we want to make sure that file was created and written before doing some other job? In that case we need to join the thread in the following way: // ... // Somewhere in thread that draws a user interface: if (is_first_run()) { boost::thread t(boost::bind( &fill_file, 0, 8 * 1024 * 1024, "save_file.txt" )); // Do some work. // ... // Waiting for thread to finish. t.join(); } After the thread is joined, the boost::thread variable holds a Not-A-Thread state and its destructor does not call std::terminate. Remember that the thread must be joined or detached before its destructor is called. Otherwise, your program will terminate! With BOOST_THREAD_VERSION=2 defined, the destructor of boost::thread calls detach(), which does not lead to std::terminate. But doing so breaks compatibility with std::thread, and some day, when your project is moving to the C++ standard library threads or when BOOST_THREAD_VERSION=2 won't be supported; this will give you a lot of surprises. Version 4 of Boost.Thread is more explicit and strong, which is usually preferable in C++ language. Beware that std::terminate() is called when any exception that is not of type boost::thread_interrupted leaves boundary of the functional object that was passed to the boost::thread constructor. There is a very helpful wrapper that works as a RAII wrapper around the thread and allows you to emulate the BOOST_THREAD_VERSION=2 behavior; it is called boost::scoped_thread<T>, where T can be one of the following classes: boost::interrupt_and_join_if_joinable: To interrupt and join thread at destruction boost::join_if_joinable: To join a thread at destruction boost::detach: To detach a thread at destruction Here is a small example: #include <boost/thread/scoped_thread.hpp> void some_func(); void example_with_raii() { boost::scoped_thread<boost::join_if_joinable> t( boost::thread(&some_func) ); // 't' will be joined at scope exit. } The boost::thread class was accepted as a part of the C++11 standard and you can find it in the <thread> header in the std:: namespace. There is no big difference between the Boost's version 4 and C++11 standard library versions of the thread class. However, boost::thread is available on the C++03 compilers, so its usage is more versatile. There is a very good reason for calling std::terminate instead of joining by default! C and C++ languages are often used in life critical software. Such software is controlled by other software, called watchdog. Those watchdogs can easily detect that application has terminated but not always can detect deadlocks or detect them with bigger delays. For example for a defibrillator software it's safer to terminate, than hang on join() for a few seconds waiting for a watchdog reaction. Keep that in mind while designing such applications. See also All the recipes in this chapter are using Boost.Thread. You may continue reading to get more information about the library. The official documentation has a full list of the boost::thread methods and remarks about their availability in the C++11 standard library. The official documentation can be found at http://boost.org/libs/thread. The Interrupting a thread recipe will give you an idea of what the boost::interrupt_and_join_if_joinable class does. Summary We saw how to create a thread of execution using some easy techniques. Resources for Article: Further resources on this subject: Introducing the Boost C++ Libraries [article] Boost.Asio C++ Network Programming [article] Application Development in Visual C++ - The Tetris Application [article]

0
0
34258

article-image-when-do-we-use-r-over-python

Packt

10 Jul 2017

16 min read

When do we use R over Python?

Packt

10 Jul 2017

16 min read

0
0
5425

Packt

10 Jul 2017

11 min read

Android UIs with Custom Views

Packt

10 Jul 2017

11 min read

In this article by Raimon Rafols Montane, the author of the book Building Android UIs with Custom Views, we will see the very basics steps we'll need to get ourselves started building Android custom views, where we should use them and where we should simply rely on the Android standard widgets. (For more resources related to this topic, see here.) Why do we need custom views: There are lovely Android applications on Google Play and in other markets such as Amazon, built only using the standard Android UI widgets and layouts. There are also many other applications which have that small additional feature that makes our interaction with them easier or simply more pleasing. There is no magic formula, but maybe by just adding something different, something that the user feels like "hey it's not just another app for" might increase our user retention. It might not be the deal breaker but can definitely make the difference sometimes. Few years ago, author was working on the pre-sales department of a mobile app development agency and when the app Path was launched to the market, we got many and many requests from our customers to build menus like the Path app menu. It wasn't a critical feature for the applications they were building, but at that point it was a cool thing and many other applications and products wanted to have something similar. Even if it was a simple detail, it made the Path application special at that time. It wasn't the case at that time, but nowadays there are many implementations of that menu published in GitHub as open source, although they're not really maintained anymore. https://github.com/daCapricorn/ArcMenu https://github.com/oguzbilgener/CircularFloatingActionMenu. One of the main reasons to create our own custom views to our mobile application is, precisely, to have something special. It might be a menu, a component, a screen, something that might be really needed or even the main functionality for our application or just as an additional feature. In addition, by creating our custom view we can actually optimize the performance of our application. We can create a specific way of layouting widgets that otherwise will need many hierarchy layers by just using standard Android layouts or a custom view that simplifies rendering or user interaction. On the other hand, we can easily fall in the error of trying to build everything custom. Android provides an awesome list of widget and layout components that manages a lot of things for ourselves. If we ignore the basic Android framework and try to build everything by ourselves it would be a lot of work, potentially struggling with a lot of issues and errors that the Android OS developers already faced or, at least, very similar ones and, to put it up in one sentence, we would be reinventing the wheel. Examples in the market We all probably use great apps that are built only using the standard Android UI widgets and layouts, but there are many others that have some custom views that we don't know or we haven't really noticed. The custom views or layouts can sometimes be very subtle and hard to spot. We'd not be the first ones to have a custom view or layout in our application. In fact, many popular apps have some custom elements on them. For example, the Etsy application had a custom layout called StaggeredGridView. It was even published as open source in GitHub. It has been deprecated since 2015 in favor of Google's own StaggeredGridLayoutManager used together with RecyclerView. More information refer to: https://github.com/etsy/AndroidStaggeredGrid https://developer.android.com/reference/android/support/v7/widget/StaggeredGridLayoutManager.html. Parameterizing our custom view We have our custom view that adapts to multiple sizes now, that's good, but, what happens if we need another custom view that paints the background blue instead of red? And yellow? we shouldn't have to copy the custom view class for each customization. Luckily, we can set parameters on the XML layout and read them from our custom view. First, we need to define the type of parameters we will use on our custom view. We've to create a file called attrs.xml: <?xml version="1.0" encoding="utf-8"?> <resources> <declare-styleable name="OwnCustomView"> <attr name="fillColor" format="color"/> </declare-styleable> </resources> Then, we have to add a new different namespace on our layout file where we want to use following new parameter: <?xml version="1.0" encoding="utf-8"?> <ScrollView android_orientation="vertical" android_layout_width="match_parent" android_layout_height="match_parent"> <LinearLayout android_layout_width="match_parent" android_layout_height="wrap_content" android_orientation="vertical" android_padding="@dimen/activity_vertical_margin"> <com.packt.rrafols.customview.OwnCustomView android_layout_width="match_parent" android_layout_height="wrap_content" app_fillColor="@android:color/holo_blue_dark"/> </LinearLayout> </ScrollView> Now that we have this defined, let's see how we can read it from our custom view class. int fillColor; TypedArray ta = context.getTheme().obtainStyledAttributes(attributeSet, R.styleable.OwnCustomView, 0, 0); try { fillColor = ta.getColor(R.styleable.OwnCustomView_fillColor, DEFAULT_FILL_COLOR); } finally { ta.recycle(); } By getting a TypedArray using the styled attribute ID Android tools created for us after saving the attrs.xml file, we'll be able to query for the value of those parameters set on the XML layout file. More information about how to obtain a TypedArray can be found on the Android documentation: https://developer.android.com/reference/android/content/res/Resources.Theme.html#obtainStyledAttributes(android.util.AttributeSet, int[], int, int). For more information about TypedArray refer to: https://developer.android.com/reference/android/content/res/TypedArray.html. In this example, we created an attribute named fillColor which will be a formatted as a color. This format, or basically, the type of the attribute is very important to limit the kind of values we can set, and how these values can be retrieved afterwards from our custom view. Also, for each parameter we define, we'll get a R.styleable.<name>_<parameter_name> index in the TypedArray. In the code above, we're querying for the fillColor using the R.styleable.OwnCustomView_fillColor index. We shouldn't forget to recycle the TypedArray after using it so it can be reused, but once recycled, we can't use it again. Many custom views will only need to draw something in a special way, that's the reason we created them as custom views, but many others will need to react to user events. For example, how our custom view will behave when the user clicks or drags on top of it? Basic event handling Let's start by adding some basic event handling to our custom views. Reacting to touches One of the first things we'd to implement is to react to touch events. Android provides us with the method onTouchEvent() that we can override in our custom view. Overriding this method, we'll get any touch event happening on top of it. To see how it works, let's add it to the custom view: @Override public boolean onTouchEvent(MotionEvent event) { Log.d(TAG, "touch: " + event); return super.onTouchEvent(event); } Let's also add a log call to see what events do we receive. If we run this code and we touch on top of our view, we'll get: D/com.packt.rrafols.customview.CircularActivityIndicator: touch: MotionEvent { action=ACTION_DOWN, actionButton=0, id[0]=0, x[0]=644.3645, y[0]=596.55804, toolType[0]=TOOL_TYPE_FINGER, buttonState=0, metaState=0, flags=0x0, edgeFlags=0x0, pointerCount=1, historySize=0, eventTime=30656461, downTime=30656461, deviceId=9, source=0x1002 } As we can see there is a lot of information on the event, the coordinates, the action type, the time, but even if we perform more actions on it, we'll only get ACTION_DOWN events. That's because the default implementation of view is not clickable. If we don't set the clickable flag on the view, onTouchEvent() will return false and ignore further events. Method onTouchEvent() has to return true if the event has been processed or false if it hasn't. If we receive an event in our custom view and we don't know what to do it or we're not interested in that kind of events, we should return false so it can be processed by our view's parent or by any other component or the system. To receive more type of events, we can do two things: Set the view as clickable by using setClickable(true) Implement our own logic and process the events ourselves in our custom class. As, we'll implement more complex events, we'll go for the second option. Let's do a quick test, change the method to return simply true instead of calling the parent method: @Override public boolean onTouchEvent(MotionEvent event) { Log.d(TAG, "touch: " + event); return true; } Now we should receive many other types of events as follows: ...CircularActivityIndicator: touch: MotionEvent { action=ACTION_DOWN, ...CircularActivityIndicator: touch: MotionEvent { action=ACTION_UP, ...CircularActivityIndicator: touch: MotionEvent { action=ACTION_DOWN, ...CircularActivityIndicator: touch: MotionEvent { action=ACTION_MOVE, ...CircularActivityIndicator: touch: MotionEvent { action=ACTION_MOVE, ...CircularActivityIndicator: touch: MotionEvent { action=ACTION_MOVE, ...CircularActivityIndicator: touch: MotionEvent { action=ACTION_UP, ...CircularActivityIndicator: touch: MotionEvent { action=ACTION_DOWN, Using the Paint class We've been drawing some primitives until now, but Canvas provides us with many more primitive rendering methods. We'll briefly cover some of them, but before let's first talk about the Paint class as we haven't introduced it properly. According to the official definition the Paint class hold the style and color information about how to draw primitives, text and bitmaps. If we check the examples we've been building, we created a Paint object on our class constructor, or on the onCreate method, and we used it to draw primitives later on our onDraw method. As, for instance, we set our background Paint instance Style to Paint.Style.FILL, it'll fill the primitive, but we can change it to Paint.Style.STROKE if we only want to draw the border or the strokes of the silhouette. We can draw both using Paint.Style.FILL_AND_STROKE. To see the Paint.Style.STROKE in action, we'll draw a black border on top of our selected colored bar in our custom view. Let's start by defining a new Paint object called indicatorBorderPaint and initialize it on our class constructor: indicatorBorderPaint = new Paint(); indicatorBorderPaint.setAntiAlias(false); indicatorBorderPaint.setColor(BLACK_COLOR); indicatorBorderPaint.setStyle(Paint.Style.STROKE); indicatorBorderPaint.setStrokeWidth(BORDER_SIZE); indicatorBorderPaint.setStrokeCap(Paint.Cap.BUTT); We also defined a constant with the size of the border line and set the stroke width to this size. If we set the width to 0, Android guaranties it'll use a single pixel to draw the line. As we want to draw a thick black border, this is not our case right now. In addition, we set the stroke cap to Paint.Cap.BUTT to avoid the stroke overflowing its path. There are two more Caps we can use, Paint.Cap.SQUARE and Paint.Cap.ROUND. These last two will end the stroke respectively with a circle, rounding the stroke, or a square. Let's see the differences between the three Caps and also introduce the drawLine primitive. First of all, we create an array with all three Cap, so we can easily iterate between them and create a more compact code: private Paint.Cap[] caps = new Paint.Cap[] { Paint.Cap.BUTT, Paint.Cap.ROUND, Paint.Cap.SQUARE }; Now, on our onDraw method, let's draw a line using each of the Cap using the drawLine(float startX, float startY, float stopX, float stopY, Paint paint) method. int xPos = (getWidth() - 100) / 2; int yPos = getHeight() / 2 - BORDER_SIZE * CAPS.length / 2; for(int i = 0; i < CAPS.length; i++) { indicatorBorderPaint.setStrokeCap(CAPS[i]); canvas.drawLine(xPos, yPos, xPos + 100, yPos, indicatorBorderPaint); yPos += BORDER_SIZE * 2; } indicatorBorderPaint.setStrokeCap(Paint.Cap.BUTT); Summary In this article we have seen the reasoning behind why we might want to build custom views and layouts. Android provides a great basic framework for creating UIs and not using it would be a mistake. Not every component, button or widget has to be developed completely custom but, by doing it so in the right spot, we can add an extra feature that might make our application remembered. We have also seen basics of event handling. Resources for Article: Further resources on this subject: Offloading work from the UI Thread on Android [article] Getting started with Android Development [article] Practical How-To Recipes for Android [article]

0
0
2811

How-To Tutorials

Packt

06 Jul 2017

11 min read

Spark Streaming

Packt

06 Jul 2017

11 min read

In this article by Romeo Kienzler, the author of the book Mastering Apache Spark 2.x - Second Edition, we will see Apache Streaming module is a stream processing-based module within Apache Spark. It uses the Spark cluster to offer the ability to scale to a high degree. Being based on Spark, it is also highly fault tolerant, having the ability to rerun failed tasks by check-pointing the data stream that is being processed. The following areas will be covered in this article after an initial section, which will provide a practical overview of how Apache Spark processes stream-based data: Error recovery and check-pointing TCP-based stream processing File streams Kafka stream source For each topic, we will provide a worked example in Scala, and will show how the stream-based architecture can be set up and tested. (For more resources related to this topic, see here.) Overview The following diagram shows potential data sources for Apache Streaming, such as Kafka, Flume, and HDFS: These feed into the Spark Streaming module, and are processed as Discrete Streams. The diagram also shows that other Spark module functionality, such as machine learning, can be used to process the stream-based data. The fully processed data can then be an output for HDFS, databases, or dashboards. This diagram is based on the one at the Spark streaming website, but we wanted to extend it for expressing the Spark module functionality: When discussing Spark Discrete Streams, the previous figure, again taken from the Spark website at http://spark.apache.org/, is the diagram we like to use. The green boxes in the previous figure show the continuous data stream sent to Spark, being broken down into a Discrete Streams (DStream). The size of each element in the stream is then based on a batch time, which might be two seconds. It is also possible to create a window, expressed as the previous red box, over the DStream. For instance, when carrying out trend analysis in real time, it might be necessary to determine the top ten Twitter-based hashtags over a ten minute window. So, given that Spark can be used for stream processing, how is a stream created? The following Scala-based code shows how a Twitter stream can be created. This example is simplified because Twitter authorization has not been included, but you get the idea. The Spark Stream Context (SSC) is created using the Spark Context sc. A batch time is specified when it is created; in this case, 5 seconds. A Twitter-based DStream, called stream, is then created from the Streamingcontext using a window of 60 seconds: val ssc = new StreamingContext(sc, Seconds(5) ) val stream = TwitterUtils.createStream(ssc,None).window( Seconds(60) ) The stream processing can be started with the stream context start method (shown next), and the awaitTermination method indicates that it should process until stopped. So, if this code is embedded in a library-based application, it will run until the session is terminated, perhaps with a Crtl + C: ssc.start() ssc.awaitTermination() This explains what Spark Streaming is, and what it does, but it does not explain error handling, or what to do if your stream-based application fails. The next section will examine Spark Streaming error management and recovery. Errors and recovery Generally, the question that needs to be asked for your application is; is it critical that you receive and process all the data? If not, then on failure you might just be able to restart the application and discard the missing or lost data. If this is not the case, then you will need to use check pointing, which will be described in the next section. It is also worth noting that your application's error management should be robust and self-sufficient. What we mean by this is that; if an exception is non-critical, then manage the exception, perhaps log it, and continue processing. For instance, when a task reaches the maximum number of failures (specified by spark.task.maxFailures), it will terminate processing. Checkpointing It is possible to set up an HDFS-based checkpoint directory to store Apache Spark-based streaming information. In this Scala example, data will be stored in HDFS, under /data/spark/checkpoint. The following HDFS file system ls command shows that before starting, the directory does not exist: [hadoop@hc2nn stream]$ hdfs dfs -ls /data/spark/checkpoint ls: `/data/spark/checkpoint': No such file or directory The Twitter-based Scala code sample given next, starts by defining a package name for the application, and by importing Spark Streaming Context, and Twitter-based functionality. It then defines an application object named stream1: package nz.co.semtechsolutions import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.spark.streaming._ import org.apache.spark.streaming.twitter._ import org.apache.spark.streaming.StreamingContext._ object stream1 { Next, a method is defined called createContext, which will be used to create both the spark, and streaming contexts. It will also checkpoint the stream to the HDFS-based directory using the streaming context checkpoint method, which takes a directory path as a parameter. The directory path being the value (cpDir) that was passed into the createContext method: def createContext( cpDir : String ) : StreamingContext = { val appName = "Stream example 1" val conf = new SparkConf() conf.setAppName(appName) val sc = new SparkContext(conf) val ssc = new StreamingContext(sc, Seconds(5) ) ssc.checkpoint( cpDir ) ssc } Now, the main method is defined, as is the HDFS directory, as well as Twitter access authority and parameters. The Spark Streaming context ssc is either retrieved or created using the HDFS checkpoint directory via the StreamingContext method—getOrCreate. If the directory doesn't exist, then the previous method called createContext is called, which will create the context and checkpoint. Obviously, we have truncated our own Twitter auth.keys in this example for security reasons: def main(args: Array[String]) { val hdfsDir = "/data/spark/checkpoint" val consumerKey = "QQpxx" val consumerSecret = "0HFzxx" val accessToken = "323xx" val accessTokenSecret = "IlQxx" System.setProperty("twitter4j.oauth.consumerKey", consumerKey) System.setProperty("twitter4j.oauth.consumerSecret", consumerSecret) System.setProperty("twitter4j.oauth.accessToken", accessToken) System.setProperty("twitter4j.oauth.accessTokenSecret", accessTokenSecret) val ssc = StreamingContext.getOrCreate(hdfsDir, () => { createContext( hdfsDir ) }) val stream = TwitterUtils.createStream(ssc,None).window( Seconds(60) ) // do some processing ssc.start() ssc.awaitTermination() } // end main Having run this code, which has no actual processing, the HDFS checkpoint directory can be checked again. This time it is apparent that the checkpoint directory has been created, and the data has been stored: [hadoop@hc2nn stream]$ hdfs dfs -ls /data/spark/checkpoint Found 1 items drwxr-xr-x - hadoop supergroup 0 2015-07-02 13:41 /data/spark/checkpoint/0fc3d94e-6f53-40fb-910d-1eef044b12e9 This example, taken from the Apache Spark website, shows how checkpoint storage can be set up and used. But how often is checkpointing carried out? The metadata is stored during each stream batch. The actual data is stored with a period, which is the maximum of the batch interval, or ten seconds. This might not be ideal for you, so you can reset the value using the method: DStream.checkpoint( newRequiredInterval ) Where newRequiredInterval is the new checkpoint interval value that you require, generally you should aim for a value which is five to ten times your batch interval. Checkpointing saves both the stream batch and metadata (data about the data). If the application fails, then when it restarts, the checkpointed data is used when processing is started. The batch data that was being processed at the time of failure is reprocessed, along with the batched data since the failure. Remember to monitor the HDFS disk space being used for check pointing. In the next section, we will begin to examine the streaming sources, and will provide some examples of each type. Streaming sources We will not be able to cover all the stream types with practical examples in this section, but where this article is too small to include code, we will at least provide a description. In this article, we will cover the TCP and file streams, and the Flume, Kafka, and Twitter streams. We will start with a practical TCP-based example. This article examines stream processing architecture. For instance, what happens in cases where the stream data delivery rate exceeds the potential data processing rate? Systems like Kafka provide the possibility of solving this issue by providing the ability to use multiple data topics and consumers. TCP stream There is a possibility of using the Spark Streaming Context method called socketTextStream to stream data via TCP/IP, by specifying a hostname and a port number. The Scala-based code example in this section will receive data on port 10777 that was supplied using the Netcat Linux command. The code sample starts by defining the package name, and importing Spark, the context, and the streaming classes. The object class named stream2 is defined, as it is the main method with arguments: package nz.co.semtechsolutions import org.apache.spark._ import org.apache.spark.SparkContext._ import org.apache.spark.streaming._ import org.apache.spark.streaming.StreamingContext._ object stream2 { def main(args: Array[String]) { The number of arguments passed to the class is checked to ensure that it is the hostname and the port number. A Spark configuration object is created with an application name defined. The Spark and streaming contexts are then created. Then, a streaming batch time of 10 seconds is set: if ( args.length < 2 ) { System.err.println("Usage: stream2 <host> <port>") System.exit(1) } val hostname = args(0).trim val portnum = args(1).toInt val appName = "Stream example 2" val conf = new SparkConf() conf.setAppName(appName) val sc = new SparkContext(conf) val ssc = new StreamingContext(sc, Seconds(10) ) A DStream called rawDstream is created by calling the socketTextStream method of the streaming context using the host and port name parameters. val rawDstream = ssc.socketTextStream( hostname, portnum ) A top-ten word count is created from the raw stream data by splitting words by spacing. Then a (key,value) pair is created as (word,1), which is reduced by the key value, this being the word. So now, there is a list of words and their associated counts. Now, the key and value are swapped, so the list becomes (count and word). Then, a sort is done on the key, which is now the count. Finally, the top 10 items in the RDD, within the DStream, are taken and printed out: val wordCount = rawDstream .flatMap(line => line.split(" ")) .map(word => (word,1)) .reduceByKey(_+_) .map(item => item.swap) .transform(rdd => rdd.sortByKey(false)) .foreachRDD( rdd => { rdd.take(10).foreach(x=>println("List : " + x)) }) The code closes with the Spark Streaming start, and awaitTermination methods being called to start the stream processing and await process termination: ssc.start() ssc.awaitTermination() } // end main } // end stream2 The data for this application is provided, as we stated previously, by the Linux Netcat (nc) command. The Linux Cat command dumps the contents of a log file, which is piped to nc. The lk options force Netcat to listen for connections, and keep on listening if the connection is lost. This example shows that the port being used is 10777: [root@hc2nn log]# pwd /var/log [root@hc2nn log]# cat ./anaconda.storage.log | nc -lk 10777 The output from this TCP-based stream processing is shown here. The actual output is not as important as the method demonstrated. However, the data shows, as expected, a list of 10 log file words in descending count order. Note that the top word is empty because the stream was not filtered for empty words: List : (17104,) List : (2333,=) List : (1656,:) List : (1603,;) List : (1557,DEBUG) List : (564,True) List : (495,False) List : (411,None) List : (356,at) List : (335,object) This is interesting if you want to stream data using Apache Spark Streaming, based upon TCP/IP from a host and port. But what about more exotic methods? What if you wish to stream data from a messaging system, or via memory-based channels? What if you want to use some of the big data tools available today like Flume and Kafka? The next sections will examine these options, but first I will demonstrate how streams can be based upon files. Summary We could have provided streaming examples for systems like Kinesis, as well as queuing systems, but there was not room in this article. This article has provided practical examples of data recovery via checkpointing in Spark Streaming. It has also touched on the performance limitations of checkpointing and shown that that the checkpointing interval should be set at five to ten times the Spark stream batch interval. Resources for Article: Further resources on this subject: Understanding Spark RDD [article] Spark for Beginners [article] Setting up Spark [article]

0
0
19571

Packt

06 Jul 2017

9 min read

Ruby Strings

Packt

06 Jul 2017

9 min read

In this article by Jordan Hudgens, the author of the book Comprehensive Ruby Programming, you'll learn about the Ruby String data type and walk through how to integrate string data into a Ruby program. Working with words, sentences, and paragraphs are common requirements in many applications. Additionally you learn how to: Employ string manipulation techniques using core Ruby methods Demonstrate how to work with the string data type in Ruby (For more resources related to this topic, see here.) Using strings in Ruby A string is a data type in Ruby and contains set of characters, typically normal English text (or whatever natural language you're building your program for), that you would write. A key point for the syntax of strings is that they have to be enclosed in single or double quotes if you want to use them in a program. The program will throw an error if they are not wrapped inside quotation marks. Let's walk through three scenarios. Missing quotation marks In this code I tried to simply declare a string without wrapping it in quotation marks. As you can see, this results in an error. This error is because Ruby thinks that the values are classes and methods. Printing strings In this code snippet we're printing out a string that we have properly wrapped in quotation marks. Please note that both single and double quotation marks work properly. It's also important that you do not mix the quotation mark types. For example, if you attempted to run the code: puts "Name an animal' You would get an error, because you need to ensure that every quotation mark is matched with a closing (and matching) quotation mark. If you start a string with double quotation marks, the Ruby parser requires that you end the string with the matching double quotation marks. Storing strings in variables Lastly in this code snippet we're storing a string inside of a variable and then printing the value out to the console. We'll talk more about strings and string interpolation in subsequent sections. String interpolation guide for Ruby In this section, we are going to talk about string interpolation in Ruby. What is string interpolation? So what exactly is string interpolation? Good question. String interpolation is the process of being able to seamlessly integrate dynamic values into a string. Let's assume we want to slip dynamic words into a string. We can get input from the console and store that input into variables. From there we can call the variables inside of a pre-existing string. For example, let's give a sentence the ability to change based on a user's input. puts "Name an animal" animal = gets.chomp puts "Name a noun" noun= gets.chomp p "The quick brown #{animal} jumped over the lazy #{noun} " Note the way I insert variables inside the string? They are enclosed in curly brackets and are preceded by a # sign. If I run this code, this is what my output will look: So, this is how you insert values dynamically in your sentences. If you see sites like Twitter, it sometimes displays personalized messages such as: Good morning Jordan or Good evening Tiffany. This type of behavior is made possible by inserting a dynamic value in a fixed part of a string and leverages string interpolation. Now, let's use single quotes instead of double quotes, to see what happens. As you'll see, the string was printed as it is without inserting the values for animal and noun. This is exactly what happens when you try using single quotes—it prints the entire string as it is without any interpolation. Therefore it's important to remember the difference. Another interesting aspect is that anything inside the curly brackets can be a Ruby script. So, technically you can type your entire algorithm inside these curly brackets, and Ruby will run it perfectly for you. However, it is not recommended for practical programming purposes. For example, I can insert a math equation, and as you'll see it prints the value out. String manipulation guide In this section we are going to learn about string manipulation along with a number of examples of how to integrate string manipulation methods in a Ruby program. What is string manipulation? So what exactly is string manipulation? It's the process of altering the format or value of a string, usually by leveraging string methods. String manipulation code examples Let's start with an example. Let's say I want my application to always display the word Astros in capital letters. To do that, I simply write: "Astros".upcase Now if I always a string to be in lower case letters I can use the downcase method, like so: "Astros".downcase Those are both methods I use quite often. However there are other string methods available that we also have at our disposal. For the rare times when you want to literally swap the case of the letters you can leverage the swapcase method: "Astros".swapcase And lastly if you want to reverse the order of the letters in the string we can call the reverse method: "Astros".reverse These methods are built into the String data class and we can call them on any string values in Ruby. Method chaining Another neat thing we can do is join different methods together to get custom output. For example, I can run: "Astros".reverse.upcase The preceding code displays the value SORTSA. This practice of combining different methods with a dot is called method chaining. Split, strip, and join guides for strings In this section, we are going to walk through how to use the split and strip methods in Ruby. These methods will help us clean up strings and convert a string to an array so we can access each word as its own value. Using the strip method Let's start off by analyzing the strip method. Imagine that the input you get from the user or from the database is poorly formatted and contains white space before and after the value. To clean the data up we can use the strip method. For example: str = " The quick brown fox jumped over the quick dog " p str.strip When you run this code, the output is just the sentence without the white space before and after the words. Using the split method Now let's walk through the split method. The split method is a powerful tool that allows you to split a sentence into an array of words or characters. For example, when you type the following code: str = "The quick brown fox jumped over the quick dog" p str.split You'll see that it converts the sentence into an array of words. This method can be particularly useful for long paragraphs, especially when you want to know the number of words in the paragraph. Since the split method converts the string into an array, you can use all the array methods like size to see how many words were in the string. We can leverage method chaining to find out how many words are in the string, like so: str = "The quick brown fox jumped over the quick dog" p str.split.size This should return a value of 9, which is the number of words in the sentence. To know the number of letters, we can pass an optional argument to the split method and use the format: str = "The quick brown fox jumped over the quick dog" p str.split(//).size And if you want to see all of the individual letters, we can remove the size method call, like this: p str.split(//) And your output should look like this: Notice, that it also included spaces as individual characters which may or may not be what you want a program to return. This method can be quite handy while developing real-world applications. A good practical example of this method is Twitter. Since this social media site restricts users to 140 characters, this method is sure to be a part of the validation code that counts the number of characters in a Tweet. Using the join method We've walked through the split method, which allows you to convert a string into a collection of characters. Thankfully, Ruby also has a method that does the opposite, which is to allow you to convert an array of characters into a single string, and that method is called join. Let's imagine a situation where we're asked to reverse the words in a string. This is a common Ruby coding interview question, so it's an important concept to understand since it tests your knowledge of how string work in Ruby. Let's imagine that we have a string, such as: str = "backwards am I" And we're asked to reverse the words in the string. The pseudocode for the algorithm would be: Split the string into words Reverse the order of the words Merge all of the split words back into a single string We can actually accomplish each of these requirements in a single line of Ruby code. The following code snippet will perform the task: str.split.reverse.join(' ') This code will convert the single string into an array of strings, for the example it will equal ["backwards", "am", "I"]. From there it will reverse the order of the array elements, so the array will equal: ["I", "am", "backwards"]. With the words reversed, now we simply need to merge the words into a single string, which is where the join method comes in. Running the join method will convert all of the words in the array into one string. Summary In this article, we were introduced to the string data type and how it can be utilized in Ruby. We analyzed how to pass strings into Ruby processes by leveraging string interpolation. We also learned the methods of basic string manipulation and how to find and replace string data. We analyzed how to break strings into smaller components, along with how to clean up string based data. We even introduced the Array class in this article. Resources for Article: Further resources on this subject: Ruby and Metasploit Modules [article] Find closest mashup plugin with Ruby on Rails [article] Building tiny Web-applications in Ruby using Sinatra [article]

0
0
16622

Packt

06 Jul 2017

9 min read

Command-Line Tools

Packt

06 Jul 2017

9 min read

In this article by Aaron Torres, author of the book, Go Cookbook, we will cover the following recipes: Using command-line arguments Working with Unix pipes An ANSI coloring application (For more resources related to this topic, see here.) Using command-line arguments This article will expand on other uses for these arguments by constructing a command that supports nested subcommands. This will demonstrate Flagsets and also using positional arguments passed into your application. This recipe requires a main function to run. There are a number of third-party packages for dealing with complex nested arguments and flags, but we'll again investigate doing so using only the standard library. Getting ready You need to perform the following steps for the installation: Download and install Go on your operating system at https://golang.org/doc/install and configure your GOPATH. Open a terminal/console application. Navigate to your GOPATH/src and create a project directory, for example, $GOPATH/src/github.com/yourusername/customrepo. All code will be run and modified from this directory. Optionally, install the latest tested version of the code using the go get github.com/agtorre/go-cookbook/ command. How to do it... From your terminal/console application, create and navigate to the chapter2/cmdargs directory. Copy tests from https://github.com/agtorre/go-cookbook/tree/master/chapter2/cmdargs or use this as an exercise to write some of your own. Create a file called cmdargs.go with the following content: package main import ( "flag" "fmt" "os" ) const version = "1.0.0" const usage = `Usage: %s [command] Commands: Greet Version ` const greetUsage = `Usage: %s greet name [flag] Positional Arguments: name the name to greet Flags: ` // MenuConf holds all the levels // for a nested cmd line argument type MenuConf struct { Goodbye bool } // SetupMenu initializes the base flags func (m *MenuConf) SetupMenu() *flag.FlagSet { menu := flag.NewFlagSet("menu", flag.ExitOnError) menu.Usage = func() { fmt.Printf(usage, os.Args[0]) menu.PrintDefaults() } return menu } // GetSubMenu return a flag set for a submenu func (m *MenuConf) GetSubMenu() *flag.FlagSet { submenu := flag.NewFlagSet("submenu", flag.ExitOnError) submenu.BoolVar(&m.Goodbye, "goodbye", false, "Say goodbye instead of hello") submenu.Usage = func() { fmt.Printf(greetUsage, os.Args[0]) submenu.PrintDefaults() } return submenu } // Greet will be invoked by the greet command func (m *MenuConf) Greet(name string) { if m.Goodbye { fmt.Println("Goodbye " + name + "!") } else { fmt.Println("Hello " + name + "!") } } // Version prints the current version that is // stored as a const func (m *MenuConf) Version() { fmt.Println("Version: " + version) } Create a file called main.go with the following content: package main import ( "fmt" "os" "strings" ) func main() { c := MenuConf{} menu := c.SetupMenu() menu.Parse(os.Args[1:]) // we use arguments to switch between commands // flags are also an argument if len(os.Args) > 1 { // we don't care about case switch strings.ToLower(os.Args[1]) { case "version": c.Version() case "greet": f := c.GetSubMenu() if len(os.Args) < 3 { f.Usage() return } if len(os.Args) > 3 { if.Parse(os.Args[3:]) } c.Greet(os.Args[2]) default: fmt.Println("Invalid command") menu.Usage() return } } else { menu.Usage() return } } Run the go build command. Run the following commands and try a few other combinations of arguments: $./cmdargs -h Usage: ./cmdargs [command] Commands: Greet Version $./cmdargs version Version: 1.0.0 $./cmdargs greet Usage: ./cmdargs greet name [flag] Positional Arguments: name the name to greet Flags: -goodbye Say goodbye instead of hello $./cmdargs greet reader Hello reader! $./cmdargs greet reader -goodbye Goodbye reader! If you copied or wrote your own tests go up one directory and run go test, and ensure all tests pass. How it works... Flagsets can be used to set up independent lists of expected arguments, usage strings, and more. The developer is required to do validation on a number of arguments, parsing in the right subset of arguments to commands, and defining usage strings. This can be error prone and requires a lot of iteration to get it completely correct. The flag package makes parsing arguments much easier and includes convenience methods to get the number of flags, arguments, and more. This recipe demonstrates basic ways to construct a complex command-line application using arguments, including a package-level config, required positional arguments, multi-leveled command usage, and how to split these things into multiple files or packages if needed. Working with Unix pipes Unix pipes are useful when passing the output of one program to the input of another. Consider the following example: $ echo "test case" | wc -l 1 In a Go application, the left-hand side of the pipe can be read in using os.Stdin and acts like a file descriptor. To demonstrate this, this recipe will take an input on the left-hand side of a pipe and return a list of words and their number of occurrences. These words will be tokenized on white space. Getting ready Refer to the Getting Ready section of the Using command-line arguments recipe. How to do it... From your terminal/console application, create a new directory, chapter2/pipes. Navigate to that directory and copy tests from https://github.com/agtorre/go-cookbook/tree/master/chapter2/pipes or use this as an exercise to write some of your own. Create a file called pipes.go with the following content: package main import ( "bufio" "fmt" "os" ) // WordCount takes a file and returns a map // with each word as a key and it's number of // appearances as a value func WordCount(f *os.File) map[string]int { result := make(map[string]int) // make a scanner to work on the file // io.Reader interface scanner := bufio.NewScanner(f) scanner.Split(bufio.ScanWords) for scanner.Scan() { result[scanner.Text()]++ } if err := scanner.Err(); err != nil { fmt.Fprintln(os.Stderr, "reading input:", err) } return result } func main() { fmt.Printf("string: number_of_occurrencesnn") for key, value := range WordCount(os.Stdin) { fmt.Printf("%s: %dn", key, value) } } Run echo "some string" | go run pipes.go. You may also run: go build echo "some string" | ./pipes You should see the following output: $ echo "test case" | go run pipes.go string: number_of_occurrences test: 1 case: 1 $ echo "test case test" | go run pipes.go string: number_of_occurrences test: 2 case: 1 If you copied or wrote your own tests, go up one directory and run go test, and ensure that all tests pass. How it works... Working with pipes in go is pretty simple, especially if you're familiar with working with files. This recipe uses a scanner to tokenize the io.Reader interface of the os.Stdin file object. You can see how you must check for errors after completing all of the reads. An ANSI coloring application Coloring an ANSI terminal application is handled by a variety of code before and after a section of text that you want colored. This recipe will explore a basic coloring mechanism to color the text red or keep it plain. For a more complete application, take a look at https://github.com/agtorre/gocolorize, which supports many more colors and text types implements the fmt.Formatter interface for ease of printing. Getting ready Refer to the Getting Ready section of the Using command line arguments recipe. How to do it... From your terminal/console application, create and navigate to the chapter2/ansicolor directory. Copy tests from https://github.com/agtorre/go-cookbook/tree/master/chapter2/ansicolor or use this as an exercise to write some of your own. Create a file called color.go with the following content: package ansicolor import "fmt" //Color of text type Color int const ( // ColorNone is default ColorNone = iota // Red colored text Red // Green colored text Green // Yellow colored text Yellow // Blue colored text Blue // Magenta colored text Magenta // Cyan colored text Cyan // White colored text White // Black colored text Black Color = -1 ) // ColorText holds a string and its color type ColorText struct { TextColor Color Text string } func (r *ColorText) String() string { if r.TextColor == ColorNone { return r.Text } value := 30 if r.TextColor != Black { value += int(r.TextColor) } return fmt.Sprintf("33[0;%dm%s33[0m", value, r.Text) } Create a new directory named example. Navigate to example and then create a file named main.go with the following content. Ensure that you modify the ansicolor import to use the path you set up in step 1: package main import ( "fmt" "github.com/agtorre/go-cookbook/chapter2/ansicolor" ) func main() { r := ansicolor.ColorText{ansicolor.Red, "I'm red!"} fmt.Println(r.String()) r.TextColor = ansicolor.Green r.Text = "Now I'm green!" fmt.Println(r.String()) r.TextColor = ansicolor.ColorNone r.Text = "Back to normal..." fmt.Println(r.String()) } Run go run main.go. Alternatively, you may also run the following: go build ./example You should see the following with the text colored if your terminal supports the ANSI coloring format: $ go run main.go I'm red! Now I'm green! Back to normal... If you copied or wrote your own tests, go up one directory and run go test, and ensure that all the tests pass. How it works... This application makes use of a struct keyword to maintain state of the colored text. In this case, it stores the color of the text and the value of the text. The final string is rendered when you call the String() method, which will either return colored text or plain text depending on the values stored in the struct. By default, the text will be plain. Summary In this article, we demonstrated basic ways to construct a complex command-line application using arguments, including a package-level config, required positional arguments, multi-leveled command usage, and how to split these things into multiple files or packages if needed. We saw how to work with Unix pipes and explored a basic coloring mechanism to color text red or keep it plain. Resources for Article: Further resources on this subject: Building a Command-line Tool [article] A Command-line Companion Called Artisan [article] Scaffolding with the command-line tool [article]

0
0
18487

Up and Running with pandas

Machine Learning Review

Introduction to Moodle 3

Initial Configuration of SCO 2016

Introduction to IOT

Massive Graphs on Big Data

Introduction to HoloLens

HPC Cluster Computing

Queues and topics

Thread of Execution

Trending Topics

When do we use R over Python?

Android UIs with Custom Views

Spark Streaming

Ruby Strings

Command-Line Tools

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access