How-To Tutorials

article-image-creating-graph-application-python-neo4j-gephi-linkuriousjs

12 Oct 2015

13 min read

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

12 Oct 2015

I love Python, and to celebrate Packt Python week, I’ve spent some time developing an app using some of my favorite tools. The app is a graph visualization of Python and related topics, as well as showing where all our content fits in. The topics are all StackOverflow tags, related by their co-occurrence in questions on the site. The app is available to view at http://gregroberts.github.io/ and in this blog, I’m going to discuss some of the techniques I used to construct the underlying dataset, and how I turned it into an online application. Graphs, not charts Graphs are an incredibly powerful tool for analyzing and visualizing complex data. In recent years, many different graph database engines have been developed to make use of this novel manner of representing data. These databases offer many benefits over traditional, relational databases because of how the data is stored and accessed. Here at Packt, I use a Neo4j graph to store and analyze data about our business. Using the Cypher query language, it’s easy to express complicated relations between different nodes succinctly. It’s not just the technical aspect of graphs which make them appealing to work with. Seeing the connections between bits of data visualized explicitly as in a graph helps you to see the data in a different light, and make connections that you might not have spotted otherwise. This graph has many uses at Packt, from customer segmentation to product recommendations. In the next section, I describe the process I use to generate recommendations from the database. Make the connection For product recommendations, I use what’s known as a hybrid filter. This considers both content based filtering (product x and y are about the same topic) and collaborative filtering (people who bought x also bought y). Each of these methods has strengths and weaknesses, so combining them into one algorithm provides a more accurate signal. The collaborative aspect is straightforward to implement in Cypher. For a particular product, we want to find out which other product is most frequently bought alongside it. We have all our products and customers stored as nodes, and purchases are stored as edges. Thus, the Cypher query we want looks like this: MATCH (n:Product {title:’Learning Cypher’})-[r:purchased*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(n.purchased+m.purchased) AS alsoBought WHERE m<>n RETURN* ORDER BY alsoBought DESC and will very efficiently return the most commonly also purchased product. When calculating the weight, we divide by the total units sold of both titles, so we get a proportion returned. We do this so we don’t just get the titles with the most units; we’re effectively calculating the size of the intersection of the two titles’ audiences relative to their overall audience size. The content side of the algorithm looks very similar: MATCH (n:Product {title:’Learning Cypher’})-[r:is_about*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(length(n.topics)+length(m.topics)) AS alsoAbout WHERE m<>n RETURN * ORDER BY alsoAbout DESC Implicit in this algorithm is knowledge that a title is_about a topic of some kind. This could be done manually, but where’s the fun in that? In Packt’s domain there already exists a huge, well moderated corpus of technology concepts and their usage: StackOverflow. The tagging system on StackOverflow not only tells us about all the topics developers across the world are using, it also tells us how those topics are related, by looking at the co-occurrence of tags in questions. So in our graph, StackOverflow tags are nodes in their own right, which represent topics. These nodes are connected via edges, which are weighted to reflect their co-occurrence on StackOverflow: edge_weight(n,m) = (Number of questions tagged with both n & m)/(Number questions tagged with n or m) So, to find topics related to a given topic, we could execute a query like this: MATCH (n:StackOverflowTag {name:'Matplotlib'})-[r:related_to]-(m:StackOverflowTag) RETURN n.name, r.weight, m.name ORDER BY r.weight DESC LIMIT 10 Which would return the following: | n.name | r.weight | m.name ----+------------+----------+-------------------- 1 | Matplotlib | 0.065699 | Plot 2 | Matplotlib | 0.045678 | Numpy 3 | Matplotlib | 0.029667 | Pandas 4 | Matplotlib | 0.023623 | Python 5 | Matplotlib | 0.023051 | Scipy 6 | Matplotlib | 0.017413 | Histogram 7 | Matplotlib | 0.015618 | Ipython 8 | Matplotlib | 0.013761 | Matplotlib Basemap 9 | Matplotlib | 0.013207 | Python 2.7 10 | Matplotlib | 0.012982 | Legend There are many, more complex relationships you can define between topics like this, too. You can infer directionality in the relationship by looking at the local network, or you could start constructing Hyper graphs using the extensive StackExchange API. So we have our topics, but we still need to connect our content to topics. To do this, I’ve used a two stage process. Step 1 – Parsing out the topics We take all the copy (words) pertaining to a particular product as a document representing that product. This includes the title, chapter headings, and all the copy on the website. We use this because it’s already been optimized for search, and should thus carry a fair representation of what the title is about. We then parse this document and keep all the words which match the topics we’ve previously imported. #...code for fetching all the copy for all the products key_re = 'W(%s)W' % '|'.join(re.escape(i) for i in topic_keywords) for i in documents tags = re.findall(key_re, i[‘copy’]) i['tags'] = map(lambda x: tag_lookup[x],tags) Having done this for each product, we have a bag of words representing each product, where each word is a recognized topic. Step 2 – Finding the information From each of these documents, we want to know the topics which are most important for that document. To do this, we use the tf-idf algorithm. tf-idf stands for term frequency, inverse document frequency. The algorithm takes the number of times a term appears in a particular document, and divides it by the proportion of the documents that word appears in. The term frequency factor boosts terms which appear often in a document, whilst the inverse document frequency factor gets rid of terms which are overly common across the entire corpus (for example, the term ‘programming’ is common in our product copy, and whilst most of the documents ARE about programming, this doesn’t provide much discriminating information about each document). To do all of this, I use python (obviously) and the excellent scikit-learn library. Tf-idf is implemented in the class sklearn.feature_extraction.text.TfidfVectorizer. This class has lots of options you can fiddle with to get more informative results. import sklearn.feature_extraction.text as skt tagger = skt.TfidfVectorizer(input = 'content', encoding = 'utf-8', decode_error = 'replace', strip_accents = None, analyzer = lambda x: x, ngram_range = (1,1), max_df = 0.8, min_df = 0.0, norm = 'l2', sublinear_tf = False) It’s a good idea to use the min_df & max_df arguments of the constructor so as to cut out the most common/obtuse words, to get a more informative weighting. The ‘analyzer’ argument tells it how to get the words from each document, in our case, the documents are already lists of normalized words, so we don’t need anything additional done. #create vectors of all the documents vectors = tagger.fit_transform(map(lambda x: x['tags'],rows)).toarray() #get back the topic names to map to the graph t_map = tagger.get_feature_names() jobs = [] for ind, vec in enumerate(vectors): features = filter(lambda x: x[1]>0, zip(t_map,vec)) doc = documents[ind] for topic, weight in features: job = ‘’’MERGE (n:StackOverflowTag {name:’%s’}) MERGE (m:Product {id:’%s’}) CREATE UNIQUE (m)-[:is_about {source:’tf_idf’,weight:%d}]-(n) ’’’ % (topic, doc[‘id’], weight) jobs.append(job) We then execute all of the jobs using Py2neo’s Batch functionality. Having done all of this, we can now relate products to each other in terms of what topics they have in common: MATCH (n:Product {isbn10:'1783988363'})-[r:is_about]-(a)-[q:is_about]-(m:Product {isbn10:'1783289007'}) WITH a.name as topic, r.weight+q.weight AS weight RETURN topic ORDER BY weight DESC limit 6 Which returns: | topic ---+------------------ 1 | Machine Learning 2 | Image 3 | Models 4 | Algorithm 5 | Data 6 | Python Huzzah! I now have a graph into which I can throw any piece of content about programming or software, and it will fit nicely into the network of topics we’ve developed. Take a breath So, that’s how the graph came to be. To communicate with Neo4j from Python, I use the excellent py2neo module, developed by Nigel Small. This module has all sorts of handy abstractions to allow you to work with nodes and edges as native Python objects, and then update your Neo instance with any changes you’ve made. The graph I’ve spoken about is used for many purposes across the business, and has grown in size and scope significantly over the last year. For this project, I’ve taken from this graph everything relevant to Python. I started by getting all of our content which is_about Python, or about a topic related to python: titles = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag {name:'Python'}) return distinct n''')] t2 = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag)-[:related_to]-(m:StackOverflowTag {name:'Python'}) where has(n.name) return distinct n''')] titles.extend(t2) then hydrated this further by going one or two hops down each path in various directions, to get a large set of topics and content related to Python. Visualising the graph Since I started working with graphs, two visualisation tools I’ve always used are Gephi and Sigma.js. Gephi is a great solution for analysing and exploring graphical data, allowing you to apply a plethora of different layout options, find out more about the statistics of the network, and to filter and change how the graph is displayed. Sigma.js is a lightweight JavaScript library which allows you to publish beautiful graph visualizations in a browser, and it copes very well with even very large graphs. Gephi has a great plugin which allows you to export your graph straight into a web page which you can host, share and adapt. More recently, Linkurious have made it their mission to bring graph visualization to the masses. I highly advise trying the demo of their product. It really shows how much value it’s possible to get out of graph based data. Imagine if your Customer Relations team were able to do a single query to view the entire history of a case or customer, laid out as a beautiful graph, full of glyphs and annotations. Linkurious have built their product on top of Sigma.js, and they’ve made available much of the work they’ve done as the open source Linkurious.js. This is essentially Sigma.js, with a few changes to the API, and an even greater variety of plugins. On Github, each plugin has an API page in the wiki and a downloadable demo. It’s worth cloning the repository just to see the things it’s capable of! Publish It! So here’s the workflow I used to get the Python topic graph out of Neo4j and onto the web. Use Py2neo to graph the subgraph of content and topics pertinent to Python, as described above Add to this some other topics linked to the same books to give a fuller picture of the Python “world” Add in topic-topic edges and product-product edges to show the full breadth of connections observed in the data Export all the nodes and edges to csv files Import node and edge tables into Gephi. The reason I’m using Gephi as a middle step is so that I can fiddle with the visualisation in Gephi until it looks perfect. The layout plugin in Sigma is good, but this way the graph is presentable as soon as the page loads, the communities are much clearer, and I’m not putting undue strain on browsers across the world! The layout of the graph has been achieved using a number of plugins. Instead of using the pre-installed ForceAtlas layouts, I’ve used the OpenOrd layout, which I feel really shows off the communities of a large graph. There’s a really interesting and technical presentation about how this layout works here. Export the graph into gexf format, having applied some partition and ranking functions to make it more clear and appealing. Now it’s all down to Linkurious and its various plugins! You can explore the source code of the final page to see all the details, but here I’ll give an overview of the different plugins I’ve used for the different parts of the visualisation: First instantiate the graph object, pointing to a container (note the CSS of the container, without this, the graph won’t display properly: <style type="text/css"> #container { max-width: 1500px; height: 850px; margin: auto; background-color: #E5E5E5; } </style> … <div id="container"></div> … <script> s= new sigma({ container: 'container', renderer: { container: document.getElementById('container'), type: 'canvas' }, settings: { … } }); sigma.parsers.gexf - used for (trivially!) importing a gexf file into a sigma instance sigma.parsers.gexf( 'static/data/Graph1.gexf', s, function(s) { //callback executed once the data is loaded, use this to set up any aspects of the app which depend on the data }); sigma.plugins.filter - Adds the ability to very simply hide nodes/edges based on a callback function which returns a boolean. This powers the filtering widgets on the page. <input class="form-control" id="min-degree" type="range" min="0" max="0" value="0"> … function applyMinDegreeFilter(e) { var v = e.target.value; $('#min-degree-val').textContent = v; filter .undo('min-degree') .nodesBy( function(n, options) { return this.graph.degree(n.id) >= options.minDegreeVal; },{ minDegreeVal: +v }, 'min-degree' ) .apply(); }; $('#min-degree').change(applyMinDegreeFilter); sigma.plugins.locate - Adds the ability to zoom in on a single node or collection of nodes. Very useful if you’re filtering a very large initial graph function locateNode (nid) { if (nid == '') { locate.center(1); } else { locate.nodes(nid); } }; sigma.renderers.glyphs - Allows you to add custom glyphs to each node. Useful if you have many types of node. Outro This application has been a very fun little project to build. The improvements to Sigma wrought by Linkurious have resulted in an incredibly powerful toolkit to rapidly generate graph based applications with a great degree of flexibility and interaction potential. None of this would have been possible were it not for Python. Python is my right (left, I’m left handed) hand which I use for almost everything. Its versatility and expressiveness make it an incredibly robust Swiss army knife in any data-analysts toolkit.

0
0
29669

article-image-microsoft-launches-open-application-model-oam-and-dapr-to-ease-developments-in-kubernetes-and-microservices

Vincy Davis

17 Oct 2019

5 min read

Microsoft launches Open Application Model (OAM) and Dapr to ease developments in Kubernetes and microservices

Vincy Davis

17 Oct 2019

5 min read

Yesterday, Microsoft announced the launch of two new open-source projects- Open Application Model (OAM) and DAPR. OAM, developed by Microsoft and Alibaba Cloud under the Open Web Foundation, is a specification that enables the developer to define a coherent model to represent an application. The Dapr project, on the other hand, will allow developers to build portable microservice applications using any language and framework for a new or existing code. Open Application Model (OAM) In OAM, an application is made of many components like a MySQL database or a replicated PHP server with a corresponding load balancer. These components are further used to build an application, thus enabling the platform architects to utilize reusable components for the easy building of reliable applications. OAM will also empower the application developers to separate the application description from the application deployment details, allowing them to focus on the key elements of their application, instead of its operational details. Microsoft also asserted that OAM consists of unique characteristics like platform agnostic. The official blog states, “While our initial open implementation of OAM, named Rudr, is built on top of Kubernetes, the Open Application Model itself is not tightly bound to Kubernetes. It is possible to develop implementations for numerous other environments including small-device form factors, like edge deployments and elsewhere, where Kubernetes may not be the right choice. Or serverless environments where users don’t want or need the complexity of Kubernetes.” Another important feature of OAM is its design extensibility. OAM also enables the platform providers to expose the unique characteristics of their platform through the trait system which will help them to build cross-platform apps wherever the necessary traits are supported. In an interview with TechCrunch, Microsoft Azure CTO Mark Russinovich said that currently, Kubernetes is “infrastructure-focused” and does not provide any resource to build a relationship between the objects of an application. Russinovich believes that OAM will solve the problem that many developers and ops teams are facing today. Commenting on the cooperation with Alibaba Cloud on this specification, Russinovich observed that both the companies encountered the same problems when they talked to their customers and internal teams. He further said that over time, Alibaba Cloud will launch a managed service based on OAM, and chances are that Microsoft will do the same over time. The Dapr project for building microservice applications This is an alpha release of Dapr with an event-driven runtime to help developers build resilient, microservice stateless and stateful applications for the cloud and edge. It also allows the application to be built using any programming language and developer framework. “In addition, through the open source project, we welcome the community to add new building blocks and contribute new components into existing ones. Dapr is completely platformed agnostic, meaning you can run your applications locally, on any Kubernetes cluster, and other hosting environments that Dapr integrates with. This enables developers to build microservice applications that can run on both the cloud and edge with no code changes,” stated the official blog. Image Source: Microsoft APIs in Dapr are exposed as a sidecar architecture (either as a container or as a process) and does not require the application code to include any Dapr runtime code. This simplifies Dapr integration from other runtimes, as well as provides separate application logic for improved supportability. Image Source: Microsoft Building blocks of Dapr Resilient service-to-service invocation: It enables method calls, including retries, on remote services wherever they are running in the supported hosting environment. State management for key/value pairs: This allows long-running, highly available, stateful services to be easily written, alongside stateless services in the same application. Publish and subscribe messaging between services: It enables event-driven architectures to simplify horizontal scalability and makes them resilient to failure. Event-driven resource bindings: This helps in building event-driven architectures for scale and resiliency by receiving and sending events to and from any external resources such as databases, queues, file systems, blob stores, webhooks, etc. Virtual actors: This is a pattern for stateless and stateful objects that makes concurrency simple with method and state encapsulation. Dapr also provides state, life-cycle management for actor activation/deactivation and timers and reminders to wake up actors. Distributed tracing between services: It enables easy diagnose and inter-service calls in production using the W3C Trace Context standard. It also allows push events for tracing and monitoring systems. Users have liked both the opensource projects, especially Dapr. A user on Hacker News comments, “I'm excited by Dapr! If I understand it correctly, it will make it easier for me to build applications by separating the "plumbing" (stateful & handled by Dapr) from my business logic (stateless, speaks to Dapr over gRPC). If I build using event-driven patterns, my business logic can be called in response to state changes in the system as a whole. I think an example of stateful "plumbing" is a non-functional concern such as retrying a service call or a write to a queue if the initial attempt fails. Since Dapr runs next to my application as a sidecar, it's unlikely that communication failures will occur within the local node.” https://twitter.com/stroker/status/1184810311263629315 https://twitter.com/ThorstenHans/status/1184513427265523712 The new WebSocket Inspector will be released in Firefox 71 Made by Google 2019: Google’s hardware event unveils Pixel 4 and announces the launch date of Google Stadia What to expect from D programming language in the near future An unpatched security issue in the Kubernetes API is vulnerable to a “billion laughs” attack Kubernetes 1.16 releases with Endpoint Slices, general availability of Custom Resources, and other enhancements

0
0
29656

Packt

21 Sep 2015

19 min read

Finding Your Way

Packt

21 Sep 2015

19 min read

This article by Ray Barrera, the author of Unity AI Game Programming Second Edition, covers the following topics: A* Pathfinding algorithm A custom A* Pathfinding implementation (For more resources related to this topic, see here.) A* Pathfinding We'll implement the A* algorithm in a Unity environment using C#. The A* Pathfinding algorithm is widely used in games and interactive applications even though there are other algorithms, such as Dijkstra's algorithm, because of its simplicity and effectiveness. Revisiting the A* algorithm Let's review the A* algorithm again before we proceed to implement it in next section. First, we'll need to represent the map in a traversable data structure. While many structures are possible, for this example, we will use a 2D grid array. We'll implement the GridManager class later to handle this map information. Our GridManager class will keep a list of the Node objects that are basically titles in a 2D grid. So, we need to implement that Node class to handle things such as node type (whether it's a traversable node or an obstacle), cost to pass through and cost to reach the goal Node, and so on. We'll have two variables to store the nodes that have been processed and the nodes that we have to process. We'll call them closed list and open list, respectively. We'll implement that list type in the PriorityQueue class. And then finally, the following A* algorithm will be implemented in the AStar class. Let's take a look at it: We begin at the starting node and put it in the open list. As long as the open list has some nodes in it, we'll perform the following processes: Pick the first node from the open list and keep it as the current node. (This is assuming that we've sorted the open list and the first node has the least cost value, which will be mentioned at the end of the code.) Get the neighboring nodes of this current node that are not obstacle types, such as a wall or canyon that can't be passed through. For each neighbor node, check if this neighbor node is already in the closed list. If not, we'll calculate the total cost (F) for this neighbor node using the following formula: F = G + H In the preceding formula, G is the total cost from the previous node to this node and H is the total cost from this node to the final target node. Store this cost data in the neighbor node object. Also, store the current node as the parent node as well. Later, we'll use this parent node data to trace back the actual path. Put this neighbor node in the open list. Sort the open list in ascending order, ordered by the total cost to reach the target node. If there's no more neighbor nodes to process, put the current node in the closed list and remove it from the open list. Go back to step 2. Once you have completed this process your current node should be in the target goal node position, but only if there's an obstacle free path to reach the goal node from the start node. If it is not at the goal node, there's no available path to the target node from the current node position. If there's a valid path, all we have to do now is to trace back from current node's parent node until we reach the start node again. This will give us a path list of all the nodes that we chose during our pathfinding process, ordered from the target node to the start node. We then just reverse this path list since we want to know the path from the start node to the target goal node. This is a general overview of the algorithm we're going to implement in Unity using C#. So let's get started. Implementation We'll implement the preliminary classes that were mentioned before, such as the Node, GridManager, and PriorityQueue classes. Then, we'll use them in our main AStar class. Implementing the Node class The Node class will handle each tile object in our 2D grid, representing the maps shown in the Node.cs file: using UnityEngine; using System.Collections; using System; public class Node : IComparable { public float nodeTotalCost; public float estimatedCost; public bool bObstacle; public Node parent; public Vector3 position; public Node() { this.estimatedCost = 0.0f; this.nodeTotalCost = 1.0f; this.bObstacle = false; this.parent = null; } public Node(Vector3 pos) { this.estimatedCost = 0.0f; this.nodeTotalCost = 1.0f; this.bObstacle = false; this.parent = null; this.position = pos; } public void MarkAsObstacle() { this.bObstacle = true; } The Node class has properties, such as the cost values (G and H), flags to mark whether it is an obstacle, its positions, and parent node. The nodeTotalCost is G, which is the movement cost value from starting node to this node so far and the estimatedCost is H, which is total estimated cost from this node to the target goal node. We also have two simple constructor methods and a wrapper method to set whether this node is an obstacle. Then, we implement the CompareTo method as shown in the following code: public int CompareTo(object obj) { Node node = (Node)obj; //Negative value means object comes before this in the sort //order. if (this.estimatedCost < node.estimatedCost) return -1; //Positive value means object comes after this in the sort //order. if (this.estimatedCost > node.estimatedCost) return 1; return 0; } } This method is important. Our Node class inherits from IComparable because we want to override this CompareTo method. If you can recall what we discussed in the previous algorithm section, you'll notice that we need to sort our list of node arrays based on the total estimated cost. The ArrayList type has a method called Sort. This method basically looks for this CompareTo method, implemented inside the object (in this case, our Node objects) from the list. So, we implement this method to sort the node objects based on our estimatedCost value. The IComparable.CompareTo method, which is a .NET framework feature, can be found at http://msdn.microsoft.com/en-us/library/system.icomparable.compareto.aspx. Establishing the priority queue The PriorityQueue class is a short and simple class to make the handling of the nodes' ArrayList easier, as shown in the following PriorityQueue.cs class: using UnityEngine; using System.Collections; public class PriorityQueue { private ArrayList nodes = new ArrayList(); public int Length { get { return this.nodes.Count; } } public bool Contains(object node) { return this.nodes.Contains(node); } public Node First() { if (this.nodes.Count > 0) { return (Node)this.nodes[0]; } return null; } public void Push(Node node) { this.nodes.Add(node); this.nodes.Sort(); } public void Remove(Node node) { this.nodes.Remove(node); //Ensure the list is sorted this.nodes.Sort(); } } The preceding code listing should be easy to understand. One thing to notice is that after adding or removing node from the nodes' ArrayList, we call the Sort method. This will call the Node object's CompareTo method and will sort the nodes accordingly by the estimatedCost value. Setting up our grid manager The GridManager class handles all the properties of the grid, representing the map. We'll keep a singleton instance of the GridManager class as we need only one object to represent the map, as shown in the following GridManager.cs file: using UnityEngine; using System.Collections; public class GridManager : MonoBehaviour { private static GridManager s_Instance = null; public static GridManager instance { get { if (s_Instance == null) { s_Instance = FindObjectOfType(typeof(GridManager)) as GridManager; if (s_Instance == null) Debug.Log("Could not locate a GridManager " + "object. n You have to have exactly " + "one GridManager in the scene."); } return s_Instance; } } We look for the GridManager object in our scene and if found, we keep it in our s_Instance static variable: public int numOfRows; public int numOfColumns; public float gridCellSize; public bool showGrid = true; public bool showObstacleBlocks = true; private Vector3 origin = new Vector3(); private GameObject[] obstacleList; public Node[,] nodes { get; set; } public Vector3 Origin { get { return origin; } } Next, we declare all the variables; we'll need to represent our map, such as number of rows and columns, the size of each grid tile, and some Boolean variables to visualize the grid and obstacles as well as to store all the nodes present in the grid, as shown in the following code: void Awake() { obstacleList = GameObject.FindGameObjectsWithTag("Obstacle"); CalculateObstacles(); } // Find all the obstacles on the map void CalculateObstacles() { nodes = new Node[numOfColumns, numOfRows]; int index = 0; for (int i = 0; i < numOfColumns; i++) { for (int j = 0; j < numOfRows; j++) { Vector3 cellPos = GetGridCellCenter(index); Node node = new Node(cellPos); nodes[i, j] = node; index++; } } if (obstacleList != null && obstacleList.Length > 0) { //For each obstacle found on the map, record it in our list foreach (GameObject data in obstacleList) { int indexCell = GetGridIndex(data.transform.position); int col = GetColumn(indexCell); int row = GetRow(indexCell); nodes[row, col].MarkAsObstacle(); } } } We look for all the game objects with an Obstacle tag and put them in our obstacleList property. Then we set up our nodes' 2D array in the CalculateObstacles method. First, we just create the normal node objects with default properties. Just after that, we examine our obstacleList. Convert their position into row-column data and update the nodes at that index to be obstacles. The GridManager class has a couple of helper methods to traverse the grid and get the grid cell data. The following are some of them with a brief description of what they do. The implementation is simple, so we won't go into the details. The GetGridCellCenter method returns the position of the grid cell in world coordinates from the cell index, as shown in the following code: public Vector3 GetGridCellCenter(int index) { Vector3 cellPosition = GetGridCellPosition(index); cellPosition.x += (gridCellSize / 2.0f); cellPosition.z += (gridCellSize / 2.0f); return cellPosition; } public Vector3 GetGridCellPosition(int index) { int row = GetRow(index); int col = GetColumn(index); float xPosInGrid = col * gridCellSize; float zPosInGrid = row * gridCellSize; return Origin + new Vector3(xPosInGrid, 0.0f, zPosInGrid); } The GetGridIndex method returns the grid cell index in the grid from the given position: public int GetGridIndex(Vector3 pos) { if (!IsInBounds(pos)) { return -1; } pos -= Origin; int col = (int)(pos.x / gridCellSize); int row = (int)(pos.z / gridCellSize); return (row * numOfColumns + col); } public bool IsInBounds(Vector3 pos) { float width = numOfColumns * gridCellSize; float height = numOfRows* gridCellSize; return (pos.x >= Origin.x && pos.x <= Origin.x + width && pos.x <= Origin.z + height && pos.z >= Origin.z); } The GetRow and GetColumn methods return the row and column data of the grid cell from the given index: public int GetRow(int index) { int row = index / numOfColumns; return row; } public int GetColumn(int index) { int col = index % numOfColumns; return col; } Another important method is GetNeighbours, which is used by the AStar class to retrieve the neighboring nodes of a particular node: public void GetNeighbours(Node node, ArrayList neighbors) { Vector3 neighborPos = node.position; int neighborIndex = GetGridIndex(neighborPos); int row = GetRow(neighborIndex); int column = GetColumn(neighborIndex); //Bottom int leftNodeRow = row - 1; int leftNodeColumn = column; AssignNeighbour(leftNodeRow, leftNodeColumn, neighbors); //Top leftNodeRow = row + 1; leftNodeColumn = column; AssignNeighbour(leftNodeRow, leftNodeColumn, neighbors); //Right leftNodeRow = row; leftNodeColumn = column + 1; AssignNeighbour(leftNodeRow, leftNodeColumn, neighbors); //Left leftNodeRow = row; leftNodeColumn = column - 1; AssignNeighbour(leftNodeRow, leftNodeColumn, neighbors); } void AssignNeighbour(int row, int column, ArrayList neighbors) { if (row != -1 && column != -1 && row < numOfRows && column < numOfColumns) { Node nodeToAdd = nodes[row, column]; if (!nodeToAdd.bObstacle) { neighbors.Add(nodeToAdd); } } } First, we retrieve the neighboring nodes of the current node in the left, right, top, and bottom, all four directions. Then, inside the AssignNeighbour method, we check the node to see whether it's an obstacle. If it's not, we push that neighbor node to the referenced array list, neighbors. The next method is a debug aid method to visualize the grid and obstacle blocks: void OnDrawGizmos() { if (showGrid) { DebugDrawGrid(transform.position, numOfRows, numOfColumns, gridCellSize, Color.blue); } Gizmos.DrawSphere(transform.position, 0.5f); if (showObstacleBlocks) { Vector3 cellSize = new Vector3(gridCellSize, 1.0f, gridCellSize); if (obstacleList != null && obstacleList.Length > 0) { foreach (GameObject data in obstacleList) { Gizmos.DrawCube(GetGridCellCenter( GetGridIndex(data.transform.position)), cellSize); } } } } public void DebugDrawGrid(Vector3 origin, int numRows, int numCols,float cellSize, Color color) { float width = (numCols * cellSize); float height = (numRows * cellSize); // Draw the horizontal grid lines for (int i = 0; i < numRows + 1; i++) { Vector3 startPos = origin + i * cellSize * new Vector3(0.0f, 0.0f, 1.0f); Vector3 endPos = startPos + width * new Vector3(1.0f, 0.0f, 0.0f); Debug.DrawLine(startPos, endPos, color); } // Draw the vertial grid lines for (int i = 0; i < numCols + 1; i++) { Vector3 startPos = origin + i * cellSize * new Vector3(1.0f, 0.0f, 0.0f); Vector3 endPos = startPos + height * new Vector3(0.0f, 0.0f, 1.0f); Debug.DrawLine(startPos, endPos, color); } } } Gizmos can be used to draw visual debugging and setup aids inside the editor scene view. The OnDrawGizmos method is called every frame by the engine. So, if the debug flags, showGrid and showObstacleBlocks, are checked, we just draw the grid with lines and obstacle cube objects with cubes. Let's not go through the DebugDrawGrid method, which is quite simple. You can learn more about gizmos in the Unity reference documentation at http://docs.unity3d.com/Documentation/ScriptReference/Gizmos.html. Diving into our A* Implementation The AStar class is the main class that will utilize the classes we have implemented so far. You can go back to the algorithm section if you want to review this. We start with our openList and closedList declarations, which are of the PriorityQueue type, as shown in the AStar.cs file: using UnityEngine; using System.Collections; public class AStar { public static PriorityQueue closedList, openList; Next, we implement a method called HeuristicEstimateCost to calculate the cost between the two nodes. The calculation is simple. We just find the direction vector between the two by subtracting one position vector from another. The magnitude of this resultant vector gives the direct distance from the current node to the goal node: private static float HeuristicEstimateCost(Node curNode, Node goalNode) { Vector3 vecCost = curNode.position - goalNode.position; return vecCost.magnitude; } Next, we have our main FindPath method: public static ArrayList FindPath(Node start, Node goal) { openList = new PriorityQueue(); openList.Push(start); start.nodeTotalCost = 0.0f; start.estimatedCost = HeuristicEstimateCost(start, goal); closedList = new PriorityQueue(); Node node = null; We initialize our open and closed lists. Starting with the start node, we put it in our open list. Then we start processing our open list: while (openList.Length != 0) { node = openList.First(); //Check if the current node is the goal node if (node.position == goal.position) { return CalculatePath(node); } //Create an ArrayList to store the neighboring nodes ArrayList neighbours = new ArrayList(); GridManager.instance.GetNeighbours(node, neighbours); for (int i = 0; i < neighbours.Count; i++) { Node neighbourNode = (Node)neighbours[i]; if (!closedList.Contains(neighbourNode)) { float cost = HeuristicEstimateCost(node, neighbourNode); float totalCost = node.nodeTotalCost + cost; float neighbourNodeEstCost = HeuristicEstimateCost( neighbourNode, goal); neighbourNode.nodeTotalCost = totalCost; neighbourNode.parent = node; neighbourNode.estimatedCost = totalCost + neighbourNodeEstCost; if (!openList.Contains(neighbourNode)) { openList.Push(neighbourNode); } } } //Push the current node to the closed list closedList.Push(node); //and remove it from openList openList.Remove(node); } if (node.position != goal.position) { Debug.LogError("Goal Not Found"); return null; } return CalculatePath(node); } This code implementation resembles the algorithm that we have previously discussed, so you can refer back to it if you are not clear of certain things. Get the first node of our openList. Remember our openList of nodes is always sorted every time a new node is added. So, the first node is always the node with the least estimated cost to the goal node. Check whether the current node is already at the goal node. If so, exit the while loop and build the path array. Create an array list to store the neighboring nodes of the current node being processed. Use the GetNeighbours method to retrieve the neighbors from the grid. For every node in the neighbors array, we check whether it's already in closedList. If not, put it in the calculate the cost values, update the node properties with the new cost values as well as the parent node data, and put it in openList. Push the current node to closedList and remove it from openList. Go back to step 1. If there are no more nodes in openList, our current node should be at the target node if there's a valid path available. Then, we just call the CalculatePath method with the current node parameter: private static ArrayList CalculatePath(Node node) { ArrayList list = new ArrayList(); while (node != null) { list.Add(node); node = node.parent; } list.Reverse(); return list; } } The CalculatePath method traces through each node's parent node object and builds an array list. It gives an array list with nodes from the target node to the start node. Since we want a path array from the start node to the target node, we just call the Reverse method. So, this is our AStar class. We'll write a test script in the following code to test all this and then set up a scene to use them in. Implementing a TestCode class This class will use the AStar class to find the path from the start node to the goal node, as shown in the following TestCode.cs file: using UnityEngine; using System.Collections; public class TestCode : MonoBehaviour { private Transform startPos, endPos; public Node startNode { get; set; } public Node goalNode { get; set; } public ArrayList pathArray; GameObject objStartCube, objEndCube; private float elapsedTime = 0.0f; //Interval time between pathfinding public float intervalTime = 1.0f; First, we set up the variables that we'll need to reference. The pathArray is to store the nodes array returned from the AStar FindPath method: void Start () { objStartCube = GameObject.FindGameObjectWithTag("Start"); objEndCube = GameObject.FindGameObjectWithTag("End"); pathArray = new ArrayList(); FindPath(); } void Update () { elapsedTime += Time.deltaTime; if (elapsedTime >= intervalTime) { elapsedTime = 0.0f; FindPath(); } } In the Start method, we look for objects with the Start and End tags and initialize our pathArray. We'll be trying to find our new path at every interval that we set to our intervalTime property in case the positions of the start and end nodes have changed. Then, we call the FindPath method: void FindPath() { startPos = objStartCube.transform; endPos = objEndCube.transform; startNode = new Node(GridManager.instance.GetGridCellCenter( GridManager.instance.GetGridIndex(startPos.position))); goalNode = new Node(GridManager.instance.GetGridCellCenter( GridManager.instance.GetGridIndex(endPos.position))); pathArray = AStar.FindPath(startNode, goalNode); } Since we implemented our pathfinding algorithm in the AStar class, finding a path has now become a lot simpler. First, we take the positions of our start and end game objects. Then, we create new Node objects using the helper methods of GridManager and GetGridIndex to calculate their respective row and column index positions inside the grid. Once we get this, we just call the AStar.FindPath method with the start node and goal node and store the returned array list in the local pathArray property. Next, we implement the OnDrawGizmos method to draw and visualize the path found: void OnDrawGizmos() { if (pathArray == null) return; if (pathArray.Count > 0) { int index = 1; foreach (Node node in pathArray) { if (index < pathArray.Count) { Node nextNode = (Node)pathArray[index]; Debug.DrawLine(node.position, nextNode.position, Color.green); index++; } } } } } We look through our pathArray and use the Debug.DrawLine method to draw the lines connecting the nodes from the pathArray. With this, we'll be able to see a green line connecting the nodes from start to end, forming a path, when we run and test our program. Setting up our sample scene We are going to set up a scene that looks something similar to the following screenshot: A sample test scene We'll have a directional light, the start and end game objects, a few obstacle objects, a plane entity to be used as ground, and two empty game objects in which we put our GridManager and TestAStar scripts. This is our scene hierarchy: The scene Hierarchy Create a bunch of cube entities and tag them as Obstacle. We'll be looking for objects with this tag when running our pathfinding algorithm. The Obstacle node Create a cube entity and tag it as Start. The Start node Then, create another cube entity and tag it as End. The End node Now, create an empty game object and attach the GridManager script. Set the name as GridManager because we use this name to look for the GridManager object from our script. Here, we can set up the number of rows and columns for our grid as well as the size of each tile. The GridManager script Testing all the components Let's hit the play button and see our A* Pathfinding algorithm in action. By default, once you play the scene, Unity will switch to the Game view. Since our pathfinding visualization code is written for the debug drawn in the editor view, you'll need to switch back to the Scene view or enable Gizmos to see the path found. Found path one Now, try to move the start or end node around in the scene using the editor's movement gizmo (not in the Game view, but the Scene view). Found path two You should see the path updated accordingly if there's a valid path from the start node to the target goal node, dynamically in real time. You'll get an error message in the console window if there's no path available. Summary In this article, we learned how to implement our own simple A* Pathfinding system. To attain this, we firstly implemented the Node class and established the priority queue. Then, we move on to setting up the grid manager. After that, we dived in deeper by implementing a TestCode class and setting up our sample scene. Finally, we tested all the components. Resources for Article: Further resources on this subject: Saying Hello to Unity and Android[article] Enemy and Friendly AIs[article] Customizing skin with GUISkin [article]

0
0
29646

article-image-npm-at-nodejs-interactive-2018-npm-6-the-rise-and-fall-of-javascript-frameworks-and-more

Bhagyashree R

16 Oct 2018

7 min read

npm at Node+JS Interactive 2018: npm 6, the rise and fall of JavaScript frameworks, and more

Bhagyashree R

16 Oct 2018

7 min read

Last week, Laurie Voss, the co-founder and COO of npm, at the Node+JS Interactive 2018 event spoke about npm and the future of JavaScript. He discussed several development tools used within the npm community, best practices, frameworks that are on the rise, and frameworks that are dying. He found these answers with the help of 1.5 billion log events per day and the JavaScript Ecosystem Survey 2017 consisting of 16,000 JavaScript developers. This survey gives data about what JavaScript users are doing and where the community is going. Let’s see some of the key highlights of this talk: npm is secure, popular, and fast With more than 10 million users and 6 billion package downloads every week, npm has become ridiculously popular. According to GitHub Octoverse 2017, JavaScript is the top language on GitHub by opened pull request. 85% of the developers who write JavaScript are using npm which is rising rapidly to reach 100%. These developers write JavaScript applications that run on 73% of browsers, 70% of servers, 44% of mobile devices, and 6% of IoT/robotics. The stats highlight that, npm is the package manager for mainly web developers and 97% of the code in a modern web app is downloaded from npm. The current version of npm, that is, npm 6 was released in April this year. This release comes with improved performance and addresses the major concern highlighted by the JavaScript Ecosystem survey, that is, security. Here are the major improvements in npm 6: npm 6 is super fast npm is now 20% faster than npm 4, so it is time for you to upgrade! You can do that using this command: npm install npm -g According to Laurie, npm is fast, and so is yarn. In fact, all of the package managers are nearly at the same speed now. This is the result of the makers of all the package managers coming together to make a community called package.community. This community of package manager maintainers and users are focused towards making package managers better, working on compatibility, and supporting each other. npm 6 locks by default This was one of the biggest changes in npm 6, which makes sure that what you have in the development environment is exactly what you put in production. This functionality is facilitated by a new file called package-lock.json. This so-called “lock file” saves information about your node_modules/ tree since the time you last edited your dependencies. This new feature comes with a number of benefits including increased reproducibility across teams, reduced network overhead when installing, and making it easier to debug issues with dependencies. npm ci This command is similar to npm install, but is meant to be used in automated environments. It is primarily used in environments such as test platforms, continuous integration, and deployment. It can be about 2-3x faster than regular npm install by skipping certain user-oriented features. It is also stricter than a regular install, which helps in catching errors or inconsistencies caused by the incrementally-installed local environments of most npm users. Advances in npm security Two-factor authentication In order to provide strong digital security, npm now has two-factor authentication (2FA). Two-factor authentication confirms your identity using two methods: Something you know such as, your username and password Something you have such as, a phone or tablet Quick audits Quick audits tells you whether the packages you are installing are secure or not. These security warnings will be more detailed and useful in npm 6 as compared to previous versions. The talk highlighted that currently quick audits are happening a lot (about 3.5 million scans per week!). These audit shows that 11% of the packages installed by developers have critical vulnerability: Source: npm To know the vulnerabilities that exist in your app and how critical they are, run the following commands: npm audit: This automatically runs when you install a package with npm install. It submits a description of the dependencies configured in your package to your default registry and asks for a report of known vulnerabilities. npm audit fix: Run the npm audit fix subcommand to automatically install compatible updates to vulnerable dependencies. The rise and fall of frameworks After speaking about the current status of npm, Laurie moved on to explaining what npm users are doing, which frameworks they are using, and which frameworks they are no longer interested to use. Not all npm users develop JavaScript applications The interesting thing here was that, though most of the JavaScript users use npm, it is not that they are the only npm users. Developers writing applications in other languages also use npm. These languages include Java, PHP, Python, and C#, among others: Source: npm Comparing various frameworks Next, he discussed the tools developers are currently opting for. This comparison was done on the basis of a metrics called share of registry. Share of registry shows the relative popularity of a package with other packages in the registry. Frontend frameworks “No framework dies, they only fade away with time.” To prove the above statement the best example would be Backbone. As compared to 2013, Backbone’s downloads has come down rapidly and not many users are using it now. Most of the developers are maintaining old applications written in Backbone and not writing new applications with it. So, what framework are they using? 60% of the respondents of the survey are using React. Despite a huge fraction gravitating towards React, its growth is slowing down a little. Angular is also an extremely popular framework. Ember is making a comeback now after facing a rough patch during 2016-2017. The next framework is Vue, which seems to be just taking off and is probably the reason behind the slowing growth of React. Here’s the graph showing the relative download growth of the frontend frameworks: Source: npm Backend frameworks Comparing backend frameworks was fairly easy, as Express was the clear winner: Source: npm But once Express is taken out of the picture, Koa seems to be the second most popular framework. With the growing use of server-side JavaScript, there is a rapid decline in the use of Sails. While, Hapi, another backend framework, is doing very well in absolute terms, it is not growing much in relative terms. Next.js is growing but with a very low pace. The following graph shows the relative growth of these backed frameworks: Source: npm Some predictions based on the survey It would be unwise to bet against React as it has tons of users and tons of modules Angular is a safer but less interesting choice Keep an eye on Next.js If you are looking for something new to learn, go for GraphQL With 46% of npm users using TypeScript for transpiling it is surely worth your attention WASM seems promising No matter what happens to JavaScript, npm is here to stay To conclude Laurie, rightly said, no framework is here forever: “Nothing lasts forever!..Any framework that we see today will have its hay days and then it will have an after-life where it will slowly slowly degrade.” To watch the full talk, check out this YouTube video: npm and the future of JavaScript. npm v6 is out! React 16.5.0 is now out with a new package for scheduling, support for DevTools, and more! Node.js and JS Foundation announce intent to merge; developers have mixed feelings

0
0
29645

article-image-the-openjdk-transition-things-to-know-and-do

Rich Sharples

28 Feb 2019

5 min read

The OpenJDK Transition: Things to know and do

Rich Sharples

28 Feb 2019

5 min read

At this point, it should not be a surprise that a number of major changes were announced in the Java ecosystem, some of which have the potential to force a reassessment of Java roadmaps and even vendor selection for enterprise Java users. Some of the biggest changes are taking place in the Upstream OpenJDK (Open Java Development Kit), which means that users will need to have a backup plan in place for the transition. Read on to better understand exactly what the changes Oracle made to Oracle JDK are, why you should care about them and how to choose the best next steps for your organization. For some quick background, OpenJDK began as a free and open source implementation of the Java Platform, Standard Edition, through a 2006 Sun Microsystems initiative. They made the announcement at the 2006 JavaOne event that Java and the core Java platform would be open sourced. Over the next few years, major components of the Java Platform were released as free, open source software using the GPL. Other vendors - like Red Hat - then got involved with the project about a year later, through a broad contributor agreement which covers the participation of non-Sun engineers in the project. This piece is by Rich Sharples, Senior Director of Product Management at Red Hat , on what Oracle ending free lifecycle support means, and what steps users can take now to prepare for the change. So what is new in the world of Java and JDK? Why should you care? Okay, enough about the history. What are we anticipating for the future of Java and OpenJDK? At the end of January 2019, Oracle officially ended free public updates to Oracle JDK for non-Oracle customer commercial users. These users will no longer be able to get updates without an Oracle support contract. Additionally, Oracle has changed the Oracle JDK license (BCPL) so that commercial use for JDK 11 and beyond will require an Oracle subscription. They do also have a non-proprietary (GPLv2+CPE) distribution but the support policy around that is unclear at this time. This is a big deal because it means that users need to have strategies and plans in place to continue to get the support that Oracle had offered. You may be wondering whether it really is critical to run OpenJDK with support and if you can get away with running it without the support. The truth is that while the open source license and nature of OpenJDK means that you can technically run the software with no commercial support, it doesn’t mean that you necessarily should. There are too many risks associated with running OpenJDK unsupported, including numerous cases of critical vulnerabilities and security flaws. While there is nothing inherently insecure about open source software, when it is deployed without support or even a maintenance plan, it can open up your organization to threats that you may not be prepared to handle and resolve. Additionally, and probably the biggest reason why it is important to have commercial OpenJDK support, is that without a third party to help, you have to be entirely self-sufficient, meaning that you would have to ensure that you have specific engineering efforts that are permanently allocated to monitoring the upstream project and maintaining installations of OpenJDK. Not all organizations have the bandwidth or resources to be able to do this consistently. How can vendors help and what are other key technology considerations? Software vendors, including Red Hat, offer OpenJDK distributions and support to make sure that those who had been reliant on Oracle JDK have a seamless transition to take care of the above risks associated with not having OpenJDK support. It also allows customers and partners to focus on their business software, rather than needing to spend valuable engineering efforts on underlying Java runtimes. Additionally, Java SE 11 has introduced some significant new features, as well as deprecated others, and is the first significant update to the platform that will require users to think seriously about the impact of moving from it. With this, it is important to separate upgrading from Java SE 8 to Java SE 11 from moving from one vendor’s Java SE distribution to another vendor’s. In fact, moving from Java SE distributions - ones that are based in OpenJDK - without needing to change versions should be a fairly simple task. It is not recommended to change both the vendor and the version in one single step. Luckily also, the differences between Oracle JDK and OpenJDK are fairly minor and in fact, are slowly aligning to become more similar. Technology vendors can help in the transition that will inevitably come for OpenJDK - if it has not happened already. Having a proper regression test plan in place to ensure that applications run as they previously did, with a particular focus on performance and scalability is key and is something that vendors can help set up. Auditing and understanding if you need to make any code changes to applications is also a major undertaking that a third-party can help guide you on, that likely includes rulesets for the migrations. Finally, third-party vendors can help you deploy to the new OpenJDK solution and provide patches and security guidance as it comes up, to help keep the environment secure, updated and safe. While there are changes to Oracle JDK, once you find the alternative solution that is best suited for your organization, it should be fairly straightforward and cost-effective to make the transition, and third party vendors have the expertise, knowledge, and experience to help guide users through the OpenJDK transition. OpenJDK Project Valhalla is now in Phase III State of OpenJDK: Past, Present and Future with Oracle Mark Reinhold on the evolution of Java platform and OpenJDK

0
0
29638

article-image-ansible-role-patterns-and-anti-patterns-by-lee-garrett-its-debian-maintainer

Vincy Davis

16 Dec 2019

6 min read

Ansible role patterns and anti-patterns by Lee Garrett, its Debian maintainer

Vincy Davis

16 Dec 2019

6 min read

At DebConf held last year, Lee Garrett, a Debian maintainer for Ansible talked about some of the best practices in the open-source, configuration management tool. Ansible runs on Unix-like systems and configures both Unix-like and Microsoft Windows. It uses a simple syntax written in YAML, which is a human-readable data serialization language and uses SSH to connect to the node machines. Ansible is a helpful tool for creating a group of machines, describing their configuration and actions. Ansible is used to implement software provisioning, application-deployment security, compliance, and orchestration solutions. When compared to other configuration management tools like Puppet, Chef, SaltStack, etc, Ansible is very easy to setup. Garett says that due to its agentless nature, users can easily control any machine with an SSH daemon using Ansible. This will assist users in controlling any Debian installed machine using Ansible. It also supports the configuration of many things like networking equipment and Windows machines. Interested in more of Ansible? [box type="shadow" align="" class="" width=""]Get an insightful understanding of the design and development of Ansible from our book ‘Mastering Ansible’ written by James Freeman and Jesse Keating. This book will help you grasp the true power of Ansible automation engine by tackling complex, real-world actions with ease. The book also presents the fully automated Ansible playbook executions with encrypted data.[/box] What are Ansible role patterns? Ansible uses a playbook as an entry point for provisioning and defines automation through the YAML format. A playbook requires a predefined pattern to organize them and also needs other files to facilitate the sharing and reusing of provisioning. This is when a ‘role’ comes into the picture. An Ansible role which is an independent component allows the reuse of common configuration steps. It contains a set of tasks that can be used to configure a host such that it will serve a certain function like configuring a service. Roles are defined using YAML files with a predefined directory structure. A role directory structure contains directories like defaults, vars, tasks, files, templates, meta, handlers. Some tips for creating good Ansible role patterns An ideal role must have a ‘roles/<role>/task/main.yml’ format, thus specifying the name of the role, it’s tasks, and main.yml. At the beginning of each role, users are advised to check for necessary conditions like the ‘assert’ tasks to inspect if the variables are defined or not. Another prerequisite involves installing packages, using apps on CentOS machines and Yum (the default package manager tool in CentOS) or by using the git checkout. Templating of files with abstraction is another important factor where variables are defined and put into templates to create the actual config file. Garrett also points out that a template module has a validate parameter which helps the user to check if the config file has any syntax errors. The syntax error can fail the playbook even before deploying the config file. For example, he says, “use Apache with the right parameters to do a con check on the syntax of the file. So that way you never end up with a state where there's a broken configure something there.” Garrett also recommends putting sensible defaults in the ‘roles/defaults/main.yml’ layout which will make the defaults override the variables on specific cases. He further adds that a role should ideally run in the check mode. Ansible playbook has a --check which basically is “just a dry run” of a user’s complete playbook and --diff will display file or file mode changes in the playbook. Further, he adds that a variable can be defined in the default and in the Var's folder. However, the latter folder is hard to override and should be avoided, warns Garrett. What are some typical anti-patterns in Ansible? The shell and command modules are used in Ansible for executing commands on remote servers. Both modules require command names followed by a list of arguments. The shell module is used when a command is to be executed in the remote servers in a particular shell. Garrett says that new Ansible users generally end up using the shell or command module in the same way as the wget computer program. According to him, this practice is wrong, since “there's currently I think thousands of three hundred different modules in ansible so there's likely a big chance that whatever you want to do there already a module for that just did that thing.” He also asserts that these two modules have several problems as the shell module gets interrupted by the actual shells, so if the user has any special variables in the shell string and if their PlayBook is running in the check mode then the shell and the command module won't run. Another drawback of these modules is that they will always refer back to change while running a command which makes its exit value zero. This means that the user will have to probably get the output and then check if there is any standard error present in it. Next, Garrett explored some examples to show the alternatives to the shell/command module - the ‘slurp’ module. The slurp module will “slope the whole file and a 64 encoded” and will also enable access to the actual content with ‘path file.contents’. The best thing about this module is that it will never return any change and works great in the check mode. In another example, Garrett showed that when fetching a URL, the shell command ends up getting downloaded every time the playbook runs, thus throwing an error each time. This can again be avoided by using the ‘uri’ module instead of the shell module. The uri module will define the URL every time a file is to be retrieved thus helping the user to write and create a parameter. At the end of the talk, Garrett also threw light on the problems with using the set_facts module and shares its templates. Watch the full video on Youtube. You can also learn all about custom modules, plugins, and dynamic inventory sources in our book ‘Mastering Ansible’ written by James Freeman and Jesse Keating. Read More Ansible 2 for automating networking tasks on Google Cloud Platform [Tutorial] Automating OpenStack Networking and Security with Ansible 2 [Tutorial] Why choose Ansible for your automation and configuration management needs? Ten tips to successfully migrate from on-premise to Microsoft Azure Why should you consider becoming ‘AWS Developer Associate’ certified?

0
0
29631

Packt

10 Mar 2016

17 min read

Exploring HDFS

Packt

10 Mar 2016

17 min read

0
0
29624

article-image-fat-2018-conference-session-2-summary-interpretability-explainability

Savia Lobo

22 Feb 2018

5 min read

FAT* 2018 Conference Session 2 Summary: Interpretability and Explainability

Savia Lobo

22 Feb 2018

5 min read

This session of the FAT* 2018 is about interpretability and explainability in machine learning models. With the advances in Deep learning, machine learning models have become more accurate. However, with accuracy and advancements, it is a tough task to keep the models highly explainable. This means, these models may appear as black boxes to business users, who utilize them without knowing what lies within. Thus, it is equally important to make ML models interpretable and explainable, which can be beneficial and essential for understanding ML models and to have a ‘behind the scenes’ knowledge of what’s happening within them. This understanding can be highly essential for heavily regulated industries like Finance, Medicine, Defence and so on. The Conference on Fairness, Accountability, and Transparency (FAT), which would be held on the 23rd and 24th of February, 2018 is a multi-disciplinary conference that brings together researchers and practitioners interested in fairness, accountability, and transparency in socio-technical systems. The FAT 2018 conference will witness 17 research papers, 6 tutorials, and 2 keynote presentations from leading experts in the field. This article covers research papers pertaining to the 2nd session that is dedicated to Interpretability and Explainability of machine-learned decisions. If you’ve missed our summary of the 1st session on Online Discrimination and Privacy, visit the article link for a catch up. Paper 1: Meaningful Information and the Right to Explanation This paper addresses an active debate in policy, industry, academia, and the media about whether and to what extent Europe’s new General Data Protection Regulation (GDPR) grants individuals a “right to explanation” of automated decisions. The paper explores two major papers, European Union Regulations on Algorithmic Decision Making and a “Right to Explanation” by Goodman and Flaxman (2017) Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation by Wachter et al. (2017) This paper demonstrates that the specified framework is built on incorrect legal and technical assumptions. In addition to responding to the existing scholarly contributions, the article articulates a positive conception of the right to explanation, located in the text and purpose of the GDPR. The authors take a position that the right should be interpreted functionally, flexibly, and should, at a minimum, enable a data subject to exercise his or her rights under the GDPR and human rights law. Key takeaways: The first paper by Goodman and Flaxman states that GDPR creates a "right to explanation" but without any argument. The second paper is in response to the first paper, where Watcher et al. have published an extensive critique, arguing against the existence of such a right. The current paper, on the other hand, is partially concerned with responding to the arguments of Watcher et al. Paper 2: Interpretable Active Learning The paper tries to highlight how due to complex and opaque ML models, the process of active learning has also become opaque. Not much has been known about what specific trends and patterns, the active learning strategy may be exploring. The paper expands on explaining about LIME (Local Interpretable Model-agnostic Explanations framework) to provide explanations for active learning recommendations. The authors, Richard Phillips, Kyu Hyun Chang, and Sorelle A. Friedler, demonstrate uses of LIME in generating locally faithful explanations for an active learning strategy. Further, the paper shows how these explanations can be used to understand how different models and datasets explore a problem space over time. Key takeaways: The paper demonstrates how active learning choices can be made more interpretable to non-experts. It also discusses techniques that make active learning interpretable to expert labelers, so that queries and query batches can be explained and the uncertainty bias can be tracked via interpretable clusters. It showcases per-query explanations of uncertainty to develop a system that allows experts to choose whether to label a query. This will allow them to incorporate domain knowledge and their own interests into the labeling process. It introduces a quantified notion of uncertainty bias, the idea that an algorithm may be less certain about its decisions on some data clusters than others. Paper 3: Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment Actuarial risk assessments might be unduly perceived as a neutral way to counteract implicit bias and increase the fairness of decisions made within the criminal justice system, from pretrial release to sentencing, parole, and probation. However, recently, these assessments have come under increased scrutiny, as critics claim that the statistical techniques underlying them might reproduce existing patterns of discrimination and historical biases that are reflected in the data. The paper proposes that machine learning should not be used for prediction, but rather to surface covariates that are fed into a causal model for understanding the social, structural and psychological drivers of crime. The authors, Chelsea Barabas, Madars Virza, Karthik Dinakar, Joichi Ito (MIT), Jonathan Zittrain (Harvard), propose an alternative application of machine learning and causal inference away from predicting risk scores to risk mitigation. Key takeaways: The paper gives a brief overview of how risk assessments have evolved from a tool used solely for prediction to one that is diagnostic at its core. The paper places a debate around risk assessment in a broader context. One can get a fuller understanding of the way these actuarial tools have evolved to achieve a varied set of social and institutional agendas. It argues for a shift away from predictive technologies, towards diagnostic methods that will help in understanding the criminogenic effects of the criminal justice system itself, as well as evaluate the effectiveness of interventions designed to interrupt cycles of crime. It proposes that risk assessments, when viewed as a diagnostic tool, can be used to understand the underlying social, economic and psychological drivers of crime. The authors also posit that causal inference offers the best framework for pursuing the goals to achieve a fair and ethical risk assessment tool.

0
0
29623

article-image-getting-started-spring-security

Packt

14 Mar 2013

14 min read

Getting Started with Spring Security

Packt

14 Mar 2013

14 min read

0
0
29606

article-image-adding-postgis-layers-using-qgis-tutorial

Pravin Dhandre

26 Jul 2018

5 min read

Adding PostGIS layers using QGIS [Tutorial]

Pravin Dhandre

26 Jul 2018

5 min read

Viewing tables as layers is great for creating maps or for simply working on a copy of the database outside the database. In this tutorial, we will establish a connection to our PostGIS database in order to add a table as a layer in QGIS (formerly known as Quantum GIS). Please navigate to the following site to install the latest version LTR of QGIS. On this page, click on Download Now and you will be able to choose a suitable operating system and the relevant settings. QGIS is available for Android, Linux, macOS X, and Windows. You might also be inclined to click on Discover QGIS to get an overview of basic information about the program along with features, screenshots, and case studies. This QGIS tutorial is an excerpt from a book written by Mayra Zurbaran,Pedro Wightman, Paolo Corti, Stephen Mather, Thomas Kraft and Bborie Park, titled PostGIS Cookbook - Second Edition. Loading Database... To begin, create the schema for this tutorial then, download data from the U.S. Census Bureau's FTP site: The shapefile is All Lines for Cuyahoga county in Ohio, which consist of roads and streams among other line features. Extract the ZIP file to your working directory and then load it into your database using shp2pgsql. Be sure to specify the spatial reference system, EPSG/SRID: 4269. When in doubt about using projections, use the service provided by the folks at OpenGeo at the following website: http://prj2epsg.org/search Use the following command to generate the SQL to load the shapefile: shp2pgsql -s 4269 -W LATIN1 -g the_geom -I tl_2012_39035_edges.shp chp11.tl_2012_39035_edges > tl_2012_39035_edges.sql How to do it... Now it's time to give the data we downloaded a look using QGIS. We must first create a connection to the database in order to access the table. Get connected and add the table as a layer by following the ensuing steps: Click on the Add PostGIS Layers icon: Click on the New button below the Connections drop-down menu. Create a new PostGIS connection. After the Add PostGIS Table(s) window opens, create a name for the connection and fill in a few parameters for your database, including Host, Port, Database, Username, and Password: Once you have entered all of the pertinent information for your database, click on the Test Connection button to verify that the connection is successful. If the connection is not successful, double-check for typos and errors. Additionally, make sure you are attempting to connect to a PostGIS-enabled database. If the connection is successful, go ahead and check the Save Username and Save Password checkboxes. This will prevent you from having to enter your login information multiple times throughout the exercise. Click on OK at the bottom of the menu to apply the connection settings. Now you can connect! Make sure the name of your PostGIS connection appears in the drop-down menu and then click on the Connect button. If you choose not to store your username and password, you will be asked to submit this information every time you try to access the database. Once connected, all schemas within the database will be shown and the tables will be made visible by expanding the target schema. Select the table(s) to be added as a layer by simply clicking on the table name or anywhere along its row. Selection(s) will be highlighted in blue. To deselect a table, click on it a second time and it will no longer be highlighted. Select the tl_2012_39035_edges table that was downloaded at the beginning of the tutorial and click on the Add button, as shown in the following screenshot: A subset of the table can also be added as a layer. This is accomplished by double-clicking on the desired table name. The Query Builder window will open, which aids in creating simple SQL WHERE clause statements. Add the roads by selecting the records where roadflg = Y. This can be done by typing a query or using the buttons within Query Builder: Click on the OK button followed by the Add button. A subset of the table is now loaded into QGIS as a layer. The layer is strictly a static, temporary copy of your database. You can make whatever changes you like to the layer and not affect the database table. The same holds true the other way around. Changes to the table in the database will have no effect on the layer in QGIS. If needed, you can save the temporary layer in a variety of formats, such as DXF, GeoJSON, KML, or SHP. Simply right-click on the layer name in the Layers panel and click on Save As. This will then create a file, which you can recall at a later time or share with others. The following screenshot shows the Cuyahoga county road network: You may also use the QGIS Browser Panel to navigate through the now connected PostGIS database and list the schemas and tables. This panel allows you to double-click to add spatial layers to the current project, providing a better user experience not only of connected databases, but on any directory of your machine: How it works... You have added a PostGIS layer into QGIS using the built-in Add PostGIS Table GUI. This was achieved by creating a new connection and entering your database parameters. Any number of database connections can be set up simultaneously. If working with multiple databases is more common for your workflows, saving all of the connections into one XML file and would save time and energy when returning to these projects in QGIS. To explore more on 3D capabilities of PostGIS, including LiDAR point clouds, read PostGIS Cookbook - Second Edition. Using R to implement Kriging – A Spatial Interpolation technique for Geostatistics data Top 7 libraries for geospatial analysis Learning R for Geospatial Analysis

0
0
29598

article-image-gql-graph-query-language-joins-sql-as-a-global-standards-project-and-is-now-the-international-standard-declarative-query-language-for-graphs

Amrata Joshi

19 Sep 2019

6 min read

GQL (Graph Query Language) joins SQL as a Global Standards Project and will be the international standard declarative query language for graphs

Amrata Joshi

19 Sep 2019

6 min read

On Tuesday, the team at Neo4j, the graph database management system announced that the international committees behind the development of the SQL standard have voted to initiate GQL (Graph Query Language) as the new database query language. GQL is now going to be the international standard declarative query language for property graphs and it is also a Global Standards Project. GQL is developed and maintained by the same international group that maintains the SQL standard. How did the proposal for GQL pass? Last year in May, the initiative for GQL was first time processed in the GQL Manifesto. This year in June, the national standards bodies across the world from the ISO/IEC’s Joint Technical Committee 1 (responsible for IT standards) started voting on the GQL project proposal. The ballot closed earlier this week and the proposal was passed wherein ten countries including Germany, Korea, United States, UK, and China voted in favor. And seven countries agreed to put forward their experts to work on this project. Japan was the only country to vote against in the ballot because according to Japan, existing languages already do the job, and SQL/Property Graph Query extensions along with the rest of the SQL standard can do the same job. According to the Neo4j team, the GQL project will initiate development of next-generation technology standards for accessing data. Its charter mandates building on core foundations that are established by SQL and ongoing collaboration in order to ensure SQL and GQL interoperability and compatibility. GQL would reflect rapid growth in the graph database market by increasing adoption of the Cypher language. Stefan Plantikow, GQL project lead and editor of the planned GQL specification, said, “I believe now is the perfect time for the industry to come together and define the next generation graph query language standard.” Plantikow further added, “It’s great to receive formal recognition of the need for a standard language. Building upon a decade of experience with property graph querying, GQL will support native graph data types and structures, its own graph schema, a pattern-based approach to data querying, insertion and manipulation, and the ability to create new graphs, and graph views, as well as generate tabular and nested data. Our intent is to respect, evolve, and integrate key concepts from several existing languages including graph extensions to SQL.” Keith Hare, who has served as the chair of the international SQL standards committee for database languages since 2005, charted the progress toward GQL, said, “We have reached a balance of initiating GQL, the database query language of the future whilst preserving the value and ubiquity of SQL.” Hare further added, “Our committee has been heartened to see strong international community participation to usher in the GQL project. Such support is the mark of an emerging de jure and de facto standard .” The need for a graph-specific query language Researchers and vendors needed a graph-specific query language because of the following limitations: SQL/PGQ language is restricted to read-only queries SQL/PGQ cannot project new graphs The SQL/PGQ language can only access those graphs that are based on taking a graph view over SQL tables. Researchers and vendors needed a language like Cypher that would cover insertion and maintenance of data and not just data querying. But SQL wasn’t the apt model for a graph-centric language that takes graphs as query inputs and outputs a graph as a result. But GQL, on the other hand, builds in openCypher, a project that brings Cypher to Apache Spark and gives users a composable graph query language. SQL and GQL can work together According to most of the companies and national standards bodies that are supporting the GQL initiative, GQL and SQL are not competitors. Instead, these languages can complement each other via interoperation and shared foundations. Alastair Green, Query Languages Standards & Research Lead at Neo4j writes, “A SQL/PGQ query is in fact a SQL sub-query wrapped around a chunk of proto-GQL.” SQL is a language that is built around tables whereas GQL is built around graphs. Users can use GQL to find and project a graph from a graph. Green further writes, “I think that the SQL standards community has made the right decision here: allow SQL, a language built around tables, to quote GQL when the SQL user wants to find and project a table from a graph, but use GQL when the user wants to find and project a graph from a graph. Which means that we can produce and catalog graphs which are not just views over tables, but discrete complex data objects.” It is still not clear when will the first implementation version of GQL will be out. The official page reads, “The work of the GQL project starts in earnest at the next meeting of the SQL/GQL standards committee, ISO/IEC JTC 1 SC 32/WG3, in Arusha, Tanzania, later this month. It is impossible at this stage to say when the first implementable version of GQL will become available, but it is highly likely that some reasonably complete draft will have been created by the second half of 2020.” Developer community welcomes the new addition Users are excited to see how GQL will incorporate Cypher, a user commented on HackerNews, “It's been years since I've worked with the product and while I don't miss Neo4j, I do miss the query language. It's a little unclear to me how GQL will incorporate Cypher but I hope the initiative is successful if for no other reason than a selfish one: I'd love Cypher to be around if I ever wind up using a GraphDB again.” Few others mistook GQL to be Facebook’s GraphQL and are sceptical about the name. A comment on HackerNews reads, “Also, the name is of course justified, but it will be a mess to search for due to (Facebook) GraphQL.” A user commented, “I read the entire article and came away mistakenly thinking this was the same thing as GraphQL.” Another user commented, “That's quiet an unfortunate name clash with the existing GraphQL language in a similar domain.” Other interesting news in Data Media manipulation by Deepfakes and cheap fakes refquire both AI and social fixes, finds a Data & Society report Percona announces Percona Distribution for PostgreSQL to support open source databases Keras 2.3.0, the first release of multi-backend Keras with TensorFlow 2.0 support is now out

0
0
29596

article-image-introducing-weld-a-runtime-written-in-rust-and-llvm-for-cross-library-optimizations

Bhagyashree R

24 Sep 2019

5 min read

Introducing Weld, a runtime written in Rust and LLVM for cross-library optimizations

Bhagyashree R

24 Sep 2019

5 min read

Weld is an open-source Rust project for improving the performance of data-intensive applications. It is an interface and runtime that can be integrated into existing frameworks including Spark, TensorFlow, Pandas, and NumPy without changing their user-facing APIs. The motivation behind Weld Data analytics applications today often require developers to combine various functions from different libraries and frameworks to accomplish a particular task. For instance, a typical Python ecosystem application selects some data using Spark SQL, transforms it using NumPy and Pandas, and trains a model with TensorFlow. This improves developers’ productivity as they are taking advantage of functions from high-quality libraries. However, these functions are usually optimized in isolation, which is not enough to achieve the best application performance. Weld aims to solve this problem by providing an interface and runtime that can optimize across data-intensive libraries and frameworks while preserving their user-facing APIs. In an interview with Federico Carrone, a Tech Lead at LambdaClass, Weld’s main contributor, Shoumik Palkar shared, “The motivation behind Weld is to provide bare-metal performance for applications that rely on existing high-level APIs such as NumPy and Pandas. The main problem it solves is enabling cross-function and cross-library optimizations that other libraries today don’t provide.” How Weld works Weld serves as a common runtime that allows libraries from different domains like SQL and machine learning to represent their computations in a common functional intermediate representation (IR). This IR is then optimized by a compiler optimizer and JIT’d to efficient machine code for diverse parallel hardware. It performs a wide range of optimizations on the IR including loop fusion, loop tiling, and vectorization. “Weld’s IR is natively parallel, so programs expressed in it can always be trivially parallelized,” said Palkar. When Weld was first introduced it was mainly used for cross-library optimizations. However, over time people have started to use it for other applications as well. It can be used to build JITs or new physical execution engines for databases or analytics frameworks, individual libraries, target new kinds of parallel hardware using the IR, and more. To evaluate Weld’s performance the team integrated it with popular data analytics frameworks including Spark, NumPy, and TensorFlow. This prototype showed up to 30x improvements over the native framework implementations. While cross library optimizations between Pandas and NumPy also improved performance by up to two orders of magnitude. Source: Weld Why Rust and LLVM were chosen for its implementation The first iteration of Weld was implemented in Scala because of its algebraic data types, powerful pattern matching, and large ecosystem. However, it did have some shortcomings. Palkar shared in the interview, “We moved away from Scala because it was too difficult to embed a JVM-based language into other runtimes and languages.” It had a managed runtime, clunky build system, and its JIT compilations were quite slow for larger programs. Because of these shortcomings the team wanted to redesign the JIT compiler, core API, and runtime from the ground up. They were in the search for a language that was fast, safe, didn’t have a managed runtime, provided a rich standard library, functional paradigms, good package manager, and great community background. So, they zeroed-in on Rust that happens to meet all these requirements. Rust provides a very minimal, no setup required runtime. It can be easily embedded into other languages such as Java and Python. To make development easier, it has high-quality packages, known as crates, and functional paradigms such as pattern matching. Lastly, it is backed by a great Rust Community. Read also: “Rust is the future of systems programming, C is the new Assembly”: Intel principal engineer, Josh Triplett Explaining the reason why they chose LLVM, Palkar said in the interview, “We chose LLVM because its an open-source compiler framework that has wide use and support; we generate LLVM directly instead of C/C++ so we don’t need to rely on the existence of a C compiler, and because it improves compilation times (we don’t need to parse C/C++ code).” In a discussion on Hacker News many users listed other Weld-like projects that developers may find useful. A user commented, “Also worth checking out OmniSci (formerly MapD), which features an LLVM query compiler to gain large speedups executing SQL on both CPU and GPU.” Users also talked about Numba, an open-source JIT compiler that translates Python functions to optimized machine code at runtime with the help of the LLVM compiler library. “Very bizarre there is no discussion of numba here, which has been around and used widely for many years, achieves faster speedups than this, and also emits an LLVM IR that is likely a much better starting point for developing a “universal” scientific computing IR than doing yet another thing that further complicates it with fairly needless involvement of Rust,” a user added. To know more about Weld, check out the full interview on Medium. Also, watch this RustConf 2019 talk by Shoumik Palkar: https://www.youtube.com/watch?v=AZsgdCEQjFo&t Other news in Programming Darklang available in private beta GNU community announces ‘Parallel GCC’ for parallelism in real-world compilers TextMate 2.0, the text editor for macOS releases

0
0
29549

article-image-transformers-2-0-nlp-library-with-deep-interoperability-between-tensorflow-2-0-and-pytorch

Fatema Patrawala

30 Sep 2019

3 min read

Transformers 2.0: NLP library with deep interoperability between TensorFlow 2.0 and PyTorch, and 32+ pretrained models in 100+ languages

Fatema Patrawala

30 Sep 2019

3 min read

Last week, Hugging Face, a startup specializing in natural language processing, released a landmark update to their popular Transformers library, offering unprecedented compatibility between two major deep learning frameworks, PyTorch and TensorFlow 2.0. Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. Transformers 2.0 embraces the ‘best of both worlds’, combining PyTorch’s ease of use with TensorFlow’s production-grade ecosystem. The new library makes it easier for scientists and practitioners to select different frameworks for the training, evaluation and production phases of developing the same language model. “This is a lot deeper than what people usually think when they talk about compatibility,” said Thomas Wolf, who leads Hugging Face’s data science team. “It’s not only about being able to use the library separately in PyTorch and TensorFlow. We’re talking about being able to seamlessly move from one framework to the other dynamically during the life of the model.” https://twitter.com/Thom_Wolf/status/1177193003678601216 “It’s the number one feature that companies asked for since the launch of the library last year,” said Clement Delangue, CEO of Hugging Face. Notable features in Transformers 2.0 8 architectures with over 30 pretrained models, in more than 100 languages Load a model and pre-process a dataset in less than 10 lines of code Train a state-of-the-art language model in a single line with the tf.keras fit function Share pretrained models, reducing compute costs and carbon footprint Deep interoperability between TensorFlow 2.0 and PyTorch models Move a single model between TF2.0/PyTorch frameworks at will Seamlessly pick the right framework for training, evaluation, production As powerful and concise as Keras About Hugging Face Transformers With half a million installs since January 2019, Transformers is the most popular open-source NLP library. More than 1,000 companies including Bing, Apple or Stitchfix are using it in production for text classification, question-answering, intent detection, text generation or conversational. Hugging Face, the creators of Transformers, have raised US$5M so far from investors in companies like Betaworks, Salesforce, Amazon and Apple. On Hacker News, users are appreciating the company and how Transformers has become the most important library in NLP. Other interesting news in data Baidu open sources ERNIE 2.0, a continual pre-training NLP model that outperforms BERT and XLNet on 16 NLP tasks Dr Joshua Eckroth on performing Sentiment Analysis on social media platforms using CoreNLP Facebook open-sources PyText, a PyTorch based NLP modeling framework

0
0
29544

article-image-microsoft-azure-stack-architecture

Packt

07 Jun 2017

13 min read

The Microsoft Azure Stack Architecture

Packt

07 Jun 2017

13 min read

In this article by Markus Klein and Susan Roesner, authors of the book Azure Stack for Datacenters, we will help you to plan, build, run, and develop your own Azure-based datacenter running Azure Stack technology. The goal is that the technology in your datacenter will be a 100 percent consistent using Azure, which provides flexibility and elasticity to your IT infrastructure. We will learn about: Cloud basics The Microsoft Azure Stack Core management services Using Azure Stack Migrating services to Azure Stack (For more resources related to this topic, see here.) Cloud as the new IT infrastructure Regarding the technical requirements of today's IT, the cloud is always a part of the general IT strategy. It does not depend upon the region in which the company is working in, nor does it depend upon the part of the economy—99.9 percent of all companies have cloud technology already in their environment. The good question for a lot of CIOs is general: "To what extent do we allow cloud services, and what does that mean to our infrastructure?" So it's a matter of compliance, allowance, and willingness. The top 10most important questions for a CIO to prepare for the cloud are as follows: Are we allowed to save our data in the cloud? What classification of data can be saved in the cloud? How flexible are we regarding the cloud? Do we have the knowledge to work with cloud technology? How does our current IT setup and infrastructure fit into the cloud's requirements? Is our current infrastructure already prepared for the cloud? Are we already working with a cloud-ready infrastructure? Is our Internet bandwidth good enough? What does the cloud mean to my employees? Which technology should we choose? Cloud Terminology The definition of the term "cloud" is not simple, but we need to differentiate between the following: Private cloud: This is a highly dynamic IT infrastructure based on virtualization technology that is flexible and scalable. The resources are saved in a privately owned datacenter either in your company or a service provider of your choice. Public cloud: This is a shared offering of IT infrastructure services that are provided via the Internet. Hybrid cloud: This is a mixture of a private and public cloud. Depending on compliance or other security regulations, the services that could be run in a public datacenter are already deployed there, but the services that need to be stored inside the company are running there. The goal is to run these services on the same technology to provide the agility, flexibility, and scalability to move services between public and private datacenters. In general, there are some big players within the cloud market (for example, Amazon Web Services, Google, Azure, and even Alibaba). If a company is quite Microsoft-minded from the infrastructure point of view, they should have a look at Microsoft Azure datacenters. Microsoft started in 2008 with their first datacenter, and today, they invest a billion dollars every month in Azure. As of today, there are about 34 official datacenters around the world that form Microsoft Azure, besides some that Microsoft does not talk about (for example, USGovernment Azure). There are some dedicated datacenters, such as the German Azure cloud, that do not have connectivity to Azure worldwide. Due to compliance requirements, these frontiers need to exist, but the technology of each Azure datacenter is the same although the services offered may vary. The following map gives an overview of the locations (so-called regions) in Azure as of today and provide an idea of which ones will be coming soon: The Microsoft cloud story When Microsoft started their public cloud, they decided that there must be a private cloud stack too, especially, to prepare their infrastructure to run in Azure sometime in the future. The first private cloud solution was the System Center suite, with System Center Orchestrator and Service Provider Foundation (SPF) and Service Manager as the self-service portal solution. Later on, Microsoft launched Windows Azure Pack for Windows Server. Today, Windows Azure Pack is available as a product focused on the private cloud and provides a self-service portal (the well-known old Azure Portal, code name red dog frontend), and it uses the System Center suite as its underlying technology: Microsoft Azure Stack In May2015, Microsoft formally announced a new solution that brings Azure to your datacenter. This solution was named Microsoft Azure Stack. To put it in one sentence: Azure Stack is the same technology with the same APIs and portal as Public Azure, but you could run it in your datacenter or in that of your service provider. With Azure Stack, System Center is completely gone because everything is the way it is in Azure now, and in Azure, there is no System Center at all. This is what the primary focus of this article is. The following diagram gives a current overview of the technical design of Azure Stack compared with Azure: The one and only difference between Azure Stack and Azure is the cloud infrastructure. In Azure, there are thousands of servers that are part of the solution; with Azure Stack, the number is slightly smaller. That's why there is the cloud-inspired infrastructure based on Windows Server, Hyper-V, and Azure technologies as the underlying technology stack. There is no System Center product in this stack anymore. This does not mean that it cannot be there (for example, SCOM for on-premise monitoring), but Azure Stack itself provides all functionality with the solution itself. For stability and functionality, Microsoft decided to provide Azure Stack as a so-called integrated system, so it will come to your door with the hardware stack included. The customer buys Azure Stack as a complete technology stack. At general availability(GA), the hardware OEMs are HPE, DellEMC, and Lenovo. In addition to this, there will be a one-host PoC deployment available for download that could be run as a proof of concept solution on every type of hardware, as soon as it meets the hardware requirements. Technical design Looking at the technical design a bit more in depth, there are some components that we need to dive deeper into: The general basis of Azure Stack is Windows Server 2016 technology, which builds the cloud-inspired infrastructure: Storage Spaces Direct (S2D) VXLAN Nano Server Azure Resource Manager (ARM) Storage Spaces Direct (S2D) Storage Spaces and Scale-Out File Server were technologies that came with Windows Server 2012. The lack of stability in the initial versions and the issues with the underlying hardware was a bad phase. The general concept was a shared storage setup using JBODs controlled from Windows Server 2012 Storage Spaces servers and a magic Scale-Out File Server cluster that acted as the single point of contact for storage: With Windows Server 2016, the design is quite different and the concept relies on a shared-nothing model, even with local attached storage: This is the storage design Azure Stack is coming up with as one of its main pillars. VXLANnetworking technology With Windows Server 2012, Microsoft introduced Software-Defined Networking(SDN)and the NVGRE technology. Hyper-V Network Virtualization supports Network Virtualization using Generic Routing Encapsulation (NVGRE) as the mechanism to virtualize IP addresses. In NVGRE, the virtual machine's packet is encapsulated inside another packet: Hyper-V Network Virtualization supports NVGRE as the mechanism to virtualize IP addresses. In NVGRE, the virtual machine's packet is encapsulated inside another packet. VXLAN comes as the new SDNv2protocol, is RFC compliant, and is supported by most network hardware vendors by default. The Virtual eXtensible Local Area Network (VXLAN) RFC 7348 protocol has been widely adopted in the marketplace, with support from vendors such as Cisco, Brocade, Arista, Dell, and HP. The VXLAN protocol uses UDP as the transport: Nano Server Nano Server offers a minimal-footprint headless version of Windows Server 2016. It completely excludes the graphical user interface, which means that it is quite small, headless, and easy to handle regarding updates and security fixes, but it doesn't provide the GUI expected by customers of Windows Server. Azure Resource Manager (ARM) The “magical” Azure Resource Manager is a 1-1 bit share with ARM from Azure, so it has the same update frequency and features that are available in Azure, too. ARM is a consistent management layer that saves resources, dependencies, inputs, and outputs as an idempotent deployment as a JSON file called an ARM template. This template defines the tastes of a deployment, whether it be VMs, databases, websites, or anything else. The goal is that once a template is designed, it can be run on each Azure-based cloud platform, including Azure Stack. ARM provides cloud consistency with the finest granularity, and the only difference between the clouds is the region the template is being deployed to and the corresponding REST endpoints. ARM not only provides a template for a logical combination of resources within Azure, it manages subscriptions and role-based access control(RBAC) and defines the gallery, metric, and usage data, too. This means quite simply that everything that needs to be done with Azure resources should be done with ARM. Not only does Azure Resource Manager design one virtual machine, it is responsible for setting up one to a bunch of resources that fit together for a specific service. Even ARM templates can be nested; this means they can depend on each other. When working with ARM, you should know the following vocabulary: Resource: Are source is a manageable item available in Azure Resource group: A resource group is the container of resources that fit together within a service Resource provider: A resource provider is a service that can be consumed within Azure Resource manager template: A resource manager template is the definition of a specific service Declarative syntax: Declarative syntax means that the template does not define the way to set up a resource; it just defines how the result and the resource itself has the feature to set up and configure itself to fulfill the syntax to create your own ARM templates, you need to fulfill the following minimum requirements: A test editor of your choice Visual Studio Community Edition Azure SDK Visual Studio Community Edition is available for free from the Internet. After setting these things up, you could start it and define your own templates: Setting up a simple blank template looks like this: There are different ways to get a template so that you can work on and modify it to fit your needs: Visual Studio templates Quick-start templates on GitHub Azure ARM templates You could export the ARM template directly from Azure Portal if the resource has been deployed: After clicking on View template, the following opens up: For further reading on ARM basics, the Getting started with Azure Resource Managerdocument is a good place to begin: http://aka.ms/GettingStartedWithARM. PowerShell Desired State Configuration We talked about ARM and ARM templates that define resources, but they are unable to design the waya VM looks inside, specify which software needs to be installed, and how the deployment should be done. This is why we need to have a look at VMextensions.VMextensions define what should be done after ARM deployment has finished. In general, the extension could be anything that's a script. The best practice is to use PowerShell and its add-on called Desired State Configuration (DSC). DSC defines—quite similarly to ARM—how the software needs to be installed and configured. The great concept is that it also monitors whether the desired state of a virtual machine is changing (for example, because an administrator uninstalls or reconfigures a machine). If it does, it makes sure within minutes whether the original state will be fulfilled again and rolls back the actions to the desired state: Migrating services to Azure Stack If you are running virtual machines today, you're already using a cloud-based technology, although we do not call it cloud today. Basically, this is the idea of a private cloud. If you are running Azure Pack today, you are quite near Azure Stack from the processes point of view but not the technology part. There is a solution called connectors for Azure Pack that lets you have one portal UI for both cloud solutions. This means that the customer can manage everything out of the Azure Stack Portal, although services run in Azure Pack as a legacy solution. Basically, there is no real migration path within Azure Stack. But the way to solve this is quite easy, because you could use every tool that you can use to migrate services to Azure. Azure website migration assistant The Azure website migration assistant will provide a high-level readiness assessment for existing websites. This report outlines sites that are ready to move and elements that may need changes, and it highlights unsupported features. If everything is prepared properly, the tool creates any website and associated database automatically and synchronizes the content. You can learn more about it at https://azure.microsoft.com/en-us/downloads/migration-assistant/: For virtual machines, there are two tools available: Virtual Machines Readiness Assessment Virtual Machines Optimization Assessment Virtual Machines Readiness Assessment The Virtual Machines Readiness Assessment tool will automatically inspect your environment and provide you with a checklist and detailed report on steps for migrating the environment to the cloud. The download location is https://azure.microsoft.com/en-us/downloads/vm-readiness-assessment/. If you run the tool, you will get an output like this: Virtual Machines Optimization Assessment The Virtual Machine Optimization Assessment tool will at first start with a questionnaire and ask several questions about your deployment. Then, it will create an automated data collection and analysis of your Azure VMs. It generates a custom report with tenprioritized recommendations across six focus areas. These areas are security and compliance, performance and scalability, and availability and business continuity. The download location ishttps://azure.microsoft.com/en-us/downloads/vm-optimization-assessment/. Summary Azure Stack provides a real Azure experience in your datacenter. The UI, administrative tools, and even third-party solutions should work properly. The design of Azure Stack is a very small instance of Azure with some technical design modifications, especially regarding the compute, storage, and network resource providers. These modifications give you a means to start small, think big, and deploy large when migrating services directly to public Azure sometime in the future, if needed. The most important tool for planning, describing, defining, and deploying Azure Stack services is Azure Resource Manager, just like in Azure. This provides you a way to create your services just once but deploy them many times. From the business perspective, this means you have better TCO and lower administrative costs. Resources for Article: Further resources on this subject: Deploying and Synchronizing Azure Active Directory [article] What is Azure API Management? [article] Installing and Configuring Windows Azure Pack [article]

0
0
29512

article-image-responsive-web-design-wordpress

Packt

09 Jul 2015

13 min read

Responsive Web Design with WordPress

Packt

09 Jul 2015

13 min read

0
0
29501

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Microsoft launches Open Application Model (OAM) and Dapr to ease developments in Kubernetes and microservices

Finding Your Way

npm at Node+JS Interactive 2018: npm 6, the rise and fall of JavaScript frameworks, and more

The OpenJDK Transition: Things to know and do

Ansible role patterns and anti-patterns by Lee Garrett, its Debian maintainer

Exploring HDFS

FAT* 2018 Conference Session 2 Summary: Interpretability and Explainability

Getting Started with Spring Security

Adding PostGIS layers using QGIS [Tutorial]

Trending Topics

GQL (Graph Query Language) joins SQL as a Global Standards Project and will be the international standard declarative query language for graphs

Introducing Weld, a runtime written in Rust and LLVM for cross-library optimizations

Transformers 2.0: NLP library with deep interoperability between TensorFlow 2.0 and PyTorch, and 32+ pretrained models in 100+ languages

The Microsoft Azure Stack Architecture

Responsive Web Design with WordPress

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access