Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-creating-graph-application-python-neo4j-gephi-linkuriousjs
Greg Roberts
12 Oct 2015
13 min read
Save for later

Creating a graph application with Python, Neo4j, Gephi & Linkurious.js

Greg Roberts
12 Oct 2015
13 min read
I love Python, and to celebrate Packt Python week, I’ve spent some time developing an app using some of my favorite tools. The app is a graph visualization of Python and related topics, as well as showing where all our content fits in. The topics are all StackOverflow tags, related by their co-occurrence in questions on the site. The app is available to view at http://gregroberts.github.io/ and in this blog, I’m going to discuss some of the techniques I used to construct the underlying dataset, and how I turned it into an online application. Graphs, not charts Graphs are an incredibly powerful tool for analyzing and visualizing complex data. In recent years, many different graph database engines have been developed to make use of this novel manner of representing data. These databases offer many benefits over traditional, relational databases because of how the data is stored and accessed. Here at Packt, I use a Neo4j graph to store and analyze data about our business. Using the Cypher query language, it’s easy to express complicated relations between different nodes succinctly. It’s not just the technical aspect of graphs which make them appealing to work with. Seeing the connections between bits of data visualized explicitly as in a graph helps you to see the data in a different light, and make connections that you might not have spotted otherwise. This graph has many uses at Packt, from customer segmentation to product recommendations. In the next section, I describe the process I use to generate recommendations from the database. Make the connection For product recommendations, I use what’s known as a hybrid filter. This considers both content based filtering (product x and y are about the same topic) and collaborative filtering (people who bought x also bought y). Each of these methods has strengths and weaknesses, so combining them into one algorithm provides a more accurate signal. The collaborative aspect is straightforward to implement in Cypher. For a particular product, we want to find out which other product is most frequently bought alongside it. We have all our products and customers stored as nodes, and purchases are stored as edges. Thus, the Cypher query we want looks like this: MATCH (n:Product {title:’Learning Cypher’})-[r:purchased*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(n.purchased+m.purchased) AS alsoBought WHERE m<>n RETURN* ORDER BY alsoBought DESC and will very efficiently return the most commonly also purchased product. When calculating the weight, we divide by the total units sold of both titles, so we get a proportion returned. We do this so we don’t just get the titles with the most units; we’re effectively calculating the size of the intersection of the two titles’ audiences relative to their overall audience size. The content side of the algorithm looks very similar: MATCH (n:Product {title:’Learning Cypher’})-[r:is_about*2]-(m:Product) WITH m.title AS suggestion, count(distinct r)/(length(n.topics)+length(m.topics)) AS alsoAbout WHERE m<>n RETURN * ORDER BY alsoAbout DESC Implicit in this algorithm is knowledge that a title is_about a topic of some kind. This could be done manually, but where’s the fun in that? In Packt’s domain there already exists a huge, well moderated corpus of technology concepts and their usage: StackOverflow. The tagging system on StackOverflow not only tells us about all the topics developers across the world are using, it also tells us how those topics are related, by looking at the co-occurrence of tags in questions. So in our graph, StackOverflow tags are nodes in their own right, which represent topics. These nodes are connected via edges, which are weighted to reflect their co-occurrence on StackOverflow: edge_weight(n,m) = (Number of questions tagged with both n & m)/(Number questions tagged with n or m) So, to find topics related to a given topic, we could execute a query like this: MATCH (n:StackOverflowTag {name:'Matplotlib'})-[r:related_to]-(m:StackOverflowTag) RETURN n.name, r.weight, m.name ORDER BY r.weight DESC LIMIT 10 Which would return the following: | n.name | r.weight | m.name ----+------------+----------+-------------------- 1 | Matplotlib | 0.065699 | Plot 2 | Matplotlib | 0.045678 | Numpy 3 | Matplotlib | 0.029667 | Pandas 4 | Matplotlib | 0.023623 | Python 5 | Matplotlib | 0.023051 | Scipy 6 | Matplotlib | 0.017413 | Histogram 7 | Matplotlib | 0.015618 | Ipython 8 | Matplotlib | 0.013761 | Matplotlib Basemap 9 | Matplotlib | 0.013207 | Python 2.7 10 | Matplotlib | 0.012982 | Legend There are many, more complex relationships you can define between topics like this, too. You can infer directionality in the relationship by looking at the local network, or you could start constructing Hyper graphs using the extensive StackExchange API. So we have our topics, but we still need to connect our content to topics. To do this, I’ve used a two stage process. Step 1 – Parsing out the topics We take all the copy (words) pertaining to a particular product as a document representing that product. This includes the title, chapter headings, and all the copy on the website. We use this because it’s already been optimized for search, and should thus carry a fair representation of what the title is about. We then parse this document and keep all the words which match the topics we’ve previously imported. #...code for fetching all the copy for all the products key_re = 'W(%s)W' % '|'.join(re.escape(i) for i in topic_keywords) for i in documents tags = re.findall(key_re, i[‘copy’]) i['tags'] = map(lambda x: tag_lookup[x],tags) Having done this for each product, we have a bag of words representing each product, where each word is a recognized topic. Step 2 – Finding the information From each of these documents, we want to know the topics which are most important for that document. To do this, we use the tf-idf algorithm. tf-idf stands for term frequency, inverse document frequency. The algorithm takes the number of times a term appears in a particular document, and divides it by the proportion of the documents that word appears in. The term frequency factor boosts terms which appear often in a document, whilst the inverse document frequency factor gets rid of terms which are overly common across the entire corpus (for example, the term ‘programming’ is common in our product copy, and whilst most of the documents ARE about programming, this doesn’t provide much discriminating information about each document). To do all of this, I use python (obviously) and the excellent scikit-learn library. Tf-idf is implemented in the class sklearn.feature_extraction.text.TfidfVectorizer. This class has lots of options you can fiddle with to get more informative results. import sklearn.feature_extraction.text as skt tagger = skt.TfidfVectorizer(input = 'content', encoding = 'utf-8', decode_error = 'replace', strip_accents = None, analyzer = lambda x: x, ngram_range = (1,1), max_df = 0.8, min_df = 0.0, norm = 'l2', sublinear_tf = False) It’s a good idea to use the min_df & max_df arguments of the constructor so as to cut out the most common/obtuse words, to get a more informative weighting. The ‘analyzer’ argument tells it how to get the words from each document, in our case, the documents are already lists of normalized words, so we don’t need anything additional done. #create vectors of all the documents vectors = tagger.fit_transform(map(lambda x: x['tags'],rows)).toarray() #get back the topic names to map to the graph t_map = tagger.get_feature_names() jobs = [] for ind, vec in enumerate(vectors): features = filter(lambda x: x[1]>0, zip(t_map,vec)) doc = documents[ind] for topic, weight in features: job = ‘’’MERGE (n:StackOverflowTag {name:’%s’}) MERGE (m:Product {id:’%s’}) CREATE UNIQUE (m)-[:is_about {source:’tf_idf’,weight:%d}]-(n) ’’’ % (topic, doc[‘id’], weight) jobs.append(job) We then execute all of the jobs using Py2neo’s Batch functionality. Having done all of this, we can now relate products to each other in terms of what topics they have in common: MATCH (n:Product {isbn10:'1783988363'})-[r:is_about]-(a)-[q:is_about]-(m:Product {isbn10:'1783289007'}) WITH a.name as topic, r.weight+q.weight AS weight RETURN topic ORDER BY weight DESC limit 6 Which returns: | topic ---+------------------ 1 | Machine Learning 2 | Image 3 | Models 4 | Algorithm 5 | Data 6 | Python Huzzah! I now have a graph into which I can throw any piece of content about programming or software, and it will fit nicely into the network of topics we’ve developed. Take a breath So, that’s how the graph came to be. To communicate with Neo4j from Python, I use the excellent py2neo module, developed by Nigel Small. This module has all sorts of handy abstractions to allow you to work with nodes and edges as native Python objects, and then update your Neo instance with any changes you’ve made. The graph I’ve spoken about is used for many purposes across the business, and has grown in size and scope significantly over the last year. For this project, I’ve taken from this graph everything relevant to Python. I started by getting all of our content which is_about Python, or about a topic related to python: titles = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag {name:'Python'}) return distinct n''')] t2 = [i.n for i in graph.cypher.execute('''MATCH (n)-[r:is_about]-(m:StackOverflowTag)-[:related_to]-(m:StackOverflowTag {name:'Python'}) where has(n.name) return distinct n''')] titles.extend(t2) then hydrated this further by going one or two hops down each path in various directions, to get a large set of topics and content related to Python. Visualising the graph Since I started working with graphs, two visualisation tools I’ve always used are Gephi and Sigma.js. Gephi is a great solution for analysing and exploring graphical data, allowing you to apply a plethora of different layout options, find out more about the statistics of the network, and to filter and change how the graph is displayed. Sigma.js is a lightweight JavaScript library which allows you to publish beautiful graph visualizations in a browser, and it copes very well with even very large graphs. Gephi has a great plugin which allows you to export your graph straight into a web page which you can host, share and adapt. More recently, Linkurious have made it their mission to bring graph visualization to the masses. I highly advise trying the demo of their product. It really shows how much value it’s possible to get out of graph based data. Imagine if your Customer Relations team were able to do a single query to view the entire history of a case or customer, laid out as a beautiful graph, full of glyphs and annotations. Linkurious have built their product on top of Sigma.js, and they’ve made available much of the work they’ve done as the open source Linkurious.js. This is essentially Sigma.js, with a few changes to the API, and an even greater variety of plugins. On Github, each plugin has an API page in the wiki and a downloadable demo. It’s worth cloning the repository just to see the things it’s capable of! Publish It! So here’s the workflow I used to get the Python topic graph out of Neo4j and onto the web. Use Py2neo to graph the subgraph of content and topics pertinent to Python, as described above Add to this some other topics linked to the same books to give a fuller picture of the Python “world” Add in topic-topic edges and product-product edges to show the full breadth of connections observed in the data Export all the nodes and edges to csv files Import node and edge tables into Gephi. The reason I’m using Gephi as a middle step is so that I can fiddle with the visualisation in Gephi until it looks perfect. The layout plugin in Sigma is good, but this way the graph is presentable as soon as the page loads, the communities are much clearer, and I’m not putting undue strain on browsers across the world! The layout of the graph has been achieved using a number of plugins. Instead of using the pre-installed ForceAtlas layouts, I’ve used the OpenOrd layout, which I feel really shows off the communities of a large graph. There’s a really interesting and technical presentation about how this layout works here. Export the graph into gexf format, having applied some partition and ranking functions to make it more clear and appealing. Now it’s all down to Linkurious and its various plugins! You can explore the source code of the final page to see all the details, but here I’ll give an overview of the different plugins I’ve used for the different parts of the visualisation: First instantiate the graph object, pointing to a container (note the CSS of the container, without this, the graph won’t display properly: <style type="text/css"> #container { max-width: 1500px; height: 850px; margin: auto; background-color: #E5E5E5; } </style> … <div id="container"></div> … <script> s= new sigma({ container: 'container', renderer: { container: document.getElementById('container'), type: 'canvas' }, settings: { … } }); sigma.parsers.gexf - used for (trivially!) importing a gexf file into a sigma instance sigma.parsers.gexf( 'static/data/Graph1.gexf', s, function(s) { //callback executed once the data is loaded, use this to set up any aspects of the app which depend on the data }); sigma.plugins.filter - Adds the ability to very simply hide nodes/edges based on a callback function which returns a boolean. This powers the filtering widgets on the page. <input class="form-control" id="min-degree" type="range" min="0" max="0" value="0"> … function applyMinDegreeFilter(e) { var v = e.target.value; $('#min-degree-val').textContent = v; filter .undo('min-degree') .nodesBy( function(n, options) { return this.graph.degree(n.id) >= options.minDegreeVal; },{ minDegreeVal: +v }, 'min-degree' ) .apply(); }; $('#min-degree').change(applyMinDegreeFilter); sigma.plugins.locate - Adds the ability to zoom in on a single node or collection of nodes. Very useful if you’re filtering a very large initial graph function locateNode (nid) { if (nid == '') { locate.center(1); } else { locate.nodes(nid); } }; sigma.renderers.glyphs - Allows you to add custom glyphs to each node. Useful if you have many types of node. Outro This application has been a very fun little project to build. The improvements to Sigma wrought by Linkurious have resulted in an incredibly powerful toolkit to rapidly generate graph based applications with a great degree of flexibility and interaction potential. None of this would have been possible were it not for Python. Python is my right (left, I’m left handed) hand which I use for almost everything. Its versatility and expressiveness make it an incredibly robust Swiss army knife in any data-analysts toolkit.
Read more
  • 0
  • 0
  • 29669

article-image-microsoft-launches-open-application-model-oam-and-dapr-to-ease-developments-in-kubernetes-and-microservices
Vincy Davis
17 Oct 2019
5 min read
Save for later

Microsoft launches Open Application Model (OAM) and Dapr to ease developments in Kubernetes and microservices

Vincy Davis
17 Oct 2019
5 min read
Yesterday, Microsoft announced the launch of two new open-source projects- Open Application Model (OAM) and DAPR. OAM, developed by Microsoft and Alibaba Cloud under the Open Web Foundation, is a specification that enables the developer to define a coherent model to represent an application. The Dapr project, on the other hand, will allow developers to build portable microservice applications using any language and framework for a new or existing code. Open Application Model (OAM) In OAM, an application is made of many components like a MySQL database or a replicated PHP server with a corresponding load balancer. These components are further used to build an application, thus enabling the platform architects to utilize reusable components for the easy building of reliable applications. OAM will also empower the application developers to separate the application description from the application deployment details, allowing them to focus on the key elements of their application, instead of its operational details. Microsoft also asserted that OAM consists of unique characteristics like platform agnostic. The official blog states, “While our initial open implementation of OAM, named Rudr, is built on top of Kubernetes, the Open Application Model itself is not tightly bound to Kubernetes. It is possible to develop implementations for numerous other environments including small-device form factors, like edge deployments and elsewhere, where Kubernetes may not be the right choice. Or serverless environments where users don’t want or need the complexity of Kubernetes.” Another important feature of OAM is its design extensibility. OAM also enables the platform providers to expose the unique characteristics of their platform through the trait system which will help them to build cross-platform apps wherever the necessary traits are supported. In an interview with TechCrunch, Microsoft Azure CTO Mark Russinovich said that currently, Kubernetes is “infrastructure-focused” and does not provide any resource to build a relationship between the objects of an application. Russinovich believes that OAM will solve the problem that many developers and ops teams are facing today. Commenting on the cooperation with Alibaba Cloud on this specification, Russinovich observed that both the companies encountered the same problems when they talked to their customers and internal teams. He further said that over time, Alibaba Cloud will launch a managed service based on OAM, and chances are that Microsoft will do the same over time. The Dapr project for building microservice applications This is an alpha release of Dapr with an event-driven runtime to help developers build resilient, microservice stateless and stateful applications for the cloud and edge. It also allows the application to be built using any programming language and developer framework. “In addition, through the open source project, we welcome the community to add new building blocks and contribute new components into existing ones. Dapr is completely platformed agnostic, meaning you can run your applications locally, on any Kubernetes cluster, and other hosting environments that Dapr integrates with. This enables developers to build microservice applications that can run on both the cloud and edge with no code changes,” stated the official blog. Image Source: Microsoft APIs in Dapr are exposed as a sidecar architecture (either as a container or as a process) and does not require the application code to include any Dapr runtime code. This simplifies Dapr integration from other runtimes, as well as provides separate application logic for improved supportability. Image Source: Microsoft Building blocks of Dapr Resilient service-to-service invocation: It enables method calls, including retries, on remote services wherever they are running in the supported hosting environment. State management for key/value pairs: This allows long-running, highly available, stateful services to be easily written, alongside stateless services in the same application. Publish and subscribe messaging between services: It enables event-driven architectures to simplify horizontal scalability and makes them resilient to failure. Event-driven resource bindings: This helps in building event-driven architectures for scale and resiliency by receiving and sending events to and from any external resources such as databases, queues, file systems, blob stores, webhooks, etc. Virtual actors: This is a pattern for stateless and stateful objects that makes concurrency simple with method and state encapsulation. Dapr also provides state, life-cycle management for actor activation/deactivation and timers and reminders to wake up actors. Distributed tracing between services: It enables easy diagnose and inter-service calls in production using the W3C Trace Context standard. It also allows push events for tracing and monitoring systems. Users have liked both the opensource projects, especially Dapr. A user on Hacker News comments, “I'm excited by Dapr! If I understand it correctly, it will make it easier for me to build applications by separating the "plumbing" (stateful & handled by Dapr) from my business logic (stateless, speaks to Dapr over gRPC). If I build using event-driven patterns, my business logic can be called in response to state changes in the system as a whole. I think an example of stateful "plumbing" is a non-functional concern such as retrying a service call or a write to a queue if the initial attempt fails. Since Dapr runs next to my application as a sidecar, it's unlikely that communication failures will occur within the local node.” https://twitter.com/stroker/status/1184810311263629315 https://twitter.com/ThorstenHans/status/1184513427265523712 The new WebSocket Inspector will be released in Firefox 71 Made by Google 2019: Google’s hardware event unveils Pixel 4 and announces the launch date of Google Stadia What to expect from D programming language in the near future An unpatched security issue in the Kubernetes API is vulnerable to a “billion laughs” attack Kubernetes 1.16 releases with Endpoint Slices, general availability of Custom Resources, and other enhancements
Read more
  • 0
  • 0
  • 29656

article-image-finding-your-way
Packt
21 Sep 2015
19 min read
Save for later

Finding Your Way

Packt
21 Sep 2015
19 min read
 This article by Ray Barrera, the author of Unity AI Game Programming Second Edition, covers the following topics: A* Pathfinding algorithm A custom A* Pathfinding implementation (For more resources related to this topic, see here.) A* Pathfinding We'll implement the A* algorithm in a Unity environment using C#. The A* Pathfinding algorithm is widely used in games and interactive applications even though there are other algorithms, such as Dijkstra's algorithm, because of its simplicity and effectiveness. Revisiting the A* algorithm Let's review the A* algorithm again before we proceed to implement it in next section. First, we'll need to represent the map in a traversable data structure. While many structures are possible, for this example, we will use a 2D grid array. We'll implement the GridManager class later to handle this map information. Our GridManager class will keep a list of the Node objects that are basically titles in a 2D grid. So, we need to implement that Node class to handle things such as node type (whether it's a traversable node or an obstacle), cost to pass through and cost to reach the goal Node, and so on. We'll have two variables to store the nodes that have been processed and the nodes that we have to process. We'll call them closed list and open list, respectively. We'll implement that list type in the PriorityQueue class. And then finally, the following A* algorithm will be implemented in the AStar class. Let's take a look at it: We begin at the starting node and put it in the open list. As long as the open list has some nodes in it, we'll perform the following processes: Pick the first node from the open list and keep it as the current node. (This is assuming that we've sorted the open list and the first node has the least cost value, which will be mentioned at the end of the code.) Get the neighboring nodes of this current node that are not obstacle types, such as a wall or canyon that can't be passed through. For each neighbor node, check if this neighbor node is already in the closed list. If not, we'll calculate the total cost (F) for this neighbor node using the following formula: F = G + H In the preceding formula, G is the total cost from the previous node to this node and H is the total cost from this node to the final target node. Store this cost data in the neighbor node object. Also, store the current node as the parent node as well. Later, we'll use this parent node data to trace back the actual path. Put this neighbor node in the open list. Sort the open list in ascending order, ordered by the total cost to reach the target node. If there's no more neighbor nodes to process, put the current node in the closed list and remove it from the open list. Go back to step 2. Once you have completed this process your current node should be in the target goal node position, but only if there's an obstacle free path to reach the goal node from the start node. If it is not at the goal node, there's no available path to the target node from the current node position. If there's a valid path, all we have to do now is to trace back from current node's parent node until we reach the start node again. This will give us a path list of all the nodes that we chose during our pathfinding process, ordered from the target node to the start node. We then just reverse this path list since we want to know the path from the start node to the target goal node. This is a general overview of the algorithm we're going to implement in Unity using C#. So let's get started. Implementation We'll implement the preliminary classes that were mentioned before, such as the Node, GridManager, and PriorityQueue classes. Then, we'll use them in our main AStar class. Implementing the Node class The Node class will handle each tile object in our 2D grid, representing the maps shown in the Node.cs file: using UnityEngine; using System.Collections; using System; public class Node : IComparable { public float nodeTotalCost; public float estimatedCost; public bool bObstacle; public Node parent; public Vector3 position; public Node() { this.estimatedCost = 0.0f; this.nodeTotalCost = 1.0f; this.bObstacle = false; this.parent = null; } public Node(Vector3 pos) { this.estimatedCost = 0.0f; this.nodeTotalCost = 1.0f; this.bObstacle = false; this.parent = null; this.position = pos; } public void MarkAsObstacle() { this.bObstacle = true; } The Node class has properties, such as the cost values (G and H), flags to mark whether it is an obstacle, its positions, and parent node. The nodeTotalCost is G, which is the movement cost value from starting node to this node so far and the estimatedCost is H, which is total estimated cost from this node to the target goal node. We also have two simple constructor methods and a wrapper method to set whether this node is an obstacle. Then, we implement the CompareTo method as shown in the following code: public int CompareTo(object obj) { Node node = (Node)obj; //Negative value means object comes before this in the sort //order. if (this.estimatedCost < node.estimatedCost) return -1; //Positive value means object comes after this in the sort //order. if (this.estimatedCost > node.estimatedCost) return 1; return 0; } } This method is important. Our Node class inherits from IComparable because we want to override this CompareTo method. If you can recall what we discussed in the previous algorithm section, you'll notice that we need to sort our list of node arrays based on the total estimated cost. The ArrayList type has a method called Sort. This method basically looks for this CompareTo method, implemented inside the object (in this case, our Node objects) from the list. So, we implement this method to sort the node objects based on our estimatedCost value. The IComparable.CompareTo method, which is a .NET framework feature, can be found at http://msdn.microsoft.com/en-us/library/system.icomparable.compareto.aspx. Establishing the priority queue The PriorityQueue class is a short and simple class to make the handling of the nodes' ArrayList easier, as shown in the following PriorityQueue.cs class: using UnityEngine; using System.Collections; public class PriorityQueue { private ArrayList nodes = new ArrayList(); public int Length { get { return this.nodes.Count; } } public bool Contains(object node) { return this.nodes.Contains(node); } public Node First() { if (this.nodes.Count > 0) { return (Node)this.nodes[0]; } return null; } public void Push(Node node) { this.nodes.Add(node); this.nodes.Sort(); } public void Remove(Node node) { this.nodes.Remove(node); //Ensure the list is sorted this.nodes.Sort(); } } The preceding code listing should be easy to understand. One thing to notice is that after adding or removing node from the nodes' ArrayList, we call the Sort method. This will call the Node object's CompareTo method and will sort the nodes accordingly by the estimatedCost value. Setting up our grid manager The GridManager class handles all the properties of the grid, representing the map. We'll keep a singleton instance of the GridManager class as we need only one object to represent the map, as shown in the following GridManager.cs file: using UnityEngine; using System.Collections; public class GridManager : MonoBehaviour { private static GridManager s_Instance = null; public static GridManager instance { get { if (s_Instance == null) { s_Instance = FindObjectOfType(typeof(GridManager)) as GridManager; if (s_Instance == null) Debug.Log("Could not locate a GridManager " + "object. n You have to have exactly " + "one GridManager in the scene."); } return s_Instance; } } We look for the GridManager object in our scene and if found, we keep it in our s_Instance static variable: public int numOfRows; public int numOfColumns; public float gridCellSize; public bool showGrid = true; public bool showObstacleBlocks = true; private Vector3 origin = new Vector3(); private GameObject[] obstacleList; public Node[,] nodes { get; set; } public Vector3 Origin { get { return origin; } } Next, we declare all the variables; we'll need to represent our map, such as number of rows and columns, the size of each grid tile, and some Boolean variables to visualize the grid and obstacles as well as to store all the nodes present in the grid, as shown in the following code: void Awake() { obstacleList = GameObject.FindGameObjectsWithTag("Obstacle"); CalculateObstacles(); } // Find all the obstacles on the map void CalculateObstacles() { nodes = new Node[numOfColumns, numOfRows]; int index = 0; for (int i = 0; i < numOfColumns; i++) { for (int j = 0; j < numOfRows; j++) { Vector3 cellPos = GetGridCellCenter(index); Node node = new Node(cellPos); nodes[i, j] = node; index++; } } if (obstacleList != null && obstacleList.Length > 0) { //For each obstacle found on the map, record it in our list foreach (GameObject data in obstacleList) { int indexCell = GetGridIndex(data.transform.position); int col = GetColumn(indexCell); int row = GetRow(indexCell); nodes[row, col].MarkAsObstacle(); } } } We look for all the game objects with an Obstacle tag and put them in our obstacleList property. Then we set up our nodes' 2D array in the CalculateObstacles method. First, we just create the normal node objects with default properties. Just after that, we examine our obstacleList. Convert their position into row-column data and update the nodes at that index to be obstacles. The GridManager class has a couple of helper methods to traverse the grid and get the grid cell data. The following are some of them with a brief description of what they do. The implementation is simple, so we won't go into the details. The GetGridCellCenter method returns the position of the grid cell in world coordinates from the cell index, as shown in the following code: public Vector3 GetGridCellCenter(int index) { Vector3 cellPosition = GetGridCellPosition(index); cellPosition.x += (gridCellSize / 2.0f); cellPosition.z += (gridCellSize / 2.0f); return cellPosition; } public Vector3 GetGridCellPosition(int index) { int row = GetRow(index); int col = GetColumn(index); float xPosInGrid = col * gridCellSize; float zPosInGrid = row * gridCellSize; return Origin + new Vector3(xPosInGrid, 0.0f, zPosInGrid); } The GetGridIndex method returns the grid cell index in the grid from the given position: public int GetGridIndex(Vector3 pos) { if (!IsInBounds(pos)) { return -1; } pos -= Origin; int col = (int)(pos.x / gridCellSize); int row = (int)(pos.z / gridCellSize); return (row * numOfColumns + col); } public bool IsInBounds(Vector3 pos) { float width = numOfColumns * gridCellSize; float height = numOfRows* gridCellSize; return (pos.x >= Origin.x && pos.x <= Origin.x + width && pos.x <= Origin.z + height && pos.z >= Origin.z); } The GetRow and GetColumn methods return the row and column data of the grid cell from the given index: public int GetRow(int index) { int row = index / numOfColumns; return row; } public int GetColumn(int index) { int col = index % numOfColumns; return col; } Another important method is GetNeighbours, which is used by the AStar class to retrieve the neighboring nodes of a particular node: public void GetNeighbours(Node node, ArrayList neighbors) { Vector3 neighborPos = node.position; int neighborIndex = GetGridIndex(neighborPos); int row = GetRow(neighborIndex); int column = GetColumn(neighborIndex); //Bottom int leftNodeRow = row - 1; int leftNodeColumn = column; AssignNeighbour(leftNodeRow, leftNodeColumn, neighbors); //Top leftNodeRow = row + 1; leftNodeColumn = column; AssignNeighbour(leftNodeRow, leftNodeColumn, neighbors); //Right leftNodeRow = row; leftNodeColumn = column + 1; AssignNeighbour(leftNodeRow, leftNodeColumn, neighbors); //Left leftNodeRow = row; leftNodeColumn = column - 1; AssignNeighbour(leftNodeRow, leftNodeColumn, neighbors); } void AssignNeighbour(int row, int column, ArrayList neighbors) { if (row != -1 && column != -1 && row < numOfRows && column < numOfColumns) { Node nodeToAdd = nodes[row, column]; if (!nodeToAdd.bObstacle) { neighbors.Add(nodeToAdd); } } } First, we retrieve the neighboring nodes of the current node in the left, right, top, and bottom, all four directions. Then, inside the AssignNeighbour method, we check the node to see whether it's an obstacle. If it's not, we push that neighbor node to the referenced array list, neighbors. The next method is a debug aid method to visualize the grid and obstacle blocks: void OnDrawGizmos() { if (showGrid) { DebugDrawGrid(transform.position, numOfRows, numOfColumns, gridCellSize, Color.blue); } Gizmos.DrawSphere(transform.position, 0.5f); if (showObstacleBlocks) { Vector3 cellSize = new Vector3(gridCellSize, 1.0f, gridCellSize); if (obstacleList != null && obstacleList.Length > 0) { foreach (GameObject data in obstacleList) { Gizmos.DrawCube(GetGridCellCenter( GetGridIndex(data.transform.position)), cellSize); } } } } public void DebugDrawGrid(Vector3 origin, int numRows, int numCols,float cellSize, Color color) { float width = (numCols * cellSize); float height = (numRows * cellSize); // Draw the horizontal grid lines for (int i = 0; i < numRows + 1; i++) { Vector3 startPos = origin + i * cellSize * new Vector3(0.0f, 0.0f, 1.0f); Vector3 endPos = startPos + width * new Vector3(1.0f, 0.0f, 0.0f); Debug.DrawLine(startPos, endPos, color); } // Draw the vertial grid lines for (int i = 0; i < numCols + 1; i++) { Vector3 startPos = origin + i * cellSize * new Vector3(1.0f, 0.0f, 0.0f); Vector3 endPos = startPos + height * new Vector3(0.0f, 0.0f, 1.0f); Debug.DrawLine(startPos, endPos, color); } } } Gizmos can be used to draw visual debugging and setup aids inside the editor scene view. The OnDrawGizmos method is called every frame by the engine. So, if the debug flags, showGrid and showObstacleBlocks, are checked, we just draw the grid with lines and obstacle cube objects with cubes. Let's not go through the DebugDrawGrid method, which is quite simple. You can learn more about gizmos in the Unity reference documentation at http://docs.unity3d.com/Documentation/ScriptReference/Gizmos.html. Diving into our A* Implementation The AStar class is the main class that will utilize the classes we have implemented so far. You can go back to the algorithm section if you want to review this. We start with our openList and closedList declarations, which are of the PriorityQueue type, as shown in the AStar.cs file: using UnityEngine; using System.Collections; public class AStar { public static PriorityQueue closedList, openList; Next, we implement a method called HeuristicEstimateCost to calculate the cost between the two nodes. The calculation is simple. We just find the direction vector between the two by subtracting one position vector from another. The magnitude of this resultant vector gives the direct distance from the current node to the goal node: private static float HeuristicEstimateCost(Node curNode, Node goalNode) { Vector3 vecCost = curNode.position - goalNode.position; return vecCost.magnitude; } Next, we have our main FindPath method: public static ArrayList FindPath(Node start, Node goal) { openList = new PriorityQueue(); openList.Push(start); start.nodeTotalCost = 0.0f; start.estimatedCost = HeuristicEstimateCost(start, goal); closedList = new PriorityQueue(); Node node = null; We initialize our open and closed lists. Starting with the start node, we put it in our open list. Then we start processing our open list: while (openList.Length != 0) { node = openList.First(); //Check if the current node is the goal node if (node.position == goal.position) { return CalculatePath(node); } //Create an ArrayList to store the neighboring nodes ArrayList neighbours = new ArrayList(); GridManager.instance.GetNeighbours(node, neighbours); for (int i = 0; i < neighbours.Count; i++) { Node neighbourNode = (Node)neighbours[i]; if (!closedList.Contains(neighbourNode)) { float cost = HeuristicEstimateCost(node, neighbourNode); float totalCost = node.nodeTotalCost + cost; float neighbourNodeEstCost = HeuristicEstimateCost( neighbourNode, goal); neighbourNode.nodeTotalCost = totalCost; neighbourNode.parent = node; neighbourNode.estimatedCost = totalCost + neighbourNodeEstCost; if (!openList.Contains(neighbourNode)) { openList.Push(neighbourNode); } } } //Push the current node to the closed list closedList.Push(node); //and remove it from openList openList.Remove(node); } if (node.position != goal.position) { Debug.LogError("Goal Not Found"); return null; } return CalculatePath(node); } This code implementation resembles the algorithm that we have previously discussed, so you can refer back to it if you are not clear of certain things. Get the first node of our openList. Remember our openList of nodes is always sorted every time a new node is added. So, the first node is always the node with the least estimated cost to the goal node. Check whether the current node is already at the goal node. If so, exit the while loop and build the path array. Create an array list to store the neighboring nodes of the current node being processed. Use the GetNeighbours method to retrieve the neighbors from the grid. For every node in the neighbors array, we check whether it's already in closedList. If not, put it in the calculate the cost values, update the node properties with the new cost values as well as the parent node data, and put it in openList. Push the current node to closedList and remove it from openList. Go back to step 1. If there are no more nodes in openList, our current node should be at the target node if there's a valid path available. Then, we just call the CalculatePath method with the current node parameter: private static ArrayList CalculatePath(Node node) { ArrayList list = new ArrayList(); while (node != null) { list.Add(node); node = node.parent; } list.Reverse(); return list; } } The CalculatePath method traces through each node's parent node object and builds an array list. It gives an array list with nodes from the target node to the start node. Since we want a path array from the start node to the target node, we just call the Reverse method. So, this is our AStar class. We'll write a test script in the following code to test all this and then set up a scene to use them in. Implementing a TestCode class This class will use the AStar class to find the path from the start node to the goal node, as shown in the following TestCode.cs file: using UnityEngine; using System.Collections; public class TestCode : MonoBehaviour { private Transform startPos, endPos; public Node startNode { get; set; } public Node goalNode { get; set; } public ArrayList pathArray; GameObject objStartCube, objEndCube; private float elapsedTime = 0.0f; //Interval time between pathfinding public float intervalTime = 1.0f; First, we set up the variables that we'll need to reference. The pathArray is to store the nodes array returned from the AStar FindPath method: void Start () { objStartCube = GameObject.FindGameObjectWithTag("Start"); objEndCube = GameObject.FindGameObjectWithTag("End"); pathArray = new ArrayList(); FindPath(); } void Update () { elapsedTime += Time.deltaTime; if (elapsedTime >= intervalTime) { elapsedTime = 0.0f; FindPath(); } } In the Start method, we look for objects with the Start and End tags and initialize our pathArray. We'll be trying to find our new path at every interval that we set to our intervalTime property in case the positions of the start and end nodes have changed. Then, we call the FindPath method: void FindPath() { startPos = objStartCube.transform; endPos = objEndCube.transform; startNode = new Node(GridManager.instance.GetGridCellCenter( GridManager.instance.GetGridIndex(startPos.position))); goalNode = new Node(GridManager.instance.GetGridCellCenter( GridManager.instance.GetGridIndex(endPos.position))); pathArray = AStar.FindPath(startNode, goalNode); } Since we implemented our pathfinding algorithm in the AStar class, finding a path has now become a lot simpler. First, we take the positions of our start and end game objects. Then, we create new Node objects using the helper methods of GridManager and GetGridIndex to calculate their respective row and column index positions inside the grid. Once we get this, we just call the AStar.FindPath method with the start node and goal node and store the returned array list in the local pathArray property. Next, we implement the OnDrawGizmos method to draw and visualize the path found: void OnDrawGizmos() { if (pathArray == null) return; if (pathArray.Count > 0) { int index = 1; foreach (Node node in pathArray) { if (index < pathArray.Count) { Node nextNode = (Node)pathArray[index]; Debug.DrawLine(node.position, nextNode.position, Color.green); index++; } } } } } We look through our pathArray and use the Debug.DrawLine method to draw the lines connecting the nodes from the pathArray. With this, we'll be able to see a green line connecting the nodes from start to end, forming a path, when we run and test our program. Setting up our sample scene We are going to set up a scene that looks something similar to the following screenshot: A sample test scene We'll have a directional light, the start and end game objects, a few obstacle objects, a plane entity to be used as ground, and two empty game objects in which we put our GridManager and TestAStar scripts. This is our scene hierarchy: The scene Hierarchy Create a bunch of cube entities and tag them as Obstacle. We'll be looking for objects with this tag when running our pathfinding algorithm. The Obstacle node Create a cube entity and tag it as Start. The Start node Then, create another cube entity and tag it as End. The End node Now, create an empty game object and attach the GridManager script. Set the name as GridManager because we use this name to look for the GridManager object from our script. Here, we can set up the number of rows and columns for our grid as well as the size of each tile. The GridManager script Testing all the components Let's hit the play button and see our A* Pathfinding algorithm in action. By default, once you play the scene, Unity will switch to the Game view. Since our pathfinding visualization code is written for the debug drawn in the editor view, you'll need to switch back to the Scene view or enable Gizmos to see the path found. Found path one Now, try to move the start or end node around in the scene using the editor's movement gizmo (not in the Game view, but the Scene view). Found path two You should see the path updated accordingly if there's a valid path from the start node to the target goal node, dynamically in real time. You'll get an error message in the console window if there's no path available. Summary In this article, we learned how to implement our own simple A* Pathfinding system. To attain this, we firstly implemented the Node class and established the priority queue. Then, we move on to setting up the grid manager. After that, we dived in deeper by implementing a TestCode class and setting up our sample scene. Finally, we tested all the components. Resources for Article: Further resources on this subject: Saying Hello to Unity and Android[article] Enemy and Friendly AIs[article] Customizing skin with GUISkin [article]
Read more
  • 0
  • 0
  • 29646

article-image-npm-at-nodejs-interactive-2018-npm-6-the-rise-and-fall-of-javascript-frameworks-and-more
Bhagyashree R
16 Oct 2018
7 min read
Save for later

npm at Node+JS Interactive 2018: npm 6, the rise and fall of JavaScript frameworks, and more

Bhagyashree R
16 Oct 2018
7 min read
Last week, Laurie Voss, the co-founder and COO of npm, at the Node+JS Interactive 2018 event spoke about npm and the future of JavaScript. He discussed several development tools used within the npm community, best practices, frameworks that are on the rise, and frameworks that are dying. He found these answers with the help of 1.5 billion log events per day and the JavaScript Ecosystem Survey 2017 consisting of 16,000 JavaScript developers. This survey gives data about what JavaScript users are doing and where the community is going. Let’s see some of the key highlights of this talk: npm is secure, popular, and fast With more than 10 million users and 6 billion package downloads every week, npm has become ridiculously popular. According to GitHub Octoverse 2017, JavaScript is the top language on GitHub by opened pull request. 85% of the developers who write JavaScript are using npm which is rising rapidly to reach 100%. These developers write JavaScript applications that run on 73% of browsers, 70% of servers, 44% of mobile devices, and 6% of IoT/robotics. The stats highlight that, npm is the package manager for mainly web developers and 97% of the code in a modern web app is downloaded from npm. The current version of npm, that is, npm 6 was released in April this year. This release comes with improved performance and addresses the major concern highlighted by the JavaScript Ecosystem survey, that is, security. Here are the major improvements in npm 6: npm 6 is super fast npm is now 20% faster than npm 4, so it is time for you to upgrade! You can do that using this command: npm install npm -g According to Laurie, npm is fast, and so is yarn. In fact, all of the package managers are nearly at the same speed now. This is the result of the makers of all the package managers coming together to make a community called package.community. This community of package manager maintainers and users are focused towards making package managers better, working on compatibility, and supporting each other. npm 6 locks by default This was one of the biggest changes in npm 6, which makes sure that what you have in the development environment is exactly what you put in production. This functionality is facilitated by a new file called package-lock.json. This so-called “lock file” saves information about your node_modules/ tree since the time you last edited your dependencies. This new feature comes with a number of benefits including increased reproducibility across teams, reduced network overhead when installing, and making it easier to debug issues with dependencies. npm ci This command is similar to npm install, but is meant to be used in automated environments. It is primarily used in environments such as test platforms, continuous integration, and deployment. It can be about 2-3x faster than regular npm install by skipping certain user-oriented features. It is also stricter than a regular install, which helps in catching errors or inconsistencies caused by the incrementally-installed local environments of most npm users. Advances in npm security Two-factor authentication In order to provide strong digital security, npm now has two-factor authentication (2FA). Two-factor authentication confirms your identity using two methods: Something you know such as, your username and password Something you have such as, a phone or tablet Quick audits Quick audits tells you whether the packages you are installing are secure or not. These security warnings will be more detailed and useful in npm 6 as compared to previous versions. The talk highlighted that currently quick audits are happening a lot (about 3.5 million scans per week!). These audit shows that 11% of the packages installed by developers have critical vulnerability: Source: npm To know the vulnerabilities that exist in your app and how critical they are, run the following commands: npm audit: This automatically runs when you install a package with npm install. It submits a description of the dependencies configured in your package to your default registry and asks for a report of known vulnerabilities. npm audit fix: Run the npm audit fix subcommand to automatically install compatible updates to vulnerable dependencies. The rise and fall of frameworks After speaking about the current status of npm, Laurie moved on to explaining what npm users are doing, which frameworks they are using, and which frameworks they are no longer interested to use. Not all npm users develop JavaScript applications The interesting thing here was that, though most of the JavaScript users use npm, it is not that they are the only npm users. Developers writing applications in other languages also use npm. These languages include Java, PHP, Python, and C#, among others: Source: npm Comparing various frameworks Next, he discussed the tools developers are currently opting for. This comparison was done on the basis of a metrics called share of registry. Share of registry shows the relative popularity of a package with other packages in the registry. Frontend frameworks “No framework dies, they only fade away with time.” To prove the above statement the best example would be Backbone. As compared to 2013, Backbone’s downloads has come down rapidly and not many users are using it now. Most of the developers are maintaining old applications written in Backbone and not writing new applications with it. So, what framework are they using? 60% of the respondents of the survey are using React. Despite a huge fraction gravitating towards React, its growth is slowing down a little. Angular is also an extremely popular framework. Ember is making a comeback now after facing a rough patch during 2016-2017. The next framework is Vue, which seems to be just taking off and is probably the reason behind the slowing growth of React. Here’s the graph showing the relative download growth of the frontend frameworks: Source: npm Backend frameworks Comparing backend frameworks was fairly easy, as Express was the clear winner: Source: npm But once Express is taken out of the picture, Koa seems to be the second most popular framework. With the growing use of server-side JavaScript, there is a rapid decline in the use of Sails. While, Hapi, another backend framework, is doing very well in absolute terms, it is not growing much in relative terms. Next.js is growing but with a very low pace. The following graph shows the relative growth of these backed frameworks: Source: npm Some predictions based on the survey It would be unwise to bet against React as it has tons of users and tons of modules Angular is a safer but less interesting choice Keep an eye on Next.js If you are looking for something new to learn, go for GraphQL With 46% of npm users using TypeScript for transpiling it is surely worth your attention WASM seems promising No matter what happens to JavaScript, npm is here to stay To conclude Laurie, rightly said, no framework is here forever: “Nothing lasts forever!..Any framework that we see today will have its hay days and then it will have an after-life where it will slowly slowly degrade.” To  watch the full talk, check out this YouTube video: npm and the future of JavaScript. npm v6 is out! React 16.5.0 is now out with a new package for scheduling, support for DevTools, and more! Node.js and JS Foundation announce intent to merge; developers have mixed feelings
Read more
  • 0
  • 0
  • 29645

article-image-the-openjdk-transition-things-to-know-and-do
Rich Sharples
28 Feb 2019
5 min read
Save for later

The OpenJDK Transition: Things to know and do

Rich Sharples
28 Feb 2019
5 min read
At this point, it should not be a surprise that a number of major changes were announced in the Java ecosystem, some of which have the potential to force a reassessment of Java roadmaps and even vendor selection for enterprise Java users. Some of the biggest changes are taking place in the Upstream OpenJDK (Open Java Development Kit), which means that users will need to have a backup plan in place for the transition. Read on to better understand exactly what the changes Oracle made to Oracle JDK are, why you should care about them and how to choose the best next steps for your organization. For some quick background, OpenJDK began as a free and open source implementation of the Java Platform, Standard Edition, through a 2006 Sun Microsystems initiative. They made the announcement at the 2006 JavaOne event that Java and the core Java platform would be open sourced. Over the next few years, major components of the Java Platform were released as free, open source software using the GPL. Other vendors - like Red Hat - then got involved with the project about a year later, through a broad contributor agreement which covers the participation of non-Sun engineers in the project. This piece is by Rich Sharples, Senior Director of Product Management at Red Hat , on what Oracle ending free lifecycle support means, and what steps users can take now to prepare for the change. So what is new in the world of Java and JDK? Why should you care? Okay, enough about the history. What are we anticipating for the future of Java and OpenJDK? At the end of January 2019, Oracle officially ended free public updates to Oracle JDK for non-Oracle customer commercial users. These users will no longer be able to get updates without an Oracle support contract. Additionally, Oracle has changed the Oracle JDK license (BCPL) so that commercial use for JDK 11 and beyond will require an Oracle subscription. They do also have a non-proprietary (GPLv2+CPE) distribution but the support policy around that is unclear at this time. This is a big deal because it means that users need to have strategies and plans in place to continue to get the support that Oracle had offered. You may be wondering whether it really is critical to run OpenJDK with support and if you can get away with running it without the support. The truth is that while the open source license and nature of OpenJDK means that you can technically run the software with no commercial support, it doesn’t mean that you necessarily should. There are too many risks associated with running OpenJDK unsupported, including numerous cases of critical vulnerabilities and security flaws. While there is nothing inherently insecure about open source software, when it is deployed without support or even a maintenance plan, it can open up your organization to threats that you may not be prepared to handle and resolve. Additionally, and probably the biggest reason why it is important to have commercial OpenJDK support, is that without a third party to help, you have to be entirely self-sufficient, meaning that you would have to ensure that you have specific engineering efforts that are permanently allocated to monitoring the upstream project and maintaining installations of OpenJDK. Not all organizations have the bandwidth or resources to be able to do this consistently. How can vendors help and what are other key technology considerations? Software vendors, including Red Hat, offer OpenJDK distributions and support to make sure that those who had been reliant on Oracle JDK have a seamless transition to take care of the above risks associated with not having OpenJDK support. It also allows customers and partners to focus on their business software, rather than needing to spend valuable engineering efforts on underlying Java runtimes. Additionally, Java SE 11 has introduced some significant new features, as well as deprecated others, and is the first significant update to the platform that will require users to think seriously about the impact of moving from it. With this, it is important to separate upgrading from Java SE 8 to Java SE 11 from moving from one vendor’s Java SE distribution to another vendor’s. In fact, moving from Java SE distributions - ones that are based in OpenJDK - without needing to change versions should be a fairly simple task. It is not recommended to change both the vendor and the version in one single step. Luckily also, the differences between Oracle JDK and OpenJDK are fairly minor and in fact, are slowly aligning to become more similar. Technology vendors can help in the transition that will inevitably come for OpenJDK - if it has not happened already. Having a proper regression test plan in place to ensure that applications run as they previously did, with a particular focus on performance and scalability is key and is something that vendors can help set up. Auditing and understanding if you need to make any code changes to applications is also a major undertaking that a third-party can help guide you on, that likely includes rulesets for the migrations. Finally, third-party vendors can help you deploy to the new OpenJDK solution and provide patches and security guidance as it comes up, to help keep the environment secure, updated and safe. While there are changes to Oracle JDK, once you find the alternative solution that is best suited for your organization, it should be fairly straightforward and cost-effective to make the transition, and third party vendors have the expertise, knowledge, and experience to help guide users through the OpenJDK transition. OpenJDK Project Valhalla is now in Phase III State of OpenJDK: Past, Present and Future with Oracle Mark Reinhold on the evolution of Java platform and OpenJDK
Read more
  • 0
  • 0
  • 29638

article-image-ansible-role-patterns-and-anti-patterns-by-lee-garrett-its-debian-maintainer
Vincy Davis
16 Dec 2019
6 min read
Save for later

Ansible role patterns and anti-patterns by Lee Garrett, its Debian maintainer

Vincy Davis
16 Dec 2019
6 min read
At DebConf held last year, Lee Garrett, a Debian maintainer for Ansible talked about some of the best practices in the open-source, configuration management tool. Ansible runs on Unix-like systems and configures both Unix-like and Microsoft Windows. It uses a simple syntax written in YAML, which is a human-readable data serialization language and uses SSH to connect to the node machines. Ansible is a helpful tool for creating a group of machines, describing their configuration and actions. Ansible is used to implement software provisioning, application-deployment security, compliance, and orchestration solutions. When compared to other configuration management tools like Puppet, Chef, SaltStack, etc, Ansible is very easy to setup. Garett says that due to its agentless nature, users can easily control any machine with an SSH daemon using Ansible. This will assist users in controlling any Debian installed machine using Ansible. It also supports the configuration of many things like networking equipment and Windows machines. Interested in more of Ansible? [box type="shadow" align="" class="" width=""]Get an insightful understanding of the design and development of Ansible from our book ‘Mastering Ansible’ written by James Freeman and Jesse Keating. This book will help you grasp the true power of Ansible automation engine by tackling complex, real-world actions with ease. The book also presents the fully automated Ansible playbook executions with encrypted data.[/box] What are Ansible role patterns? Ansible uses a playbook as an entry point for provisioning and defines automation through the YAML format. A playbook requires a predefined pattern to organize them and also needs other files to facilitate the sharing and reusing of provisioning. This is when a ‘role’ comes into the picture.  An Ansible role which is an independent component allows the reuse of common configuration steps. It contains a set of tasks that can be used to configure a host such that it will serve a certain function like configuring a service. Roles are defined using YAML files with a predefined directory structure. A role directory structure contains directories like defaults, vars, tasks, files, templates, meta, handlers.  Some tips for creating good Ansible role patterns An ideal role must have a ‘roles/<role>/task/main.yml’ format, thus specifying the name of the role, it’s tasks, and main.yml. At the beginning of each role, users are advised to check for necessary conditions like the ‘assert’ tasks to inspect if the variables are defined or not. Another prerequisite involves installing packages, using apps on CentOS machines and Yum (the default package manager tool in CentOS) or by using the git checkout.  Templating of files with abstraction is another important factor where variables are defined and put into templates to create the actual config file. Garrett also points out that a template module has a validate parameter which helps the user to check if the config file has any syntax errors. The syntax error can fail the playbook even before deploying the config file. For example, he says, “use Apache with the right parameters to do a con check on the syntax of the file. So that way you never end up with a state where there's a broken configure something there.”  Garrett also recommends putting sensible defaults in the ‘roles/defaults/main.yml’ layout which will make the defaults override the variables on specific cases. He further adds that a role should ideally run in the check mode. Ansible playbook has a --check which basically is “just a dry run” of a user’s complete playbook and --diff will display file or file mode changes in the playbook. Further, he adds that a variable can be defined in the default and in the Var's folder. However, the latter folder is hard to override and should be avoided, warns Garrett. What are some typical anti-patterns in Ansible? The shell and command modules are used in Ansible for executing commands on remote servers. Both modules require command names followed by a list of arguments.  The shell module is used when a command is to be executed in the remote servers in a particular shell. Garrett says that new Ansible users generally end up using the shell or command module in the same way as the wget computer program. According to him, this practice is wrong, since “there's currently I think thousands of three hundred different modules in ansible so there's likely a big chance that whatever you want to do there already a module for that just did that thing.”  He also asserts that these two modules have several problems as the shell module gets interrupted by the actual shells, so if the user has any special variables in the shell string and if their PlayBook is running in the check mode then the shell and the command module won't run.  Another drawback of these modules is that they will always refer back to change while running a command which makes its exit value zero. This means that the user will have to probably get the output and then check if there is any standard error present in it.  Next, Garrett explored some examples to show the alternatives to the shell/command module - the ‘slurp’ module. The slurp module will “slope the whole file and a 64 encoded” and will also enable access to the actual content with ‘path file.contents’. The best thing about this module is that it will never return any change and works great in the check mode. In another example, Garrett showed that when fetching a URL, the shell command ends up getting downloaded every time the playbook runs, thus throwing an error each time. This can again be avoided by using the ‘uri’ module instead of the shell module. The uri module will define the URL every time a file is to be retrieved thus helping the user to write and create a parameter. At the end of the talk, Garrett also threw light on the problems with using the set_facts module and shares its templates. Watch the full video on Youtube. You can also learn all about custom modules, plugins, and dynamic inventory sources in our book ‘Mastering Ansible’ written by James Freeman and Jesse Keating. Read More Ansible 2 for automating networking tasks on Google Cloud Platform [Tutorial] Automating OpenStack Networking and Security with Ansible 2 [Tutorial] Why choose Ansible for your automation and configuration management needs? Ten tips to successfully migrate from on-premise to Microsoft Azure Why should you consider becoming ‘AWS Developer Associate’ certified?
Read more
  • 0
  • 0
  • 29631
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-exploring-hdfs
Packt
10 Mar 2016
17 min read
Save for later

Exploring HDFS

Packt
10 Mar 2016
17 min read
In this article by Tanmay Deshpande, the author of the book Hadoop Real World Solutions Cookbook- Second Edition, we'll cover the following recipes: Loading data from a local machine to HDFS Exporting HDFS data to a local machine Changing the replication factor of an existing file in HDFS Setting the HDFS block size for all the files in a cluster Setting the HDFS block size for a specific file in a cluster Enabling transparent encryption for HDFS Importing data from another Hadoop cluster Recycling deleted data from trash to HDFS Saving compressed data in HDFS Hadoop has two important components: Storage: This includes HDFS Processing: This includes Map Reduce HDFS takes care of the storage part of Hadoop. So, let's explore the internals of HDFS through various recipes. (For more resources related to this topic, see here.) Loading data from a local machine to HDFS In this recipe, we are going to load data from a local machine's disk to HDFS. Getting ready To perform this recipe, you should have an already Hadoop running cluster. How to do it... Performing this recipe is as simple as copying data from one folder to another. There are a couple of ways to copy data from the local machine to HDFS. Using the copyFromLocal commandTo copy the file on HDFS, let's first create a directory on HDFS and then copy the file. Here are the commands to do this: hadoop fs -mkdir /mydir1 hadoop fs -copyFromLocal /usr/local/hadoop/LICENSE.txt /mydir1 Using the put commandWe will first create the directory, and then put the local file in HDFS: hadoop fs -mkdir /mydir2 hadoop fs -put /usr/local/hadoop/LICENSE.txt /mydir2 You can validate that the files have been copied to the correct folders by listing the files: hadoop fs -ls /mydir1 hadoop fs -ls /mydir2 How it works... When you use HDFS copyFromLocal or the put command, the following things will occur: First of all, the HDFS client (the command prompt, in this case) contacts NameNode because it needs to copy the file to HDFS. NameNode then asks the client to break the file into chunks of different cluster block sizes. In Hadoop 2.X, the default block size is 128MB. Based on the capacity and availability of space in DataNodes, NameNode will decide where these blocks should be copied. Then, the client starts copying data to specified DataNodes for a specific block. The blocks are copied sequentially one after another. When a single block is copied, the block is sent to DataNode into packets that are 4MB in size. With each packet, a checksum is sent; once the packet copying is done, it is verified with checksum to check whether it matches. The packets are then sent to the next DataNode where the block will be replicated. The HDFS client's responsibility is to copy the data to only the first node; the replication is taken care by respective DataNode. Thus, the data block is pipelined from one DataNode to the next. When the block copying and replication is taking place, metadata on the file is updated in NameNode by DataNode. Exporting data from HDFS to Local machine In this recipe, we are going to export/copy data from HDFS to the local machine. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... Performing this recipe is as simple as copying data from one folder to the other. There are a couple of ways in which you can export data from HDFS to the local machine. Using the copyToLocal command, you'll get this code: hadoop fs -copyToLocal /mydir1/LICENSE.txt /home/ubuntu Using the get command, you'll get this code: hadoop fs -get/mydir1/LICENSE.txt /home/ubuntu How it works... When you use HDFS copyToLocal or the get command, the following things occur: First of all, the client contacts NameNode because it needs a specific file in HDFS. NameNode then checks whether such a file exists in its FSImage. If the file is not present, the error code is returned to the client. If the file exists, NameNode checks the metadata for blocks and replica placements in DataNodes. NameNode then directly points DataNode from where the blocks would be given to client one by one. The data is directly copied from DataNode to the client machine. and it never goes through NameNode to avoid bottlenecks. Thus, the file is exported to the local machine from HDFS. Changing the replication factor of an existing file in HDFS In this recipe, we are going to take a look at how to change the replication factor of a file in HDFS. The default replication factor is 3. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... Sometimes. there might be a need to increase or decrease the replication factor of a specific file in HDFS. In this case, we'll use the setrep command. This is how you can use the command: hadoop fs -setrep [-R] [-w] <noOfReplicas><path> ... In this command, a path can either be a file or directory; if its a directory, then it recursively sets the replication factor for all replicas. The w option flags the command and should wait until the replication is complete The r option is accepted for backward compatibility First, let's check the replication factor of the file we copied to HDFS in the previous recipe: hadoop fs -ls /mydir1/LICENSE.txt -rw-r--r-- 3 ubuntu supergroup 15429 2015-10-29 03:04 /mydir1/LICENSE.txt Once you list the file, it will show you the read/write permissions on this file, and the very next parameter is the replication factor. We have the replication factor set to 3 for our cluster, hence, you the number is 3. Let's change it to 2 using this command: hadoop fs -setrep -w 2 /mydir1/LICENSE.txt It will wait till the replication is adjusted. Once done, you can verify this again by running the ls command: hadoop fs -ls /mydir1/LICENSE.txt -rw-r--r-- 2 ubuntu supergroup 15429 2015-10-29 03:04 /mydir1/LICENSE.txt How it works... Once the setrep command is executed, NameNode will be notified, and then NameNode decides whether the replicas need to be increased or decreased from certain DataNode. When you are using the –w command, sometimes, this process may take too long if the file size is too big. Setting the HDFS block size for all the files in a cluster In this recipe, we are going to take a look at how to set a block size at the cluster level. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... The HDFS block size is configurable for all files in the cluster or for a single file as well. To change the block size at the cluster level itself, we need to modify the hdfs-site.xml file. By default, the HDFS block size is 128MB. In case we want to modify this, we need to update this property, as shown in the following code. This property changes the default block size to 64MB: <property> <name>dfs.block.size</name> <value>67108864</value> <description>HDFS Block size</description> </property> If you have a multi-node Hadoop cluster, you should update this file in the nodes, that is, NameNode and DataNode. Make sure you save these changes and restart the HDFS daemons: /usr/local/hadoop/sbin/stop-dfs.sh /usr/local/hadoop/sbin/start-dfs.sh This will set the block size for files that will now get added to the HDFS cluster. Make sure that this does not change the block size of the files that are already present in HDFS. There is no way to change the block sizes of existing files. How it works... By default, the HDFS block size is 128MB for Hadoop 2.X. Sometimes, we may want to change this default block size for optimization purposes. When this configuration is successfully updated, all the new files will be saved into blocks of this size. Ensure that these changes do not affect the files that are already present in HDFS; their block size will be defined at the time being copied. Setting the HDFS block size for a specific file in a cluster In this recipe, we are going to take a look at how to set the block size for a specific file only. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... In the previous recipe, we learned how to change the block size at the cluster level. But this is not always required. HDFS provides us with the facility to set the block size for a single file as well. The following command copies a file called myfile to HDFS, setting the block size to 1MB: hadoop fs -Ddfs.block.size=1048576 -put /home/ubuntu/myfile / Once the file is copied, you can verify whether the block size is set to 1MB and has been broken into exact chunks: hdfs fsck -blocks /myfile Connecting to namenode via http://localhost:50070/fsck?ugi=ubuntu&blocks=1&path=%2Fmyfile FSCK started by ubuntu (auth:SIMPLE) from /127.0.0.1 for path /myfile at Thu Oct 29 14:58:00 UTC 2015 .Status: HEALTHY Total size: 17276808 B Total dirs: 0 Total files: 1 Total symlinks: 0 Total blocks (validated): 17 (avg. block size 1016282 B) Minimally replicated blocks: 17 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 1 Average block replication: 1.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 3 Number of racks: 1 FSCK ended at Thu Oct 29 14:58:00 UTC 2015 in 2 milliseconds The filesystem under path '/myfile' is HEALTHY How it works... When we specify the block size at the time of copying a file, it overwrites the default block size and copies the file to HDFS by breaking the file into chunks of a given size. Generally, these modifications are made in order to perform other optimizations. Make sure you make these changes, and you are aware of their consequences. If the block size isn't adequate enough, it will increase the parallelization, but it will also increase the load on NameNode as it would have more entries in FSImage. On the other hand, if the block size is too big, then it will reduce the parallelization and degrade the processing performance. Enabling transparent encryption for HDFS When handling sensitive data, it is always important to consider the security measures. Hadoop allows us to encrypt sensitive data that's present in HDFS. In this recipe, we are going to see how to encrypt data in HDFS. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... For many applications that hold sensitive data, it is very important to adhere to standards such as PCI, HIPPA, FISMA, and so on. To enable this, HDFS provides a utility called encryption zone where we can create a directory in it so that data is encrypted on writes and decrypted on read. To use this encryption facility, we first need to enable Hadoop Key Management Server (KMS): /usr/local/hadoop/sbin/kms.sh start This would start KMS in the Tomcat web server. Next, we need to append the following properties in core-site.xml and hdfs-site.xml. In core-site.xml, add the following property: <property> <name>hadoop.security.key.provider.path</name> <value>kms://http@localhost:16000/kms</value> </property> In hds-site.xml, add the following property: <property> <name>dfs.encryption.key.provider.uri</name> <value>kms://http@localhost:16000/kms</value> </property> Restart the HDFS daemons: /usr/local/hadoop/sbin/stop-dfs.sh /usr/local/hadoop/sbin/start-dfs.sh Now, we are all set to use KMS. Next, we need to create a key that will be used for the encryption: hadoop key create mykey This will create a key, and then, save it on KMS. Next, we have to create an encryption zone, which is a directory in HDFS where all the encrypted data is saved: hadoop fs -mkdir /zone hdfs crypto -createZone -keyName mykey -path /zone We will change the ownership to the current user: hadoop fs -chown ubuntu:ubuntu /zone If we put any file into this directory, it will encrypt and would decrypt at the time of reading: hadoop fs -put myfile /zone hadoop fs -cat /zone/myfile How it works... There can be various types of encryptions one can do in order to comply with security standards, for example, application-level encryption, database level, file level, and disk-level encryption. The HDFS transparent encryption sits between the database and file-level encryptions. KMS acts like proxy between HDFS clients and HDFS's encryption provider via HTTP REST APIs. There are two types of keys used for encryption: Encryption Zone Key( EZK) and Data Encryption Key (DEK). EZK is used to encrypt DEK, which is also called Encrypted Data Encryption Key(EDEK). This is then saved on NameNode. When a file needs to be written to the HDFS encryption zone, the client gets EDEK from NameNode and EZK from KMS to form DEK, which is used to encrypt data and store it in HDFS (the encrypted zone). When an encrypted file needs to be read, the client needs DEK, which is formed by combining EZK and EDEK. These are obtained from KMS and NameNode, respectively. Thus, encryption and decryption is automatically handled by HDFS. and the end user does not need to worry about executing this on their own. You can read more on this topic at http://blog.cloudera.com/blog/2015/01/new-in-cdh-5-3-transparent-encryption-in-hdfs/. Importing data from another Hadoop cluster Sometimes, we may want to copy data from one HDFS to another either for development, testing, or production migration. In this recipe, we will learn how to copy data from one HDFS cluster to another. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... Hadoop provides a utility called DistCp, which helps us copy data from one cluster to another. Using this utility is as simple as copying from one folder to another: hadoop distcp hdfs://hadoopCluster1:9000/source hdfs://hadoopCluster2:9000/target This would use a Map Reduce job to copy data from one cluster to another. You can also specify multiple source files to be copied to the target. There are couple of other options that we can also use: -update: When we use DistCp with the update option, it will copy only those files from the source that are not part of the target or differ from the target. -overwrite: When we use DistCp with the overwrite option, it overwrites the target directory with the source. How it works... When DistCp is executed, it uses map reduce to copy the data and also assists in error handling and reporting. It expands the list of source files and directories and inputs them to map tasks. When copying from multiple sources, collisions are resolved in the destination based on the option (update/overwrite) that's provided. By default, it skips if the file is already present at the target. Once the copying is complete, the count of skipped files is presented. You can read more on DistCp at https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html. Recycling deleted data from trash to HDFS In this recipe, we are going to see how recover deleted data from the trash to HDFS. Getting ready To perform this recipe, you should already have a running Hadoop cluster. How to do it... To recover accidently deleted data from HDFS, we first need to enable the trash folder, which is not enabled by default in HDFS. This can be achieved by adding the following property to core-site.xml: <property> <name>fs.trash.interval</name> <value>120</value> </property> Then, restart the HDFS daemons: /usr/local/hadoop/sbin/stop-dfs.sh /usr/local/hadoop/sbin/start-dfs.sh This will set the deleted file retention to 120 minutes. Now, let's try to delete a file from HDFS: hadoop fs -rmr /LICENSE.txt 15/10/30 10:26:26 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 120 minutes, Emptier interval = 0 minutes. Moved: 'hdfs://localhost:9000/LICENSE.txt' to trash at: hdfs://localhost:9000/user/ubuntu/.Trash/Current We have 120 minutes to recover this file before it is permanently deleted from HDFS. To restore the file to its original location, we can execute the following commands. First, let's confirm whether the file exists: hadoop fs -ls /user/ubuntu/.Trash/Current Found 1 items -rw-r--r-- 1 ubuntu supergroup 15429 2015-10-30 10:26 /user/ubuntu/.Trash/Current/LICENSE.txt Now, restore the deleted file or folder; it's better to use the distcp command instead of copying each file one by one: hadoop distcp hdfs //localhost:9000/user/ubuntu/.Trash/Current/LICENSE.txt hdfs://localhost:9000/ This will start a map reduce job to restore data from the trash to the original HDFS folder. Check the HDFS path; the deleted file should be back to its original form. How it works... Enabling trash enforces the file retention policy for a specified amount of time. So, when trash is enabled, HDFS does not execute any blocks deletions or movements immediately but only updates the metadata of the file and its location. This way, we can accidently stop deleting files from HDFS; make sure that trash is enabled before experimenting with this recipe. Saving compressed data on HDFS In this recipe, we are going to take a look at how to store and process compressed data in HDFS. Getting ready To perform this recipe, you should already have a running Hadoop. How to do it... It's always good to use compression while storing data in HDFS. HDFS supports various types of compression algorithms such as LZO, BIZ2, Snappy, GZIP, and so on. Every algorithm has its own pros and cons when you consider the time taken to compress and decompress and the space efficiency. These days people prefer Snappy compression as it aims to achieve a very high speed and reasonable amount compression. We can easily store and process any number of files in HDFS. To store compressed data, we don't need to specifically make any changes to the Hadoop cluster. You can simply copy the compressed data in the same way it's in HDFS. Here is an example of this: hadoop fs -mkdir /compressed hadoop fs –put file.bz2 /compressed Now, we'll run a sample program to take a look at how Hadoop automatically uncompresses the file and processes it: hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount /compressed /compressed_out Once the job is complete, you can verify the output. How it works... Hadoop explores native libraries to find the support needed for various codecs and their implementations. Native libraries are specific to the platform that you run Hadoop on. You don't need to make any configurations changes to enable compression algorithms. As mentioned earlier, Hadoop supports various compression algorithms that are already familiar to the computer world. Based on your needs and requirements (more space or more time), you can choose your compression algorithm. Take a look at http://comphadoop.weebly.com/ for more information on this. Summary We covered major factors with respect to HDFS in this article which comprises of recipes that help us to load, extract, import, export and saving data in HDFS. It also covers enabling transparent encryption for HDFS as well adjusting block size of HDFS cluster. Resources for Article: Further resources on this subject: Hadoop and MapReduce [article] Advanced Hadoop MapReduce Administration [article] Integration with Hadoop [article]
Read more
  • 0
  • 0
  • 29624

article-image-fat-2018-conference-session-2-summary-interpretability-explainability
Savia Lobo
22 Feb 2018
5 min read
Save for later

FAT* 2018 Conference Session 2 Summary: Interpretability and Explainability

Savia Lobo
22 Feb 2018
5 min read
This session of the FAT* 2018 is about interpretability and explainability in machine learning models. With the advances in Deep learning, machine learning models have become more accurate. However, with accuracy and advancements, it is a tough task to keep the models highly explainable. This means, these models may appear as black boxes to business users, who utilize them without knowing what lies within. Thus, it is equally important to make ML models interpretable and explainable, which can be beneficial and essential for understanding ML models and to have a ‘behind the scenes’ knowledge of what’s happening within them. This understanding can be highly essential for heavily regulated industries like Finance, Medicine, Defence and so on. The Conference on Fairness, Accountability, and Transparency (FAT), which would be held on the 23rd and 24th of February, 2018 is a multi-disciplinary conference that brings together researchers and practitioners interested in fairness, accountability, and transparency in socio-technical systems. The FAT 2018 conference will witness 17 research papers, 6 tutorials, and 2 keynote presentations from leading experts in the field. This article covers research papers pertaining to the 2nd session that is dedicated to Interpretability and Explainability of machine-learned decisions. If you’ve missed our summary of the 1st session on Online Discrimination and Privacy, visit the article link for a catch up. Paper 1: Meaningful Information and the Right to Explanation This paper addresses an active debate in policy, industry, academia, and the media about whether and to what extent Europe’s new General Data Protection Regulation (GDPR) grants individuals a “right to explanation” of automated decisions. The paper explores two major papers, European Union Regulations on Algorithmic Decision Making and a “Right to Explanation” by Goodman and Flaxman (2017) Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation by Wachter et al. (2017) This paper demonstrates that the specified framework is built on incorrect legal and technical assumptions. In addition to responding to the existing scholarly contributions, the article articulates a positive conception of the right to explanation, located in the text and purpose of the GDPR. The authors take a position that the right should be interpreted functionally, flexibly, and should, at a minimum, enable a data subject to exercise his or her rights under the GDPR and human rights law. Key takeaways: The first paper by Goodman and Flaxman states that GDPR creates a "right to explanation" but without any argument. The second paper is in response to the first paper, where Watcher et al. have published an extensive critique, arguing against the existence of such a right. The current paper, on the other hand, is partially concerned with responding to the arguments of Watcher et al. Paper 2: Interpretable Active Learning The paper tries to highlight how due to complex and opaque ML models, the process of active learning has also become opaque. Not much has been known about what specific trends and patterns, the active learning strategy may be exploring. The paper expands on explaining about LIME (Local Interpretable Model-agnostic Explanations framework) to provide explanations for active learning recommendations. The authors, Richard Phillips, Kyu Hyun Chang, and Sorelle A. Friedler, demonstrate uses of LIME in generating locally faithful explanations for an active learning strategy. Further, the paper shows how these explanations can be used to understand how different models and datasets explore a problem space over time. Key takeaways: The paper demonstrates how active learning choices can be made more interpretable to non-experts. It also discusses techniques that make active learning interpretable to expert labelers, so that queries and query batches can be explained and the uncertainty bias can be tracked via interpretable clusters. It showcases per-query explanations of uncertainty to develop a system that allows experts to choose whether to label a query. This will allow them to incorporate domain knowledge and their own interests into the labeling process. It introduces a quantified notion of uncertainty bias, the idea that an algorithm may be less certain about its decisions on some data clusters than others. Paper 3: Interventions over Predictions: Reframing the Ethical Debate for Actuarial Risk Assessment Actuarial risk assessments might be unduly perceived as a neutral way to counteract implicit bias and increase the fairness of decisions made within the criminal justice system, from pretrial release to sentencing, parole, and probation. However, recently, these assessments have come under increased scrutiny, as critics claim that the statistical techniques underlying them might reproduce existing patterns of discrimination and historical biases that are reflected in the data. The paper proposes that machine learning should not be used for prediction, but rather to surface covariates that are fed into a causal model for understanding the social, structural and psychological drivers of crime. The authors, Chelsea Barabas, Madars Virza, Karthik Dinakar, Joichi Ito (MIT), Jonathan Zittrain (Harvard),  propose an alternative application of machine learning and causal inference away from predicting risk scores to risk mitigation. Key takeaways: The paper gives a brief overview of how risk assessments have evolved from a tool used solely for prediction to one that is diagnostic at its core. The paper places a debate around risk assessment in a broader context. One can get a fuller understanding of the way these actuarial tools have evolved to achieve a varied set of social and institutional agendas. It argues for a shift away from predictive technologies, towards diagnostic methods that will help in understanding the criminogenic effects of the criminal justice system itself, as well as evaluate the effectiveness of interventions designed to interrupt cycles of crime. It proposes that risk assessments, when viewed as a diagnostic tool, can be used to understand the underlying social, economic and psychological drivers of crime. The authors also posit that causal inference offers the best framework for pursuing the goals to achieve a fair and ethical risk assessment tool.
Read more
  • 0
  • 0
  • 29623

article-image-getting-started-spring-security
Packt
14 Mar 2013
14 min read
Save for later

Getting Started with Spring Security

Packt
14 Mar 2013
14 min read
(For more resources related to this topic, see here.) Hello Spring Security Although Spring Security can be extremely difficult to configure, the creators of the product have been thoughtful and have provided us with a very simple mechanism to enable much of the software's functionality with a strong baseline. From this baseline, additional configuration will allow a fine level of detailed control over the security behavior of our application. We'll start with an unsecured calendar application, and turn it into a site that's secured with rudimentary username and password authentication. This authentication serves merely to illustrate the steps involved in enabling Spring Security for our web application; you'll see that there are some obvious flaws in this approach that will lead us to make further configuration refinements. Updating your dependencies The first step is to update the project's dependencies to include the necessary Spring Security .jar files. Update the Maven pom.xml file from the sample application you imported previously, to include the Spring Security .jar files that we will use in the following few sections. Remember that Maven will download the transitive dependencies for each listed dependency. So, if you are using another mechanism to manage dependencies, ensure that you also include the transitive dependencies. When managing the dependencies manually, it is useful to know that the Spring Security reference includes a list of its transitive dependencies. pom.xml <dependency> <groupId>org.springframework.security</groupId> <artifactId>spring-security-config</artifactId> <version>3.1.0.RELEASE</version> </dependency> <dependency> <groupId>org.springframework.security</groupId> <artifactId>spring-security-core</artifactId> <version>3.1.0.RELEASE</version> </dependency> <dependency> <groupId>org.springframework.security</groupId> <artifactId>spring-security-web</artifactId> <version>3.1.0.RELEASE</version> </dependency> Downloading the example code You can download the example code files for all Packt books you have purchased from your account at https://www.packtpub.com. If you purchased this book elsewhere, you can visit https://www.packtpub.com/books/content/support and register to have the files e-mailed directly to you. Using Spring 3.1 and Spring Security 3.1 It is important to ensure that all of the Spring dependency versions match and all the Spring Security versions match; this includes transitive versions. Since Spring Security 3.1 builds with Spring 3.0, Maven will attempt to bring in Spring 3.0 dependencies. This means, in order to use Spring 3.1, you must ensure to explicitly list the Spring 3.1 dependencies or use Maven's dependency management features, to ensure that Spring 3.1 is used consistently. Our sample applications provide an example of the former option, which means that no additional work is required by you. In the following code, we present an example fragment of what is added to the Maven pom.xml file to utilize Maven's dependency management feature, to ensure that Spring 3.1 is used throughout the entire application: <project ...> ... <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-aop</artifactId> <version>3.1.0.RELEASE</version> </dependency> … list all Spring dependencies (a list can be found in our sample application's pom.xml ... <dependency> <groupId>org.springframework</groupId> <artifactId>spring-web</artifactId> <version>3.1.0.RELEASE</version> </dependency> </dependencies> </dependencyManagement> </project> If you are using Spring Tool Suite, any time you update the pom.xml file, ensure you right-click on the project and navigate to Maven | Update Project…, and select OK, to update all the dependencies. Implementing a Spring Security XML configuration file The next step in the configuration process is to create an XML configuration file, representing all Spring Security components required to cover standard web requests.Create a new XML file in the src/main/webapp/WEB-INF/spring/ directory with the name security.xml and the following contents. Among other things, the following file demonstrates how to require a user to log in for every page in our application, provide a login page, authenticate the user, and require the logged-in user to be associated to ROLE_USER for every URL:URL element: src/main/webapp/WEB-INF/spring/security.xml <?xml version="1.0" encoding="UTF-8"?> <bean:beans xsi_schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-3.1.xsd http://www.springframework.org/schema/security http://www.springframework.org/schema/security/spring-security- 3.1.xsd"> <http auto-config="true"> <intercept-url pattern="/**" access="ROLE_USER"/> </http> <authentication-manager> <authentication-provider> <user-service> <user name="user1@example.com" password="user1" authorities="ROLE_USER"/> </user-service> </authentication-provider> </authentication-manager> </bean:beans> If you are using Spring Tool Suite, you can easily create Spring configuration files by using File | New Spring Bean Configuration File. This wizard allows you to select the XML namespaces you wish to use, making configuration easier by not requiring the developer to remember the namespace locations and helping prevent typographical errors. You will need to manually change the schema definitions as illustrated in the preceding code. This is the only Spring Security configuration required to get our web application secured with a minimal standard configuration. This style of configuration, using a Spring Security-specific XML dialect, is known as the security namespace style, named after the XML namespace (http://www.springframework.org/schema/security) associated with the XML configuration elements. Let's take a minute to break this configuration apart, so we can get a high-level idea of what is happening. The <http> element creates a servlet filter, which ensures that the currently logged-in user is associated to the appropriate role. In this instance, the filter will ensure that the user is associated with ROLE_USER. It is important to understand that the name of the role is arbitrary. Later, we will create a user with ROLE_ADMIN and will allow this user to have access to additional URLs that our current user does not have access to. The <authentication-manager> element is how Spring Security authenticates the user. In this instance, we utilize an in-memory data store to compare a username and password. Our example and explanation of what is happening are a bit contrived. An inmemory authentication store would not work for a production environment. However, it allows us to get up and running quickly. We will incrementally improve our understanding of Spring Security as we update our application to use production quality security . Users who dislike Spring's XML configuration will be disappointed to learn that there isn't an alternative annotation-based or Java-based configuration mechanism for Spring Security, as there is with Spring Framework. There is an experimental approach that uses Scala to configure Spring Security, but at the time of this writing, there are no known plans to release it. If you like, you can learn more about it at https://github.com/tekul/scalasec/. Still, perhaps in the future, we'll see the ability to easily configure Spring Security in other ways. Although annotations are not prevalent in Spring Security, certain aspects of Spring Security that apply security elements to classes or methods are, as you'd expect, available via annotations. Updating your web.xml file The next steps involve a series of updates to the web.xml file. Some of the steps have already been performed because the application was already using Spring MVC. However, we will go over these requirements to ensure that these more fundamental Spring requirements are understood, in the event that you are using Spring Security in an application that is not Spring-enabled. ContextLoaderListener The first step of updating the web.xml file is to ensure that it contains the o.s.w.context.ContextLoaderListener listener, which is in charge of starting and stopping the Spring root ApplicationContext interface. ContextLoaderListener determines which configurations are to be used, by looking at the <context-param> tag for contextConfigLocation. It is also important to specify where to read the Spring configurations from. Our application already has ContextLoaderListener added, so we only need to add the newly created security.xml configuration file, as shown in the following code snippet: src/main/webapp/WEB-INF/web.xml <context-param> <param-name>contextConfigLocation</param-name> <param-value> /WEB-INF/spring/services.xml /WEB-INF/spring/i18n.xml /WEB-INF/spring/security.xml </param-value> </context-param> <listener> <listener-class> org.springframework.web.context.ContextLoaderListener </listener-class> </listener> The updated configuration will now load the security.xml file from the /WEB-INF/spring/ directory of the WAR. As an alternative, we could have used /WEB-INF/spring/*.xml to load all the XML files found in /WEB-INF/spring/. We choose not to use the *.xml notation to have more control over which files are loaded. ContextLoaderListener versus DispatcherServlet You may have noticed that o.s.web.servlet.DispatcherServlet specifies a contextConfigLocation component of its own. src/main/webapp/WEB-INF/web.xml <servlet> <servlet-name>Spring MVC Dispatcher Servlet</servlet-name> <servlet-class> org.springframework.web.servlet.DispatcherServlet </servlet-class> <init-param> <param-name>contextConfigLocation</param-name> <param-value> /WEB-INF/mvc-config.xml </param-value> </init-param> <load-on-startup>1</load-on-startup> </servlet> DispatcherServlet creates o.s.context.ApplicationContext, which is a child of the root ApplicationContext interface. Typically, Spring MVC-specific components are initialized in the ApplicationContext interface of DispatcherServlet, while the rest are loaded by ContextLoaderListener. It is important to know that beans in a child ApplicationContext (such as those created by DispatcherServlet) can reference beans of its parent ApplicationContext (such as those created by ContextLoaderListener). However, the parent ApplicationContext cannot refer to beans of the child ApplicationContext. This is illustrated in the following diagram where childBean can refer to rootBean, but rootBean cannot refer to childBean. As with most usage of Spring Security, we do not need Spring Security to refer to any of the MVC-declared beans. Therefore, we have decided to have ContextLoaderListener initialize all of Spring Security's configuration. springSecurityFilterChain The next step is to configure springSecurityFilterChain to intercept all requests by updating web.xml. Servlet <filter-mapping> elements are considered in the order that they are declared. Therefore, it is critical for springSecurityFilterChain to be declared first, to ensure the request is secured prior to any other logic being invoked. Update your web.xml file with the following configuration: src/main/webapp/WEB-INF/web.xml </listener> <filter> <filter-name>springSecurityFilterChain</filter-name> <filter-class> org.springframework.web.filter.DelegatingFilterProxy </filter-class> </filter> <filter-mapping> <filter-name>springSecurityFilterChain</filter-name> <url-pattern>/*</url-pattern> </filter-mapping> <servlet> Not only is it important for Spring Security to be declared as the first <filter-mapping> element, but we should also be aware that, with the example configuration, Spring Security will not intercept forwards, includes, or errors. Often, it is not necessary to intercept other types of requests, but if you need to do this, the dispatcher element for each type of request should be included in <filter-mapping>. We will not perform these steps for our application, but you can see an example, as shown in the following code snippet: src/main/webapp/WEB-INF/web.xml <filter-mapping> <filter-name>springSecurityFilterChain</filter-name> <url-pattern>/*</url-pattern> <dispatcher>REQUEST</dispatcher> <dispatcher>ERROR</dispatcher> ... </filter-mapping> DelegatingFilterProxy The o.s.web.filter.DelegatingFilterProxy class is a servlet filter provided by Spring Web that will delegate all work to a Spring bean from the root ApplicationContext that must implement javax.servlet.Filter. Since, by default, the bean is looked up by name, using the value of <filter-name>, we must ensure we use springSecurityFilterChain as the value of <filter-name>. Pseudo-code for how o.s.web.filter.DelegatingFilterProxy works for our web.xml file can be found in the following code snippet: public class DelegatingFilterProxy implements Filter { void doFilter(request, response, filterChain) { Filter delegate = applicationContet.getBean("springSecurityFilterChain") delegate.doFilter(request,response,filterChain); } } FilterChainProxy When working in conjunction with Spring Security, o.s.web.filter. DelegatingFilterProxy will delegate to Spring Security's o.s.s.web. FilterChainProxy, which was created in our minimal security.xml file. FilterChainProxy allows Spring Security to conditionally apply any number of servlet filters to the servlet request. We will learn more about each of the Spring Security filters and their role in ensuring that our application is properly secured, throughout the rest of the book. The pseudo-code for how FilterChainProxy works is as follows: public class FilterChainProxy implements Filter { void doFilter(request, response, filterChain) { // lookup all the Filters for this request List<Filter> delegates = lookupDelegates(request,response) // invoke each filter unless the delegate decided to stop for delegate in delegates { if continue processing delegate.doFilter(request,response,filterChain) } // if all the filters decide it is ok allow the // rest of the application to run if continue processing filterChain.doFilter(request,response) } } Due to the fact that both DelegatingFilterProxy and FilterChainProxy are the front door to Spring Security, when used in a web application, it is here that you would add a debug point when trying to figure out what is happening. Running a secured application If you have not already done so, restart the application and visit http://localhost:8080/calendar/, and you will be presented with the following screen: Great job! We've implemented a basic layer of security in our application, using Spring Security. At this point, you should be able to log in using user1@example.com as the User and user1 as the Password (user1@example.com/user1). You'll see the calendar welcome page, which describes at a high level what to expect from the application in terms of security. Common problems Many users have trouble with the initial implementation of Spring Security in their application. A few common issues and suggestions are listed next. We want to ensure that you can run the example application and follow along! Make sure you can build and deploy the application before putting Spring Security in place. Review some introductory samples and documentation on your servlet container if needed. It's usually easiest to use an IDE, such as Eclipse, to run your servlet container. Not only is deployment typically seamless, but the console log is also readily available to review for errors. You can also set breakpoints at strategic locations, to be triggered on exceptions to better diagnose errors. If your XML configuration file is incorrect, you will get this (or something similar to this): org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element 'beans'. It's quite common for users to get confused with the various XML namespace references required to properly configure Spring Security. Review the samples again, paying attention to avoid line wrapping in the schema declarations, and use an XML validator to verify that you don't have any malformed XML. If you get an error stating "BeanDefinitionParsingException: Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace [http://www.springframework.org/ schema/security] ...", ensure that the spring-security-config- 3.1.0.RELEASE.jar file is on your classpath. Also ensure the version matches the other Spring Security JARs and the XML declaration in your Spring configuration file. Make sure the versions of Spring and Spring Security that you're using match and that there aren't any unexpected Spring JARs remaining as part of your application. As previously mentioned, when using Maven, it can be a good idea to declare the Spring dependencies in the dependency management section.
Read more
  • 0
  • 0
  • 29606

article-image-adding-postgis-layers-using-qgis-tutorial
Pravin Dhandre
26 Jul 2018
5 min read
Save for later

Adding PostGIS layers using QGIS [Tutorial]

Pravin Dhandre
26 Jul 2018
5 min read
Viewing tables as layers is great for creating maps or for simply working on a copy of the database outside the database. In this tutorial, we will establish a connection to our PostGIS database in order to add a table as a layer in QGIS (formerly known as Quantum GIS). Please navigate to the following site to install the latest version LTR of QGIS. On this page, click on Download Now and you will be able to choose a suitable operating system and the relevant settings. QGIS is available for Android, Linux, macOS X, and Windows. You might also be inclined to click on Discover QGIS to get an overview of basic information about the program along with features, screenshots, and case studies. This QGIS tutorial is an excerpt from a book written by Mayra Zurbaran,Pedro Wightman, Paolo Corti, Stephen Mather, Thomas Kraft and Bborie Park, titled PostGIS Cookbook - Second Edition. Loading Database... To begin, create the schema for this tutorial then, download data from the U.S. Census Bureau's FTP site: The shapefile is All Lines for Cuyahoga county in Ohio, which consist of roads and streams among other line features. Extract the ZIP file to your working directory and then load it into your database using shp2pgsql. Be sure to specify the spatial reference system, EPSG/SRID: 4269. When in doubt about using projections, use the service provided by the folks at OpenGeo at the following website: http://prj2epsg.org/search Use the following command to generate the SQL to load the shapefile: shp2pgsql -s 4269 -W LATIN1 -g the_geom -I tl_2012_39035_edges.shp chp11.tl_2012_39035_edges > tl_2012_39035_edges.sql How to do it... Now it's time to give the data we downloaded a look using QGIS. We must first create a connection to the database in order to access the table. Get connected and add the table as a layer by following the ensuing steps: Click on the Add PostGIS Layers icon: Click on the New button below the Connections drop-down menu. Create a new PostGIS connection. After the Add PostGIS Table(s) window opens, create a name for the connection and fill in a few parameters for your database, including Host, Port, Database, Username, and Password: Once you have entered all of the pertinent information for your database, click on the Test Connection button to verify that the connection is successful. If the connection is not successful, double-check for typos and errors. Additionally, make sure you are attempting to connect to a PostGIS-enabled database. If the connection is successful, go ahead and check the Save Username and Save Password checkboxes. This will prevent you from having to enter your login information multiple times throughout the exercise. Click on OK at the bottom of the menu to apply the connection settings. Now you can connect! Make sure the name of your PostGIS connection appears in the drop-down menu and then click on the Connect button. If you choose not to store your username and password, you will be asked to submit this information every time you try to access the database. Once connected, all schemas within the database will be shown and the tables will be made visible by expanding the target schema. Select the table(s) to be added as a layer by simply clicking on the table name or anywhere along its row. Selection(s) will be highlighted in blue. To deselect a table, click on it a second time and it will no longer be highlighted. Select the tl_2012_39035_edges table that was downloaded at the beginning of the tutorial and click on the Add button, as shown in the following screenshot: A subset of the table can also be added as a layer. This is accomplished by double-clicking on the desired table name. The Query Builder window will open, which aids in creating simple SQL WHERE clause statements. Add the roads by selecting the records where roadflg = Y. This can be done by typing a query or using the buttons within Query Builder: Click on the OK button followed by the Add button. A subset of the table is now loaded into QGIS as a layer. The layer is strictly a static, temporary copy of your database. You can make whatever changes you like to the layer and not affect the database table. The same holds true the other way around. Changes to the table in the database will have no effect on the layer in QGIS. If needed, you can save the temporary layer in a variety of formats, such as DXF, GeoJSON, KML, or SHP. Simply right-click on the layer name in the Layers panel and click on Save As. This will then create a file, which you can recall at a later time or share with others. The following screenshot shows the Cuyahoga county road network: You may also use the QGIS Browser Panel to navigate through the now connected PostGIS database and list the schemas and tables. This panel allows you to double-click to add spatial layers to the current project, providing a better user experience not only of connected databases, but on any directory of your machine: How it works... You have added a PostGIS layer into QGIS using the built-in Add PostGIS Table GUI. This was achieved by creating a new connection and entering your database parameters. Any number of database connections can be set up simultaneously. If working with multiple databases is more common for your workflows, saving all of the connections into one XML file and would save time and energy when returning to these projects in QGIS. To explore more on 3D capabilities of PostGIS, including LiDAR point clouds, read PostGIS Cookbook - Second Edition. Using R to implement Kriging – A Spatial Interpolation technique for Geostatistics data Top 7 libraries for geospatial analysis Learning R for Geospatial Analysis
Read more
  • 0
  • 0
  • 29598
article-image-gql-graph-query-language-joins-sql-as-a-global-standards-project-and-is-now-the-international-standard-declarative-query-language-for-graphs
Amrata Joshi
19 Sep 2019
6 min read
Save for later

GQL (Graph Query Language) joins SQL as a Global Standards Project and will be the international standard declarative query language for graphs

Amrata Joshi
19 Sep 2019
6 min read
On Tuesday, the team at Neo4j, the graph database management system announced that the international committees behind the development of the SQL standard have voted to initiate GQL (Graph Query Language) as the new database query language. GQL is now going to be the international standard declarative query language for property graphs and it is also a Global Standards Project. GQL is developed and maintained by the same international group that maintains the SQL standard. How did the proposal for GQL pass? Last year in May, the initiative for GQL was first time processed in the GQL Manifesto. This year in June, the national standards bodies across the world from the ISO/IEC’s Joint Technical Committee 1 (responsible for IT standards) started voting on the GQL project proposal.  The ballot closed earlier this week and the proposal was passed wherein ten countries including Germany, Korea, United States, UK, and China voted in favor. And seven countries agreed to put forward their experts to work on this project. Japan was the only country to vote against in the ballot because according to Japan, existing languages already do the job, and SQL/Property Graph Query extensions along with the rest of the SQL standard can do the same job. According to the Neo4j team, the GQL project will initiate development of next-generation technology standards for accessing data. Its charter mandates building on core foundations that are established by SQL and ongoing collaboration in order to ensure SQL and GQL interoperability and compatibility. GQL would reflect rapid growth in the graph database market by increasing adoption of the Cypher language.  Stefan Plantikow, GQL project lead and editor of the planned GQL specification, said, “I believe now is the perfect time for the industry to come together and define the next generation graph query language standard.”  Plantikow further added, “It’s great to receive formal recognition of the need for a standard language. Building upon a decade of experience with property graph querying, GQL will support native graph data types and structures, its own graph schema, a pattern-based approach to data querying, insertion and manipulation, and the ability to create new graphs, and graph views, as well as generate tabular and nested data. Our intent is to respect, evolve, and integrate key concepts from several existing languages including graph extensions to SQL.” Keith Hare, who has served as the chair of the international SQL standards committee for database languages since 2005, charted the progress toward GQL, said, “We have reached a balance of initiating GQL, the database query language of the future whilst preserving the value and ubiquity of SQL.” Hare further added, “Our committee has been heartened to see strong international community participation to usher in the GQL project.  Such support is the mark of an emerging de jure and de facto standard .” The need for a graph-specific query language Researchers and vendors needed a graph-specific query language because of the following limitations: SQL/PGQ language is restricted to read-only queries SQL/PGQ cannot project new graphs The SQL/PGQ language can only access those graphs that are based on taking a graph view over SQL tables. Researchers and vendors needed a language like Cypher that would cover insertion and maintenance of data and not just data querying. But SQL wasn’t the apt model for a graph-centric language that takes graphs as query inputs and outputs a graph as a result. But GQL, on the other hand, builds in openCypher, a project that brings Cypher to Apache Spark and gives users a composable graph query language. SQL and GQL can work together According to most of the companies and national standards bodies that are supporting the GQL initiative, GQL and SQL are not competitors. Instead, these languages can complement each other via interoperation and shared foundations. Alastair Green, Query Languages Standards & Research Lead at Neo4j writes, “A SQL/PGQ query is in fact a SQL sub-query wrapped around a chunk of proto-GQL.” SQL is a language that is built around tables whereas GQL is built around graphs. Users can use GQL to find and project a graph from a graph.  Green further writes, “I think that the SQL standards community has made the right decision here: allow SQL, a language built around tables, to quote GQL when the SQL user wants to find and project a table from a graph, but use GQL when the user wants to find and project a graph from a graph. Which means that we can produce and catalog graphs which are not just views over tables, but discrete complex data objects.” It is still not clear when will the first implementation version of GQL will be out. The official page reads,  “The work of the GQL project starts in earnest at the next meeting of the SQL/GQL standards committee, ISO/IEC JTC 1 SC 32/WG3, in Arusha, Tanzania, later this month. It is impossible at this stage to say when the first implementable version of GQL will become available, but it is highly likely that some reasonably complete draft will have been created by the second half of 2020.” Developer community welcomes the new addition Users are excited to see how GQL will incorporate Cypher, a user commented on HackerNews, “It's been years since I've worked with the product and while I don't miss Neo4j, I do miss the query language. It's a little unclear to me how GQL will incorporate Cypher but I hope the initiative is successful if for no other reason than a selfish one: I'd love Cypher to be around if I ever wind up using a GraphDB again.” Few others mistook GQL to be Facebook’s GraphQL and are sceptical about the name. A comment on HackerNews reads, “Also, the name is of course justified, but it will be a mess to search for due to (Facebook) GraphQL.” A user commented, “I read the entire article and came away mistakenly thinking this was the same thing as GraphQL.” Another user commented, “That's quiet an unfortunate name clash with the existing GraphQL language in a similar domain.” Other interesting news in Data Media manipulation by Deepfakes and cheap fakes refquire both AI and social fixes, finds a Data & Society report Percona announces Percona Distribution for PostgreSQL to support open source databases  Keras 2.3.0, the first release of multi-backend Keras with TensorFlow 2.0 support is now out
Read more
  • 0
  • 0
  • 29596

article-image-introducing-weld-a-runtime-written-in-rust-and-llvm-for-cross-library-optimizations
Bhagyashree R
24 Sep 2019
5 min read
Save for later

Introducing Weld, a runtime written in Rust and LLVM for cross-library optimizations

Bhagyashree R
24 Sep 2019
5 min read
Weld is an open-source Rust project for improving the performance of data-intensive applications. It is an interface and runtime that can be integrated into existing frameworks including Spark, TensorFlow, Pandas, and NumPy without changing their user-facing APIs. The motivation behind Weld Data analytics applications today often require developers to combine various functions from different libraries and frameworks to accomplish a particular task. For instance, a typical Python ecosystem application selects some data using Spark SQL, transforms it using NumPy and Pandas, and trains a model with TensorFlow. This improves developers’ productivity as they are taking advantage of functions from high-quality libraries. However, these functions are usually optimized in isolation, which is not enough to achieve the best application performance. Weld aims to solve this problem by providing an interface and runtime that can optimize across data-intensive libraries and frameworks while preserving their user-facing APIs. In an interview with Federico Carrone, a Tech Lead at LambdaClass, Weld’s main contributor, Shoumik Palkar shared, “The motivation behind Weld is to provide bare-metal performance for applications that rely on existing high-level APIs such as NumPy and Pandas. The main problem it solves is enabling cross-function and cross-library optimizations that other libraries today don’t provide.” How Weld works Weld serves as a common runtime that allows libraries from different domains like SQL and machine learning to represent their computations in a common functional intermediate representation (IR). This IR is then optimized by a compiler optimizer and JIT’d to efficient machine code for diverse parallel hardware. It performs a wide range of optimizations on the IR including loop fusion, loop tiling, and vectorization. “Weld’s IR is natively parallel, so programs expressed in it can always be trivially parallelized,” said Palkar. When Weld was first introduced it was mainly used for cross-library optimizations. However, over time people have started to use it for other applications as well. It can be used to build JITs or new physical execution engines for databases or analytics frameworks, individual libraries, target new kinds of parallel hardware using the IR, and more. To evaluate Weld’s performance the team integrated it with popular data analytics frameworks including Spark, NumPy, and TensorFlow. This prototype showed up to 30x improvements over the native framework implementations. While cross library optimizations between Pandas and NumPy also improved performance by up to two orders of magnitude. Source: Weld Why Rust and LLVM were chosen for its implementation The first iteration of Weld was implemented in Scala because of its algebraic data types, powerful pattern matching, and large ecosystem. However, it did have some shortcomings. Palkar shared in the interview, “We moved away from Scala because it was too difficult to embed a JVM-based language into other runtimes and languages.” It had a managed runtime, clunky build system, and its JIT compilations were quite slow for larger programs. Because of these shortcomings the team wanted to redesign the JIT compiler, core API, and runtime from the ground up. They were in the search for a language that was fast, safe, didn’t have a managed runtime, provided a rich standard library, functional paradigms, good package manager, and great community background. So, they zeroed-in on Rust that happens to meet all these requirements. Rust provides a very minimal, no setup required runtime. It can be easily embedded into other languages such as Java and Python. To make development easier, it has high-quality packages, known as crates, and functional paradigms such as pattern matching. Lastly, it is backed by a great Rust Community. Read also: “Rust is the future of systems programming, C is the new Assembly”: Intel principal engineer, Josh Triplett Explaining the reason why they chose LLVM, Palkar said in the interview, “We chose LLVM because its an open-source compiler framework that has wide use and support; we generate LLVM directly instead of C/C++ so we don’t need to rely on the existence of a C compiler, and because it improves compilation times (we don’t need to parse C/C++ code).” In a discussion on Hacker News many users listed other Weld-like projects that developers may find useful. A user commented, “Also worth checking out OmniSci (formerly MapD), which features an LLVM query compiler to gain large speedups executing SQL on both CPU and GPU.” Users also talked about Numba, an open-source JIT compiler that translates Python functions to optimized machine code at runtime with the help of the LLVM compiler library.  “Very bizarre there is no discussion of numba here, which has been around and used widely for many years, achieves faster speedups than this, and also emits an LLVM IR that is likely a much better starting point for developing a “universal” scientific computing IR than doing yet another thing that further complicates it with fairly needless involvement of Rust,” a user added. To know more about Weld, check out the full interview on Medium. Also, watch this RustConf 2019 talk by Shoumik Palkar: https://www.youtube.com/watch?v=AZsgdCEQjFo&t Other news in Programming Darklang available in private beta GNU community announces ‘Parallel GCC’ for parallelism in real-world compilers TextMate 2.0, the text editor for macOS releases  
Read more
  • 0
  • 0
  • 29549

article-image-transformers-2-0-nlp-library-with-deep-interoperability-between-tensorflow-2-0-and-pytorch
Fatema Patrawala
30 Sep 2019
3 min read
Save for later

Transformers 2.0: NLP library with deep interoperability between TensorFlow 2.0 and PyTorch, and 32+ pretrained models in 100+ languages

Fatema Patrawala
30 Sep 2019
3 min read
Last week, Hugging Face, a startup specializing in natural language processing, released a landmark update to their popular Transformers library, offering unprecedented compatibility between two major deep learning frameworks, PyTorch and TensorFlow 2.0. Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over 32+ pretrained models in 100+ languages and deep interoperability between TensorFlow 2.0 and PyTorch. Transformers 2.0 embraces the ‘best of both worlds’, combining PyTorch’s ease of use with TensorFlow’s production-grade ecosystem. The new library makes it easier for scientists and practitioners to select different frameworks for the training, evaluation and production phases of developing the same language model. “This is a lot deeper than what people usually think when they talk about compatibility,” said Thomas Wolf, who leads Hugging Face’s data science team. “It’s not only about being able to use the library separately in PyTorch and TensorFlow. We’re talking about being able to seamlessly move from one framework to the other dynamically during the life of the model.” https://twitter.com/Thom_Wolf/status/1177193003678601216 “It’s the number one feature that companies asked for since the launch of the library last year,” said Clement Delangue, CEO of Hugging Face. Notable features in Transformers 2.0 8 architectures with over 30 pretrained models, in more than 100 languages Load a model and pre-process a dataset in less than 10 lines of code Train a state-of-the-art language model in a single line with the tf.keras fit function Share pretrained models, reducing compute costs and carbon footprint Deep interoperability between TensorFlow 2.0 and PyTorch models Move a single model between TF2.0/PyTorch frameworks at will Seamlessly pick the right framework for training, evaluation, production As powerful and concise as Keras About Hugging Face Transformers With half a million installs since January 2019, Transformers is the most popular open-source NLP library. More than 1,000 companies including Bing, Apple or Stitchfix are using it in production for text classification, question-answering, intent detection, text generation or conversational. Hugging Face, the creators of Transformers, have raised US$5M so far from investors in companies like Betaworks, Salesforce, Amazon and Apple. On Hacker News, users are appreciating the company and how Transformers has become the most important library in NLP. Other interesting news in data Baidu open sources ERNIE 2.0, a continual pre-training NLP model that outperforms BERT and XLNet on 16 NLP tasks Dr Joshua Eckroth on performing Sentiment Analysis on social media platforms using CoreNLP Facebook open-sources PyText, a PyTorch based NLP modeling framework
Read more
  • 0
  • 0
  • 29544
article-image-microsoft-azure-stack-architecture
Packt
07 Jun 2017
13 min read
Save for later

The Microsoft Azure Stack Architecture

Packt
07 Jun 2017
13 min read
In this article by Markus Klein and Susan Roesner, authors of the book Azure Stack for Datacenters, we will help you to plan, build, run, and develop your own Azure-based datacenter running Azure Stack technology. The goal is that the technology in your datacenter will be a 100 percent consistent using Azure, which provides flexibility and elasticity to your IT infrastructure. We will learn about: Cloud basics The Microsoft Azure Stack Core management services Using Azure Stack Migrating services to Azure Stack (For more resources related to this topic, see here.) Cloud as the new IT infrastructure Regarding the technical requirements of today's IT, the cloud is always a part of the general IT strategy. It does not depend upon the region in which the company is working in, nor does it depend upon the part of the economy—99.9 percent of all companies have cloud technology already in their environment. The good question for a lot of CIOs is general: "To what extent do we allow cloud services, and what does that mean to our infrastructure?" So it's a matter of compliance, allowance, and willingness. The top 10most important questions for a CIO to prepare for the cloud are as follows: Are we allowed to save our data in the cloud? What classification of data can be saved in the cloud? How flexible are we regarding the cloud? Do we have the knowledge to work with cloud technology? How does our current IT setup and infrastructure fit into the cloud's requirements? Is our current infrastructure already prepared for the cloud? Are we already working with a cloud-ready infrastructure? Is our Internet bandwidth good enough? What does the cloud mean to my employees? Which technology should we choose? Cloud Terminology The definition of the term "cloud" is not simple, but we need to differentiate between the following: Private cloud: This is a highly dynamic IT infrastructure based on virtualization technology that is flexible and scalable. The resources are saved in a privately owned datacenter either in your company or a service provider of your choice. Public cloud: This is a shared offering of IT infrastructure services that are provided via the Internet. Hybrid cloud: This is a mixture of a private and public cloud. Depending on compliance or other security regulations, the services that could be run in a public datacenter are already deployed there, but the services that need to be stored inside the company are running there. The goal is to run these services on the same technology to provide the agility, flexibility, and scalability to move services between public and private datacenters. In general, there are some big players within the cloud market (for example, Amazon Web Services, Google, Azure, and even Alibaba). If a company is quite Microsoft-minded from the infrastructure point of view, they should have a look at Microsoft Azure datacenters. Microsoft started in 2008 with their first datacenter, and today, they invest a billion dollars every month in Azure. As of today, there are about 34 official datacenters around the world that form Microsoft Azure, besides some that Microsoft does not talk about (for example, USGovernment Azure). There are some dedicated datacenters, such as the German Azure cloud, that do not have connectivity to Azure worldwide. Due to compliance requirements, these frontiers need to exist, but the technology of each Azure datacenter is the same although the services offered may vary. The following map gives an overview of the locations (so-called regions) in Azure as of today and provide an idea of which ones will be coming soon: The Microsoft cloud story When Microsoft started their public cloud, they decided that there must be a private cloud stack too, especially, to prepare their infrastructure to run in Azure sometime in the future. The first private cloud solution was the System Center suite, with System Center Orchestrator and Service Provider Foundation (SPF) and Service Manager as the self-service portal solution. Later on, Microsoft launched Windows Azure Pack for Windows Server. Today, Windows Azure Pack is available as a product focused on the private cloud and provides a self-service portal (the well-known old Azure Portal, code name red dog frontend), and it uses the System Center suite as its underlying technology: Microsoft Azure Stack In May2015, Microsoft formally announced a new solution that brings Azure to your datacenter. This solution was named Microsoft Azure Stack. To put it in one sentence: Azure Stack is the same technology with the same APIs and portal as Public Azure, but you could run it in your datacenter or in that of your service provider. With Azure Stack, System Center is completely gone because everything is the way it is in Azure now, and in Azure, there is no System Center at all. This is what the primary focus of this article is. The following diagram gives a current overview of the technical design of Azure Stack compared with Azure: The one and only difference between Azure Stack and Azure is the cloud infrastructure. In Azure, there are thousands of servers that are part of the solution; with Azure Stack, the number is slightly smaller. That's why there is the cloud-inspired infrastructure based on Windows Server, Hyper-V, and Azure technologies as the underlying technology stack. There is no System Center product in this stack anymore. This does not mean that it cannot be there (for example, SCOM for on-premise monitoring), but Azure Stack itself provides all functionality with the solution itself. For stability and functionality, Microsoft decided to provide Azure Stack as a so-called integrated system, so it will come to your door with the hardware stack included. The customer buys Azure Stack as a complete technology stack. At general availability(GA), the hardware OEMs are HPE, DellEMC, and Lenovo. In addition to this, there will be a one-host PoC deployment available for download that could be run as a proof of concept solution on every type of hardware, as soon as it meets the hardware requirements. Technical design Looking at the technical design a bit more in depth, there are some components that we need to dive deeper into: The general basis of Azure Stack is Windows Server 2016 technology, which builds the cloud-inspired infrastructure: Storage Spaces Direct (S2D) VXLAN Nano Server Azure Resource Manager (ARM) Storage Spaces Direct (S2D) Storage Spaces and Scale-Out File Server were technologies that came with Windows Server 2012. The lack of stability in the initial versions and the issues with the underlying hardware was a bad phase. The general concept was a shared storage setup using JBODs controlled from Windows Server 2012 Storage Spaces servers and a magic Scale-Out File Server cluster that acted as the single point of contact for storage: With Windows Server 2016, the design is quite different and the concept relies on a shared-nothing model, even with local attached storage: This is the storage design Azure Stack is coming up with as one of its main pillars. VXLANnetworking technology With Windows Server 2012, Microsoft introduced Software-Defined Networking(SDN)and the NVGRE technology. Hyper-V Network Virtualization supports Network Virtualization using Generic Routing Encapsulation (NVGRE) as the mechanism to virtualize IP addresses. In NVGRE, the virtual machine's packet is encapsulated inside another packet: Hyper-V Network Virtualization supports NVGRE as the mechanism to virtualize IP addresses. In NVGRE, the virtual machine's packet is encapsulated inside another packet. VXLAN comes as the new SDNv2protocol, is RFC compliant, and is supported by most network hardware vendors by default. The Virtual eXtensible Local Area Network (VXLAN) RFC 7348 protocol has been widely adopted in the marketplace, with support from vendors such as Cisco, Brocade, Arista, Dell, and HP. The VXLAN protocol uses UDP as the transport: Nano Server Nano Server offers a minimal-footprint headless version of Windows Server 2016. It completely excludes the graphical user interface, which means that it is quite small, headless, and easy to handle regarding updates and security fixes, but it doesn't provide the GUI expected by customers of Windows Server. Azure Resource Manager (ARM) The “magical” Azure Resource Manager is a 1-1 bit share with ARM from Azure, so it has the same update frequency and features that are available in Azure, too. ARM is a consistent management layer that saves resources, dependencies, inputs, and outputs as an idempotent deployment as a JSON file called an ARM template. This template defines the tastes of a deployment, whether it be VMs, databases, websites, or anything else. The goal is that once a template is designed, it can be run on each Azure-based cloud platform, including Azure Stack. ARM provides cloud consistency with the finest granularity, and the only difference between the clouds is the region the template is being deployed to and the corresponding REST endpoints. ARM not only provides a template for a logical combination of resources within Azure, it manages subscriptions and role-based access control(RBAC) and defines the gallery, metric, and usage data, too. This means quite simply that everything that needs to be done with Azure resources should be done with ARM. Not only does Azure Resource Manager design one virtual machine, it is responsible for setting up one to a bunch of resources that fit together for a specific service. Even ARM templates can be nested; this means they can depend on each other. When working with ARM, you should know the following vocabulary: Resource: Are source is a manageable item available in Azure Resource group: A resource group is the container of resources that fit together within a service Resource provider: A resource provider is a service that can be consumed within Azure Resource manager template: A resource manager template is the definition of a specific service Declarative syntax: Declarative syntax means that the template does not define the way to set up a resource; it just defines how the result and the resource itself has the feature to set up and configure itself to fulfill the syntax to create your own ARM templates, you need to fulfill the following minimum requirements: A test editor of your choice Visual Studio Community Edition Azure SDK Visual Studio Community Edition is available for free from the Internet. After setting these things up, you could start it and define your own templates: Setting up a simple blank template looks like this: There are different ways to get a template so that you can work on and modify it to fit your needs: Visual Studio templates Quick-start templates on GitHub Azure ARM templates You could export the ARM template directly from Azure Portal if the resource has been deployed: After clicking on View template, the following opens up: For further reading on ARM basics, the Getting started with Azure Resource Managerdocument is a good place to begin: http://aka.ms/GettingStartedWithARM. PowerShell Desired State Configuration We talked about ARM and ARM templates that define resources, but they are unable to design the waya VM looks inside, specify which software needs to be installed, and how the deployment should be done. This is why we need to have a look at VMextensions.VMextensions define what should be done after ARM deployment has finished. In general, the extension could be anything that's a script. The best practice is to use PowerShell and its add-on called Desired State Configuration (DSC). DSC defines—quite similarly to ARM—how the software needs to be installed and configured. The great concept is that it also monitors whether the desired state of a virtual machine is changing (for example, because an administrator uninstalls or reconfigures a machine). If it does, it makes sure within minutes whether the original state will be fulfilled again and rolls back the actions to the desired state: Migrating services to Azure Stack If you are running virtual machines today, you're already using a cloud-based technology, although we do not call it cloud today. Basically, this is the idea of a private cloud. If you are running Azure Pack today, you are quite near Azure Stack from the processes point of view but not the technology part. There is a solution called connectors for Azure Pack that lets you have one portal UI for both cloud solutions. This means that the customer can manage everything out of the Azure Stack Portal, although services run in Azure Pack as a legacy solution. Basically, there is no real migration path within Azure Stack. But the way to solve this is quite easy, because you could use every tool that you can use to migrate services to Azure. Azure website migration assistant The Azure website migration assistant will provide a high-level readiness assessment for existing websites. This report outlines sites that are ready to move and elements that may need changes, and it highlights unsupported features. If everything is prepared properly, the tool creates any website and associated database automatically and synchronizes the content. You can learn more about it at https://azure.microsoft.com/en-us/downloads/migration-assistant/: For virtual machines, there are two tools available: Virtual Machines Readiness Assessment Virtual Machines Optimization Assessment Virtual Machines Readiness Assessment The Virtual Machines Readiness Assessment tool will automatically inspect your environment and provide you with a checklist and detailed report on steps for migrating the environment to the cloud. The download location is https://azure.microsoft.com/en-us/downloads/vm-readiness-assessment/. If you run the tool, you will get an output like this: Virtual Machines Optimization Assessment The Virtual Machine Optimization Assessment tool will at first start with a questionnaire and ask several questions about your deployment. Then, it will create an automated data collection and analysis of your Azure VMs. It generates a custom report with tenprioritized recommendations across six focus areas. These areas are security and compliance, performance and scalability, and availability and business continuity. The download location ishttps://azure.microsoft.com/en-us/downloads/vm-optimization-assessment/. Summary Azure Stack provides a real Azure experience in your datacenter. The UI, administrative tools, and even third-party solutions should work properly. The design of Azure Stack is a very small instance of Azure with some technical design modifications, especially regarding the compute, storage, and network resource providers. These modifications give you a means to start small, think big, and deploy large when migrating services directly to public Azure sometime in the future, if needed. The most important tool for planning, describing, defining, and deploying Azure Stack services is Azure Resource Manager, just like in Azure. This provides you a way to create your services just once but deploy them many times. From the business perspective, this means you have better TCO and lower administrative costs. Resources for Article: Further resources on this subject: Deploying and Synchronizing Azure Active Directory [article] What is Azure API Management? [article] Installing and Configuring Windows Azure Pack [article]
Read more
  • 0
  • 0
  • 29512

article-image-responsive-web-design-wordpress
Packt
09 Jul 2015
13 min read
Save for later

Responsive Web Design with WordPress

Packt
09 Jul 2015
13 min read
Welcome to the world of the Responsive Web Design! This article is written by Dejan Markovic, author of the book WordPress Responsive Theme Design, and it will introduce you to the Responsive Web Design and its concepts and techniques. It will also present crisp notes from WordPress Responsive Theme Design. (For more resources related to this topic, see here.) Responsive web design (RWD) is a web design approach aimed at crafting sites to provide an optimal viewing experience—easy reading and navigation with a minimum of resizing, panning, and scrolling—across a wide range of devices (from mobile phones to desktop computer monitors). Reference: http://en.wikipedia.org/wiki/Responsive_web_design. To say it simply, responsive web design (RWD) means that the responsive website should adapt to the screen size of the device it is being viewed on. When I began my web development journey in 2002, we didn't have to consider as many factors as we do today. We just had to create the website for a 17-inch screen (which was the standard at that time), and that was it. Yes, we also had to consider 15, 19, and 21-inch monitors, but since the 17-inch screen was the standard, that was the target screen size for us. In pixels, these sizes were usually 800 or 1024. We also had to consider a fewer number of browsers (Internet Explorer, Netscape, and Opera) and the styling for the print, and that was it. Since then, a lot of things have changed, and today, in 2015, for a website design, we have to consider multiple factors, such as: A lot of different web browsers (Internet Explorer, Firefox, Opera, Chrome, and Safari) A number of different operating systems (Windows (XP, 7, and 8), Mac OS X, Linux, Unix, iOS, Android, and Windows phones) Device screen sizes (desktop, mobile, and tablet) Is content accessible and readable with screen readers? How the content will look like when it's printed? Today, creating different design for all these listed factors & devices would take years. This is where a responsive web design comes to the rescue. The concepts of RWD I have to point out that the mobile environment is becoming more important factor than the desktop environment. Mobile browsing is becoming bigger than the desktop-based access, which makes the mobile environment very important factor to consider when developing a website. Simply put, the main point of RWD is that the layout changes based on the size and capabilities of the device its being viewed on. The concepts of RWD, that we will learn next, are: Viewport, scaling and screen density. Controlling Viewport On the desktop, Viewport is the screen size of the window in a browser. For example, when we resize the browser window, we are actually changing the Viewport size. On mobile devices, the Viewport size is also independent of the device screen size. For example, Viewport is 850 px for mobile Opera and 980 px for mobile Safari, and the screen size for iPhone is 320 px. If we compare the Viewport size of 980 px and the screen size of an iPhone of 320 px, we can see that Viewport is bigger than the screen size. This is because mobile browsers function differently. They first load the page into Viewport, and then they resize it to the device's screen size. This is why we are able to see the whole page on the mobile device. If the mobile browsers had Viewport the same as the screen size (320 px), we would be able to see only a part of the page on the mobile device. In the following screenshot, we can see the table with the list of Viewport sizes for some iPhone models: We can control Viewport with CSS: @viewport {width: device-width;} Or, we can control it with the meta tag: <meta name="viewport" content="width=device-width"> In the preceding code, we are matching the Viewport width with the device width. Because the Viewport meta tag approach is more widely adopted, as it was first used on iOS and the @viewport approach was not supported by some browsers, we will use the meta tag approach. We are setting the Viewport width in order to match our web content with our mobile content, as we want to make sure that our web content looks good on a mobile device as well. We can set Viewports in the code for each device separately, for example, 320 px for the iPhone. The better approach will be to use content="width=device-width". Scaling Scaling is extremely important, as the initial scale controls the zoom aspect of the content for the initial look of the page. For example, if the initial scale is set to 3, the content will be loaded in the size of 3 times of the Viewport size, which means 3 times zoom. Here is the look of the screenshot for initial-scale=1 and initial-scale=3: As we can see from the preceding screenshots, on the initial scale 3 (three times zoom), the logo image takes the bigger part of the screen. It is important to note that this is just the initial scale, which means that the user can zoom in and zoom out later, if they want to. Here is the example of the code with the initial scale: <meta name="viewport" content="width=device-width, initial- scale=1, maximum-scale=1"> In this example, we have used the maximum-scale=1 option, which means that the user will not be able to use the zoom here. We should avoid using the maximum-scale property because of accessibility issues. If we forbid zooming on our pages, users with visual problems will not be able to see the content properly. The screen density As the screen technology is going forward every year or even faster than that, we have to consider the screen density aspect as well. Screen density is the number of pixels that are contained within a screen area. This means that if the screen density is higher, we can have more details, in this case, pixels in the same area. There are two measurements that are usually used for this, dots per inch (DPI) and pixels per inch (PPI). DPI means how many drops a printer can place in an inch of a space. PPI is the number of pixels we can have in one inch of the screen. If we go back to the preceding screenshot with the table where we are showing Viewports and densities and compare the values of iPhone 3G and iPhone 4S, we will see that the screen size stayed the same at 3.5 inch, Viewport stayed the same at 320 px, but the screen density has doubled, from 163 dpi to 326 dpi, which means that the screen resolution also has doubled from 320x480 to 640x960. The screen density is very relevant to RWD, as newer devices have bigger densities and we should do our best to cover as many densities as we can in order to provide a better experience for end users. Pixels' density matters more than the resolution or screen size, because more pixels is equal to sharper display: There are topics that need to be taken into consideration, such as hardware, reference pixels, and the device-pixel-ratio, too. Problems and solutions with the screen density Scalable vector graphics and CSS graphics will scale to the resolution. This is why I recommend using Font Awesome icons in your project. Font Awesome icons are available for download at: http://fortawesome.github.io/Font-Awesome/icons/. Font Icons is a font that is made up of symbols, icons, or pictograms (whatever you prefer to call them) that you can use in a webpage just like a font. They can be instantly customized with properties like: size, drop shadow, or anything you want can be done with the power of CSS. The real problem triggered by the change in the screen density is images, as for high-density screens, we should provide higher resolution images. There are several ways through which we can approach this problem: By targeting high-density screens (providing high-resolution images to all screens) By providing high-resolution images where appropriate (loading high-resolution images only on devices with high-resolution screens) By not using high-resolution images For the beginner developers I will recommend using second approach, providing high-resolution images where appropriate. Techniques in RWD RWD consists of three coding techniques: Media queries (adapt content to specific screen sizes) Fluid grids (for flexible layouts) Flexible images and media (that respond to changes to screen sizes) More detailed information about RWD techniques by Ethan Marcote, who coined the term Reponsive Web Design, is available at http://alistapart.com/article/responsive-web-design. Media queries Media queries are CSS modules, or as some people like to say, just a conditional statements, which are telling tells the browsers to use a specific type of style, depending on the size of the screen and other factors, such as print (specific styles for print). They are here for a long time already, as I was using different styles for print in 2002. If you wish to know more about media queries, refer to W3C Candidate Recommendation 8 July 2002 at http://www.w3.org/TR/2002/CR-css3-mediaqueries-20020708/. Here is an example of media query declaration: @media only screen and (min-width:500px) { font-family: sans-serif; } Let's explain the preceding code: The @media code means that it is a media type declaration. The screen and part of the query is an expression or condition (in this case, it means only screen and no print). The following conditional statement means that everything above 500 px will have the font family of sans serif: (min-width:500px) { font-family: sans-serif; } Here is another example of a media query declaration: @media only screen and (min-width: 500px), screen and (orientation: portrait) { font-family: sans-serif; } In this case, if we have two statements and if one of the statements is true, the entire declaration is applied (either everything above 50 px or the portrait orientation will be applied to the screen). The only keyword hides the styles from older browsers. As some older browsers don't support media queries, I recommend using a respond.js script, which will "patch" support for them. Polyfill (or polyfiller) is code that provides features that are not built or supported by some web browsers. For example, a number of HTML5 features are not supported by older versions of IE (older than 8 or 9), but these features can be used if polyfill is installed on the web page. This means that if the developer wants to use these features, he/she can just include that polyfill library and these features will work in older browsers. Breakpoints Breakpoint is a moment when layout switches, from one layout to another, when some condition is fulfilled, for example, the screen has been resized. Almost all responsive designs cover the changes of the screen between the desktop, tablets, and smart phones. Here is an example with comments inside: @media only screen and (max-width: 480px) { //mobile styles // up to 480px size } Media query in the preceding code will only be used if the width of the screen is 480 px or less. @media only screen and (min-width:481px) and (max-width: 768px) { //tablet styles //between 481 and 768px } Media query in the preceding code will only be used if the width of the screen is between the 481 px and 768 px. @media only screen and (min-width:769px) { //desktop styles //from 769px and up } Media query in the preceding code will only be used when the width of the screen is 769 px and more. The minimum width value in desktop styles is 1 pixel over the maximum width value in tablet styles, and the same difference is there between values from tablet and mobile styles. We are doing this in order to avoid overlapping, as that could cause problem with our styles. There is also an approach to set the maximum width and minimum width with em values. Setting em of the screen for maximum will mean that the width of the screen is set relative to the device's font size. If the font size for the device is 16 px (which is the usual size), the maximum width for mobile styles would be 480/16=30. Why do we use em values? With pixel sizes, everything is fixed; for example, h1 is 19 px (or 1.5 em of the default size of 16 px), and that's it. With em sizes, everything is relative, so if we change the default value in the browser from, for example, 16 px to 18 px, everything relative to that will change. Therefore, all h1 values will change from 19 px to 22 px and make our layout "zoomable". Here is the example with sizes changed to em: @media only screen and (max-width: 30em) { //mobile styles // up to 480px size }   @media only screen and (min-width:30em) and (max-width: 48em) { //tablet styles //between 481 and 768px }   @media only screen and (min-width:48em) { //desktop styles //from 769px and up } Fluid grids The major point in RWD is that the content should adapt to any screen it's viewed on. One of the best solutions to do this is to use fluid layouts where our content can be resized on each breakpoint. In fluid grids, we define a maximum layout size for the design. The grid is divided into a specific number of columns to keep the layout clean and easy to handle. Then we design each element with proportional widths and heights instead of pixel based dimensions. So whenever the device or screen size is changed, elements will adjust their widths and heights by the specified proportions to its parent container. Reference: http://www.1stwebdesigner.com/tutorials/fluid-grids-in-responsive-design/. To make the grid flexible (or elastic), we can use the % points, or we can use the em values, whichever suits us better. We can make our own fluid grids, or we can use grid frameworks. As there are so many frameworks available, I would recommend that you use the existing framework rather than building your own. Grid frameworks could use a single grid that covers various screen sizes, or we can have multiple grids for each of the break points or screen size categories, such as mobiles, tablets, and desktops. Some of the notable frameworks are Twitter's Bootstrap, Foundation, and SemanticUI. I prefer Twitter's Bootstrap, as it really helps me speed up the process and it is the most used framework currently. Flexible images and media Last but not the least important, are images and media (videos). The problem with them is that they are elements that come with fixed sizes. There are several approaches to fix this: Replacing dimensions with percentage values Using maximum widths Using background images only for some cases, as these are not good for accessibility Using some libraries, such as Scott Jehl's picturefill (https://github.com/scottjehl/picturefill) Taking out the width and height parameters from the image tag and dealing with dimensions in CSS Summary In this article, you learned about the RWD concepts such as: Viewport, scaling and the screen density. Also, we have covered the RWD techniques: media queries, fluid grids, and flexible media. Resources for Article: Further resources on this subject: Deployment Preparations [article] Why Meteor Rocks! [article] Clustering and Other Unsupervised Learning Methods [article]
Read more
  • 0
  • 0
  • 29501
Modal Close icon
Modal Close icon