How-To Tutorials

08 Mar 2017

8 min read

Toy Bin

08 Mar 2017

In this article by Steffen Damtoft Sommer and Jim Campagno, the author of the book Swift 3 Programming for Kids, we will walk you through what an array is. These are considered collection types in Swift and are very powerful. (For more resources related to this topic, see here.) Array An array stores values of the same type in an ordered list. The following is an example of an array: let list = ["Legos", "Dungeons and Dragons", "Gameboy", "Monopoly", "Rubix Cube"] This is an array (which you can think of as a list). Arrays are an ordered collections of values. We've created a constant called list of the [String] type and assigned it a value that represents our list of toys that we want to take with us. When describing arrays, you surround the type of the values that are being stored in the array by square brackets, [String]. Following is another array called numbers which contains four values being 5, 2, 9 and 22: let numbers = [5, 2, 9, 22] You would describe numbers as being an array which contains Int values which can be written as [Int]. We can confirm this by holding Alt and selecting the numbers constant to see what its type is in a playground file: What if we were to go back and confirm that list is an array of String values. Let's Alt click that constant to make sure: Similar to how we created instances of String and Int without providing any type information, we're doing the same thing here when we create list and numbers. Both list and numbers are created taking advantage of type inference. In creating our two arrays, we weren't explicit in providing any type information, we just created the array and Swift was able to figure out the type of the array for us. If we want to, though, we can provide type information, as follows: let colors: [String] = ["Red", "Orange", "Yellow"] colors is a constant of the [String] type. Now that we know how to create an array in swift, which can be compared to a list in real life, how can we actually use it? Can we access various items from the array? If so, how? Also, can we add new items to the list in case we forgot to include any items? Yes to all of these questions. Every element (or item) in an array is indexed. What does that mean? Well, you can think of being indexed as being numbered. Except that there's one big difference between how we humans number things and how arrays number things. Humans start from 1 when they create a list (just like we did when we created our preceding list). An array starts from 0. So, the first element in an array is considered to be at index 0: Always remember that the first item in any array begins at 0. If we want to grab the first item from an array, we will do so as shown using what is referred to as subscript syntax: That 0 enclosed in two square brackets is what is known as subscript syntax. We are looking to access a certain element in the array at a certain index. In order to do that, we need to use subscript index, including the index of the item we want within square brackets. In doing so, it will return the value at the index. The value at the index in our preceding example is Legos. The = sign is also referred to as the assignment operator. So, we are assigning the Legos value to a new constant, called firstItem. If we were to print out firstItem, Legos should print to the console: print(firstItem) // Prints "Legos" If we want to grab the last item in this array, how do we do it? Well, there are five items in the array, so the last item should be at index 5, right? Wrong! What if we wrote the following code (which would be incorrect!): let lastItem = list[5] This would crash our application, which would be bad. When working with arrays, you need to ensure that you don't attempt to grab an item at a certain index which doesn't exist. There is no item in our array at index 5, which would make our application crash. When you run your app, you will receive the fatal error: Index out of range error. This is shown in the screenshot below: Let's correctly grab the last item in the array: let lastItem = list[4] print("I'm not as good as my sister, but I love solving the (lastItem)") // Prints "I'm not as good as my sister, but I love solving the Rubix Cube" Comments in code are made by writing text after //. None of this text will be considered code and will not be executed; it's a way for you to leave notes in your code. All of a sudden, you've now decided that you don't want to take the rubix cube as it's too difficult to play with. You were never able to solve it on Earth, so you start wondering why bringing it to the moon would help solve that problem. Bringing crayons is a much better idea. Let's swap out the rubix cube for crayons, but how do we do that? Using subscript syntax, we should be able to assign a new value to the array. Let's give it a shot: list[4] = "Crayons" This will not work! But why, can you take a guess? It's telling us that we cannot assign through subscript because list is a constant (we declared it using the let keyword). Ah! That's exactly how String and Int work. We decide whether or not we can change (mutate) the array based upon the let or var keyword just like every other type in Swift. Let's change the list array to a variable using the var keyword: var list = ["Legos", "Dungeons and Dragons", "Gameboy", "Monopoly", "Rubix Cube"] After doing so, we should be able to run this code without any problem: list[4] = "Crayons" If we decide to print the entire array, we will see the following print to console: ["Legos", "Dungeons and Dragons", "Gameboy", "Monopoly", "Crayons"] Note how Rubix Cube is no longer the value at index 4 (our last index); it has been changed to Crayons. That's how we can mutate (or change) elements at certain indexes in our array. What if we want to add a new item to the array, how do we do that? We've just saw that trying to use subscript syntax with an index that doesn't exist in our array crashes our application, so we know we can't use that to add new items to our array. Apple (having created Swift) has created hundreds, if not thousands, of functions that are available in all the different types (like String, Int, and array). You can consider yourself an instance of a person (person being the name of the type). Being an instance of a person, you can run, eat, sleep, study, and exercise (among other things). These things are considered functions (or methods) that are available to you. Your pet rock doesn't have these functions available to it, why? This is because it's an instance of a rock and not an instance of a person. An instance of a rock doesn't have the same functions available to it that an instance of a person has. All that being said, an array can do things that a String and Int can't do. No, arrays can't run or eat, but they can append (or add) new items to themselves. An array can do this by calling the append(_:) method available to it. This method can be called on an instance of an array (like the preceding list) using what is known as dot syntax. In dot syntax, you write the name of the method immediately after the instance name, separated by a period (.), without any space: list.append("Play-Doh") Just as if we were to tell a person to run, we are telling the list to append. However, we can't just tell it to append, we have to pass an argument to the append function so that it can add it to the list. Our list array now looks like this: ["Legos", "Dungeons and Dragons", "Gameboy", "Monopoly", "Crayons", "Play-Doh"] Summary We have covered a lot of material important to understanding Swift and writing iOS apps here. Feel free to reread what you've read so far as well as write code in a playground file. Create your own arrays, add whatever items you want to it, and change values at certain indexes. Get used to the syntax of working with creating an arrays as well as appending new items. If you can feel comfortable up to this point with how arrays work, that's awesome, keep up the great work! Resources for Article: Further resources on this subject: Introducing the Swift Programming Language [article] The Swift Programming Language [article] Functions in Swift [article]

0
0
25411

article-image-building-voice-technology-iot-projects

Packt

08 Nov 2016

10 min read

Building Voice Technology on IoT Projects

Packt

08 Nov 2016

10 min read

0
0
25394

How-To Tutorials

Packt

12 Aug 2016

9 min read

Laravel 5.0 Essentials

Packt

12 Aug 2016

9 min read

In this article by Alfred Nutile from the book, Laravel 5.x Cookbook, we will learn the following topics: Setting up Travis to Auto Deploy when all is Passing Working with Your .env File Testing Your App on Production with Behat (For more resources related to this topic, see here.) Setting up Travis to Auto Deploy when all is Passing Level 0 of any work should be getting a deployment workflow setup. What that means in this case is that a push to GitHub will trigger our Continuous Integration (CI). And then from the CI, if the tests are passing, we trigger the deployment. In this example I am not going to hit the URL Forge gives you but I am going to send an Artifact to S3 and then have call CodeDeploy to deploy this Artifact. Getting ready… You really need to see the section before this, otherwise continue knowing this will make no sense. How to do it… Install the travis command line tool in Homestead as noted in their docs https://github.com/travis-ci/travis.rb#installation. Make sure to use Ruby 2.x: sudo apt-get install ruby2.0-dev sudo gem install travis -v 1.8.2 --no-rdoc --no-ri Then in the recipe folder I run the command > travis setup codedeploy I answer all the questions keeping in mind: The KEY and SECRET are the ones we made of the I AM User in the Section before this The S3 KEY is the filename not the KEY we used for a above. So in my case I just use the name again of the file latest.zip since it sits inside the recipe-artifact bucket. Finally I open the .travis.yml file, which the above modifies and I update the before-deploy area so the zip command ignores my .env file otherwise it would overwrite the file on the server. How it works… Well if you did the CodeDeploy section before this one you will know this is not as easy as it looks. After all the previous work we are able to, with the one command travis setup codedeploy punch in securely all the needed info to get this passing build to deploy. So after phpunit reports things are passing we are ready. With that said we had to have a lot of things in place, S3 bucket to put the artifact, permission with the KEY and SECRET to access the Artifact and CodeDeploy, and a CodeDeploy Group and Application to deploy to. All of this covered in the previous section. After that it is just the magic of Travis and CodeDeploy working together to make this look so easy. See also… Travis Docs: https://docs.travis-ci.com/user/deployment/codedeploy https://github.com/travis-ci/travis.rb https://github.com/travis-ci/travis.rb#installation Working with Your .env File The workflow around this can be tricky. Going from Local, to TravisCI, to CodeDeploy and then to AWS without storing your info in .env on GitHub can be a challenge. What I will show here are some tools and techniques to do this well. Getting ready…. A base install is fine I will use the existing install to show some tricks around this. How to do it… Minimize using Conventions as much as possible config/queue.php I can do this to have one or more Queues config/filesystems.php Use the Config file as much as possible. For example this is in my .env If I add config/marvel.php and then make it look like this My .env can be trimmed down by KEY=VALUES later on I can call to those: Config::get('marvel.MARVEL_API_VERSION') Config::get('marvel.MARVEL_API_BASE_URL') Now to easily send to Staging or Production using the EnvDeployer library >composer require alfred-nutile-inc/env-deployer:dev-master Follow the readme.md for that library. Then as it says in the docs setup your config file so that it matches the destination IP/URL and username and path for those services. I end up with this config file config/envdeployer.php Now the trick to this library is you start to enter KEY=VALUES into your .env file stacked on top of each other. For example, my database settings might look like this. so now I can type: >php artisan envdeployer:push production Then this will push over SSH your .env to production and swap out the related @production values for each KEY they are placed above. How it works… The first mindset to follow is conventions before you put a new KEY=VALUE into the .env file set back and figure out defaults and conventions around what you already must have in this file. For example must haves, APP_ENV, and then I always have APP_NAME so those two together do a lot to make databases, queues, buckets and so on. all around those existing KEYs. It really does add up, whether you are working alone or on a team focus on these conventions and then using the config/some.php file workflow to setup defaults. Then libraries like the one I use above that push this info around with ease. Kind of like Heroku you can command line these settings up to the servers as needed. See also… Laravel Validator for the .env file: https://packagist.org/packages/mathiasgrimm/laravel-env-validator Laravel 5 Fundamentals: Environments and Configuration: https://laracasts.com/series/laravel-5-fundamentals/episodes/6 Testing Your App on Production with Behat So your app is now on Production! Start clicking away at hundreds of little and big features so you can make sure everything went okay or better yet run Behat! Behat on production? Sounds crazy but I will cover some tips on how to do this including how to setup some remote conditions and clean up when you are done. Getting ready… Any app will do. In my case I am going to hit production with some tests I made earlier. How to do it… Tag a Behat test @smoke or just a Scenario that you know it is safe to run on Production for example features/home/search.feature. Update behat.yml adding a profile call production. Then run > vendor/bin/behat -shome_ui --tags=@smoke --profile=production I run an Artisan command to run all these Then you will see it hit the production url and only the Scenarios you feel are safe for Behat. Another method is to login as a demo user. And after logging in as that user you can see data that is related to that user only so you can test authenticated level of data and interactions. For example database/seeds/UserTableSeeder.php add the demo user to the run method Then update your .env. Now push that .env setting up to Production. >php artisan envdeploy:push production Then we update our behat.yml file to run this test even on Production features/auth/login.feature. Now we need to commit our work and push to GitHub so TravisCI can deploy and changes: Since this is a seed and not a migration I need to rerun seeds on production. Since this is a new site, and no one has used it this is fine BUT of course this would have been a migration if I had to do this later in the applications life. Now let's run this test, from our vagrant box > vendor/bin/behat -slogin_ui --profile=production But it fails because I am setting up the start of this test for my local database not the remote database features/bootstrap/LoginPageUIContext.php. So I can basically begin to create a way to setup the state of the world on the remote server. > php artisan make:controller SetupBehatController And update that controller to do the setup. And make the route app/Http/routes.php Then update the behat test features/bootstrap/LoginPageUIContext.php And we should do some cleanup! First add a new method to features/bootstrap/LoginPageUIContext.php. Then add that tag to the Scenarios this is related to features/auth/login.feature Then add the controller like before and route app/Http/Controllers/CleanupBehatController.php Then Push and we are ready test this user with fresh state and then clean up when they are done! In this case I could test editing the Profile from one state to another. How it works… Not to hard! Now we have a workflow that can save us a ton of clicking around Production after every deployment. To begin with I add the tag @smoke to tests I considered safe for production. What does safe mean? Basically read only tests that I know will not effect that site's data. Using the @smoke tag I have a consistent way to make Suites or Scenarios as safe to run on Production. But then I take it a step further and create a way to test authenticated related state. Like make a Favorite or updating a Profile! By using some simple routes and a user I can begin to tests many other things on my long list of features I need to consider after every deploy. All of this happens with the configurability of Behat and how it allows me to manage different Profiles and Suites in the behat.yml file! Lastly I tie into the fact that Behat has hooks. I this case I tie in to the @AfterScenario by adding that to my Annotation. And I add another hooks @profile so it only runs if the Scenario has that Tag. That is it, thanks to Behat, Hooks and how easy it is to make Routes in Laravel I can easily take care of a large percentage of what otherwise would be a tedious process after every deployment! See also… Behat Docus on Hooks—http://docs.behat.org/en/v3.0/guides/3.hooks.html Saucelabs—on behat.yml setting later and you can test your site on numerous devices: https://saucelabs.com/. Summary This article gives a summary of Setting up Travis, working with .env files and Behat. Resources for Article: Further resources on this subject: CRUD Applications using Laravel 4 [article] Laravel Tech Page [article] Eloquent… without Laravel! [article]

0
0
25383

How-To Tutorials

Packt

11 Jul 2017

19 min read

Massive Graphs on Big Data

Packt

11 Jul 2017

19 min read

In this article by Rajat Mehta, author of the book Big Data Analytics with Java, we will learn about graphs. Graphs theory is one of the most important and interesting concepts of computer science. Graphs have been implemented in real life in a lot of use cases. If you use a GPS on your phone or a GPS device and it shows you a driving direction to a place, behind the scene there is an efficient graph that is working for you to give you the best possible direction. In a social network you are connected to your friends and your friends are connected to other friends and so on. This is a massive graph running in production in all the social networks that you use. You can send messages to your friends or follow them or get followed all in this graph. Social networks or a database storing driving directions all involve massive amounts of data and this is not data that can be stored on a single machine, instead this is distributed across a cluster of thousands of nodes or machines. This massive data is nothing but big data and in this article we will learn how data can be represented in the form of a graph so that we make analysis or deductions on top of these massive graphs. In this article, we will cover: A small refresher on the graphs and its basic concepts A small introduction on graph analytics, its advantages and how Apache Spark fits in Introduction on the GraphFrames library that is used on top of Apache Spark Before we dive deeply into each individual section, let's look at the basic graph concepts in brief. (For more resources related to this topic, see here.) Refresher on graphs In this section, we will cover some of the basic concepts of graphs, and this is supposed to be a refresher section on graphs. This is a basic section, hence if some users already know this information they can skip this section. Graphs are used in many important concepts in our day-to-day lives. Before we dive into the ways of representing a graph, let's look at some of the popular use cases of graphs (though this is not a complete list) Graphs are used heavily in social networks In finding driving directions via GPS In many recommendation engines In fraud detection in many financial companies In search engines and in network traffic flows In biological analysis As you must have noted earlier, graphs are used in many applications that we might be using on a daily basis. Graphs is a form of a data structure in computer science that helps in depicting entities and the connection between them. So, if there are two entities such as Airport A and Airport B and they are connected by a flight that takes, for example, say a few hours then Airport A and Airport B are the two entities and the flight connecting between them that takes those specific hours depict the weightage between them or their connection. In formal terms, these entities are called as vertexes and the relationship between them are called as edges. So, in mathematical terms, graph G = {V, E},that is, graph is a function of vertexes and edges. Let's look at the following diagram for the simple example of a graph: As you can see, the preceding graph is a set of six vertexes and eight edges,as shown next: Vertexes = {A, B, C, D, E, F} Edges = { AB, AF, BC, CD, CF, DF, DE, EF} These vertexes can represent any entities, for example, they can be places with the edges being 'distances' between the places or they could be people in a social network with the edges being the type of relationship, for example, friends or followers. Thus, graphs can represent real-world entities like this. The preceding graph is also called a bidirected graph because in this graph the edges go in either direction that is the edge from A to B can be traversed both ways from A to B as well as from B to A. Thus, the edges in the preceding diagrams that is AB can be BA or AF can be FA too. There are other types of graphs called as directed graphs and in these graphs the direction of the edges go in one way only and does not retrace back. A simple example of a directed graph is shown as follows:. As seen in the preceding graph, the edge A to B goes only in one direction as well as the edge B to C. Hence, this is a directed graph. A simple linked list data structure or a tree datastructure are also forms of graph only. In a tree, nodes can have children only and there are no loops, while there is no such rule in a general graph. Representing graphs Visualizing a graph makes it easily comprehensible but depicting it using a program requires two different approaches Adjacency matrix: Representing a graph as a matrix is easy and it has its own advantages and disadvantages. Let's look at the bidirected graph that we showed in the preceding diagram. If you would represent this graph as a matrix, it would like this: The preceding diagram is a simple representation of our graph in matrix form. The concept of matrix representation of graph is simple—if there is an edge to a node we mark the value as 1, else, if the edge is not present, we mark it as 0. As this is a bi-directed graph, it has edges flowing in one direction only. Thus, from the matrix, the rows and columns depict the vertices. There if you look at the vertex A, it has an edge to vertex B and the corresponding matrix value is 1. As you can see, it takes just one step or O[1]to figure out an edge between two nodes. We just need the index (rows and columns) in the matrix and we can extract that value from it. Also, if you would have looked at the matrix closely, you would have seen that most of the entries are zero, hence this is a sparse matrix. Thus, this approach eats a lot of space in computer memory in marking even those elements that do not have an edge to each other, and this is its main disadvantage. Adjacency list: Adjacency list solves the problem of space wastage of adjacency matrix. To solve this problem, it stores the node and its neighbors in a list (linked list) as shown in the following diagram: For maintaining brevity, we have not shown all the vertices but you can make out from the diagram that each vertex is storing its neighbors in a linked list. So when you want to figure out the neighbors of a particular vertex, you can directly iterate over the list. Of course this has the disadvantage of iterating when you have to figure out whether an edge exists between two nodes or not. This approach is also widely used in many algorithms in computer science. We have briefly seen how graphs can be represented, let's now see some important terms that are used heavily on graphs. Common terminology on graphs We will now introduce you to some common terms and concepts in graphs that you can use in your analytics on top of graphs: Vertices:As we mentioned earlier, vertices are the mathematical terms for the nodes in a graph. For analytic purposes, thevertices count shows the number of nodes in the system, for example, the number of people involved in a social graph. Edges: As we mentioned earlier, edges are the connection between vertices and edges can carry weights. The number of edges represent the number of relations in a system of graph. The weight on a graph represents the intensity of the relationship between the nodes involved; for example, in a social network, the relationship of friends is a stronger relationship than followed between nodes. Degrees: Represent the total number of connections flowing into as well as out of a node. For example, in the previous diagram the degree of node F is four. The degree count is useful, for example, in a social network graph it can represent how well a person is connected if his degree count is very high. Indegrees: This represents the number of connections flowing into a node. For example, in the previous diagram, for node F the indegree value is three. In a social network graph, this might represent how many people can send messages to this person or node. Oudegrees: This represents the number of connections flowing out of a node. For example, in the previous diagram, for node F the outdegree value is one. In a social network graph, this might represent how many people can send messages to this person or node. Common algorithms on graphs Let's look at the three common algorithms that are run on graphs frequently and some of their uses: Breadth first search: Breadth first search is an algorithm for graph traversal or searching. As the name suggests, the traversal occurs across the breadth of the graphs that is to say the neighbors of the node form where traversal starts are searched first before exploring further in the same manner. We will refer to the same graph we used earlier: If we start at vertex A, then according to breadth first search next we search or go at the neighbors of A that's B and F. After that, we will go at the neighbors of B and that will be C. Next we will go to the neighbours of F and those will be E and D. We only go through each node once and this mimics real-life travel as well, as to reach from a point to another point we seldom cover the same road or path again. Thus, our breadth first traversal starting from A will be {A , B , F , C , D , E }. Breadth first search is very useful in graph analytics and can tell us things such as the friends that are not your immediate friends but just at the next level after your immediate friends in a social network or in the case of a graph of a flights network it can show flights with just a single stop or two stops to the destination. Depth first search: This is another way of searching where we start from the source vertice and keep on searching until we reach the end node or the leaf node and then we backtrack. This algorithm is not as performant as the bread first search as it requires lots of traversals. So if you want to know if a node A is connected to node B, you might end up searching along a lot of wasteful nodes that do not have anything to do with the original nodes A and B before coming at the appropriate solution. Dijkstra's shortest path: This is a greedy algorithm to find the shortest path in a graph network. So in a weighted graph, if you need to find the shortest path between two nodes, you can start from the starting node and keep on picking the next node in path greedily to be the one with the least weight (in the case of weights being distances between nodes like in city graphs depicting interconnecting cities and roads).So in a road network, you can find the shortest path between two cities using this algorithm. PageRank algorithm: This is a very popular algorithm that came out from Google and it essentially is used to find the importance of a web page by figuring out how connected it is to other important websites. It gives a page rank score to each of the websites based on this approach and finally the search results are built based on this score. The best part about this algorithm is it can be applied to other areas in life too, for example, in figuring out the important airports in a flight graph, or figuring out the most important people in a social network group. So much for the basics and refresher on graphs, in the next section, we will now see how graphs can be used in real world in massive datasets such as social network data or in data used in the field of biology. We will also study how graph analytics can be used on top of these graphs to derive exclusive deductions. Plotting graphs There is a handy open source Java library called GraphStream, which can be used to plot graphs and this is very useful specially if you want to view the structure of your graphs. While viewing, you can also figure out if some of the vertices are very close to each other (clustered) or in general how they are placed. Using the GraphStream library is easy. Just download the jar from http://graphstream-project.org and put it in the classpath of your project. Next, we will show a simple example demonstrating how easy it is to plot a graph using this library. Just create an instance of a graph. For our example, we will create a simple DefaultGraph and name it SimpleGraph. Next, we will add the nodes or vertices of the graph. We will also add the attribute of the label that is displayed on the vertice. Graph graph = newDefaultGraph("SimpleGraph"); graph.addNode("A" ).setAttribute("ui.label", "A"); graph.addNode("B" ).setAttribute("ui.label", "B"); graph.addNode("C" ).setAttribute("ui.label", "C"); After building the nodes, it's now time to connect these nodes using the edges. The API is simple to use and on the graph instance we can define the edges, provided an ID is given to them and the starting and ending nodes are also given. graph.addEdge("AB", "A", "B"); graph.addEdge("BC", "B", "C"); graph.addEdge("CA", "C", "A"); All the information of nodes and edges is present on the graph instance. It's now time to plot this graph on the UI and we can just invoke the display method on the graph instance as shown next and display it on the UI. graph.display(); This would plot the graph on the UI as follows: This library is extensive and it will be good learning experience to explore this library further and we would urge the readers to further explore this library on their own. Massive graphs on big data Big data comprises huge amount of data distributed across a cluster of thousands (if not more) of machines. Building graphs based on this massive data has different challenges shown as follows: Due to the vast amount of data involved, the data for the graph is distributed across a cluster of machines. Hence, in actuality, it's not a single node graph and we have to build a graph that spans across a cluster of machines. A graph that spans across a cluster of machines would have vertices and edges spread across different machines and this data in a graph won't fit into the memory of one single machine. Consider your friend's list on Facebook; some of your friend's data in your Facebook friend list graph might lie on different machines and this data might be just tremendous in size. Look at an example diagram of a graph of 10 Facebook friends and their network shown as follows: As you can see in the preceding diagram, when for just 10 friends the data can be huge, and here since the graph is drawn by hand we have not even shown a lot of connections to make the image comprehensible, but in real life each person can have say more than thousands of connections. So imagine what will happen to a graph with say thousands if not more people on the list. As shown in the reasons we just saw, building massive graphs on big data is a different ball game altogether and there are few main approaches for building this massive graphs. From the perspective of big data building the massive graphs involve running and storing data parallely on many nodes. The two main approaches are bulk synchronous parallely and the pregel approach. Apache Spark follows the pregel approach. Covering these approaches in detail is out of scope of this book and if the users are interested more on these topics they should refer to other books and the Wikipedia for the same. Graph analytics The biggest advantage to using graphs is you can analyze these graphs and use them for analyzing complex datasets. You might ask what is so special about graph analytics that we can't do by relational databases. Let's try to understand this using an example, suppose we want to analyze your friends network on Facebook and pull information about your friends such as their name, their birth date, their recent likes, and so on. If Facebook had a relational database, then this would mean firing a query on some table using the foreign key of the user requesting this info. From the perspective of relational database, this first level query is easy. But what if we now ask you to go to the friends at level four in your network and fetch their data (as shown in the following diagram). The query to get this becomes more and more complicated from a relational database perspective but this is a trivial task on a graph or graphical database (such as Neo4j). Graphs are extremely good on operations where you want to pull information from one end of the node to another, where the other node lies after a lot of joins and hops. As such, graph analytics is good for certain use cases (but not for all use cases, relation database are still good on many other use cases). As you can see, the preceding diagram depicts a huge social network (though the preceding diagram might just be depicting a network of a few friends only). The dots represent actual people in a social network. So if somebody asks to pick one user on the left-most side of the diagram and see and follow host connections to the right-most side and pull the friends at the say 10th level or more, this is something very difficult to do in a normal relational database and doing it and maintaining it could easily go out of hand. There are four particular use cases where graph analytics is extremely useful and used frequently (though there are plenty more use cases too) Path analytics: As the name suggests, this analytics approach is used to figure out the paths as you traverse along the nodes of a graph. There are many fields where this can be used—simplest being road networks and figuring out details such as shortest path between cities, or in flight analytics to figure out the shortest time taking flight or direct flights. Connectivity analytics: As the name suggests, this approach outlines how the nodes within a graph are connected to each other. So using this you can figure out how many edges are flowing into a node and how many are flowing out of the node. This kind of information is very useful in analysis. For example, in a social network if there is a person who receives just one message but gives out say ten messages within his network then this person can be used to market his favorite products as he is very good in responding to messages. Community Analytics: Some graphs on big data are huge. But within these huge graphs there might be nodes that are very close to each other and are almost stacked in a cluster of their own. This is useful information as based on this you can extract out communities from your data. For example, in a social network if there are people who are part of some community, say marathon runners, then they can be clubbed into a single community and further tracked. Centrality Analytics: This kind of analytical approach is useful in finding central nodes in a network or graph. This is useful in figuring out sources that are single handedly connected to many other sources. It is helpful in figuring out influential people in a social network, or a central computer in a computer network. From the perspective of this article, we will be covering some of these use cases in our sample case studies and for this we will be using a library on Apache Spark called GraphFrames. GraphFrames GraphX library is advanced and performs well on massive graphs, but, unfortunately, it's currently only implemented in Scala and does not have any direct Java API. GraphFrames is a relatively new library that is built on top of Apache Spark and provides support for dataframe (now dataset) based graphs.It contains a lot of methods that are direct wrappers over the underlying sparkx methods. As such it provides similar functionality as GraphX except that GraphX acts on the Spark SRDD and GraphFrame works on the dataframe so GraphFrame is more user friendly (as dataframes are simpler to use). All the advantages of firing Spark SQL queries, joining datasets, filtering queries are all supported on this. To understand GraphFrames and representing massive big data graphs, we will take small baby steps first by building some simple programs using GraphFrames before building full-fledged case studies. Summary In this article, we learned about graph analytics. We saw how graphs can be built even on top of massive big datasets. We learned how Apache Spark can be used to build these massive graphs and in the process we learned about the new library GraphFrames that helps us in building these graphs. Resources for Article: Further resources on this subject: Saying Hello to Java EE [article] Object-Oriented JavaScript [article] Introduction to JavaScript [article]

0
0
25378

article-image-key-takeaways-from-sundar-pichais-congress-hearing-over-user-data-political-bias-and-project-dragonfly

Natasha Mathur

14 Dec 2018

12 min read

Key Takeaways from Sundar Pichai’s Congress hearing over user data, political bias, and Project Dragonfly

Natasha Mathur

14 Dec 2018

12 min read

Google CEO, Sundar Pichai testified before the House Judiciary Committee earlier this week. The hearing titled “Transparency & Accountability: Examining Google and its Data Collection, Use, and Filtering Practices” was a three-and-a-half-hour question-answer session that centered mainly around user data collection at Google, allegations of political bias in its search algorithms, and Google’s controversial plans with China. “All of these topics, competition, censorship, bias, and others..point to one fundamental question that demands the nation’s attention. Are America’s technology companies serving as instruments of freedom or instruments of control?,” said Representative Kevin McCarthy of California, the House Republican leader. The committee members could have engaged with Pichai on more important topics had they not been busy focussing on opposing each other’s opinions over whether Google search and its other products are biased against conservatives. Also, most of Pichai’s responses were unsatisfactory as he cleverly dodged questions regarding its Project Dragonfly and user data. Here are the key highlights from the testimony. Allegations of Political Bias One common theme throughout the long hearing session was Republicans asking questions based around alleged bias against conservatives on Google's platforms. Google search Bias Rep. Lamar Smith asked questions regarding the alleged political bias that is “imbibed” in Google’s search algorithms and its culture. Smith presented an example of a study by Robert Epstein, a Harvard trained psychologist. As per the study’s results, Google’s search bias likely swung 2.6 million votes to Hillary Clinton during the 2016 elections. To this Pichai’s reply was that Google has investigated some of the studies including the one by Dr. Epstein, and found that there were issues with the methodology and its sample size. He also mentioned how Google evaluates their search results for accuracy by using a “robust methodology” that it has been using for the past 20 years. Pichai also added that “providing users with high quality, accurate, and trusted information is sacrosanct to us. It’s what our principles are and our business interests and our natural long-term incentives are aligned with that. We need to serve users everywhere and we need to earn their trust in order to do so.” Google employees’ bias, the reason for biased search algorithms, say Republicans Smith also presented examples of pro-Trump content and immigration laws being tagged as hate speech on Google search results posing threat to the democratic form of government. He also alleged that people at Google were biased and intentionally transferred their biases into these search algorithms to get the results they want and management allows it. Pichai clarified that Google doesn't manually intervene on any particular search result. “Google doesn’t choose conservative voices over liberal voices. There’s no political bias and Google operates in a neutral way,” added Pichai. Would Google allow an independent third party to study its search results to determine the degree of political bias? Pichai responded to this question saying that they already have third parties that are completely independent and haven’t been appointed by Google in place for evaluating its search algorithms. “We’re transparent as to how we evaluate our search. We publish our rater guidelines. We publish it externally and raters evaluate it, we’re trying hard to understand what users want and this is what we think is right. It’s not possible for an employee or a group of employees to manipulate our search algorithm”. Political advertising bias The Committee Chairman Bob Goodlatte, a Republican from Virginia also asked Pichai about political advertising bias on Google’s ad platforms that offer different rates for different political candidates to reach prospective voters. This is largely different than how other competitive media platforms like TV and radio operate - offering the lowest rate to all political candidates. He asked if Google should charge the same effective ad rates to political candidates. Pichai explained that their advertising products are built without any bias and the rates are competitive and set by a live auction process. The prices are calculated automatically based on the keywords that you’re bidding for, and on the demand in the auction. There won’t be a difference in rates based on any political reasons unless there are keywords that are of particular interest. He referred the whole situation to a demand-supply equilibrium, where the rates can differ but that will vary from time to time. There could be occasions when there is a substantial difference in rates based on the time of the day, location, how keywords are chosen etc, and it’s a process that Google has been using for over 20 years. Pichai further added that “anything to do with the civic process, we make sure to do it in a non-partisan way and it's really important for us”. User data collection and security Another highlight of the hearing was Google’s practices around user data collection and security. “Google is able to collect an amount of information about its users that would even make the NSA blush. Americans have no idea the sheer volume of information that is collected”, said Goodlatte. Location tracking data related privacy concerns During Mr. Pichai’s testimony, the first question from Rep. Goodlatte was about whether consumers understand the frequency and amount of location data that Google collects from its Android operating system. Goodlatte asked Pichai about the collection of location data and apps running on Android. To this Pichai replied that Google offers users controls for limiting location data collection. “We go to great lengths to protect their privacy, we give them transparency, choice, and control,” says Pichai. Pichai highlighted that Android is a powerful smartphone that offers services to over 2 billion people. User data that is collected via Android depends on the applications that users choose to use. He also pointed out that Google makes it very clear to its users about what information is collected. He pointed out that there are terms of service and also a “privacy checkup”. Going to “my account” settings on Gmail gives you a clear picture of what user data they have. He also says that users can take that data to other platforms if they choose to stop using Google. On Google+ data breach Another Rep. Jerrold Nadler talked about the recent Google plus data breach that affected some 52.5 million users. He asked Pichai about the legal obligations that the company is under to publicly expose the security issues. Pichai responded to this saying that Google “takes privacy seriously,” and that Google needs to alert the users and the necessary authorities of any kind of data breach or bugs within 72 hours. He also mentions "building software inevitably has bugs associated as part of the process”. Google undertakes a lot of efforts to find bugs and the root cause of it, and make sure to take care of it. He also says how they have advanced protection in Gmail to offer a stronger layer of security to its users. Google’s commitment to protecting U.S. elections from foreign interference It was last year when Google discovered that Russian operatives spent tens of thousands of dollars on ads on its YouTube, Gmail and Google Search products in an effort to meddle in the 2016 US presidential election. “Does Google now know the full extent to which its online platforms were exploited by Russian actors in the election 2 years ago?” asked Nadler. Pichai responded that Google conducted a thorough investigation in 2016. It found out that there were two main ads accounts linked to Russia which advertised on google for about 4700 dollars in advertising. “We found a limited activity, improper activity, we learned from that and have increased the protections dramatically we have around our elections offering”, says Pichai. He also added that to protect the US elections, Google will do a significant review of how ads are bought, it will look for the origin of these accounts, share and collaborate with law enforcement, and other tech companies. “Protecting our elections is foundational to our democracy and you have my full commitment that we will do that,” said Pichai. Google’s plans with China Rep. Sheila Jackson Lee was the first person to directly ask Pichai about the company’s Project Dragonfly i.e. its plans of building a censored search engine with China. “We applauded you in 2010 when Google took a very powerful stand principle and democratic values over profits and came out of China,” said Jackson. Other who asked Pichai regarding Google's China plans were Rep. Tom Marino and Rep. David Cicilline. Google left China in 2010 because of concerns regarding hacking, attacks, censorship, and how the Chinese government was gaining access to its data. How is working with the Chinese govt to censor search results a part of Google’s core values? Pichai repeatedly said that Google has no plans currently to launch in China. “We don't have a search product there. Our core mission is to provide users with access to information and getting access to information is an important right (of users) so we try hard to provide that information”, says Pichai. He added that Google always has evidence based on every country that it has operated in. “Us reaching out and giving users more information has a very positive impact and we feel that calling but right now there are no plans to launch in China,” says Pichai. He also mentioned that if Google ever approaches a decision like that he’ll be fully transparent with US policymakers and “engage in consult widely”. He further added that Google only provides Android services in China for which it has partners and manufacturers all around the world. “We don't have any special agreements on user data with the Chinese government”, said Pichai. On being asked by Rep. Marino about a report from The Intercept that said Google created a prototype for a search engine to censor content in China, Pichai replied, “we designed what a search could look like if it were to be launched in a country like China and that’s what we explored”. Rep. Cicilline asked Pichai whether any employees within Google are currently attending product meetings on Dragonfly. Pichai replied evasively saying that Google has “undertaken an internal effort, but right now there are no plans to launch a search service in China necessarily”. Cicilline shot another question at Pichai asking if Google employees are talking to members of the Chinese government, which Pichai dodged by responding with "Currently we are not in discussions around launching a search product in China," instead. Lastly, when Pichai was asked if he would rule out "launching a tool for surveillance and censorship in China”, he replied that Google’s mission is providing users with information, and that “we always think it’s in our duty to explore possibilities to give users access to information. I have a commitment, but as I’ve said earlier we’ll be very thoughtful and we’ll engage widely as we make progress”. On ending forced arbitration for all forms of discrimination Last month 20,000 Google employees along with Temps, Vendors, and Contractors walked out of their respective Google offices to protest discrimination and sexual harassment in the workplace. As part of the walkout, Google employees laid out five demands urging Google to bring about structural changes within the workplace. One of the demands was ending forced arbitration meaning that Google should no longer require people to waive their right to sue. Also, that every co-worker should have the right to bring a representative, or supporter of their choice when meeting with HR for filing a harassment claim. Rep. Pramila Jayapal asked Pichai if he can commit to expanding the policy of ending forced arbitration for any violation of an employee’s (also contractors) right not just sexual harassment. To this Pichai replied that Google is currently definitely looking into this further. “It’s an area where I’ve gotten feedback personally from our employees so we’re currently reviewing what we could do and I’m looking forward to consulting, and I’m happy to think about more changes here. I’m happy to have my office follow up to get your thoughts on it and we are definitely committed to looking into this more and making changes”, said Pichai. Managing misinformation and hate speech During the hearing, Pichai was questioned about how Google is handling misinformation and hate speech. Rep. Jamie Raskin asked why videos promoting conspiracy theory known as “Frazzledrip,” ( Hillary Clinton kills young women and drinks their blood) are still allowed on YouTube. To this Pichai responded with, “We would need to validate whether that specific video violates our policies”. Rep. Jerry Nadler also asked Pichai about Google’s actions to "combat white supremacy and right-wing extremism." Pichai said Google has defined policies against hate speech and that if Google finds violations, it takes down the content. “We feel a tremendous sense of responsibility to moderate hate speech, define hate speech clearly inciting violence or hatred towards a group of people. It's absolutely something we need to take a strict line on. We’ve stated our policies strictly and we’re working hard to make our enforcement better and we’ve gotten a lot better but it's not enough so yeah we’re committed to doing a lot more here”, said Pichai. Our Take Hearings between tech companies and legislators, in the current form, are an utter failure. In addition to making tech reforms, there is an urgent need to also make reforms in how policy hearings are conducted. It is high time we upgraded ourselves to the 21st century. These were the key highlights of the hearing held on 11th December 2018. We recommend you watch the complete hearing for a more comprehensive context. As Pichai defends Google’s “integrity” ahead of today’s Congress hearing, over 60 NGOs ask him to defend human rights by dropping Drag Google bypassed its own security and privacy teams for Project Dragonfly reveals Intercept Google employees join hands with Amnesty International urging Google to drop Project Dragonfly

0
0
25368

Packt Editorial Staff

03 Apr 2018

18 min read

What is domain driven design?

Packt Editorial Staff

03 Apr 2018

18 min read

Domain driven design exists because all software exists for a purpose. It does something. For example, you can't provide a software solution for a financial system such as online stock trading if you don't understand the stock exchanges and their functioning. Having domain knowledge is essential to solving problems with software. Domain driven design is simply designing software with the specific domain - whether that's finance, medicine, law, eCommerce - in mind. This has been taken from Mastering Microservices with Java 9 - Second Edition. Central to Domain Driven Design is the concept of a model. A model is an abstraction, or a blueprint, of the domain. Domain driven design is a collaborative activity Designing this model is not rocket science, but it does take a lot of effort, refining, and input from domain experts. It is the collective job of software designers, domain experts, and developers. They organize information, divide it into smaller parts, group them logically, and create modules. Each module can be taken up individually, and can be divided using a similar approach. This process can be followed until we reach the unit level, or when we cannot divide it any further. A complex project may have more of such iterations; similarly, a simple project could have just a single iteration of it. Once a model is defined and well documented, it can move onto the next stage - code design. So, here we have a software design—a domain model and code design, and code implementation of the domain model. The domain model provides a high level of the architecture of a solution (software/application), and the code implementation gives the domain model a life, as a working model. Domain Driven Design makes design and development work together. It provides the ability to develop software continuously, while keeping the design up to date based on feedback received from the development. It solves one of the limitations offered by Agile and Waterfall methodologies, making software maintainable, including design and code, as well as keeping application minimum viable. It gives developers the right platform to understand the domain, and provides the opportunity to share early feedback of the domain model implementation. It removes the bottleneck that appears in later stages when stockholders wait for deliverables. The fundamental components of Domain Driven Design To understand domain driven design, you can break it down into 3 fundamental concepts: Ubiquitous language and unified model language (UML) Multilayer architecture Artifacts (components) Ubiquitous language Ubiquitous language is a common language to communicate within a project. It's because designing a model is a collaborative effort of software designers, domain experts, and developers that it requires a common language to communicate with. It removes misunderstandings, misinterpretations. Communication gaps so often lead to bad software - ubiquitous language minimizes these gaps. It does, however, need to be used everywhere on a project. Unified Modeling Language (UML) is widely used and very popular when creating models. It also has a few limitations; for example, when you have thousands of classes drawn from a paper, it's difficult to represent class relationships and simultaneously understand their abstraction while taking a meaning from it. Also, UML diagrams do not represent the concepts of a model and what objects are supposed to do. Therefore, UML should always be used with other documents, code, or any other reference for effective communication. Multilayered architecture Multilayered architecture is a common solution for Domain Driven Design. It contains four layers: Presentation layer or (UI) Application layer - responsible for application logic. It maintains and coordinates the overall flow of the product/service. It does not contain business logic or UI. It may hold the state of application objects, like tasks in progress. Domain layer - contains the domain information and business logic. It holds the state of the business object. Infrastructure layer - provides support to all the other layers and is responsible for communication between them. To understand the interaction of the different layers, take the example of table booking at a restaurant. The end user places a request for a table booking using UI. The UI passes the request to the application layer. The application layer fetches the domain objects, such as the restaurant, the table, a date, and so on, from the domain layer. The domain layer fetches these existing persisted objects from the infrastructure, and invokes relevant methods to make the booking and persist them back to the infrastructure layer. Once domain objects are persisted, the application layer shows the booking confirmation to the end user. Artifacts used in Domain Driven Design There are seven different artifacts used in Domain Driven Design to express, create, and retrieve domain models: Entities Value objects Services Aggregates Repository Factory Module Entities are certain types of objects that are identifiable and remain the same throughout the states of the products/services. These objects are not identified by their attributes, but by their identity and thread of continuity. These type of objects are known as entities. It sounds pretty simple, but it carries complexity. You need to understand how we can define the entities. Let's take an example of a table booking system, where we have a restaurant class with attributes such as restaurant name, address, phone number, establishment data, and so on. We can take two instances of the restaurant class that are not identifiable using the restaurant name, as there could be other restaurants with the same name. Similarly, if we go by any other single attribute, we will not find any attributes that can singularly identify a unique restaurant. If two restaurants have all the same attribute values, they are therefore the same and are interchangeable with each other. Still, they are not the same entities, as both have different references (memory addresses). Conversely, let's take a class of U.S. citizens. Every U.S. citizen has his or her own social security number. This number is not only unique, but remains unchanged throughout the life of the citizen and assures continuity. This citizen object would exist in the memory, would be serialized, and would be removed from the memory and stored in the database. It even exists after the person is deceased. It will be kept in the system for as long as the system exists. A citizen's social security number remains the same irrespective of its representation. Therefore, creating entities in a product means creating an identity. So, now give an identity to any restaurant in the previous example, then either use a combination of attributes such as restaurant name, establishment date, and street, or add an identifier such as restaurant_id to identify it. The basic rule is that two identifiers cannot be the same. Therefore, when we introduce an identifier for an entity, we need to be sure of it. There are different ways to create a unique identity for objects, described as follows: Using the primary key in a table. Using an automated generated ID by a domain module. A domain program generates the identifier and assigns it to objects that are being persisted among different layers. A few real-life objects carry user-defined identifiers themselves. For example, each country has its own country codes for dialing ISD calls. Composite key. This is a combination of attributes that can also be used for creating an identifier, as explained for the preceding restaurant object. Value objects Value objects (VOs) simplify the design. In contrast to entities, value objects have only attributes and no conceptual identity. A best practice is to keep value objects as immutable objects. If possible, you should even keep entity objects immutable too. You might want to keep all objects as entities, but you're likely to run into problems if you do this; there has to be one instance for each object. Let's say you are creating customers as entity objects. Each customer object would represent the restaurant guest; this cannot be used for booking orders for other guests. This may create millions of customer entity objects in the memory if millions of customers are using the system. Not only are there millions of uniquely identifiable objects that exist in the system, but each object is being tracked. Tracking as well as creating an identity is complex. A highly credible system is required to create and track these objects, which is not only very complex, but also resource heavy. It may result in system performance degradation. Therefore, it is important to use value objects instead of using entities. The reasons are explained in the next few paragraphs. Applications don't always need to have to be trackable and have an identifiable customer object. There are cases when you just need to have some or all attributes of the domain element. These are the cases when value objects can be used by the application. It makes things simple and improves the performance. Value objects can easily be created and destroyed, owing to the absence of identity. This simplifies the design—it makes value objects available for garbage collection if no other object has referenced them. Value objects should be designed and coded as immutable. Once they are created, they should never be modified during their life-cycle. If you need a different value of the VO, or any of its objects, then simply create a new value object, but don't modify the original value object. Here, immutability carries all the significance from object-oriented programming (OOP). A value object can be shared and used without impacting on its integrity if, and only if, it is immutable. Services While creating the domain model, you may come across situations where behavior may not be related to any object. These behaviors can be accommodated in service objects. Service objects are part of the domain layer and do not have any internal state. The sole purpose of service objects is to provide behavior to the domain that does not belong to a single entity or value object. Ubiquitous language helps you to identify different objects, identities, or value objects with different attributes and behaviors during the process of domain driven design and domain modelling. During the course of creating the domain model, you may find different behaviors or methods that do not belong to any specific object. Such behaviors are important, and so cannot be neglected. Neither can you add them to entities or value objects. It would spoil the object to add behavior that does not belong to it. Keep in mind, that behavior may impact on various objects. The use of object-oriented programming makes it possible to attach to some objects; this is known as a service. Services are common in technical frameworks. These are also used in domain layers in domain driven design. A service object does not have any internal state; its only purpose is to provide a behavior to the domain. Service objects provide behaviors that cannot be related to specific entities or value objects. Service objects may provide one or more related behaviors to one or more entities or value objects. It is a practice to define the services explicitly in the domain model. While creating the services, you need to tick all of the following points: Service objects' behavior performs on entities and value objects, but it does not belong to entities or value objects Service objects' behavior state is not maintained, and hence, they are stateless Services are part of the domain model Services may also exist in other layers. It is very important to keep domain-layer services isolated. It removes the complexities and keeps the design decoupled. Let's take an example where a restaurant owner wants to see the report of his monthly table bookings. In this case, he will log in as an admin and click the Display Report button after providing the required input fields, such as duration. Application layers pass the request to the domain layer that owns the report and templates objects, with some parameters such as report ID, and so on. Reports get created using the template, and data is fetched from either the database or other sources. Then the application layer passes through all the parameters, including the report ID to the business layer. Here, a template needs to be fetched from the database or another source to generate the report based on the ID. This operation does not belong to either the report object or the template object. Therefore, a service object is used that performs this operation to retrieve the required template from the database. Aggregates Aggregate domain pattern is related to the object's life cycle. It defines ownership and boundaries which is crucial in Domain Driven Design When you reserve a table at your favorite restaurant online using an application, you don't need to worry about the internal system and process that takes place to book your reservation, including searching for available restaurants, then for available tables on the given date, time, and so on and so forth. Therefore, you can say that a reservation application is an aggregate of several other objects, and works as a root for all the other objects for a table reservation system. This root should be an entity that binds collections of objects together. It is also called the aggregate root. This root object does not pass any reference of inside objects to external worlds, and protects the changes performed within internal objects. We need to understand why aggregators are required. A domain model can contain large numbers of domain objects. The bigger the application functionalities and size and the more complex its design, the greater number of objects present. A relationship exists between these objects. Some may have a many-to-many relationship, a few may have a one-to-many relationship, and others may have a one-to-one relationship. These relationships are enforced by the model implementation in the code, or in the database that ensures that these relationships among the objects are kept intact. Relationships are not just unidirectional; they can also be bidirectional. They can also increase in complexity. The designer's job is to simplify these relationships in the model. Some relationships may exist in a real domain, but may not be required in the domain model. Designers need to ensure that such relationships do not exist in the domain model. Similarly, multiplicity can be reduced by these constraints. One constraint may do the job where many objects satisfy the relationship. It is also possible that a bidirectional relationship could be converted into a unidirectional relationship. No matter how much simplification you input, you may still end up with relationships in the model. These relationships need to be maintained in the code. When one object is removed, the code should remove all the references to this object from other places. For example, a record removal from one table needs to be addressed wherever it has references in the form of foreign keys and such, to keep the data consistent and maintain its integrity. Also, invariants (rules) need to be forced and maintained whenever data changes. Relationships, constraints, and invariants bring a complexity that requires an efficient handling in code. We find the solution by using the aggregate represented by the single entity known as the root, which is associated with the group of objects that maintains consistency with regards to data changes. This root is the only object that is accessible from outside, so this root element works as a boundary gate that separates the internal objects from the external world. Roots can refer to one or more inside objects, and these inside objects can have references to other inside objects that may or may not have relationships with the root. However, outside objects can also refer to the root, and not to any inside objects. An aggregate ensures data integrity and enforces the invariant. Outside objects cannot make any change to inside objects; they can only change the root. However, they can use the root to make a change inside the object by calling exposed operations. The root should pass the value of inside objects to outside objects if required. If an aggregate object is stored in the database, then the query should only return the aggregate object. Traversal associations should be used to return the object when it is internally linked to the aggregate root. These internal objects may also have references to other aggregates. An aggregate root entity holds its global identity, and holds local identities inside their entities. A simple example of an aggregate in the table booking system is the customer. Customers can be exposed to external objects, and their root object contains their internal object address and contact information. When requested, the value object of internal objects, such as address, can be passed to external objects: Repository In a domain model, at a given point in time, many domain objects may exist. Each object may have its own life-cycle, from the creation of objects to their removal or persistence. Whenever any domain operation needs a domain object, it should retrieve the reference of the requested object efficiently. It would be very difficult if you didn't maintain all of the available domain objects in a central object. A central object carries the references of all the objects, and is responsible for returning the requested object reference. This central object is known as the repository. The repository is a point that interacts with infrastructures such as the database or file system. A repository object is the part of the domain model that interacts with storage such as the database, external sources, and so on, to retrieve the persisted objects. When a request is received by the repository for an object's reference, it returns the existing object's reference. If the requested object does not exist in the repository, then it retrieves the object from storage. For example, if you need a customer, you would query the repository object to provide the customer with ID 31. The repository would provide the requested customer object if it is already available in the repository, and if not, it would query the persisted stores such as the database, fetch it, and provide its reference. The main advantage of using the repository is having a consistent way to retrieve objects where the requestor does not need to interact directly with the storage such as the database. A repository may query objects from various storage types, such as one or more databases, filesystems, or factory repositories, and so on. In such cases, a repository may have strategies that also point to different sources for different object types As you can see in the repository object flow diagram on the right, the repository interacts with the infrastructure layer, and this interface is part of the domain layer. The requestor may belong to a domain layer, or an application layer. The repository helps the system to manage the life cycle of domain objects. Factory A factory is required when a simple constructor is not enough to create the object. It helps to create complex objects, or an aggregate that involves the creation of other related objects. A factory is also a part of the life cycle of domain objects, as it is responsible for creating them. Factories and repositories are in some way related to each other, as both refer to domain objects. The factory refers to newly created objects, whereas the repository returns the already existing objects either from the memory, or from external storage. Let's see how control flows, by using a user creation process application. Let's say that a user signs up with a username user1. This user creation first interacts with the factory, which creates the name user1 and then caches it in the domain using the repository, which also stores it in the storage for persistence. When the same user logs in again, the call moves to the repository for a reference. This uses the storage to load the reference and pass it to the requestor. The requestor may then use this user1 object to book the table in a specified restaurant, and at a specified time. These values are passed as parameters, and a table booking record is created in storage using the repository: The factory may use one of the object-oriented programming patterns, such as the factory or abstract factory pattern, for object creation. Modules Modules are the best way to separate related business objects. These are best suited to large projects where the size of domain objects is bigger. For the end user, it makes sense to divide the domain model into modules and set the relationship between these modules. Once you understand the modules and their relationship, you start to see the bigger picture of the domain model, thus it's easier to drill down further and understand the model. Modules also help you to write code that is highly cohesive, or maintains low coupling. Ubiquitous language can be used to name these modules. For the table booking system, we could have different modules, such as user-management, restaurants and tables, analytics and reports, and reviews, and so on. This introduction to domain driven design should give you a strong foundation for using it when you build software. It's principles are useful - in particular, making sure you collaborate and use the same language as different stakeholders is one of domain driven design's most valuable contributions to the way we approach software development.

0
0
25354

article-image-ansible-2-automate-networking-tasks-on-google-cloud

Vijin Boricha

31 Jul 2018

8 min read

Ansible 2 for automating networking tasks on Google Cloud Platform [Tutorial]

Vijin Boricha

31 Jul 2018

8 min read

0
0
25295

article-image-introduction-nmap-scripting-engine

Packt

02 Jun 2015

5 min read

Introduction to the Nmap Scripting Engine

Packt

02 Jun 2015

5 min read

In this article by David Shaw, author of the book Nmap Essentials, we will see that although being able to conduct port scans is an integral part of using the Nmap suite of tools, the developers of Nmap created a very powerful engine that's built into the tool: the Nmap Scripting Engine (NSE). This article introduces the NSE, and covers all the topics needed to use reliably-written scripts in the Nmap script repository, in order to conduct reconnaissance scans that include much more than just what ports are open and which services are listening. In this article, we will cover: The history of the NSE How the NSE works (For more resources related to this topic, see here.) The history of the NSE By the mid-2000s, Nmap had established itself as the clear leader in port scanning tools—and security tools in general—whether open source or not. Although it's a constant battle to continually innovate and optimize, Nmap can only be considered as an extremely successful project. Due to its popularity, and the fact that it's an open source project with a relatively high profile, Nmap was selected to participate in Google Summer of Code several times. Google Summer of Code is a software development internship/association project, during which students are selected and put on open source software teams to build new features into existing projects. In May 2006—when the currently released version of Nmap was only 4.0—Nmap was selected for its second Summer of Code season. The previous year, in 2005, several improvements had been made through the students' coding for the Nmap project: the students had written a contemporary implementation of Netcat (called Ncat), upgraded the OS detection for Nmap to its second (and much better) generation, and created a small, simplified GUI that would later become Zenmap. For this second run through, after an extremely successful first summer, the participant developers were even more ambitious. Since Nmap clearly had an excellent set of features, why not make those features extendable by the greater community? New vulnerabilities and scanning techniques were being pioneered on a very frequent basis, and full Nmap releases couldn't keep up with the things that security professionals needed to assess. Every time a new vulnerability came out, security professionals (and malicious hackers!) would scan for vulnerable services with Nmap, but could only test whether software versions were vulnerable by using manual analysis: clearly, not a very efficient use of time. Because of the new resources granted by Google Summer of Code developers, an arbitrary scripting framework was created that allows users to trigger additional checks based on certain open ports or services. This means, for example, that if you're looking for a specific file on all web servers—robots.txt, for example—you can easily create a script that can check for it on all HTTP and HTTPS services. The NSE (and the inclusion of Nmap scripts in default installations of Nmap) truly revolutionized the versatility of the tool suite. After months of hard work, the NSE was released in December 2006, packaged with Nmap release 4.21ALPHA1. The scripts that come packaged with the NSE have continued to grow in complexity and usability, and are excellent resources to turn Nmap into a fully-featured security tool suite. The inner working of the NSE The NSE is a framework that runs code written in the programming language Lua with specific flags that the engine can parse. Lua is a lightweight, fast, and interpreted programming language—one that has the most fame for scripting user interfaces for computer games such as World of Warcraft—that has a similar syntax to other contemporary interpreted languages. If you've ever seen code written in Python or Ruby, Lua won't seem too alien to you. The preceding screenshot shows an Nmap script that identifies information about Bitcoins (written by Patrik Karlsson). Don't worry if you don't understand it yet but you can see that the code used to generate a relatively complex Nmap script looks very simple. This is the whole point of the NSE! Where security engineers and system administrators used to have to export Nmap results, find the information they are looking for and then use third-party tools to assist them; they are now able to either find a script that serves their purposes, or write a simple one themselves. Many penetration testers can leverage the Nmap scripting language to even weaponize the tool for security exploits. Summary This article introduced the NSE, which can be one of the most useful, versatile, and engaging features of the Nmap tool suite. We should now be able to launch scans that do more than just port and service versions—Nmap scripts can actually interact with the services listening, and in some cases can even exploit vulnerabilities! In this article, we covered the history of the NSE, and how NSE works. Resources for Article: Further resources on this subject: Target Exploitation [article] Enabling and configuring SNMP on Windows [article] Gathering all rejects prior to killing a job [article]

0
0
25292

Packt

20 Nov 2013

7 min read

Limits of Game Data Analysis

Packt

20 Nov 2013

7 min read

(For more resources related to this topic, see here.) Which game analytics should be used This section will focus on the role data that should take in your production process. As a studio, the first step is to identify your needs and to choose the goals you will attribute to game analytics. Game analytics as a tool Firstly, it is important to understand that game analytics are a tool, which means they can serve several purposes. You can use them for marketing, science, sociological studies, and so on. Following this statement, you will need different tools and different approaches to reach your goal. As this article has tried to highlight it, tools are chosen according to problems, regardless if the choice is technique or analysis. You must not choose a tool because it is said to be the best performing tool ever made, or because it is fashionable. Instead, you must choose a tool because it is said to be the most efficient tool for your needs. Try to answer the following questions: What are the long-term uses I plan to do with game analytics? Is it simply reporting the Key Performance Indicators or is it the building a user-centric framework for deep analysis? What are the types and the level of skills of the people who will work on it?Do I have all of the skills, from data scientists to game analysts, or do I need to choose a solution which will offset some lacks in a particular field? How much data will be collected? How do I plan to deal with possible peaks of frequentation? How do I adapt temporalities of reporting and analysis with the rhythmof production I have on my project? Do I split them weekly or monthly? What are the main goals of my process? Do I want to build a predictive model (for example, based on correlations) in order to define the next acquisition campaign I will run? Do I want to increase the monetization rate on the current player base? Do I want to perform A/B testing? And the list goes on. Game analytics must serve your team Secondly, it is important to ensure that the use of game analytics must serve your team as a whole. They should not have any disagreements about the long-term objectives that you have chosen. They must accompany it and especially improve it, but the general objective should remain the same. Given the current state of the field, withdrawing the "human touch" from the design process entirely and listening only to data would be a mistake. That's why the game analytics process should be thought through the prism of your own team; and therefore, should be presented as a new tool. This will help them to make good decisions for the game. The best example for the democratization of "game analytics way of thinking" inside your team is certainly the A/B testing aspect. If you experience debates about particular features in the game, instead of taking part you can propose to use A/B tests for some of those features. Following this, there are no particular limits to the use of the tool. A game designer can test different balancing on the virtual economy of a game and an artist can experience different graphic styles. When starting, focus your attention on simple practices If you are new to the field, the following list may help you to start defining your first objectives. It contains most of the typical use for online games, especially free-to-play games: Producing KPIs on a weekly or monthly basis, according to your needs. These KPIs will help you to orient the upcoming development of your game and to anticipate the return on investment of your acquisition campaigns. Identifying if some of the steps of your tutorial phase are poorly designed; for example, if you have a sudden player loss at a particular step of your tutorial. On the same idea, having the loss of players at each level is also very useful to improve the general balancing of your game, especially the progress curve and the difficulty. This topic is more important if you have a part of your business model based on purchasable goods, which can increase the progression rate of the player. You can evaluate which area and which purchasable goods of your game are generating the best income. You can perform A/B testing on particular key features of your game in order to see which ones are the most efficient. What game analytics should not be used for On the other hand, there are a few limits that you need to know before using methods and processes from game analytics. Keep away from numbers You must always be careful about the fact that numbers are used to represent a given situation during a "T" instant. From this statement, the predictive models must always be revised and improved. they should never be considered as the perfect truth. In order for the process to be efficient, it is quite important to keep research on the data inside the structure defined by the initial goals. Otherwise, you might split your efforts and no actionable insights would be identified. In other words, numbers must remain at their place. They are a tool in the hands of a human subject, and they should not become an obsession. Try to reason if they make any sense and if you are asking the right question. Practices that need to be avoided As mentioned in the the previous section, if you are new to this field, be aware of the following situations: Data cannot dictate the full content of your next update. If it is the case, you may first re-evaluate the general intention behind your product and talk with the game designer. When starting, try to avoid complex questions that involve external factors in the game, even if they seem crucial for you. For example, trying to understand why people stopped playing your game over a long period of time is usually impossible. Old players might stop playing because another game came out or they just got bored. Data cannot make miracles at this point of the engagement. Data must not take too much ampleness in the creative process. There are some human intentions and ideas, and only then the data comes in order to verify and improve the potential success of those intentions. Data must not slow down the performances of the game. One of the common methods to avoid this is to send the data when the player logs in or logs out and not at each click or each action. Summary This is the end of this article, and the most important thing you need to remember about game analytics in general is the importance of the definition of your objectives. The reason why you choose this tool instead of another (and this article has tried to list a maximum of them, from data mining to pure analysis) is because it fits your needs as much as possible. This statement is true at every stage of the refiection process which surrounds game analytics, from the choice of the storage solution to the type of analysis you want to perform. The rising of a fully-connected state in the video game industry offers developers the opportunity to change the way they create games, but there is no doubt that the level of maturation related to this tool has not reached its maximum yet. Therefore, even if the benefits of game analytics are great, be prepared to make mistakes as well; and keep your own process open to various criticisms from your team. Resources for Article: Further resources on this subject: Flash 10 Multiplayer Game: Game Interface Design [Article] GNU Octave: Data Analysis Examples [Article] HTML5 Games Development: Using Local Storage to Store Game Data [Article]

0
0
25291

article-image-using-front-controllers-create-new-page

Packt

28 Nov 2014

22 min read

Using front controllers to create a new page

Packt

28 Nov 2014

22 min read

In this article, by Fabien Serny, author of PrestaShop Module Development, you will learn about controllers and object models. Controllers handle display on front and permit us to create new page type. Object models handle all required database requests. We will also see that, sometimes, hooks are not enough and can't change the way PrestaShop works. In these cases, we will use overrides, which permit us to alter the default process of PrestaShop without making changes in the core code. If you need to create a complex module, you will need to use front controllers. First of all, using front controllers will permit to split the code in several classes (and files) instead of coding all your module actions in the same class. Also, unlike hooks (that handle some of the display in the existing PrestaShop pages), it will allow you to create new pages. (For more resources related to this topic, see here.) Creating the front controller To make this section easier to understand, we will make an improvement on our current module. Instead of displaying all of the comments (there can be many), we will only display the last three comments and a link that redirects to a page containing all the comments of the product. First of all, we will add a limit to the Db request in the assignProductTabContent method of your module class that retrieves the comments on the product page: $comments = Db::getInstance()->executeS('SELECT * FROM `'._DB_PREFIX_.'mymod_comment`WHERE `id_product` = '.(int)$id_product.'ORDER BY `date_add` DESCLIMIT 3'); Now, if you go to a product, you should only see the last three comments. We will now create a controller that will display all comments concerning a specific product. Go to your module's root directory and create the following directory path: /controllers/front/ Create the file that will contain the controller. You have to choose a simple and explicit name since the filename will be used in the URL; let's name it comments.php. In this file, create a class and name it, ensuring that you follow the [ModuleName][ControllerFilename]ModuleFrontController convention, which extends the ModuleFrontController class. So in our case, the file will be as follows: <?phpclass MyModCommentsCommentsModuleFrontController extendsModuleFrontController{} The naming convention has been defined by PrestaShop and must be respected. The class names are a bit long, but they enable us to avoid having two identical class names in different modules. Now you just to have to set the template file you want to display with the following lines: class MyModCommentsCommentsModuleFrontController extendsModuleFrontController{public function initContent(){parent::initContent();$this->setTemplate('list.tpl');}} Next, create a template named list.tpl and place it in views/templates/front/ of your module's directory: <h1>{l s='Comments' mod='mymodcomments'}</h1> Now, you can check the result by loading this link on your shop: /index.php?fc=module&module=mymodcomments&controller=comments You should see the Comments title displayed. The fc parameter defines the front controller type, the module parameter defines in which module directory the front controller is, and, at last, the controller parameter defines which controller file to load. Maintaining compatibility with the Friendly URL option In order to let the visitor access the controller page we created in the preceding section, we will just add a link between the last three comments displayed and the comment form in the displayProductTabContent.tpl template. To maintain compatibility with the Friendly URL option of PrestaShop, we will use the getModuleLink method. This will generate a URL according to the URL settings (defined in Preferences | SEO & URLs). If the Friendly URL option is enabled, then it will generate a friendly URL (for example, /en/5-tshirts-doctor-who); if not, it will generate a classic URL (for example, /index.php?id_category=5&controller=category&id_lang=1). This function takes three parameters: the name of the module, the controller filename you want to call, and an array of parameters. The array of parameters must contain all of the data that's needed, which will be used by the controller. In our case, we will need at least the product identifier, id_product, to display only the comments related to the product. We can also add a module_action parameter just in case our controller contains several possible actions. Here is an example. As you will notice, I created the parameters array directly in the template using the assign Smarty method. From my point of view, it is easier to have the content of the parameters close to the link. However, if you want, you can create this array in your module class and assign it to your template in order to have cleaner code: <div class="rte">{assign var=params value=['module_action' => 'list','id_product'=> $smarty.get.id_product]}<a href="{$link->getModuleLink('mymodcomments', 'comments',$params)}">{l s='See all comments' mod='mymodcomments'}</a></div> Now, go to your product page and click on the link; the URL displayed should look something like this: /index.php?module_action=list&id_product=1&fc=module&module=mymodcomments&controller=comments&id_lang=1 Creating a small action dispatcher In our case, we won't need to have several possible actions in the comments controller. However, it would be great to create a small dispatcher in our front controller just in case we want to add other actions later. To do so, in controllers/front/comments.php, we will create new methods corresponding to each action. I propose to use the init[Action] naming convention (but this is not mandatory). So in our case, it will be a method named initList: protected function initList(){$this->setTemplate('list.tpl');} Now in the initContent method, we will create a $actions_list array containing all possible actions and associated callbacks: $actions_list = array('list' => 'initList'); Now, we will retrieve the id_product and module_action parameters in variables. Once complete, we will check whether the id_product parameter is valid and if the action exists by checking in the $actions_list array. If the method exists, we will dynamically call it: if ($id_product > 0 && isset($actions_list[$module_action]))$this->$actions_list[$module_action](); Here's what your code should look like: public function initContent(){parent::initContent();$id_product = (int)Tools::getValue('id_product');$module_action = Tools::getValue('module_action');$actions_list = array('list' => 'initList');if ($id_product > 0 && isset($actions_list[$module_action]))$this->$actions_list[$module_action]();} If you did this correctly nothing should have changed when you refreshed the page on your browser, and the Comments title should still be displayed. Displaying the product name and comments We will now display the product name (to let the visitor know he or she is on the right page) and associated comments. First of all, create a public variable, $product, in your controller class, and insert it in the initContent method with an instance of the selected product. This way, the product object will be available in every action method: $this->product = new Product((int)$id_product, false,$this->context->cookie->id_lang); In the initList method, just before setTemplate, we will make a DB request to get all comments associated with the product and then assign the product object and the comments list to Smarty: // Get comments$comments = Db::getInstance()->executeS('SELECT * FROM `'._DB_PREFIX_.'mymod_comment`WHERE `id_product` = '.(int)$this->product->id.'ORDER BY `date_add` DESC');// Assign comments and product object$this->context->smarty->assign('comments', $comments);$this->context->smarty->assign('product', $this->product); Once complete, we will display the product name by changing the h1 title: <h1>{l s='Comments on product' mod='mymodcomments'}"{$product->name}"</h1> If you refresh your page, you should now see the product name displayed. I won't explain this part since it's exactly the same HTML code we used in the displayProductTabContent.tpl template. At this point, the comments should appear without the CSS style; do not panic, just go to the next section of this article. Including CSS and JS media in the controller As you can see, the comments are now displayed. However, you are probably asking yourself why the CSS style hasn't been applied properly. If you look back at your module class, you will see that it is the hookDisplayProductTab hook in the product page that includes the CSS and JS files. The problem is that we are not on a product page here. So we have to include them on this page. To do so, we will create a method named setMedia in our controller and add CS and JS files (as we did in the hookDisplayProductTab hook). It will override the default setMedia method contained in the FrontController class. Since this method includes general CSS and JS files used by PrestaShop, it is very important to call the setMedia parent method in our override: public function setMedia(){// We call the parent methodparent::setMedia();// Save the module path in a variable$this->path = __PS_BASE_URI__.'modules/mymodcomments/';// Include the module CSS and JS files needed$this->context->controller->addCSS($this->path.'views/css/starrating.css', 'all');$this->context->controller->addJS($this->path.'views/js/starrating.js');$this->context->controller->addCSS($this->path.'views/css/mymodcomments.css', 'all');$this->context->controller->addJS($this->path.'views/js/mymodcomments.js');} If you refresh your browser, the comments should now appear well formatted. In an attempt to improve the display, we will just add the date of the comment beside the author's name. Just replace <p>{$comment.firstname} {$comment.lastname|substr:0:1}.</p> in your list.tpl template with this line: <div>{$comment.firstname} {$comment.lastname|substr:0:1}. <small>{$comment.date_add|substr:0:10}</small></div> You can also replace the same line in the displayProductTabContent.tpl template if you want. If you want more information on how the Smarty method works, such as substr that I used for the date, you can check the official Smarty documentation. Adding a pagination system Your controller page is now fully working. However, if one of your products has thousands of comments, the display won't be quick. We will add a pagination system to handle this case. First of all, in the initList method, we need to set a number of comments per page and know how many comments are associated with the product: // Get number of comments$nb_comments = Db::getInstance()->getValue('SELECT COUNT(`id_product`)FROM `'._DB_PREFIX_.'mymod_comment`WHERE `id_product` = '.(int)$this->product->id);// Init$nb_per_page = 10; By default, I have set the number per page to 10, but you can set the number you want. The value is stored in a variable to easily change the number, if needed. Now we just have to calculate how many pages there will be : $nb_pages = ceil($nb_comments / $nb_per_page); Also, set the page the visitor is on: $page = 1;if (Tools::getValue('page') != '')$page = (int)$_GET['page']; Now that we have this data, we can generate the SQL limit and use it in the comment's DB request in such a way so as to display the 10 comments corresponding to the page the visitor is on: $limit_start = ($page - 1) * $nb_per_page;$limit_end = $nb_per_page;$comments = Db::getInstance()->executeS('SELECT * FROM `'._DB_PREFIX_.'mymod_comment`WHERE `id_product` = '.(int)$this->product->id.'ORDER BY `date_add` DESCLIMIT '.(int)$limit_start.','.(int)$limit_end); If you refresh your browser, you should only see the last 10 comments displayed. To conclude, we just need to add links to the different pages for navigation. First, assign the page the visitor is on and the total number of pages to Smarty: $this->context->smarty->assign('page', $page);$this->context->smarty->assign('nb_pages', $nb_pages); Then in the list.tpl template, we will display numbers in a list from 1 to the total number of pages. On each number, we will add a link with the getModuleLink method we saw earlier, with an additional parameter page: <ul class="pagination">{for $count=1 to $nb_pages}{assign var=params value=['module_action' => 'list','id_product' => $smarty.get.id_product,'page' => $count]}<li><a href="{$link->getModuleLink('mymodcomments', 'comments',$params)}"><span>{$count}</span></a></li>{/for}</ul> To make the pagination clearer for the visitor, we can use the native CSS class to indicate the page the visitor is on: {if $page ne $count}<li><a href="{$link->getModuleLink('mymodcomments', 'comments',$params)}"><span>{$count}</span></a></li>{else}<li class="active current"><span><span>{$count}</span></span></li>{/if} Your pagination should now be fully working. Creating routes for a module's controller At the beginning of this article, we chose to use the getModuleLink method to keep compatibility with the Friendly URL option of PrestaShop. Let's enable this option in the SEO & URLs section under Preferences. Now go to your product page and look at the target of the See all comments link; it should have changed from /index.php?module_action=list&id_product=1&fc=module&module=mymodcomments&controller=comments&id_lang=1 to /en/module/mymodcomments/comments?module_action=list&id_product=1. The result is nice, but it is not really a Friendly URL yet. ISO code at the beginning of URLs appears only if you enabled several languages; so if you have only one language enabled, the ISO code will not appear in the URL in your case. Since PrestaShop 1.5.3, you can create specific routes for your module's controllers. To do so, you have to attach your module to the ModuleRoutes hook. In your module's install method in mymodcomments.php, add the registerHook method for ModuleRoutes: // Register hooksif (!$this->registerHook('displayProductTabContent') ||!$this->registerHook('displayBackOfficeHeader') ||!$this->registerHook('ModuleRoutes'))return false; Don't forget; you will have to uninstall/install your module if you want it to be attached to this hook. If you don't want to uninstall your module (because you don't want to lose all the comments you filled in), you can go to the Positions section under the Modules section of your back office and hook it manually. Now we have to create the corresponding hook method in the module's class. This method will return an array with all the routes we want to add. The array is a bit complex to explain, so let me write an example first: public function hookModuleRoutes(){return array('module-mymodcomments-comments' => array('controller' => 'comments','rule' => 'product-comments{/:module_action}{/:id_product}/page{/:page}','keywords' => array('id_product' => array('regexp' => '[d]+','param' => 'id_product'),'page' => array('regexp' => '[d]+','param' => 'page'),'module_action' => array('regexp' => '[w]+','param' => 'module_action'),),'params' => array('fc' => 'module','module' => 'mymodcomments','controller' => 'comments')));} The array can contain several routes. The naming convention for the array key of a route is module-[ModuleName]-[ModuleControllerName]. So in our case, the key will be module-mymodcomments-comments. In the array, you have to set the following: The controller; in our case, it is comments. The construction of the route (the rule parameter). You can use all the parameters you passed in the getModuleLink method by using the {/:YourParameter} syntax. PrestaShop will automatically add / before each dynamic parameter. In our case, I chose to construct the route this way (but you can change it if you want): product-comments{/:module_action}{/:id_product}/page{/:page} The keywords array corresponding to the dynamic parameters. For each dynamic parameter, you have to set Regexp, which will permit to retrieve it from the URL (basically, [d]+ for the integer values and '[w]+' for string values) and the parameter name. The parameters associated with the route. In the case of a module's front controller, it will always be the same three parameters: the fc parameter set with the fix value module, the module parameter set with the module name, and the controller parameter set with the filename of the module's controller. Very importantNow PrestaShop is waiting for a page parameter to build the link. To avoid fatal errors, you will have to set the page parameter to 1 in your getModuleLink parameters in the displayProductTabContent.tpl template: {assign var=params value=[ 'module_action' => 'list', 'id_product' => $smarty.get.id_product, 'page' => 1 ]} Once complete, if you go to a product page, the target of the See all comments link should now be: /en/product-comments/list/1/page/1 It's really better, but we can improve it a little more by setting the name of the product in the URL. In the assignProductTabContent method of your module, we will load the product object and assign it to Smarty: $product = new Product((int)$id_product, false,$this->context->cookie->id_lang);$this->context->smarty->assign('product', $product); This way, in the displayProductTabContent.tpl template, we will be able to add the product's rewritten link to the parameters of the getModuleLink method: (do not forget to add it in the list.tpl template too!): {assign var=params value=['module_action' => 'list','product_rewrite' => $product->link_rewrite,'id_product' => $smarty.get.id_product,'page' => 1]} We can now update the rule of the route with the product's link_rewrite variable: 'product-comments{/:module_action}{/:product_rewrite} {/:id_product}/page{/:page}' Do not forget to add the product_rewrite string in the keywords array of the route: 'product_rewrite' => array('regexp' => '[w-_]+','param' => 'product_rewrite'), If you refresh your browser, the link should look like this now: /en/product-comments/list/tshirt-doctor-who/1/page/1 Nice, isn't it? Installing overrides with modules As we saw in the introduction of this article, sometimes hooks are not sufficient to meet the needs of developers; hooks can't alter the default process of PrestaShop. We could add code to core classes; however, it is not recommended, as all those core changes will be erased when PrestaShop is updated using the autoupgrade module (even a manual upgrade would be difficult). That's where overrides take the stage. Creating the override class Installing new object models and controller overrides on PrestaShop is very easy. To do so, you have to create an override directory in the root of your module's directory. Then, you just have to place your override files respecting the path of the original file that you want to override. When you install the module, PrestaShop will automatically move the override to the overrides directory of PrestaShop. In our case, we will override the getProducts method of the /classes/Search.php class to display the grade and the number of comments on the product list. So we just have to create the Search.php file in /modules/mymodcomments/override/classes/Search.php, and fill it with: <?phpclass Search extends SearchCore{public static function find($id_lang, $expr, $page_number = 1,$page_size = 1, $order_by = 'position', $order_way = 'desc',$ajax = false, $use_cookie = true, Context $context = null){}} In this method, first of all, we will call the parent method to get the products list and return it: // Call parent method$find = parent::find($id_lang, $expr, $page_number, $page_size,$order_by, $order_way, $ajax, $use_cookie, $context);// Return productsreturn $find; We want to display the information (grade and number of comments) to the products list. So, between the find method call and the return statement, we will add some lines of code. First, we will check whether $find contains products. The find method can return an empty array when no products match the search. In this case, we don't have to change the way this method works. We also have to check whether the mymodcomments module has been installed (if the override is being used, the module is most likely to be installed, but as I said, it's just for security): if (isset($find['result']) && !empty($find['result']) &&Module::isInstalled('mymodcomments')){} If we enter these conditions, we will list the product identifier returned by the find parent method: // List id product$products = $find['result'];$id_product_list = array();foreach ($products as $p)$id_product_list[] = (int)$p['id_product']; Next, we will retrieve the grade average and number of comments for the products in the list: // Get grade average and nb comments for products in list$grades_comments = Db::getInstance()->executeS('SELECT `id_product`, AVG(`grade`) as grade_avg,count(`id_mymod_comment`) as nb_commentsFROM `'._DB_PREFIX_.'mymod_comment`WHERE `id_product` IN ('.implode(',', $id_product_list).')GROUP BY `id_product`'); Finally, fill in the $products array with the data (grades and comments) corresponding to each product: // Associate grade and nb comments with productforeach ($products as $kp => $p)foreach ($grades_comments as $gc)if ($gc['id_product'] == $p['id_product']){$products[$kp]['mymodcomments']['grade_avg'] =round($gc['grade_avg']);$products[$kp]['mymodcomments']['nb_comments'] =$gc['nb_comments'];}$find['result'] = $products; Now, as we saw at the beginning of this section, the overrides of the module are installed when you install the module. So you will have to uninstall/install your module. Once this is done, you can check the override contained in your module; the content of /modules/mymodcomments/override/classes/Search.php should be copied in /override/classes/Search.php. If an override of the class already exists, PrestaShop will try to merge it by adding the methods you want to override to the existing override class. Once the override is added by your module, PrestaShop should have regenerated the cache/class_index.php file (which contains the path of every core class and controller), and the path of the Category class should have changed. Open the cache/class_index.php file and search for 'Search'; the content of this array should now be: 'Search' =>array ( 'path' => 'override/classes /Search.php','type' => 'class',), If it's not the case, it probably means the permissions of this file are wrong and PrestaShop could not regenerate it. To fix this, just delete this file manually and refresh any page of your PrestaShop. The file will be regenerated and the new path will appear. Since you uninstalled/installed the module, all your comments should have been deleted. So take 2 minutes to fill in one or two comments on a product. Then search for this product. As you must have noticed, nothing has changed. Data is assigned to Smarty, but not used by the template yet. To avoid deletion of comments each time you uninstall the module, you should comment the loadSQLFile call in the uninstall method of mymodcomments.php. We will uncomment it once we have finished working with the module. Editing the template file to display grades on products list In a perfect world, you should avoid using overrides. In this case, we could have used the displayProductListReviews hook, but I just wanted to show you a simple example with an override. Moreover, this hook exists only since PrestaShop 1.6, so it would not work on PrestaShop 1.5. Now, we will have to edit the product-list.tpl template of the active theme (by default, it is /themes/default-bootstrap/), so the module won't be a turnkey module anymore. A merchant who will install this module will have to manually edit this template if he wants to have this feature. In the product-list.tpl template, just after the short description, check if the $product.mymodcomments variable exists (to test if there are comments on the product), and then display the grade average and the number of comments: {if isset($product.mymodcomments)}<p><b>{l s='Grade:'}</b> {$product.mymodcomments.grade_avg}/5<br/><b>{l s='Number of comments:'}</b>{$product.mymodcomments.nb_comments}</p>{/if} Here is what the products list should look like now: Creating a new method in a native class In our case, we have overridden an existing method of a PrestaShop class. But we could have added a method to an existing class. For example, we could have added a method named getComments to the Product class: <?phpclass Product extends ProductCore{public function getComments($limit_start, $limit_end = false){$limit = (int)$limit_start;if ($limit_end)$limit = (int)$limit_start.','.(int)$limit_end;$comments = Db::getInstance()->executeS('SELECT * FROM `'._DB_PREFIX_.'mymod_comment`WHERE `id_product` = '.(int)$this->id.'ORDER BY `date_add` DESCLIMIT '.$limit);return $comments;}} This way, you could easily access the product comments everywhere in the code just with an instance of a Product class. Summary This article taught us about the main design patterns of PrestaShop and explained how to use them to construct a well-organized application. Resources for Article: Further resources on this subject: Django 1.2 E-commerce: Generating PDF Reports from Python using ReportLab [Article] Customizing PrestaShop Theme Part 2 [Article] Django 1.2 E-commerce: Data Integration [Article]

0
0
25240

article-image-5-nation-joint-activity-alert-report-finds-most-threat-actors-use-publicly-available-tools-for-cyber-attacks

Melisha Dsouza

12 Oct 2018

4 min read

5 nation joint Activity Alert Report finds most threat actors use publicly available tools for cyber attacks

Melisha Dsouza

12 Oct 2018

4 min read

NCCIC, in collaboration with cybersecurity authorities of Australia, Canada, New Zealand, the United Kingdom, and the United States has released a joint ‘Activity Alert Report’. This report highlights five publicly available tools frequently observed in cyber attacks worldwide. Today, malicious tools are available free for use and can be misused by cybercriminals to endanger public security and privacy. There are numerous cyber incidents encountered on a daily basis that challenge even the most secure network and exploit confidential information across finance, government, health sectors. What’s surprising is that a majority of these exploits are caused by freely available tools that find loopholes in security systems to achieve an attacker’s objectives. The report highlights the five most frequently used tools that are used by cybercriminals all over the globe to perform cyber crimes. These fall into five categories: #1 Remote Access Trojan: JBiFrost Once the RAT program is installed on a victim’s machine, it allows remote administrative control of the system. It can then be used to exploit the system as per the hacker’s objectives. For example, installing malicious backdoors to obtain confidential data. These are often difficult to detect because they are designed to not appear in lists of running programs and to mimic the behavior of legitimate applications. RATs also disable network analysis tools (e.g., Wireshark) on the victim’s system. Operating systems Windows, Linux, MAC OS X, and Android are susceptible to this threat. Hackers spammed companies with emails to infiltrate their systems with the Adwind RAT into their systems. The entire story can be found on Symantec’s blog. #2 Webshell: China Chopper The China Chopper is being used widely since 2012. These webshells are malicious scripts which are uploaded to a target system to grant the hacker remote access to administrative capabilities on the system. The hackers can then pivot to additional hosts within a network. China Chopper consists of the client-side, which is run by the attacker, and the server, which is installed on the victim server and is also attacker-controlled. The client can issue terminal commands and manage files on the victim server. It can then upload and download files to and from the victim using wget. They can then either modify or delete the existing files. #3 Credential Stealer: Mimikatz Mimikatz is mainly used by attackers to access the memory within a targeted Windows system and collect the credentials of logged in users. These credentials can be then used to give access to other machines on a network. Besides obtaining credentials, the tool can obtain Local Area Network Manager and NT LAN Manager hashes, certificates, and long-term keys on Windows XP (2003) through Windows 8.1 (2012r2). When the "Invoke-Mimikatz" PowerShell script is used to operate Mimikatz, its activity is difficult to isolate and identify. In 2017, this tool was used in combination with NotPetya infected hundreds of computers in Russia and Ukraine. The attack paralysed systems and disabled the subway payment systems. The good news is that Mimikatz can be detected by most up-to-date antivirus tools. That being said, hackers can modify Mimikatz code to go undetected by antivirus. # 4 Lateral Movement Framework: PowerShell Empire PowerShell Empire is a post-exploitation or lateral movement tool. It allows an attacker to move around a network after gaining initial access. This tool can be used to generate executables for social engineering access to networks. The tool consists of a a threat actor that can escalate privileges, harvest credentials, exfiltrate information, and move laterally across a network. Traditional antivirus tools fail to detect PowerShell Empire. In 2018, the tool was used by hackers sending out Winter Olympics-themed socially engineered emails and malicious attachments in a spear-phishing campaign targeting several South Korean organizations. # 5 C2 Obfuscation and Exfiltration: HUC Packet Transmitter HUC Packet Transmitter (HTran) is a proxy tool used by attackers to obfuscate their location. The tool intercepts and redirects the Transmission Control Protocol (TCP) connections from the local host to a remote host. This makes it possible to detect an attacker’s communications with victim networks. HTran uses a threat actor to facilitate TCP connections between the victim and a hop point. Threat actors can then redirect their packets through multiple compromised hosts running HTran to gain greater access to hosts in a network. The research encourages everyone to use the report to stay informed about the potential network threats due to these malicious tools. They also provide a complete list of detection and prevention measures for each tool in detail. You can head over to the official site of the US-CERT for more information on this research. 6 artificial intelligence cybersecurity tools you need to know How will AI impact job roles in Cybersecurity New cybersecurity threats posed by artificial intelligence

0
0
25220

article-image-virtualizing-hosts-and-applications

Packt

20 May 2016

19 min read

Virtualizing Hosts and Applications

Packt

20 May 2016

19 min read

In this article by Jay LaCroix, the author of the book Mastering Ubuntu Server, you will learn how there have been a great number of advancements in the IT space in the last several decades, and a few technologies have come along that have truly revolutionized the technology industry. The author is sure few would argue that the Internet itself is by far the most revolutionary technology to come around, but another technology that has created a paradigm shift in IT is virtualization. It evolved the way we maintain our data centers, allowing us to segregate workloads into many smaller machines being run from a single server or hypervisor. Since Ubuntu features the latest advancements of the Linux kernel, virtualization is actually built right in. After installing just a few packages, we can create virtual machines on our Ubuntu Server installation without the need for a pricey license agreement or support contract. In this article, Jay will walk you through creating, running, and managing Docker containers. (For more resources related to this topic, see here.) Creating, running, and managing Docker containers Docker is a technology that seemed to come from nowhere and took the IT world by storm just a few years ago. The concept of containerization is not new, but Docker took this concept and made it very popular. The idea behind a container is that you can segregate an application you'd like to run from the rest of your system, keeping it sandboxed from the host operating system, while still being able to use the host's CPU and memory resources. Unlike a virtual machine, a container doesn't have a virtual CPU and memory of its own, as it shares resources with the host. This means that you will likely be able to run more containers on a server than virtual machines, since the resource utilization would be lower. In addition, you can store a container on a server and allow others within your organization to download a copy of it and run it locally. This is very useful for developers developing a new solution and would like others to test or run it. Since the Docker container contains everything the application needs to run, it's very unlikely that a systematic difference between one machine or another will cause the application to behave differently. The Docker server, also known as Hub, can be used remotely or locally. Normally, you'd pull down a container from the central Docker Hub instance, which will make various containers available, which are usually based on a Linux distribution or operating system. When you download it locally, you'll be able to install packages within the container or make changes to its files, just as if it were a virtual machine. When you finish setting up your application within the container, you can upload it back to Docker Hub for others to benefit from or your own local Hub instance for your local staff members to use. In some cases, some developers even opt to make their software available to others in the form of containers rather than creating distribution-specific packages. Perhaps they find it easier to develop a container that can be used on every distribution than build separate packages for individual distributions. Let's go ahead and get started. To set up your server to run or manage Docker containers, simply install the docker.io package: # apt-get install docker.io Yes, that's all there is to it. Installing Docker has definitely been the easiest thing we've done during this entire article. Ubuntu includes Docker in its default repositories, so it's only a matter of installing this one package. You'll now have a new service running on your machine, simply titled docker. You can inspect it with the systemctl command, as you would any other: # systemctl status docker Now that Docker is installed and running, let's take it for a test drive. Having Docker installed gives us the docker command, which has various subcommands to perform different functions. Let's try out docker search: # docker search ubuntu What we're doing with this command is searching Docker Hub for available containers based on Ubuntu. You could search for containers based on other distributions, such as Fedora or CentOS, if you wanted. The command will return a list of Docker images available that meet your search criteria. The search command was run as root. This is required, unless you make your own user account a member of the docker group. I recommend you do that and then log out and log in again. That way, you won't need to use root anymore. From this point on, I won't suggest using root for the remaining Docker examples. It's up to you whether you want to set up your user account with the docker group or continue to run docker commands as root. To pull down a docker image for our use, we can use the docker pull command, along with one of the image names we saw in the output of our search command: docker pull ubuntu With this command, we're pulling down the latest Ubuntu container image available on Docker Hub. The image will now be stored locally, and we'll be able to create new containers from it. To create a new container from our downloaded image, this command will do the trick: docker run -it ubuntu:latest /bin/bash Once you run this command, you'll notice that your shell prompt immediately changes. You're now within a shell prompt from your container. From here, you can run commands you would normally run within a real Ubuntu machine, such as installing new packages, changing configuration files, and so on. Go ahead and play around with the container, and then we'll continue on with a bit more theory on how it actually works. There are some potentially confusing aspects of Docker we should get out of the way first before we continue with additional examples. The most likely thing to confuse newcomers to Docker is how containers are created and destroyed. When you execute the docker run command against an image you've downloaded, you're actually creating a container. Each time you use the docker run command, you're not resuming the last container, but creating a new one. To see this in action, run a container with the docker run command provided earlier, and then type exit. Run it again, and then type exit again. You'll notice that the prompt is different each time you run the command. After the root@ portion of the bash prompt within the container is a portion of a container ID. It'll be different each time you execute the docker run command, since you're creating a new container with a new ID each time. To see the number of containers on your server, execute the docker info command. The first line of the output will tell you how many containers you have on your system, which should be the number of times you've run the docker run command. To see a list of all of these containers, execute the docker ps -a command: docker ps -a The output will give you the container ID of each container, the image it was created from, the command being run, when the container was created, its status, and any ports you may have forwarded. The output will also display a randomly generated name for each container, and these names are usually quite wacky. As I was going through the process of creating containers while writing this section, the codenames for my containers were tender_cori, serene_mcnulty, and high_goldwasser. This is just one of the many quirks of Docker, and some of these can be quite hilarious. The important output of the docker ps -a command is the container ID, the command, and the status. The ID allows you to reference a specific container. The command lets you know what command was run. In our example, we executed /bin/bash when we started our containers. Using the ID, we can resume a container. Simply execute the docker start command with the container ID right after. Your command will end up looking similar to the following: docker start 353c6fe0be4d The output will simply return the ID of the container and then drop you back to your shell prompt. Not the shell prompt of your container, but that of your server. You might be wondering at this point, then, how you get back to the shell prompt for the container. We can use docker attach for that: docker attach 353c6fe0be4d You should now be within a shell prompt inside your container. If you remember from earlier, when you type exit to disconnect from your container, the container stops. If you'd like to exit the container without stopping it, press CTRL + P and then CTRL + Q on your keyboard. You'll return to your main shell prompt, but the container will still be running. You can see this for yourself by checking the status of your containers with the docker ps -a command. However, while these keyboard shortcuts work to get you out of the container, it's important to understand what a container is and what it isn't. A container is not a service running in the background, at least not inherently. A container is a collection of namespaces, such as a namespace for its filesystem or users. When you disconnect without a process running within the container, there's no reason for it to run, since its namespace is empty. Thus, it stops. If you'd like to run a container in a way that is similar to a service (it keeps running in the background), you would want to run the container in detached mode. Basically, this is a way of telling your container, "run this process, and don't stop running it until I tell you to." Here's an example of creating a container and running it in detached mode: docker run -dit ubuntu /bin/bash Normally, we use the -it options to create a container. This is what we used a few pages back. The -i option triggers interactive mode, while the -t option gives us a psuedo-TTY. At the end of the command, we tell the container to run the Bash shell. The -d option runs the container in the background. It may seem relatively useless to have another Bash shell running in the background that isn't actually performing a task. But these are just simple examples to help you get the hang of Docker. A more common use case may be to run a specific application. In fact, you can even run a website from a Docker container by installing and configuring Apache within the container, including a virtual host. The question then becomes this: how do you access the container's instance of Apache within a web browser? The answer is port redirection, which Docker also supports. Let's give this a try. First, let's create a new container in detached mode. Let's also redirect port 80 within the container to port 8080 on the host: docker run -dit -p 8080:80 ubuntu /bin/bash The command will output a container ID. This ID will be much longer than you're accustomed to seeing, because when we run docker ps -a, it only shows shortened container IDs. You don't need to use the entire container ID when you attach; you can simply use part of it, so long as it's long enough to be different from other IDs—like this: docker attach dfb3e Here, I've attached to a container with an ID that begins with dfb3e. I'm now attached to a Bash shell within the container. Let's install Apache. We've done this before, but to keep it simple, just install the apache2 package within your container, we don't need to worry about configuring the default sample web page or making it look nice. We just want to verify that it works. Apache should now be installed within the container. In my tests, the apache2 daemon wasn't automatically started as it would've been on a real server instance. Since the latest container available on Docker Hub for Ubuntu hasn't yet been upgraded to 16.04 at the time of writing this (it's currently 14.04), the systemctl command won't work, so we'll need to use the legacy start command for Apache: # /etc/init.d/apache2 start We can similarly check the status, to make sure it's running: # /etc/init.d/apache2 status Apache should be running within the container. Now, press CTRL + P and then CTRL + Q to exit the container, but allow it to keep running in the background. You should be able to visit the sample Apache web page for the container by navigating to localhost:8080 in your web browser. You should see the default "It works!" page that comes with Apache. Congratulations, you're officially running an application within a container! Before we continue, think for a moment of all the use cases you can use Docker for. It may seem like a very simple concept (and it is), but it allows you to do some very powerful things. I'll give you a personal example. At a previous job, I worked with some embedded Linux software engineers, who each had their preferred Linux distribution to run on their workstation computers. Some preferred Ubuntu, others preferred Debian, and a few even ran Gentoo. For developers, this poses a problem—the build tools are different in each distribution, because they all ship different versions of all development packages. The application they developed was only known to compile in Debian, and newer versions of the GNU Compiler Collection (GCC) compiler posed a problem for the application. My solution was to provide each developer a Docker container based on Debian, with all the build tools baked in that they needed to perform their job. At this point, it no longer mattered which distribution they ran on their workstations. The container was the same no matter what they were running. I'm sure there are some clever use cases you can come up with. Anyway, back to our Apache container: it's now running happily in the background, responding to HTTP requests over port 8080 on the host. But, what should we do with it at this point? One thing we can do is create our own image from it. Before we do, we should configure Apache to automatically start when the container is started. We'll do this a bit differently inside the container than we would on an actual Ubuntu server. Attach to the container, and open the /etc/bash.bashrc file in a text editor within the container. Add the following to the very end of the file: /etc/init.d/apache2 start Save the file, and exit your editor. Exit the container with the CTRL + P and CTRL + Q key combinations. We can now create a new image of the container with the docker commit command: docker commit <Container ID> ubuntu:apache-server This command will return to us the ID of our new image. To view all the Docker images available on our machine, we can run the docker images command to have Docker return a list. You should see the original Ubuntu image we downloaded, along with the one we just created. We'll first see a column for the repository the image came from. In our case, it's Ubuntu. Next, we can see the tag. Our original Ubuntu image (the one we used docker pull command to download) has a tag of latest. We didn't specify that when we first downloaded it, it just defaulted to latest. In addition, we see an image ID for both, as well as the size. To create a new container from our new image, we just need to use docker run but specify the tag and name of our new image. Note that we may already have a container listening on port 8080, so this command may fail if that container hasn't been stopped: docker run -dit -p 8080:80 ubuntu:apache-server /bin/bash Speaking of stopping a container, I should probably show you how to do that as well. As you could probably guess, the command is docker stop followed by a container ID. This will send the SIGTERM signal to the container, followed by SIGKILL if it doesn't stop on its own after a delay: docker stop <Container ID> To remove a container, issue the docker rm command followed by a container ID. Normally, this will not remove a running container, but it will if you add the -f option. You can remove more than one docker container at a time by adding additional container IDs to the command, with a space separating each. Keep in mind that you'll lose any unsaved changes within your container if you haven't committed the container to an image yet: docker rm <Container ID> The docker rm command will not remove images. If you want to remove a docker image, use the docker rmi command followed by an image ID. You can run the docker image command to view images stored on your server, so you can easily fetch the ID of the image you want to remove. You can also use the repository and tag name, such as ubuntu:apache-server, instead of the image ID. If the image is in use, you can force its removal with the -f option: docker rmi <Image ID> Before we conclude our look into Docker, there's another related concept you'll definitely want to check out: Dockerfiles. A Dockerfile is a neat way of automating the building of docker images, by creating a text file with a set of instructions for their creation. The easiest way to set up a Dockerfile is to create a directory, preferably with a descriptive name for the image you'd like to create (you can name it whatever you wish, though) and inside it create a file named Dockerfile. Following is a sample—copy this text into your Dockerfile and we'll look at how it works: FROM ubuntu MAINTAINER Jay <jay@somewhere.net> # Update the container's packages RUN apt-get update; apt-get dist-upgrade # Install apache2 and vim RUN apt-get install -y apache2 vim # Make Apache automatically start-up` RUN echo "/etc/init.d/apache2 start" >> /etc/bash.bashrc Let's go through this Dockerfile line by line to get a better understanding of what it's doing: FROM ubuntu We need an image to base our new image on, so we're using Ubuntu as a base. This will cause Docker to download the ubuntu:latest image from Docker Hub if we don't already have it downloaded: MAINTAINER Jay <myemail@somewhere.net> Here, we're setting the maintainer of the image. Basically, we're declaring its author: # Update the container's packages Lines beginning with a hash symbol (#) are ignored, so we are able to create comments within the Dockerfile. This is recommended to give others a good idea of what your Dockerfile does: RUN apt-get update; apt-get dist-upgrade -y With the RUN command, we're telling Docker to run a specific command while the image is being created. In this case, we're updating the image's repository index and performing a full package update to ensure the resulting image is as fresh as can be. The -y option is provided to suppress any requests for confirmation while the command runs: RUN apt-get install -y apache2 vim Next, we're installing both apache2 and vim. The vim package isn't required, but I personally like to make sure all of my servers and containers have it installed. I mainly included it here to show you that you can install multiple packages in one line: RUN echo "/etc/init.d/apache2 start" >> /etc/bash.bashrc Earlier, we copied the startup command for the apache2 daemon into the /etc/bash.bashrc file. We're including that here so that we won't have to do this ourselves when containers are crated from the image. To build the image, we can use the docker build command, which can be executed from within the directory that contains the Dockerfile. What follows is an example of using the docker build command to create an image tagged packt:apache-server: docker build -t packt:apache-server Once you run this command, you'll see Docker create the image for you, running each of the commands you asked it to. The image will be set up just the way you like. Basically, we just automated the entire creation of the Apache container we used as an example in this section. Once this is complete, we can create a container from our new image: docker run -dit -p 8080:80 packt:apache-server /bin/bash Almost immediately after running the container, the sample Apache site will be available on the host. With a Dockerfile, you'll be able to automate the creation of your Docker images. There's much more you can do with Dockerfiles though; feel free to peruse Docker's official documentation to learn more. Summary In this article, we took a look at virtualization as well as containerization. We began by walking through the installation of KVM as well as all the configuration required to get our virtualization server up and running. We also took a look at Docker, which is a great way of virtualizing individual applications rather than entire servers. We installed Docker on our server, and we walked through managing containers by pulling down an image from Docker Hub, customizing our own images, and creating Dockerfiles to automate the deployment of Docker images. We also went over many of the popular Docker commands to manage our containers. Resources for Article: Further resources on this subject: Configuring and Administering the Ubuntu Server[article] Network Based Ubuntu Installations[article] Making the most of Ubuntu through Windows Proxies[article]

0
0
25196

article-image-scientific-analysis-of-donald-trumps-tweets-on-covid-19-with-transformers

Expert Network

19 May 2021

7 min read

Scientific Analysis of Donald Trump’s Tweets on COVID-19 with Transformers

Expert Network

19 May 2021

7 min read

It takes time and effort to figure out what is fake news and what isn't. Like children, we have to work our way through something we perceive as fake news. This article is an excerpt from the book Transformers for Natural Language Processing by Denis Rothman – A comprehensive guide for deep learning & NLP practitioners, data analysts and data scientists who want an introduction to AI language understanding to process the increasing amounts of language-driven functions. In this article, we will focus on the logic of fake news. We will run the BERT model on SRL and visualize the results on AllenNLP.org. Now, let's go through some presidential tweets on COVID-19. Our goal is certainly not to judge anybody or anything. Fake news involves both opinion and facts. News often depends on the perception of facts by local culture. We will provide ideas and tools to help others gather more information on a topic and find their way in the jungle of information we receive every day. Semantic Role Labeling (SRL) SRL is an excellent educational tool for all of us. We tend just to read Tweets passively and listen to what others say about them. Breaking messages down with SRL is a good way to develop social media analytical skills to distinguish fake from accurate information. I recommend using SRL transformers for educational purposes in class. A young student can enter a Tweet and analyze each verb and its arguments. It could help younger generations become active readers on social media. We will first analyze a relatively undivided Tweet and then a conflictual Tweet. Analyzing the undivided Tweet Let's analyze the latest Tweet found on July 4 while writing the book, Transformers for Natural Language Processing. I took the name of the person who is referred to as a "Black American" out and paraphrased some of the former President's text: "X is a great American, is hospitalized with coronavirus, and has requested prayer. Would you join me in praying for him today, as well as all those who are suffering from COVID-19?" Let's go to AllenNLP.org, visualize our SRL using https://demo.allennlp.org/semantic-role-labeling, run the sentence, and look at the result. The verb "hospitalized" shows the member is staying close to the facts: Figure: SRL arguments of the verb "hospitalized" The message is simple: "X" + "hospitalized" + "coronavirus." The verb "requested" shows that the message is becoming political: Figure: SRL arguments of the verb "requested" We don't know if the person requested the former President to pray or he decided he would be the center of the request. A good exercise would be to display an HTML page and ask the users what they think. For example, the users could be asked to look at the results of the SRL task and answer the two following questions: "Was former President Trump asked to pray, or did he deviate a request made to others for political reasons?" "Is the fact that former President Trump states that he was indirectly asked to pray for X fake news or not?" You can think about it and decide for yourself! Analyzing the Banned Tweet Let's have a look at one that was banned from Twitter. I took the names out and paraphrased it and toned it down. Still, when we run it on AllenNLP.org and visualize the results, we get some surprising SRL outputs. Here is the toned-down and paraphrased Tweet: These thugs are dishonoring the memory of X. When the looting starts, actions must be taken. Although I suppressed the main part of the original Tweet, we can see that the SRL task shows the bad associations made in the Tweet: Figure: SRL arguments of the verb "dishonoring" An educational approach to this would be to explain that we should not associate the arguments "thugs" and "memory" and "looting." They do not fit together at all. An important exercise would be to ask a user why the SRL arguments do not fit together. I recommend many such exercises so that the transformer model users develop SRL skills to have a critical view of any topic presented to them. Critical thinking is the best way to stop the propagation of the fake news pandemic! We have gone through rational approaches to fake news with transformers, heuristics, and instructive websites. However, in the end, a lot of the heat in fake news debates boils down to emotional and irrational reactions. In a world of opinion, you will never find an entirely objective transformer model that detects fake news since opposing sides never agree on what the truth is in the first place! One side will agree with the transformer model's output. Another will say that the model is biased and built by enemies of their opinion! The best approach is to listen to others and try to keep the heat down! Looking for the silver bullet Looking for a silver bullet transformer model can be time-consuming or rewarding, depending on how much time and money you want to spend on continually changing models. For example, a new approach to transformers can be found through disentanglement. Disentanglement in AI allows you to separate the features of a representation to make the training process more flexible. Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen designed DeBERTa, a disentangled version of a transformer, and described the model in an interesting article: DeBERTa: Decoding-enhanced BERT with Disentangled Attention, https://arxiv.org/ abs/2006.03654 The two main ideas implemented in DeBERTa are: Disentangle the content and position in the transformer model to train the two vectors separately. Use an absolute position in thedecoderto predict masked tokens in the pretraining process. The authors provide the code on GitHub: https://github.com/microsoft/DeBERTa DeBERTa exceeds the human baseline on the SuperGLUE leaderboard in December 2020 using 1.5B parameters. Should you stop everything you are doing on transformers and rush to this model, integrate your data, train the model, test it, and implement it? It is very probable that by the end of 2021, another model will beat this one and so on. Should you change models all of the time in production? That will be your decision. You can also choose to design better training methods. Looking for reliable training methods Looking for reliable training methods with smaller models such as the PET designed by Timo Schick can also be a solution. Why? Being in a good position on the SuperGLUE leaderboard does not mean that the model will provide a high quality of decision-making for medical, legal, and other critical areas for sequence predications. Looking for customized training solutions for a specific topic could be more productive than trying all the best transformers on the SuperGLUE leaderboard. Take your time to think about implementing transformers to find the best approach for your project. We will now conclude the article. Summary Fake news begins deep inside our emotional history as humans. When an event occurs, emotions take over to help us react quickly to a situation. We are hardwired to react strongly when we are threatened. We went through raging conflicts over COVID-19, former President Trump, and climate change. In each case, we saw that emotional reactions are the fastest ones to build up into conflicts. We then designed a roadmap to take the emotional perception of fake news to a rational level. We showed that it is possible to find key information in Tweets, Facebook messages, and other media. The news used in this article is perceived by some as real news and others as fake news to create a rationale for teachers, parents, friends, co-workers, or just people talking. About the Author Denis Rothman graduated from Sorbonne University and Paris-Diderot University, patenting one of the very first word2matrix embedding solutions. Denis Rothman is the author of three cutting-edge AI solutions: one of the first AI cognitive chatbots more than 30 years ago; a profit-orientated AI resource optimizing system; and an AI APS (Advanced Planning and Scheduling) solution based on cognitive patterns used worldwide in aerospace, rail, energy, apparel, and many other fields. Designed initially as a cognitive AI bot for IBM, it then went on to become a robust APS solution used to this day.

0
0
25196

article-image-hands-on-tutorial-on-how-to-use-pinecone-with-langchain

Alan Bernardo Palacio

21 Aug 2023

17 min read

Hands-On tutorial on how to use Pinecone with LangChain

Alan Bernardo Palacio

21 Aug 2023

17 min read

A vector database stores high-dimensional vectors and mathematical representations of attributes. Each vector holds dimensions ranging from tens to thousands, enhancing data richness. It operationalizes embedding models, aiding application development with resource management, security, scalability, and query efficiency. Pinecone, a vector database, enables a quick semantic search of vectors. Integrating OpenAI’s LLMs with Pinecone merges deep learning-based embedding generation with efficient storage and retrieval, facilitating real-time recommendation and search systems. Pinecone acts as long-term memory for large language models like OpenAI’s GPT-4.IntroductionThis tutorial will guide you through the process of integrating Pinecone, a high-performance vector database, with LangChain, a framework for building applications powered by large language models (LLMs). Pinecone enables developers to build scalable, real-time recommendation and search systems based on vector similarity search.PrerequisitesBefore you begin this tutorial, you should have the following:A Pinecone accountA LangChain accountA basic understanding of PythonPinecone basicsAs a starter, we will get familiarized with the use of Pinecone by exploring its basic functionalities of it. Remember to get the Pinecone access key.Here is a step-by-step guide on how to set up and use Pinecone, a cloud-native vector database that provides long-term memory for AI applications, especially those involving large language models, generative AI, and semantic search.Initialize Pinecone clientWe will use the Pinecone client, so this step is only necessary if you don’t have it installed already.pip install pinecone-clientTo use Pinecone, you must have an API key. You can find your API key in the Pinecone console under the "API Keys" section. Note both your API key and your environment. To verify that your Pinecone API key works, use the following command:import pinecone pinecone.init(api_key="YOUR_API_KEY", environment="YOUR_ENVIRONMENT")If you don't receive an error message, then your API key is valid. This will also initialize the Pinecone session.Creating and retrieving indexesThe commands below create an index named "quickstart" that performs an approximate nearest-neighbor search using the Euclidean distance metric for 8-dimensional vectors.pinecone.create_index("quickstart", dimension=8, metric="euclidean")The Index creation takes roughly a minute.Once your index is created, its name appears in the index list. Use the following command to return a list of your indexes.pinecone.list_indexes()Before you can query your index, you must connect to the index.index = pinecone.Index("quickstart")Now that you have created your index, you can start to insert data into it.Insert the dataTo ingest vectors into your index, use the upsert operation, which inserts a new vector into the index or updates the vector if a vector with the same ID is already present. The following commands upsert 5 8-dimensional vectors into your index.index.upsert([ ("A", [0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1]), ("B", [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2]), ("C", [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3]), ("D", [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]), ("E", [0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5]) ])You can get statistics about your index, like the dimensions, the usage, and the vector count. To do this, you can use the following command to return statistics about the contents of your index.index.describe_index_stats()This will return a dictionary with information about your index:Now that you have created an index and inserted data into it, we can query the database to retrieve vectors based on their similarity.Query the index and get similar vectorsThe following example queries the index for the three vectors that are most similar to an example 8-dimensional vector using the Euclidean distance metric specified above.index.query( vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3], top_k=3, include_values=True )This command will return the first 3 vectors stored in this index that have the lowest Euclidian distance:Once you no longer need the index, use the delete_index operation to delete it.pinecone.delete_index("quickstart")By following these steps, you can set up a Pinecone vector database in just a few minutes. This will help you provide long-term memory for your high-performance AI applications without any infrastructure hassles.Now, let’s take a look at a bit more complex example, in which we embed text data and insert it into Pinecone.Preparing and Processing the DataIn this section, we will create a context for large language models (LLMs) using the OpenAI API. We will walk through the different parts of a Python script, understanding the purpose and function of each code block. The ultimate aim is to transform data into larger chunks of around 500 tokens, ensuring that the dataset is ordered sequentially.SetupFirst, we install the necessary libraries for our script. We're going to use OpenAI for AI models, pandas for data manipulation, and transformers for tokenization.!pip install openai pandas transformersAfter the installations, we import the necessary modules for our script.import pandas as pd import openaiBefore you can interact with OpenAI, you need to provide your API key. Make sure to replace <<YOUR_API_KEY>> with your actual API key.openai.api_key = ('<<YOUR_API_KEY>>')Now we are ready to start processing the data to be embedded and stored in Pinecone.Data transformationWe use pandas to load JSON data files related to different technologies (HuggingFace, PyTorch, TensorFlow, Streamlit). These files seem to contain questions and answers related to their respective topics and are based on the data in the Pinecone documentation. First, we will concatenate these data frames into one for easier manipulation.hf = pd.read_json('data/huggingface-qa.jsonl', lines=True) pt = pd.read_json('data/pytorch-qa.jsonl', lines=True) tf = pd.read_json('data/tensorflow-qa.jsonl', lines=True) sl = pd.read_json('data/streamlit-qa.jsonl', lines=True) df = pd.concat([hf, pt, tf, sl], ignore_index=True) df.head()We can see the data here:Next, we define a function to remove new lines and unnecessary spaces in our text data. The function remove_newlines takes a pandas Series object and performs several replace operations to clean the text.def remove_newlines(serie): serie = serie.str.replace('\\\\n', ' ', regex=False) serie = serie.str.replace('\\\\\\\\n', ' ', regex=False) serie = serie.str.replace(' ',' ', regex=False) serie = serie.str.replace(' ',' ', regex=False) return serieWe transform the text in our dataframe into a single string format combining the 'docs', 'category', 'thread', 'question', and 'context' columns.df['text'] = "Topic: " + df.docs + " - " + df.category + "; Question: " + df.thread + " - " + df.question + "; Answer: " + df.context df['text'] = remove_newlines(df.text)TokenizationWe use the HuggingFace transformers library to tokenize our text. The GPT2 tokenizer is used, and the number of tokens for each text string is stored in a new column 'n_tokens'.from transformers import GPT2TokenizerFast tokenizer = GPT2TokenizerFast.from_pretrained("gpt2") df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))We filter out rows in our data frame where the number of tokens exceeds 2000.df = df[df.n_tokens < 2000]Now we can finally embed the data using the OpenAI API.from openai.embeddings_utils import get_embedding size = 'curie' df['embeddings'] = df.text.apply(lambda x: get_embedding(x, engine=f'text-search-{size}-doc-001')) df.head()We will be using the text-search-curie-doc-001' Open AI engine to create the embeddings, which is very capable, faster, and lower cost than Davinci:So far, we've prepared our data for subsequent processing. In the next parts of the tutorial, we will cover obtaining embeddings from the OpenAI API and using them with the Pinecone vector database.Next, we will initialize the Pinecone index, create text embeddings using the OpenAI API and insert them into Pinecone.Initializing the Index and Uploading Data to PineconeThe second part of the tutorial aims to take the data that was prepared previously and upload them to the Pinecone vector database. This would allow these embeddings to be queried for similarity, providing a means to use contextual information from a larger set of data than what an LLM can handle at once.Checking for Large Text DataThe maximum size limit for metadata in Pinecone is 5KB, so we check if any 'text' field items are larger than this.from sys import getsizeof too_big = [] for text in df['text'].tolist(): if getsizeof(text) > 5000: too_big.append((text, getsizeof(text))) print(f"{len(too_big)} / {len(df)} records are too big")This will filter out the entries whose metadata is larger than the one Pinecone can manage. The next step is to create a unique identifier for the records.There are several records with text data larger than the Pinecone limit, so we assign a unique ID to each record in the DataFrame.df['id'] = [str(i) for i in range(len(df))] df.head()This ID can be used to retrieve the original text later:Now we can start with the initialization of the index in Pinecone and insert the data.Pinecone Initialization and Index CreationNext, Pinecone is initialized with the API key, and an index is created if it doesn't already exist. The name of the index is 'beyond-search-openai', and its dimension matches the length of the embeddings. The metric used for similarity search is cosine.import pinecone pinecone.init( api_key='PINECONE_API_KEY', environment="YOUR_ENV" ) index_name = 'beyond-search-openai' if not index_name in pinecone.list_indexes(): pinecone.create_index( index_name, dimension=len(df['embeddings'].tolist()[0]), metric='cosine' ) index = pinecone.Index(index_name)Now that we have created the index, we can proceed to insert the data. The index will be populated in batches of 32. Relevant metadata (like 'docs', 'category', 'thread', and 'href') is also included with each item. We will use tqdm to create a progress bar for the progress of the insertion.from tqdm.auto import tqdm batch_size = 32 for i in tqdm(range(0, len(df), batch_size)): i_end = min(i+batch_size, len(df)) df_slice = df.iloc[i:i_end] to_upsert = [ ( row['id'], row['embeddings'], { 'docs': row['docs'], 'category': row['category'], 'thread': row['thread'], 'href': row['href'], 'n_tokens': row['n_tokens'] } ) for _, row in df_slice.iterrows() ] index.upsert(vectors=to_upsert)This will insert the records into the database to be used later on in the process:Finally, the ID-to-text mappings are saved into a JSON file. This would allow us to retrieve the original text associated with an ID later on.mappings = {row['id']: row['text'] for _, row in df[['id', 'text']].iterrows()} import json with open('data/mapping.json', 'w') as fp: json.dump(mappings, fp)Now the Pinecone vector database should now be populated and ready for querying. Next, we will use this information to provide context to a question answering LLM.Querying and Answering QuestionsThe final part of the tutorial involves querying the Pinecone vector database with questions, retrieving the most relevant context embeddings, and using OpenAI's API to generate an answer to the question based on the retrieved contexts.OpenAI Embedding GenerationThe OpenAI API is used to create embeddings for the question.from openai.embeddings_utils import get_embedding q_embeddings = get_embedding( 'how to use gradient tape in tensorflow', engine=f'text-search-curie-query-001' )A function create_context is defined to use the OpenAI API to create a query embedding, retrieve the most relevant context embeddings from Pinecone, and append these contexts into a larger string ready for feeding into OpenAI's next generation step.from openai.embeddings_utils import get_embedding def create_context(question, index, max_len=3750, size="curie"): q_embed = get_embedding(question, engine=f'text-search-{size}-query-001') res = index.query(q_embed, top_k=5, include_metadata=True) cur_len = 0 contexts = [] for row in res['matches']: text = mappings[row['id']] cur_len += row['metadata']['n_tokens'] + 4 if cur_len < max_len: contexts.append(text) else: cur_len -= row['metadata']['n_tokens'] + 4 if max_len - cur_len < 200: break return "\\\\n\\\\n###\\\\n\\\\n".join(contexts) We can now use this function to retrieve the context necessary based on a given question, as the question is embedded and the relevant context is retrieved from the Pinecone database:Now we are ready to start passing the context to a question-answering model.Querying and AnsweringWe start by defining the parameters that will take during the query, specifically the model we will be using, the maximum token length and other parameters. We can also define given instructions to the model which will be used to constrain the results we can get..fine_tuned_qa_model="text-davinci-002" instruction=""" Answer the question based on the context below, and if the question can't be answered based on the context, say \\"I don't know\\"\\n\\nContext:\\n{0}\\n\\n---\\n\\nQuestion: {1}\\nAnswer:""" max_len=3550 size="curie" max_tokens=400 stop_sequence=None domains=["huggingface", "tensorflow", "streamlit", "pytorch"]Different instruction formats can be defined. We will start now making some simple questions and seeing what the results look like.question="What is Tensorflow" context = create_context( question, index, max_len=max_len, size=size, ) try: # fine-tuned models requires model parameter, whereas other models require engine parameter model_param = ( {"model": fine_tuned_qa_model} if ":" in fine_tuned_qa_model and fine_tuned_qa_model.split(":")[1].startswith("ft") else {"engine": fine_tuned_qa_model} ) #print(instruction.format(context, question)) response = openai.Completion.create( prompt=instruction.format(context, question), temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, **model_param, ) print( response["choices"][0]["text"].strip()) except Exception as e: print(e)We can see that it's giving us the proper results using the context that it's retrieving from Pinecone:We can also inquire about Pytorch:question="What is Pytorch" context = create_context( question, index, max_len=max_len, size=size, ) try: # fine-tuned models requires model parameter, whereas other models require engine parameter model_param = ( {"model": fine_tuned_qa_model} if ":" in fine_tuned_qa_model and fine_tuned_qa_model.split(":")[1].startswith("ft") else {"engine": fine_tuned_qa_model} ) #print(instruction.format(context, question)) response = openai.Completion.create( prompt=instruction.format(context, question), temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, **model_param, ) print( response["choices"][0]["text"].strip()) except Exception as e: print(e)The results keep being consistent with the context provided:Now we can try to go beyond the capabilities of the context by pushing the boundaries a bit more.question="Am I allowed to publish model outputs to Twitter, without a human review?" context = create_context( question, index, max_len=max_len, size=size, ) try: # fine-tuned models requires model parameter, whereas other models require engine parameter model_param = ( {"model": fine_tuned_qa_model} if ":" in fine_tuned_qa_model and fine_tuned_qa_model.split(":")[1].startswith("ft") else {"engine": fine_tuned_qa_model} ) #print(instruction.format(context, question)) response = openai.Completion.create( prompt=instruction.format(context, question), temperature=0, max_tokens=max_tokens, top_p=1, frequency_penalty=0, presence_penalty=0, stop=stop_sequence, **model_param, ) print( response["choices"][0]["text"].strip()) except Exception as e: print(e)We can see in the results that the model is working according to the instructions provided as we don’t have any context on Twitter:Lastly, the Pinecone index is deleted to free up resources.pinecone.delete_index(index_name)ConclusionThis tutorial provided a comprehensive guide to harnessing Pinecone, OpenAI's language models, and HuggingFace's library for advanced question-answering. We introduced Pinecone's vector search engine, explored data preparation, embedding generation, and data uploading. Creating a question-answering model using OpenAI's API concluded the process. The tutorial showcased how the synergy of vector search engines, language models, and text processing can revolutionize information retrieval. This holistic approach holds potential for developing AI-powered applications in various domains, from customer service chatbots to research assistants and beyond.Author Bio:Alan Bernardo Palacio is a data scientist and an engineer with vast experience in different engineering fields. His focus has been the development and application of state-of-the-art data products and algorithms in several industries. He has worked for companies such as Ernst and Young, Globant, and now holds a data engineer position at Ebiquity Media helping the company to create a scalable data pipeline. Alan graduated with a Mechanical Engineering degree from the National University of Tucuman in 2015, participated as the founder in startups, and later on earned a Master's degree from the faculty of Mathematics in the Autonomous University of Barcelona in 2017. Originally from Argentina, he now works and resides in the Netherlands.LinkedIn

0
0
25167

article-image-time-facebook-twitter-other-social-media-take-responsibility-or-face-regulation

Sugandha Lahoti

01 Aug 2018

9 min read

Time for Facebook, Twitter and other social media to take responsibility or face regulation

Sugandha Lahoti

01 Aug 2018

9 min read

Of late, the world has been shaken over the rising number of data related scandals and attacks that have overshadowed social media platforms. This shakedown was experienced in Wall Street last week when tech stocks came crashing down after Facebook’s Q2 earnings call on 25th July and then further down after Twitter’s earnings call on 27th July. Social media regulation is now at the heart of discussions across the tech sector. The social butterfly effect is real 2018 began with the Cambridge Analytica scandal where the data analytics company was alleged to have not only been influencing the outcome of UK and US Presidential elections but also of harvesting copious amounts of data from Facebook (illegally). Then Facebook fell down the rabbit hole with Muller’s indictment report that highlighted the role social media played in election interference in 2016. ‘Fake news’ on Whatsapp triggered mob violence in India while Twitter has been plagued with fake accounts and tweets that never seem to go away. Fake news and friends crash the tech stock party Last week, social media stocks fell in double digits (Facebook by 20% and Twitter by 21%) bringing down the entire tech sector; a fall that continues to keep tech stocks in a bearish market and haunt tech shareholders even today. Wall Street has been a nervous wreck this week hoping for the bad news to stop spirally downwards with good news from Apple to undo last week’s nightmare. Amidst these reports, lawmakers, regulators and organizations alike are facing greater pressure for regulation of social media platforms. How are lawmakers proposing to regulate social media? Even though lawmakers have started paying increased attention to social networks over the past year, there has been little progress made in terms of how much they actually understand them. This could soon change as Axios’ David McCabe published a policy paper from the office of Senator Mark Warner. This paper describes a comprehensive regulatory policy covering almost every aspect of social networks. The paper-proposal is designed to address three broad categories: combating misinformation, privacy and data protection, and promoting competition in tech space. Misinformation, disinformation, and the exploitation of technology covers ideas such as: Networks are to label automated bots. Platforms are to verify identities, Platforms are to make regular disclosures about how many fake accounts they’ve deleted. Platforms are to create APIs for academic research. Privacy and data protection include policies such as: Create a US version of the GDPR. Designate platforms as information fiduciaries with the legal responsibility of protecting user’s data. Empowering the Federal Trade Commission to make rules around data privacy. Create a legislative ban on dark patterns that trick users into accepting terms and conditions without reading them. Allow the government to audit corporate algorithms. Promoting competition in tech space that requires: Tech companies to continuously disclose to consumers how their data is being used. Social network data to be made portable. Social networks to be interoperable. Designate certain products as essential facilities and demand that third parties get fair access to them. Although these proposals and more of them (British parliamentary committee recommended imposing much stricter guidelines on social networks) remain far from becoming the law, they are an assurance that legal firms and lawmakers are serious about taking steps to ensure that social media platforms don’t go out of hand. Taking measures to ensure data regulations by lawmakers and legal authorities is only effective if the platforms themselves care about the issues themselves and are motivated to behave in the right way. Losing a significant chunk of their user base in EU lately seems to have provided that very incentive. Social network platforms, themselves have now started seeking ways to protecting user data and improve their platforms in general to alleviate some of the problems they helped create or amplify. How is Facebook planning to course correct it’s social media Frankenstein? Last week, Mark Zuckerberg started the fated earnings call by saying, “I want to start by talking about all the investments we've made over the last six months to improve safety, security, and privacy across our services. This has been a lot of hard work, and it's starting to pay off.” He then goes on to elaborate key areas of focus for Facebook in the coming months, the next 1.5 years to be more specific. Ad transparency tools: All ads can be viewed by anyone, even if they are not targeted at them. Facebook is also developing an archive of ads with political or issue content which will be labeled to show who paid for them, what the budget was and how many people viewed the ads, and will also allow one to search ads by an advertiser for the past 7 years. Disallow and report known election interference attempts: Facebook will proactively look for and eliminate fake accounts, pages, and groups that violated their policies. This could minimize election interference, says Zuckerberg. Fight against misinformation: Remove the financial incentives for spammers to create fake news. Stop pages that repeatedly spread false information from buying ads. Shift from reactive to proactive detection with AI: Use AI to prevent fake accounts that generate a lot of the problematic content from ever being created in the first place. They can now remove more bad content quickly because we don't have to wait until after it's reported. In Q1, for example, almost 90% of graphic violence content that Facebook removed or added a warning label to was identified using AI. Invest heavily in security and privacy. No further elaboration on this aspect was given on the call. This week, Facebook reported that they’d detected and removed 32 pages and fake accounts that had engaged in a coordinated inauthentic behavior. These accounts and pages were of a political influence campaign that was potentially built to disrupt the midterm elections. According to Facebook’s Head of Cybersecurity, Nathaniel Gleicher, “So far, the activity encompasses eight Facebook Pages, 17 profiles and seven accounts on Instagram.” Facebook’s action is a change from last year when it was widely criticized for failing to detect Russian interference in the 2016 presidential election. Although the current campaign hasn’t been linked to Russia (yet), Facebook officials pointed out that some of the tools and techniques used by the accounts were similar to those used by the Russian government-linked Internet Research Agency. How Twitter plans to make its platform a better place for real and civilized conversation “We want people to feel safe freely expressing themselves and have launched new tools to address problem behaviors that distort and distract from the public conversation. We’re also continuing to make it easier for people to find and follow breaking news and events…” said Jack Dorsey, Twitter's CEO, at Q2 2018 Earnings call. The letter to Twitter shareholders further elaborates on this point: We continue to invest in improving the health of the public conversation on Twitter, making the service better by integrating new behavioral signals to remove spammy and suspicious accounts and continuing to prioritize the long-term health of the platform over near-term metrics. We also acquired Smyte, a company that specializes in spam prevention, safety, and security. Unlike Facebook’s explanatory anecdotal support for the claims made, Twitter provided quantitative evidence to show the seriousness of their endeavor. Here are some key metrics from the shareholders’ letter this quarter. Results from early experiments on using new tools to address behaviors that distort and distract from the public conversation show a 4% drop in abuse reports from search and 8% fewer abuse reports from conversations More than 9 million potentially spammy or automated accounts identified and challenged per week 8k fewer average spam reports per day Removing more than 2x the number of accounts for violating Twitter’s spam policies than they did last year It is clear that Twitter has been quite active when it comes to looking for ways to eliminate toxicity from the website’s network. CEO Jack Dorsey in a series of tweets stated that the company did not always meet users’ expectations. “We aren’t proud of how people have taken advantage of our service, or our inability to address it fast enough, with the company needing a “systemic framework.” Back in March 2018, Twitter invited external experts, to measure the health of the company in order to encourage a more healthy conversation, debate, and critical thinking. Twitter asked them to create proposals taking inspiration from the concept of measuring conversation health defined by a non-profit firm Cortico. As of yesterday, they now have their dream team of researchers finalized and ready to take up the challenge of identifying echo chambers on Twitter for unhealthy behavior and then translating their findings into practical algorithms down the line. [dropcap]W[/dropcap]ith social media here to stay, both lawmakers and social media platforms are looking for new ways to regulate. Any misstep by these social media sites will have solid repercussions which include not only closer scrutiny by the government and private watchdogs but also losing out on stock value, a bad reputation, as well as being linked to other forms of data misuse and accusations of political bias. Lastly, let’s not forget the responsibility that lies with the ‘social’ side of these platforms. Individuals need to play their part in being proactive in reporting fake news and stories, and they also need to be more selective about the content they share on social. Why Wall Street unfriended Facebook: Stocks fell $120 billion in market value after Q2 2018 earnings call Facebook must stop discriminatory advertising in the US, declares Washington AG, Ferguson Facebook is investigating data analytics firm Crimson Hexagon over misuse of data

0
0
25163

Toy Bin

Building Voice Technology on IoT Projects

Laravel 5.0 Essentials

Massive Graphs on Big Data

Key Takeaways from Sundar Pichai’s Congress hearing over user data, political bias, and Project Dragonfly

What is domain driven design?

Ansible 2 for automating networking tasks on Google Cloud Platform [Tutorial]

Introduction to the Nmap Scripting Engine

Limits of Game Data Analysis

Using front controllers to create a new page

Trending Topics

5 nation joint Activity Alert Report finds most threat actors use publicly available tools for cyber attacks

Virtualizing Hosts and Applications

Scientific Analysis of Donald Trump’s Tweets on COVID-19 with Transformers

Hands-On tutorial on how to use Pinecone with LangChain

Time for Facebook, Twitter and other social media to take responsibility or face regulation

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access