Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Game Development

370 Articles
article-image-introduction-ai
Packt
28 Aug 2013
30 min read
Save for later

Introduction to AI

Packt
28 Aug 2013
30 min read
(For more resources related to this topic, see here.) Artificial Intelligence (AI) Living organisms such as animals and humans have some sort of intelligence that helps us in making a particular decision to perform something. On the other hand, computers are just electronic devices that can accept data, perform logical and mathematical operations at high speeds, and output the results. So, Artificial Intelligence (AI) is essentially the subject of making computers able to think and decide like living organisms to perform specific operations. So, apparently this is a huge subject. But it is really important to understand the basics of AI being used in different domains. AI is just a general term; its implementations and applications are different for different purposes, solving different sets of problems. Before we move on to game-specific techniques, we'll take a look at the following research areas in AI applications: Computer vision: It is the ability to take visual input from sources such as videos and cameras, and analyze them to do particular operations such as facial recognition, object recognition, and optical-character recognition. Natural language processing (NLP): It is the ability that allows a machine to read and understand the languages, as we normally write and speak. The problem is that the languages we use today are difficult for machines to understand. There are many different ways to say the same thing, and the same sentence can have different meanings according to the context. NLP is an important step for machines, since they need to understand the languages and expressions we use, before they can process them and respond accordingly. Fortunately, there's an enormous amount of data sets available on the Web that can help researchers to do automatic analysis of a language. Common sense reasoning: This is a technique that our brains can easily use to draw answers even from the domains we don't fully understand. Common sense knowledge is a usual and common way for us to attempt certain questions, since our brains can mix and interplay between the context, background knowledge, and language proficiency. But making machines to apply such knowledge is very complex, and still a major challenge for researchers. AI in games Game AI needs to complement the quality of a game. For that we need to understand the fundamental requirement that every game must have. The answer should be easy. It is the fun factor. So, what makes a game fun to play? This is the subject of game design, and a good reference is The Art of Game Design by Jesse Schell. Let's attempt to tackle this question without going deep into game design topics. We'll find that a challenging game is indeed fun to play. Let me repeat: it's about making a game challenging. This means the game should not be so difficult that it's impossible for the player to beat the opponent, or too easy to win. Finding the right challenge level is the key to make a game fun to play. And that's where the AI kicks in. The role of AI in games is to make it fun by providing challenging opponents to compete, and interesting non-player characters (NPCs) that behave realistically inside the game world. So, the objective here is not to replicate the whole thought process of humans or animals, but to make the NPCs seem intelligent by reacting to the changing situations inside the game world in a way that makes sense to the player. The reason that we don't want to make the AI system in games so computationally expensive is that the processing power required for AI calculations needs to be shared between other operations such as graphic rendering and physics simulation. Also, don't forget that they are all happening in real time, and it's also really important to achieve a steady framerate throughout the game. There were even attempts to create dedicated processor for AI calculations (AI Seek's Intia Processor). With the ever-increasing processing power, we now have more and more room for AI calculations. However, like all the other disciplines in game development, optimizing AI calculations remains a huge challenge for the AI developers. AI techniques In this section, we'll walk through some of the AI techniques being used in different types of games. So, let's just take it as a crash course, before actually going into implementation. If you want to learn more about AI for games, there are some really great books out there, such as Programming Game AI by Example by Mat Buckland and Artificial Intelligence for Games by Ian Millington and John Funge. The AI Game Programming Wisdom series also contain a lot of useful resources and articles on the latest AI techniques. Finite State Machines (FSM) Finite State Machines (FSM) can be considered as one of the simplest AI model form, and are commonly used in the majority of games. A state machine basically consists of a finite number of states that are connected in a graph by the transitions between them. A game entity starts with an initial state, and then looks out for the events and rules that will trigger a transition to another state. A game entity can only be in exactly one state at any given time. For example, let's take a look at an AI guard character in a typical shooting game. Its states could be as simple as patrolling, chasing, and shooting. Simple FSM of an AI guard character There are basically four components in a simple FSM: States: This component defines a set of states that a game entity or an NPC can choose from (patrol, chase, and shoot) Transitions: This component defines relations between different states Rules: This component is used to trigger a state transition (player on sight, close enough to attack, and lost/killed player) Events: This is the component, which will trigger to check the rules (guard's visible area, distance with the player, and so on) So, a monster in Quake 2 might have the following states: standing, walking, running, dodging, attacking, idle, and searching. FSMs are widely used in game AI especially, because they are really easy to implement and more than enough for both simple and somewhat complex games. Using simple if/else statements or switch statements, we can easily implement an FSM. It can get messy, as we start to have more states and more transitions. Random and probability in AI Imagine an enemy bot in an FPS game that can always kill the player with a headshot, an opponent in a racing game that always chooses the best route, and overtakes without collision with any obstacle. Such a level of intelligence will make the game so difficult that it becomes almost impossible to win. On the other hand, imagine an AI enemy that always chooses the same route to follow, or tries to escape from the player. AI controlled entities behaving the same way every time the player encounters them, makes the game predictable and easy to win. Both of the previous situations obviously affect the fun aspect of the game, and make the player feel like the game is not challenging or fair enough anymore. One way to fix this sort of perfect AI and stupid AI is to introduce some errors in their intelligence. In games, randomness and probabilities are applied in the decision making process of AI calculations. The following are the main situations when we would want to let our AI entities change a random decision: Non-intentional: This situation is sometimes a game agent, or perhaps an NPC might need to make a decision randomly, just because it doesn't have enough information to make a perfect decision, and/or it doesn't really matter what decision it makes. Simply making a decision randomly and hoping for the best result is the way to go in such a situation. Intentional: This situation is for perfect AI and stupid AI. As we discussed in the previous examples, we will need to add some randomness purposely, just to make them more realistic, and also to match the difficulty level that the player is comfortable with. Such randomness and probability could be used for things such as hit probabilities, plus or minus random damage on top of base damage. Using randomness and probability we can add a sense of realistic uncertainty to our game and make our AI system somewhat unpredictable. We can also use probability to define different classes of AI characters. Let's look at the hero characters from Defense of the Ancient (DotA), which is a popular action real-time strategy (RTS) game mode of Warcraft III. There are three categories of heroes based on the three main attributes: strength, intelligence, and agility. Strength is the measure of the physical power of the hero, while intellect relates to how well the hero can control spells and magic. Agility defines a hero's ability to avoid attacks and attack quickly. An AI hero from the strength category will have the ability to do more damage during close combat, while an intelligence hero will have more chance of success to score higher damage using spells and magic. Carefully balancing the randomness and probability between different classes and heroes, makes the game a lot more challenging, and makes DotA a lot fun to play. The sensor system Our AI characters need to know about their surroundings, and the world they are interacting with, in order to make a particular decision. Such information could be as follows: Position of the player: This information is used to decide whether to attack or chase, or keep patrolling Buildings and objects nearby: This information is used to hide or take cover Player's health and its own health: This remaining information is used to decide whether to retreat or advance Location of resources on the map in an RTS game: This information is used to occupy and collect resources, required for constructing and producing other units As you can see, it could vary a lot depending on the type of game we are trying to build. So, how do we collect that information? Polling One method to collect such information is polling. We can simply do if/else or switch checks in the FixedUpdate method of our AI character. AI character just polls the information they are interested in from the game world, does the checks, and takes action accordingly. Polling methods works great, if there aren't too many things to check. However, some characters might not need to poll the world states every frame. Different characters might require different polling rates. So, usually in larger games with more complex AI systems, we need to deploy an event-driven method using a global messaging system. The messaging system AI does decision making in response to the events in the world. The events are communicated between the AI entity and the player, the world, or the other AI entities through a messaging system. For example, when the player attacks an enemy unit from a group of patrol guards, the other AI units need to know about this incident as well, so that they can start searching for and attacking the player. If we were using the polling method, our AI entities will need to check the state of all the other AI entities, in order to know about this incident. But with an event-driven messaging system, we can implement this in a more manageable and scalable way. The AI characters interested in a particular event can be registered as listeners, and if that event happens, our messaging system will broadcast to all listeners. The AI entities can then proceed to take appropriate actions, or perform further checks. The event-driven system does not necessarily provide faster mechanism than polling. But it provides a convenient, central checking system that senses the world and informs the interested AI agents, rather than each individual agent having to check the same event in every frame. In reality, both polling and messaging system are used together most of the time. For example, AI might poll for more detailed information when it receives an event from the messaging system. Flocking, swarming, and herding Many living beings such as birds, fish, insects, and land animals perform certain operations such as moving, hunting, and foraging in groups. They stay and hunt in groups, because it makes them stronger and safer from predators than pursuing goals individually. So, let's say you want a group of birds flocking, swarming around in the sky; it'll cost too much time and effort for animators to design the movement and animations of each bird. But if we apply some simple rules for each bird to follow, we can achieve emergent intelligence of the whole group with complex, global behavior. One pioneer of this concept is Craig Reynolds, who presented such a flocking algorithm in his SIGGRAPH paper, 1987, Flocks, Herds and Schools – A Distributed Behavioral Model. He coined the term "boid" that sounds like "bird", but referring to a "bird-like" object. He proposed three simple rules to apply to each unit, which are as follows: Separation: This rule is used to maintain a minimum distance with neighboring boids to avoid hitting them Alignment: This rule is used to align itself with the average direction of its neighbors, and then move in the same velocity with them as a flock Cohesion: This step is used to maintain a minimum distance with the group's center of mass These three simple rules are all that we need to implement a realistic and a fairly complex flocking behavior for birds. They can also be applied to group behaviors of any other entity type with little or no modifications. Path following and steering Sometimes we want our AI characters to roam around in the game world, following a roughly guided or thoroughly defined path. For example in a racing game, the AI opponents need to navigate on the road. And the decision-making algorithms such as our flocking boid algorithm discussed already, can only do well in making decisions. But in the end, it all comes down to dealing with actual movements and steering behaviors. Steering behaviors for AI characters have been in research topics for a couple of decades now. One notable paper in this field is Steering Behaviors for Autonomous Characters, again by Craig Reynolds, presented in 1999 at the Game Developers Conference (GDC). He categorized steering behaviors into the following three layers: Hierarchy of motion behaviors Let me quote the original example from his paper to understand these three layers: "Consider, for example, some cowboys tending a herd of cattle out on the range. A cow wanders away from the herd. The trail boss tells a cowboy to fetch the stray. The cowboy says "giddy-up" to his horse, and guides it to the cow, possibly avoiding obstacles along the way. In this example, the trail boss represents action selection, noticing that the state of the world has changed (a cow left the herd), and setting a goal (retrieve the stray). The steering level is represented by the cowboy who decomposes the goal into a series of simple sub goals (approach the cow, avoid obstacles, and retrieve the cow). A sub goal corresponds to a steering behavior for the cowboy-and-horse team. Using various control signals (vocal commands, spurs, and reins), the cowboy steers his horse towards the target. In general terms, these signals express concepts like go faster, go slower, turn right, turn left, and so on. The horse implements the locomotion level. Taking the cowboy's control signals as input, the horse moves in the indicated direction. This motion is the result of a complex interaction of the horse's visual perception, its sense of balance, and its muscles applying torques to the joints of its skeleton." Then he presented how to design and implement some common and simple steering behaviors for individual AI characters and pairs. Such behaviors include seek and flee, pursue and evade, wander, arrival, obstacle avoidance, wall following, and path following. A* pathfinding There are many games where you can find monsters or enemies that follow the player, or go to a particular point while avoiding obstacles. For example, let's take a look at a typical RTS game. You can select a group of units and click a location where you want them to move or click on the enemy units to attack them. Your units then need to find a way to reach the goal without colliding with the obstacles. The enemy units also need to be able to do the same. Obstacles could be different for different units. For example, an air force unit might be able to pass over a mountain, while the ground or artillery units need to find a way around it. A* (pronounced "A star") is a pathfinding algorithm widely used in games, because of its performance and accuracy. Let's take a look at an example to see how it works. Let's say we want our unit to move from point A to point B, but there's a wall in the way, and it can't go straight towards the target. So, it needs to find a way to point B while avoiding the wall. Top-down view of our map We are looking at a simple 2D example. But the same idea can be applied to 3D environments. In order to find the path from point A to point B, we need to know more about the map such as the position of obstacles. For that we can split our whole map into small tiles, representing the whole map in a grid format, as shown in the following figure: Map represented in a 2D grid The tiles can also be of other shapes such as hexagons and triangles. But we'll just use square tiles here, as that's quite simple and enough for our scenario. Representing the whole map in a grid, makes the search area more simplified, and this is an important step in pathfinding. We can now reference our map in a small 2D array. Our map is now represented by a 5 x 5 grid of square tiles with a total of 25 tiles. We can start searching for the best path to reach the target. How do we do this? By calculating the movement score of each tile adjacent to the starting tile, which is a tile on the map not occupied by an obstacle, and then choosing the tile with the lowest cost. There are four possible adjacent tiles to the player, if we don't consider the diagonal movements. Now, we need to know two numbers to calculate the movement score for each of those tiles. Let's call them G and H, where G is the cost of movement from starting tile to current tile, and H is the cost to reach the target tile from current tile. By adding G and H, we can get the final score of that tile; let's call it F. So we'll be using this formula: F = G + H. Valid adjacent tiles In this example, we'll be using a simple method called Manhattan length (also known as Taxicab geometry), in which we just count the total number of tiles between the starting tile and the target tile to know the distance between them. Calculating G The preceding figure shows the calculations of G with two different paths. We just add one (which is the cost to move one tile) to the previous tile's G score to get the current G score of the current tile. We can give different costs to different tiles. For example, we might want to give a higher movement cost for diagonal movements (if we are considering them), or to specific tiles occupied by, let's say a pond or a muddy road. Now we know how to get G. Let's look at the calculation of H. The following figure shows different H values from different starting tiles to the target tile. You can try counting the squares between them to understand how we get those values. Calculating H So, now we know how to get G and H. Let's go back to our original example to figure out the shortest path from A to B. We first choose the starting tile, and then determine the valid adjacent tiles, as shown in the following figure. Then we calculate the G and H scores of each tile, shown in the lower-left and right corners of the tile respectively. And then the final score F, which is G + H is shown at the top-left corner. Obviously, the tile to the immediate right of the start tile has got the lowest F score. So, we choose this tile as our next movement, and store the previous tile as its parent. This parent stuff will be useful later, when we trace back our final path. Starting position From the current tile, we do the similar process again, determining valid adjacent tiles. This time there are only two valid adjacent tiles at the top and bottom. The left tile is a starting tile, which we've already examined, and the obstacle occupies the right tile. We calculate the G, the H, and then the F score of those new adjacent tiles. This time we have four tiles on our map with all having the same score, six. So, which one do we choose? We can choose any of them. It doesn't really matter in this example, because we'll eventually find the shortest path with whichever tile we choose, if they have the same score. Usually, we just choose the tile added most recently to our adjacent list. This is because later we'll be using some sort of data structure, such as a list to store those tiles that are being considered for the next move. So, accessing the tile most recently added to that list could be faster than searching through the list to reach a particular tile that was added previously. In this demo, we'll just randomly choose the tile for our next test, just to prove that it can actually find the shortest path. Second step So, we choose this tile, which is highlighted with a red border. Again we examine the adjacent tiles. In this step, there's only one new adjacent tile with a calculated F score of 8. So, the lowest score right now is still 6. We can choose any tile with the score 6. Third step So, we choose a tile randomly from all the tiles with the score 6. If we repeat this process until we reach our target tile, we'll end up with a board complete with all the scores for each valid tile. Reach target Now all we have to do is to trace back starting from the target tile using its parent tile. This will give a path that looks something like the following figure: Path traced back So this is the concept of A* pathfinding in a nutshell, without displaying any code. A* is an important concept in the AI pathfinding area, but since Unity 3.5, there are a couple of new features such as automatic navigation mesh generation and the Nav Mesh Agent, which we'll see roughly in the next section and then in more detail later. These features make implementing pathfinding in your games very much easier. In fact, you may not even need to know about A* to implement pathfinding for your AI characters. Nonetheless, knowing how the system is actually working behind the scenes will help you to become a solid AI programmer. Unfortunately, those advanced navigation features in Unity are only available in the Pro version at this moment. A navigation mesh Now we have some idea of A* pathfinding techniques. One thing that you might notice is that using a simple grid in A* requires quite a number of computations to get a path which is the shortest to the target, and at the same time avoids the obstacles. So, to make it cheaper and easier for AI characters to find a path, people came up with the idea of using waypoints as a guide to move AI characters from the start point to the target point. Let's say we want to move our AI character from point A to point B, and we've set up three waypoints as shown in the following figure: Waypoints All we have to do now is to pick up the nearest waypoint, and then follow its connected node leading to the target waypoint. Most of the games use waypoints for pathfinding, because they are simple and quite effective in using less computation resources. However, they do have some issues. What if we want to update the obstacles in our map? We'll also have to place waypoints for the updated map again, as shown in the following figure: New waypoints Following each node to the target can mean the AI character moves in zigzag directions. Look at the preceding figures; it's quite likely that the AI character will collide with the wall where the path is close to the wall. If that happens, our AI will keep trying to go through the wall to reach the next target, but it won't be able to and it will get stuck there. Even though we can smooth out the zigzag path by transforming it to a spline and do some adjustments to avoid such obstacles, the problem is the waypoints don't give any information about the environment, other than the spline connected between two nodes. What if our smoothed and adjusted path passes the edge of a cliff or a bridge? The new path might not be a safe path anymore. So, for our AI entities to be able to effectively traverse the whole level, we're going to need a tremendous number of waypoints, which will be really hard to implement and manage. Let's look at a better solution, navigation mesh. A navigation mesh is another graph structure that can be used to represent our world, similar to the way we did with our square tile-based grid or waypoints graph. Navigation mesh A navigation mesh uses convex polygons to represent the areas in the map that an AI entity can travel. The most important benefit of using a navigation mesh is that it gives a lot more information about the environment than a waypoint system. Now we can adjust our path safely, because we know the safe region in which our AI entities can travel. Another advantage of using a navigation mesh is that we can use the same mesh for different types of AI entities. Different AI entities can have different properties such as size, speed, and movement abilities. A set of waypoints is tailored for human, AI may not work nicely for flying creatures or AI controlled vehicles. Those might need different sets of waypoints. Using a navigation mesh can save a lot of time in such cases. But generating a navigation mesh programmatically based on a scene, is a somewhat complicated process. Fortunately, Unity 3.5 introduced a built-in navigation mesh generator (Pro only feature). Instead, we'll learn how to use Unity's navigation mesh for generating features to easily implement our AI pathfinding. The behavior trees Behavior trees are the other techniques used to represent and control the logic behind AI characters. They have become popular for the applications in AAA games such as Halo and Spore. Previously, we have briefly covered FSM. FSMs provide a very simple way to define the logic of an AI character, based on the different states and transitions between them. However, FSMs are considered difficult to scale and re-use existing logic. We need to add many states and hard-wire many transitions, in order to support all the scenarios, which we want our AI character to consider. So, we need a more scalable approach when dealing with large problems. behavior trees are a better way to implement AI game characters that could potentially become more and more complex. The basic elements of behavior trees are tasks, where states are the main elements for FSMs. There are a few different tasks such as Sequence, Selector, and Parallel Decorator. This is quite confusing. The best way to understand this is to look at an example. Let's try to translate our example from the FSM section using a behavior tree. We can break all the transitions and states into tasks. Tasks Let's look at a Selector task for this Behavior tree. Selector tasks are represented with a circle and a question mark inside. First it'll choose to attack the player. If the Attack task returns success, the Selector task is done and will go back to the parent node, if there is one. If the Attack task fails, it'll try the Chase task. If the Chase task fails, it'll try the Patrol task. Selector task What about the tests? They are also one of the tasks in the behavior trees. The following diagram shows the use of Sequence tasks, denoted by a rectangle with an arrow inside it. The root selector may choose the first Sequence action. This Sequence action's first task is to check whether the player character is close enough to attack. If this task succeeds, it'll proceed with the next task, which is to attack the player. If the Attack task also returns success, the whole sequence will return success, and the selector is done with this behavior, and will not continue with other Sequence tasks. If the Close enough to attack? task fails, then the Sequence action will not proceed to the Attack task, and will return a failed status to the parent selector task. Then the selector will choose the next task in the sequence, Lost or Killed Player?. Sequence tasks The other two common components are Parallel and Decorator. A Parallel task will execute all of its child tasks at the same time, while the Sequence and Selector tasks only execute their child tasks one by one. Decorator is another type of task that has only one child. It can change the behavior of its own child's tasks, which includes whether to run its child's task or not, how many times it should run, and so on. We'll study how to implement a basic behavior tree system later. There's a free add-on for Unity called Behave in the Unity Asset Store. Behave is a useful, free GUI editor to set up behavior trees of AI characters, and we'll look at it in more detail later as well. Locomotion Animals (including humans) have a very complex musculoskeletal system (the locomotor system) that gives them the ability to move around the body using the muscular and skeletal systems. We know where to put our steps when climbing a ladder, stairs, or on uneven terrain, and we know how to balance our body to stabilize all the fancy poses we want to make. We can do all this using our bones, muscles, joints, and other tissues, collectively described as our locomotor system. Now put that into our game development perspective. Let's say we've a human character who needs to walk on both even and uneven surfaces, or on small slopes, and we have only one animation for a "walk" cycle. With the lack of a locomotor system in our virtual character, this is how it would look: Climbing stair without locomotion First we play the walk animation and advance the player forward. Now the character knows it's penetrating the surface. So, the collision detection system will pull the character up above the surface to prevent this penetration. This is how we usually set up the movement on an uneven surface. Even though it doesn't give a realistic look and feel, it does the job and is cheap to implement. Let's take a look at how we really walk up stairs. We put our step firmly on the staircase, and using this force we pull up the rest of our body for the next step. This is how we do it in real life with our advanced locomotor system. However, it's not so simple to implement this level of realism inside games. We'll need a lot of animations for different scenarios, which include climbing ladders, walking/running up stairs, and so on. So, only the large studios with a lot of animators could pull this off in the past, until we came up with an automated system. With a locomotion system Fortunately, Unity 3D has an extension that can do just that, which is a locomotion system. Locomotion system Unity extension This system can automatically blend our animated walk/run cycles, and adjust the movements of the bones in the legs to ensure that the feet step correctly on the ground. It can also adjust the original animations made for a specific speed and direction on any surface, arbitrary steps, and slopes. We'll see how to use this locomotion system to apply realistic movement to our AI characters. Dijkstra's algorithm The Dijkstra's algorithm, named after professor Edsger Dijkstra, who devised the algorithm, is one of the most famous algorithms for finding the shortest paths in a graph with non-negative edge path costs. The algorithm was originally designed to solve the shortest path problem in the context of mathematical graph theory. And it's designed to find all the shortest paths from a starting node to all the other nodes in the graph. Since most of the games only need the shortest path between one starting point and one target point, all the other paths generated or found by this algorithm are not really useful. We can stop the algorithm, once we find the shortest path from a single starting point to a target point. But still it'll try to find all the shortest paths from all the points it has visited. So, this algorithm is not efficient enough to be used in most games. And we won't be doing a Unity demo of Dijkstra's algorithm in this article as well. However, Dijkstra's algorithm is an important algorithm for the games that require strategic AI that needs as much information as possible about the map to make tactical decisions. It has many applications other than games, such as finding the shortest path in network routing protocols. Summary Game AI and academic AI have different objectives. Academic AI researches try to solve real-world problems, and prove a theory without much limited resources. Game AI focuses on building NPCs within limited resources that seems to be intelligent to the player. Objective of AI in games is to provide a challenging opponent that makes the game more fun to play with. We also learned briefly about the different AI techniques that are widely used in games such as finite state machines (FSMs), random and probability, sensor and input system, flocking and group behaviors, path following and steering behaviors, AI path finding, navigation mesh generation, and behavior trees. Resources for Article: Further resources on this subject: Unity 3-0 Enter the Third Dimension [Article] Introduction to Game Development Using Unity 3D [Article] Unity Game Development: Welcome to the 3D world [Article]
Read more
  • 0
  • 0
  • 41338

article-image-techniques-and-practices-game-ai
Packt
14 Jan 2016
10 min read
Save for later

Techniques and Practices of Game AI

Packt
14 Jan 2016
10 min read
In this article by Peter L Newton, author of the book Learning Unreal AI Programming, we will understand the fundamental techniques and practices of game AI. This will be the building block to developing an amazing and interesting game AI. (For more resources related to this topic, see here.) Navigation While all the following components aren't necessary to achieve AI navigation, they all contribute critical feedback that can affect navigation. Navigating within a world is limited only by the pathways within the game. Navigation for AI is built up of the following things: Path following (path nodes): Another solution similar to NavMesh, path nodes can designate the space in which the AI traverses. Navigation mesh: Using tools such as Navigation Mesh, also known as NavMesh, you can designate areas in which the AI can traverse. NavMesh generates a plot of grids that is used to calculate the path and cost during navigation. It's important to know that this is only one of several pathfinding techniques available; we use it because it works well in this demonstration. Behavior trees: Using behavior trees to influence your AI's next destination can create a more interesting player experience. It not only calculates its requested destination, but also decides whether it should enter the screen with a cartwheel double backflip, no hands or try the triple somersault to jazz hands. Steering behaviors: Steering behaviors affect the way the AI moves while navigating to avoid obstacles. This also means using steering to create formations with your fleets that you have set to attack the king's wall. Steering can be used in many ways to influence the movement of the character. Sensory systems: Sensory systems can provide critical details, such as players nearby, sound levels, cover nearby, and many other variables of the environment that can alter movement. It's critical that your AI understands the changing environment so that it doesn't break the illusion of being a real opponent. Achieving realistic movement with steering When you think of what steering does for a car, you would be right to imagine that the same idea is applied to game AI navigation. Steering influences the movement of AI elements as they traverse to their next destination. The influences can be supplied as necessary, but we will go over the most commonly used ones. Avoidance is used essentially to avoid colliding with oncoming AI. Flocking is another key factor in steering; you commonly see an example of it while watching a school of fish. This phenomenon, known as flocking, is useful in simulating interesting group movement; simulate a complete panic or a school of fish. The goal of steering behaviors is to achieve realistic movement behavior within the player's world. Creating character with randomness and probability AI with character is what randomness and probability adds to the bots decision making. If a bot attacked you the same way, always entered the scene the same way, and annoyed you with its laugh after every successful hit, it wouldn't make for a unique experience—the AI always does the same thing. By using randomness and probability, you can instead make the AI laugh based on probability or introduce randomness to the AI's skill of choice. Another great by-product of applying randomness and probability is that it allows you to introduce levels of difficulty. You can lower the chance of missing the skill cast or even allow the bots to aim more precisely. If you have bots who wander around looking for enemies, their next destination can be randomly chosen. Creating complex decision making with behavior trees Finite State Machines (FSM) allow your bot to perform transitions between states. This allows it to go from wandering to hunting and then to killing. Behavior trees are similar but allow more flexibility. Behavior trees allow hierarchical FSM, which introduces another layer of decisions. So, the bot decides between branches of behaviors that define the state it is in. There is a tool provided by UE4 called Behavior Tree. Its editor tool allows you to modify AI behavior quickly and with ease. The following sections show the components found within UE4's Behavior Tree. Root This node is the starting node that sends the signal to the next node in the tree. This would connect to a composite that begins your first tree. What you may notice is that you are required to use a composite first to define a tree and then create the task for that tree. This is because a hierarchical FSM creates branches of states. These states will be populated with other states or tasks. This allows easy transitions between multiple states. Decorators This node creates another task, which you can add on top of the node as a "decoration". This could be, for example, a Force Success decorator when using a sequence composite or using a loop to have a node's actions repeated a number of times. I used a decorator in the AI we will make that tells it to update to the next available route. Consider the following screenshot: In the preceding screenshot, you see the Attack & Destroy decorator at the top of the composite, which defines the state. This state includes two tasks, Attack Enemy and Move To Enemy, the latter of which also has a decorator telling it to execute only when the bot state is searching. Composites These are the starting points of the states. They define how the state will behave with returns and execution flow. There is a Selector in our example that will execute each of its children from left to right and doesn't fail but returns success when one of its children returns success. Therefore, this is good for a state that doesn't check for successfully executed nodes. The Sequence executes its children in a similar fashion to the Selector, but returns a fail message when one of its children returns fail. This means that it's required that the nodes return a success message to complete the sequence. Last but not least is Simple Parallel. This allows you to execute a task and a tree at essentially the same time. This is great for creating a state that will require another task to always be called. So, to set it up, you first need to connect it to a task that it will execute. The second task or state that is connected continues to be called with the first task until the first task returns a success message. Services Services run as long as the composite that it is added to stays activated. They tick on the intervals that you set within the properties. They have another float property that allows you to create deviations in the tick intervals. Services are used to modify the state of the AI in most cases, because it's always called. For example, in the bot that we will create, we add a service to the first branch of the tree so that it's called without interruption, thus being able to maintain the state that the bot should be in at any given movement. This service, called Detect Enemy, actually runs a deviating cycle that updates Blackboard variables, such as State and EnemyActor: Tasks Tasks do the dirty work and report with a success or failed message if necessary. They have two nodes, which you'll use most often when working with a task: Event Receive Execute, which receives the signal to execute the connected scripts, and Finish Execute, which sends the signal back, returning a true or false message on success. This is important when making a task meant for the Sequence composite. Blackboards Blackboards are used to store variables within the behavior tree of the AI. In our example, we store an enumeration variable, State, to store the state, TargetPoint to hold the currently targeted enemy, and Route, which stores the current route position the AI has been requested to travel to, just to name a few. Blackboards work just by setting a public variable of a node to one of the available Blackboard variables in the drop-down menu. The naming convention shown in the following screenshot makes this process streamlined: Sensory system Creating a sensory system is heavily based on the environment where the AI will be fighting the player. It will need to be able to find cover, evade the enemy, get ammo, and other features that you feel will create an immersive AI for your game. Games with AI that challenges the player create a unique individual experience. A good sensory system contributes critical information, which makes for reactive AI. In this project, we use the sensory system to detect pawns that the AI can see. We also use functions to check for the line of sight of the enemy. We check whether there is another pawn in our path. We can check for cover and other resources within the area. Machine learning Machine learning is a branch of its own. This technique allows AI to learn from situations and simulations. The inputs are from the environment, including the context in which the bot allows it to make decisive actions. In machine learning, the inputs are put within a classifier, which can predict a set of outputs with a certain level of certainty. Classifiers can be combined into ensembles to increase the accuracy of the probabilistic prediction. We don't dig heavily into this subject, but I will provide some material for those interested. Tracing Tracing allows another actor within the world to detect objects by ray tracing. A single line trace is sent out, and if it collides with an actor, the actor is returned, including the information about the impact. Tracing is used for many reasons. One way it is used in FPS games is to detect hits. Are you familiar with the hit box? When your player shoots in a game, a trace is shot out that collides with the opponent's hit box, determining the damage to your opponent and, if you're skillful enough, resulting in their death. There are other shapes available for traces, such as spheres, capsules, and boxes, which allow tracing for different situations. Recently, I used the box trace for my car in order to detect objects near it. Influence mapping Influence mapping isn't a finite approach; it's the idea that specific locations on the map would contribute information that directly influences the player or AI. An example when using influence mapping with AI is presence falloff. Say we have enemy AI in a group. Their presence map would create a radial circle around the group with an intensity based on the size of the group. This way, other AI elements know that on entering this area, they're entering a zone occupied by enemy AI. Practical information isn't the only thing people use this for, so just understand that it's meant to provide another level of input to help your bot make additional decisions. Summary In this article, we saw the fundamental techniques and practices of game AI. We saw how to implement navigation, achieve realistic movement of AI elements, and create characters with randomness in order to achieve a sense of realism. We also looked at behavior trees and all their constituent elements. Further, we touched upon some aspects related to AI, such as machine learning and tracing. Resources for Article: Further resources on this subject: Overview of Unreal Engine 4[article] The Unreal Engine[article] Creating weapons for your game using UnrealScript[article]
Read more
  • 0
  • 0
  • 40596

article-image-dynamic-graphics
Packt
22 Feb 2016
64 min read
Save for later

Dynamic Graphics

Packt
22 Feb 2016
64 min read
There is no question that the rendering system of modern graphics devices is complicated. Even rendering a single triangle to the screen engages many of these components, since GPUs are designed for large amounts of parallelism, as opposed to CPUs, which are designed to handle virtually any computational scenario. Modern graphics rendering is a high-speed dance of processing and memory management that spans software, hardware, multiple memory spaces, multiple languages, multiple processors, multiple processor types, and a large number of special-case features that can be thrown into the mix. To make matters worse, every graphics situation we will come across is different in its own way. Running the same application against a different device, even by the same manufacturer, often results in an apples-versus-oranges comparison due to the different capabilities and functionality they provide. It can be difficult to determine where a bottleneck resides within such a complex chain of devices and systems, and it can take a lifetime of industry work in 3D graphics to have a strong intuition about the source of performance issues in modern graphics systems. Thankfully, Profiling comes to the rescue once again. If we can gather data about each component, use multiple performance metrics for comparison, and tweak our Scenes to see how different graphics features affect their behavior, then we should have sufficient evidence to find the root cause of the issue and make appropriate changes. So in this article, you will learn how to gather the right data, dig just deep enough into the graphics system to find the true source of the problem, and explore various solutions to work around a given problem. There are many more topics to cover when it comes to improving rendering performance, so in this article we will begin with some general techniques on how to determine whether our rendering is limited by the CPU or by the GPU, and what we can do about either case. We will discuss optimization techniques such as Occlusion Culling and Level of Detail (LOD) and provide some useful advice on Shader optimization, as well as large-scale rendering features such as lighting and shadows. Finally, since mobile devices are a common target for Unity projects, we will also cover some techniques that may help improve performance on limited hardware. (For more resources related to this topic, see here.) Profiling rendering issues Poor rendering performance can manifest itself in a number of ways, depending on whether the device is CPU-bound, or GPU-bound; in the latter case, the root cause could originate from a number of places within the graphics pipeline. This can make the investigatory stage rather involved, but once the source of the bottleneck is discovered and the problem is resolved, we can expect significant improvements as small fixes tend to reap big rewards when it comes to the rendering subsystem. The CPU sends rendering instructions through the graphics API, that funnel through the hardware driver to the GPU device, which results in commands entering the GPU's Command Buffer. These commands are processed by the massively parallel GPU system one by one until the buffer is empty. But there are a lot more nuances involved in this process. The following shows a (greatly simplified) diagram of a typical GPU pipeline (which can vary based on technology and various optimizations), and the broad rendering steps that take place during each stage: The top row represents the work that takes place on the CPU, the act of calling into the graphics API, through the hardware driver, and pushing commands into the GPU. Ergo, a CPU-bound application will be primarily limited by the complexity, or sheer number, of graphics API calls. Meanwhile, a GPU-bound application will be limited by the GPU's ability to process those calls, and empty the Command Buffer in a reasonable timeframe to allow for the intended frame rate. This is represented in the next two rows, showing the steps taking place in the GPU. But, because of the device's complexity, they are often simplified into two different sections: the front end and the back end. The front end refers to the part of the rendering process where the GPU has received mesh data, a draw call has been issued, and all of the information that was fed into the GPU is used to transform vertices and run through Vertex Shaders. Finally, the rasterizer generates a batch of fragments to be processed in the back end. The back end refers to the remainder of the GPU's processing stages, where fragments have been generated, and now they must be tested, manipulated, and drawn via Fragment Shaders onto the frame buffer in the form of pixels. Note that "Fragment Shader" is the more technically accurate term for Pixel Shaders. Fragments are generated by the rasterization stage, and only technically become pixels once they've been processed by the Shader and drawn to the Frame Buffer. There are a number of different approaches we can use to determine where the root cause of a graphics rendering issue lies: Profiling the GPU with the Profiler Examining individual frames with the Frame Debugger Brute Force Culling GPU profiling Because graphics rendering involves both the CPU and GPU, we must examine the problem using both the CPU Usage and GPU Usage areas of the Profiler as this can tell us which component is working hardest. For example, the following screenshot shows the Profiler data for a CPU-bound application. The test involved creating thousands of simple objects, with no batching techniques taking place. This resulted in an extremely large Draw Call count (around 15,000) for the CPU to process, but giving the GPU relatively little work to do due to the simplicity of the objects being rendered: This example shows that the CPU's "rendering" task is consuming a large amount of cycles (around 30 ms per frame), while the GPU is only processing for less than 16 ms, indicating that the bottleneck resides in the CPU. Meanwhile, Profiling a GPU-bound application via the Profiler is a little trickier. This time, the test involves creating a small number of high polycount objects (for a low Draw Call per vertex ratio), with dozens of real-time point lights and an excessively complex Shader with a texture, normal texture, heightmap, emission map, occlusion map, and so on, (for a high workload per pixel ratio). The following screenshot shows Profiler data for the example Scene when it is run in a standalone application: As we can see, the rendering task of the CPU Usage area matches closely with the total rendering costs of the GPU Usage area. We can also see that the CPU and GPU time costs at the bottom of the image are relatively similar (41.48 ms versus 38.95 ms). This is very unintuitive as we would expect the GPU to be working much harder than the CPU. Be aware that the CPU/GPU millisecond cost values are not calculated or revealed unless the appropriate Usage Area has been added to the Profiler window. However, let's see what happens when we test the same exact Scene through the Editor: This is a better representation of what we would expect to see in a GPU-bound application. We can see how the CPU and GPU time costs at the bottom are closer to what we would expect to see (2.74 ms vs 64.82 ms). However, this data is highly polluted. The spikes in the CPU and GPU Usage areas are the result of the Profiler Window UI updating during testing, and the overhead cost of running through the Editor is also artificially increasing the total GPU time cost. It is unclear what causes the data to be treated this way, and this could certainly change in the future if enhancements are made to the Profiler in future versions of Unity, but it is useful to know this drawback. Trying to determine whether our application is truly GPU-bound is perhaps the only good excuse to perform a Profiler test through the Editor. The Frame Debugger A new feature in Unity 5 is the Frame Debugger, a debugging tool that can reveal how the Scene is rendered and pieced together, one Draw Call at a time. We can click through the list of Draw Calls and observe how the Scene is rendered up to that point in time. It also provides a lot of useful details for the selected Draw Call, such as the current render target (for example, the shadow map, the camera depth texture, the main camera, or other custom render targets), what the Draw Call did (drawing a mesh, drawing a static batch, drawing depth shadows, and so on), and what settings were used (texture data, vertex colors, baked lightmaps, directional lighting, and so on). The following screenshot shows a Scene that is only being partially rendered due to the currently selected Draw Call within the Frame Debugger. Note the shadows that are visible from baked lightmaps that were rendered during an earlier pass before the object itself is rendered: If we are bound by Draw Calls, then this tool can be effective in helping us figure out what the Draw Calls are being spent on, and determine whether there are any unnecessary Draw Calls that are not having an effect on the scene. This can help us come up with ways to reduce them, such as removing unnecessary objects or batching them somehow. We can also use this tool to observe how many additional Draw Calls are consumed by rendering features, such as shadows, transparent objects, and many more. This could help us, when we're creating multiple quality levels for our game, to decide what features to enable/disable under the low, medium, and high quality settings. Brute force testing If we're poring over our Profiling data, and we're still not sure we can determine the source of the problem, we can always try the brute force method: cull a specific activity from the Scene and see if it results in greatly increased performance. If a small change results in a big speed improvement, then we have a strong clue about where the bottleneck lies. There's no harm in this approach if we eliminate enough unknown variables to be sure the data is leading us in the right direction. We will cover different ways to brute force test a particular issue in each of the upcoming sections. CPU-bound If our application is CPU-bound, then we will observe a generally poor FPS value within the CPU Usage area of the Profiler window due to the rendering task. However, if VSync is enabled the data will often get muddied up with large spikes representing pauses as the CPU waits for the screen refresh rate to come around before pushing the current frame buffer. So, we should make sure to disable the VSync block in the CPU Usage area before deciding the CPU is the problem. Brute-forcing a test for CPU-bounding can be achieved by reducing Draw Calls. This is a little unintuitive since, presumably, we've already been reducing our Draw Calls to a minimum through techniques such as Static and Dynamic Batching, Atlasing, and so forth. This would mean we have very limited scope for reducing them further. What we can do, however, is disable the Draw-Call-saving features such as batching and observe if the situation gets significantly worse than it already is. If so, then we have evidence that we're either already, or very close to being, CPU-bound. At this point, we should see whether we can re-enable these features and disable rendering for a few choice objects (preferably those with low complexity to reduce Draw Calls without over-simplifying the rendering of our scene). If this results in a significant performance improvement then, unless we can find further opportunities for batching and mesh combining, we may be faced with the unfortunate option of removing objects from our scene as the only means of becoming performant again. There are some additional opportunities for Draw Call reduction, including Occlusion Culling, tweaking our Lighting and Shadowing, and modifying our Shaders. These will be explained in the following sections. However, Unity's rendering system can be multithreaded, depending on the targeted platform, which version of Unity we're running, and various settings, and this can affect how the graphics subsystem is being bottlenecked by the CPU, and slightly changes the definition of what being CPU-bound means. Multithreaded rendering Multithreaded rendering was first introduced in Unity v3.5 in February 2012, and enabled by default on multicore systems that could handle the workload; at the time, this was only PC, Mac, and Xbox 360. Gradually, more devices were added to this list, and since Unity v5.0, all major platforms now enable multithreaded rendering by default (and possibly some builds of Unity 4). Mobile devices were also starting to feature more powerful CPUs that could support this feature. Android multithreaded rendering (introduced in Unity v4.3) can be enabled through a checkbox under Platform Settings | Other Settings | Multithreaded Rendering. Multithreaded rendering on iOS can be enabled by configuring the application to make use of the Apple Metal API (introduced in Unity v4.6.3), under Player Settings | Other Settings | Graphics API. When multithreaded rendering is enabled, tasks that must go through the rendering API (OpenGL, DirectX, or Metal), are handed over from the main thread to a "worker thread". The worker thread's purpose is to undertake the heavy workload that it takes to push rendering commands through the graphics API and driver, to get the rendering instructions into the GPU's Command Buffer. This can save an enormous number of CPU cycles for the main thread, where the overwhelming majority of other CPU tasks take place. This means that we free up extra cycles for the majority of the engine to process physics, script code, and so on. Incidentally, the mechanism by which the main thread notifies the worker thread of tasks operates in a very similar way to the Command Buffer that exists on the GPU, except that the commands are much more high-level, with instructions like "render this object, with this Material, using this Shader", or "draw N instances of this piece of procedural geometry", and so on. This feature has been exposed in Unity 5 to allow developers to take direct control of the rendering subsystem from C# code. This customization is not as powerful as having direct API access, but it is a step in the right direction for Unity developers to implement unique graphical effects. Confusingly, the Unity API name for this feature is called "CommandBuffer", so be sure not to confuse it with the GPU's Command Buffer. Check the Unity documentation on CommandBuffer to make use of this feature: http://docs.unity3d.com/ScriptReference/Rendering.CommandBuffer.html. Getting back to the task at hand, when we discuss the topic of being CPU-bound in graphics rendering, we need to keep in mind whether or not the multithreaded renderer is being used, since the actual root cause of the problem will be slightly different depending on whether this feature is enabled or not. In single-threaded rendering, where all graphics API calls are handled by the main thread, and in an ideal world where both components are running at maximum capacity, our application would become bottlenecked on the CPU when 50 percent or more of the time per frame is spent handling graphics API calls. However, resolving these bottlenecks can be accomplished by freeing up work from the main thread. For example, we might find that greatly reducing the amount of work taking place in our AI subsystem will improve our rendering significantly because we've freed up more CPU cycles to handle the graphics API calls. But, when multithreaded rendering is taking place, this task is pushed onto the worker thread, which means the same thread isn't being asked to manage both engine work and graphics API calls at the same time. These processes are mostly independent, and even though additional work must still take place in the main thread to send instructions to the worker thread in the first place (via the internal CommandBuffer system), it is mostly negligible. This means that reducing the workload in the main thread will have little-to-no effect on rendering performance. Note that being GPU-bound is the same regardless of whether multithreaded rendering is taking place. GPU Skinning While we're on the subject of CPU-bounding, one task that can help reduce CPU workload, at the expense of additional GPU workload, is GPU Skinning. Skinning is the process where mesh vertices are transformed based on the current location of their animated bones. The animation system, working on the CPU, only transforms the bones, but another step in the rendering process must take care of the vertex transformations to place the vertices around those bones, performing a weighted average over the bones connected to those vertices. This vertex processing task can either take place on the CPU or within the front end of the GPU, depending on whether the GPU Skinning option is enabled. This feature can be toggled under Edit | Project Settings | Player Settings | Other Settings | GPU Skinning. Front end bottlenecks It is not uncommon to use a mesh that contains a lot of unnecessary UV and Normal vector data, so our meshes should be double-checked for this kind of superfluous fluff. We should also let Unity optimize the structure for us, which minimizes cache misses as vertex data is read within the front end. We will also learn some useful Shader optimization techniques shortly, when we begin to discuss back end optimizations, since many optimization techniques apply to both Fragment and Vertex Shaders. The only attack vector left to cover is finding ways to reduce actual vertex counts. The obvious solutions are simplification and culling; either have the art team replace problematic meshes with lower polycount versions, and/or remove some objects from the scene to reduce the overall polygon count. If these approaches have already been explored, then the last approach we can take is to find some kind of middle ground between the two. Level Of Detail Since it can be difficult to tell the difference between a high quality distance object and a low quality one, there is very little reason to render the high quality version. So, why not dynamically replace distant objects with something more simplified? Level Of Detail (LOD), is a broad term referring to the dynamic replacement of features based on their distance or form factor relative to the camera. The most common implementation is mesh-based LOD: dynamically replacing a mesh with lower and lower detailed versions as the camera gets farther and farther away. Another example might be replacing animated characters with versions featuring fewer bones, or less sampling for distant objects, in order to reduce animation workload. The built-in LOD feature is available in the Unity 4 Pro Edition and all editions of Unity 5. However, it is entirely possible to implement it via Script code in Unity 4 Free Edition if desired. Making use of LOD can be achieved by placing multiple objects in the Scene and making them children of a GameObject with an attached LODGroup component. The LODGroup's purpose is to generate a bounding box from these objects, and decide which object should be rendered based on the size of the bounding box within the camera's field of view. If the object's bounding box consumes a large area of the current view, then it will enable the mesh(es) assigned to lower LOD groups, and if the bounding box is very small, it will replace the mesh(es) with those from higher LOD groups. If the mesh is too far away, it can be configured to hide all child objects. So, with the proper setup, we can have Unity replace meshes with simpler alternatives, or cull them entirely, which eases the burden on the rendering process. Check the Unity documentation for more detailed information on the LOD feature: http://docs.unity3d.com/Manual/LevelOfDetail.html. This feature can cost us a large amount of development time to fully implement; artists must generate lower polygon count versions of the same object, and level designers must generate LOD groups, configure them, and test them to ensure they don't cause jarring transitions as the camera moves closer or farther away. It also costs us in memory and runtime CPU; the alternative meshes need to be kept in memory, and the LODGroup component must routinely test whether the camera has moved to a new position that warrants a change in LOD level. In this era of graphics card capabilities, vertex processing is often the least of our concerns. Combined with the additional sacrifices needed for LOD to function, developers should avoid preoptimizing by automatically assuming LOD will help them. Excessive use of the feature will lead to burdening other parts of our application's performance, and chew up precious development time, all for the sake of paranoia. If it hasn't been proven to be a problem, then it's probably not a problem! Scenes that feature large, expansive views of the world, and lots of camera movement, should consider implementing this technique very early, as the added distance and massive number of visible objects will exacerbate the vertex count enormously. Scenes that are always indoors, or feature a camera with a viewpoint looking down at the world (real-time strategy and MOBA games, for example) should probably steer clear of implementing LOD from the beginning. Games somewhere between the two should avoid it until necessary. It all depends on how many vertices are expected to be visible at any given time and how much variability in camera distance there will be. Note that some game development middleware companies offer third-party tools for automated LOD mesh generation. These might be worth investigating to compare their ease of use versus quality loss versus cost effectiveness. Disable GPU Skinning As previously mentioned, we could enable GPU Skinning to reduce the burden on a CPU-bound application, but enabling this feature will push the same workload into the front end of the GPU. Since Skinning is one of those "embarrassingly parallel" processes that fits well with the GPU's parallel architecture, it is often a good idea to perform the task on the GPU. But this task can chew up precious time in the front end preparing the vertices for fragment generation, so disabling it is another option we can explore if we're bottlenecked in this area. Again, this feature can be toggled under Edit | Project Settings | Player Settings | Other Settings | GPU Skinning. GPU Skinning is available in Unity 4 Pro Edition, and all editions of Unity 5. Reduce tessellation There is one last task that takes place in the front end process and that we need to consider: tessellation. Tessellation through Geometry Shaders can be a lot of fun, as it is a relatively underused technique that can really make our graphical effects stand out from the crowd of games that only use the most common effects. But, it can contribute enormously to the amount of processing work taking place in the front end. There are no simple tricks we can exploit to improve tessellation, besides improving our tessellation algorithms, or easing the burden caused by other front end tasks to give our tessellation tasks more room to breathe. Either way, if we have a bottleneck in the front end and are making use of tessellation techniques, we should double-check that they are not consuming the lion's share of the front end's budget. Back end bottlenecks The back end is the more interesting part of the GPU pipeline, as many more graphical effects take place during this stage. Consequently, it is the stage that is significantly more likely to suffer from bottlenecks. There are two brute force tests we can attempt: Reduce resolution Reduce texture quality These changes will ease the workload during two important stages at the back end of the pipeline: fill rate and memory bandwidth, respectively. Fill rate tends to be the most common source of bottlenecks in the modern era of graphics rendering, so we will cover it first. Fill rate By reducing screen resolution, we have asked the rasterization system to generate significantly fewer fragments and transpose them over a smaller canvas of pixels. This will reduce the fill rate consumption of the application, giving a key part of the rendering pipeline some additional breathing room. Ergo, if performance suddenly improves with a screen resolution reduction, then fill rate should be our primary concern. Fill rate is a very broad term referring to the speed at which the GPU can draw fragments. But, this only includes fragments that have survived all of the various conditional tests we might have enabled within the given Shader. A fragment is merely a "potential pixel," and if it fails any of the enabled tests, then it is immediately discarded. This can be an enormous performance-saver as the pipeline can skip the costly drawing step and begin work on the next fragment instead. One such example is Z-testing, which checks whether the fragment from a closer object has already been drawn to the same pixel already. If so, then the current fragment is discarded. If not, then the fragment is pushed through the Fragment Shader and drawn over the target pixel, which consumes exactly one draw from our fill rate. Now imagine multiplying this process by thousands of overlapping objects, each generating hundreds or thousands of possible fragments, for high screen resolutions causing millions, or billions, of fragments to be generated each and every frame. It should be fairly obvious that skipping as many of these draws as we can will result in big rendering cost savings. Graphics card manufacturers typically advertise a particular fill rate as a feature of the card, usually in the form of gigapixels per second, but this is a bit of a misnomer, as it would be more accurate to call it gigafragments per second; however this argument is mostly academic. Either way, larger values tell us that the device can potentially push more fragments through the pipeline, so with a budget of 30 GPix/s and a target frame rate of 60 Hz, we can afford to process 30,000,000,000/60 = 500 million fragments per frame before being bottlenecked on fill rate. With a resolution of 2560x1440, and a best-case scenario where each pixel is only drawn over once, then we could theoretically draw the entire scene about 125 times without any noticeable problems. Sadly, this is not a perfect world, and unless we take significant steps to avoid it, we will always end up with some amount of redraw over the same pixels due to the order in which objects are rendered. This is known as overdraw, and it can be very costly if we're not careful. The reason that resolution is a good attack vector to check for fill rate bounding is that it is a multiplier. A reduction from a resolution of 2560x1440 to 800x600 is an improvement factor of about eight, which could reduce fill rate costs enough to make the application perform well again. Overdraw Determining how much overdraw we have can be represented visually by rendering all objects with additive alpha blending and a very transparent flat color. Areas of high overdraw will show up more brightly as the same pixel is drawn over with additive blending multiple times. This is precisely how the Scene view's Overdraw shading mode reveals how much overdraw our scene is suffering. The following screenshot shows a scene with several thousand boxes drawn normally, and drawn using the Scene view's Overdraw shading mode: At the end of the day, fill rate is provided as a means of gauging the best-case behavior. In other words, it's primarily a marketing term and mostly theoretical. But, the technical side of the industry has adopted the term as a way of describing the back end of the pipeline: the stage where fragment data is funneled through our Shaders and drawn to the screen. If every fragment required an absolute minimum level of processing (such as a Shader that returned a constant color), then we might get close to that theoretical maximum. The GPU is a complex beast, however, and things are never so simple. The nature of the device means it works best when given many small tasks to perform. But, if the tasks get too large, then fill rate is lost due to the back end not being able to push through enough fragments in time and the rest of the pipeline is left waiting for tasks to do. There are several more features that can potentially consume our theoretical fill rate maximum, including but not limited to alpha testing, alpha blending, texture sampling, the amount of fragment data being pulled through our Shaders, and even the color format of the target render texture (the final Frame Buffer in most cases). The bad news is that this gives us a lot of subsections to cover, and a lot of ways to break the process, but the good news is it gives us a lot of avenues to explore to improve our fill rate usage. Occlusion Culling One of the best ways to reduce overdraw is to make use of Unity's Occlusion Culling system. The system works by partitioning Scene space into a series of cells and flying through the world with a virtual camera making note of which cells are invisible from other cells (are occluded) based on the size and position of the objects present. Note that this is different to the technique of Frustum Culling, which culls objects not visible from the current camera view. This feature is always active in all versions, and objects culled by this process are automatically ignored by the Occlusion Culling system. Occlusion Culling is available in the Unity 4 Pro Edition and all editions of Unity 5. Occlusion Culling data can only be generated for objects properly labeled Occluder Static and Occludee Static under the StaticFlags dropdown. Occluder Static is the general setting for static objects where we want it to hide other objects, and be hidden by large objects in its way. Occludee Static is a special case for transparent objects that allows objects behind them to be rendered, but we want them to be hidden if something large blocks their visibility. Naturally, because one of the static flags must be enabled for Occlusion Culling, this feature will not work for dynamic objects. The following screenshot shows how effective Occlusion Culling can be at reducing the number of visible objects in our Scene: This feature will cost us in both application footprint and incur some runtime costs. It will cost RAM to keep the Occlusion Culling data structure in memory, and there will be a CPU processing cost to determine which objects are being occluded in each frame. The Occlusion Culling data structure must be properly configured to create cells of the appropriate size for our Scene, and the smaller the cells, the longer it takes to generate the data structure. But, if it is configured correctly for the Scene, Occlusion Culling can provide both fill rate savings through reduced overdraw, and Draw Call savings by culling non-visible objects. Shader optimization Shaders can be a significant fill rate consumer, depending on their complexity, how much texture sampling takes place, how many mathematical functions are used, and so on. Shaders do not directly consume fill rate, but do so indirectly because the GPU must calculate or fetch data from memory during Shader processing. The GPU's parallel nature means any bottleneck in a thread will limit how many fragments can be pushed into the thread at a later date, but parallelizing the task (sharing small pieces of the job between several agents) provides a net gain over serial processing (one agent handling each task one after another). The classic example is a vehicle assembly line. A complete vehicle requires multiple stages of manufacture to complete. The critical path to completion might involve five steps: stamping, welding, painting, assembly, and inspection, and each step is completed by a single team. For any given vehicle, no stage can begin before the previous one is finished, but whatever team handled the stamping for the last vehicle can begin stamping for the next vehicle as soon as it has finished. This organization allows each team to become masters of their particular domain, rather than trying to spread their knowledge too thin, which would likely result in less consistent quality in the batch of vehicles. We can double the overall output by doubling the number of teams, but if any team gets blocked, then precious time is lost for any given vehicle, as well as all future vehicles that would pass through the same team. If these delays are rare, then they can be negligible in the grand scheme, but if not, and one stage takes several minutes longer than normal each and every time it must complete the task, then it can become a bottleneck that threatens the release of the entire batch. The GPU parallel processors work in a similar way: each processor thread is an assembly line, each processing stage is a team, and each fragment is a vehicle. If the thread spends a long time processing a single stage, then time is lost on each fragment. This delay will multiply such that all future fragments coming through the same thread will be delayed. This is a bit of an oversimplification, but it often helps to paint a picture of how poorly optimized Shader code can chew up our fill rate, and how small improvements in Shader optimization provide big benefits in back end performance. Shader programming and optimization have become a very niche area of game development. Their abstract and highly-specialized nature requires a very different kind of thinking to generate Shader code compared to gameplay and engine code. They often feature mathematical tricks and back-door mechanisms for pulling data into the Shader, such as precomputing values in texture files. Because of this, and the importance of optimization, Shaders tend to be very difficult to read and reverse-engineer. Consequently, many developers rely on prewritten Shaders, or visual Shader creation tools from the Asset Store such as Shader Forge or Shader Sandwich. This simplifies the act of initial Shader code generation, but might not result in the most efficient form of Shaders. If we're relying on pre-written Shaders or tools, we might find it worthwhile to perform some optimization passes over them using some tried-and-true techniques. So, let's focus on some easily reachable ways of optimizing our Shaders. Consider using Shaders intended for mobile platforms The built-in mobile Shaders in Unity do not have any specific restrictions that force them to only be used on mobile devices. They are simply optimized for minimum resource usage (and tend to feature some of the other optimizations listed in this section). Desktop applications are perfectly capable of using these Shaders, but they tend to feature a loss of graphical quality. It only becomes a question of whether the loss of graphical quality is acceptable. So, consider doing some testing with the mobile equivalents of common Shaders to see whether they are a good fit for your game. Use small data types GPUs can calculate with smaller data types more quickly than larger types (particularly on mobile platforms!), so the first tweak we can attempt is replacing our float data types (32-bit, floating point) with smaller versions such as half (16-bit, floating point), or even fixed (12-bit, fixed point). The size of the data types listed above will vary depending on what floating point formats the target platform prefers. The sizes listed are the most common. The importance for optimization is in the relative size between formats. Color values are good candidates for precision reduction, as we can often get away with less precise color values without any noticeable loss in coloration. However, the effects of reducing precision can be very unpredictable for graphical calculations. So, changes such as these can require some testing to verify whether the reduced precision is costing too much graphical fidelity. Note that the effects of these tweaks can vary enormously between one GPU architecture and another (for example, AMD versus Nvidia versus Intel), and even GPU brands from the same manufacturer. In some cases, we can make some decent performance gains for a trivial amount of effort. In other cases, we might see no benefit at all. Avoid changing precision while swizzling Swizzling is the Shader programming technique of creating a new vector (an array of values) from an existing vector by listing the components in the order in which we wish to copy them into the new structure. Here are some examples of swizzling: float4 input = float4(1.0, 2.0, 3.0, 4.0); // initial test value float2 val1 = input.yz; // swizzle two components float3 val2 = input.zyx; // swizzle three components in a different order float4 val3 = input.yyy; // swizzle the same component multiple times float sclr = input.w; float3 val4 = sclr.xxx // swizzle a scalar multiple times We can use both the xyzw and rgba representations to refer to the same components, sequentially. It does not matter whether it is a color or vector; they just make the Shader code easier to read. We can also list components in any order we like to fill in the desired data, repeating them if necessary. Converting from one precision type to another in a Shader can be a costly operation, but converting the precision type while simultaneously swizzling can be particularly painful. If we have mathematical operations that rely on being swizzled into different precision types, it would be wiser if we simply absorbed the high-precision cost from the very beginning, or reduced precision across the board to avoid the need for changes in precision. Use GPU-optimized helper functions The Shader compiler often performs a good job of reducing mathematical calculations down to an optimized version for the GPU, but compiled custom code is unlikely to be as effective as both the Cg library's built-in helper functions and the additional helpers provided by the Unity Cg included files. If we are using Shaders that include custom function code, perhaps we can find an equivalent helper function within the Cg or Unity libraries that can do a better job than our custom code can. These extra include files can be added to our Shader within the CGPROGRAM block like so: CGPROGRAM // other includes #include "UnityCG.cginc" // Shader code here ENDCG Example Cg library functions to use are abs() for absolute values, lerp() for linear interpolation, mul() for multiplying matrices, and step() for step functionality. Useful UnityCG.cginc functions include WorldSpaceViewDir() for calculating the direction towards the camera, and Luminance() for converting a color to grayscale. Check the following URL for a full list of Cg standard library functions: http://http.developer.nvidia.com/CgTutorial/cg_tutorial_appendix_e.html. Check the Unity documentation for a complete and up-to-date list of possible include files and their accompanying helper functions: http://docs.unity3d.com/Manual/SL-BuiltinIncludes.html. Disable unnecessary features Perhaps we can make savings by simply disabling Shader features that aren't vital. Does the Shader really need multiple passes, transparency, Z-writing, alpha-testing, and/or alpha blending? Will tweaking these settings or removing these features give us a good approximation of our desired effect without losing too much graphical fidelity? Making such changes is a good way of making fill rate cost savings. Remove unnecessary input data Sometimes the process of writing a Shader involves a lot of back and forth experimentation in editing code and viewing it in the Scene. The typical result of this is that input data that was needed when the Shader was going through early development is now surplus fluff once the desired effect has been obtained, and it's easy to forget what changes were made when/if the process drags on for a long time. But, these redundant data values can cost the GPU valuable time as they must be fetched from memory even if they are not explicitly used by the Shader. So, we should double check our Shaders to ensure all of their input geometry, vertex, and fragment data is actually being used. Only expose necessary variables Exposing unnecessary variables from our Shader to the accompanying Material(s) can be costly as the GPU can't assume these values are constant. This means the Shader code cannot be compiled into a more optimized form. This data must be pushed from the CPU with every pass since they can be modified at any time through the Material's methods such as SetColor(), SetFloat(), and so on. If we find that, towards the end of the project, we always use the same value for these variables, then they can be replaced with a constant in the Shader to remove such excess runtime workload. The only cost is obfuscating what could be critical graphical effect parameters, so this should be done very late in the process. Reduce mathematical complexity Complicated mathematics can severely bottleneck the rendering process, so we should do whatever we can to limit the damage. Complex mathematical functions could be replaced with a texture that is fed into the Shader and provides a pre-generated table for runtime lookup. We may not see any improvement with functions such as sin and cos, since they've been heavily optimized to make use of GPU architecture, but complex methods such as pow, exp, log, and other custom mathematical processes can only be optimized so much, and would be good candidates for simplification. This is assuming we only need one or two input values, which are represented through the X and Y coordinates of the texture, and mathematical accuracy isn't of paramount importance. This will cost us additional graphics memory to store the texture at runtime (more on this later), but if the Shader is already receiving a texture (which they are in most cases) and the alpha channel is not being used, then we could sneak the data in through the texture's alpha channel, costing us literally no performance, and the rest of the Shader code and graphics system would be none-the-wiser. This will involve the customization of art assets to include such data in any unused color channel(s), requiring coordination between programmers and artists, but is a very good way of saving Shader processing costs with no runtime sacrifices. In fact, Material properties and textures are both excellent entry points for pushing work from the Shader (the GPU) onto the CPU. If a complex calculation does not need to vary on a per pixel basis, then we could expose the value as a property in the Material, and modify it as needed (accepting the overhead cost of doing so from the previous section Only expose necessary variables). Alternatively, if the result varies per pixel, and does not need to change often, then we could generate a texture file from script code, containing the results of the calculations in the RGBA values, and pulling the texture into the Shader. Lots of opportunities arise when we ignore the conventional application of such systems, and remember to think of them as just raw data being transferred around. Reduce texture lookups While we're on the subject of texture lookups, they are not trivial tasks for the GPU to process and they have their own overhead costs. They are the most common cause of memory access problems within the GPU, especially if a Shader is performing samples across multiple textures, or even multiple samples across a single texture, as they will likely inflict cache misses in memory. Such situations should be simplified as much as possible to avoid severe GPU memory bottlenecking. Even worse, sampling a texture in a random order would likely result in some very costly cache misses for the GPU to suffer through, so if this is being done, then the texture should be reordered so that it can be sampled in a more sequential order. Avoid conditional statements In modern day CPU architecture, conditional statements undergo a lot of clever predictive techniques to make use of instruction-level parallelism. This is a feature where the CPU attempts to predict which direction a conditional statement will go in before it has actually been resolved, and speculatively begins processing the most likely result of the conditional using any free components that aren't being used to resolve the conditional (fetching some data from memory, copying some floats into unused registers, and so on). If it turns out that the decision is wrong, then the current result is discarded and the proper path is taken instead. So long as the cost of speculative processing and discarding false results is less than the time spent waiting to decide the correct path, and it is right more often than it is wrong, then this is a net gain for the CPU's speed. However, this feature is not possible on GPU architecture because of its parallel nature. The GPU's cores are typically managed by some higher-level construct that instructs all cores under its command to perform the same machine-code-level instruction simultaneously. So, if the Fragment Shader requires a float to be multiplied by 2, then the process will begin by having all cores copy data into the appropriate registers in one coordinated step. Only when all cores have finished copying to the registers will the cores be instructed to begin the second step: multiplying all registers by 2. Thus, when this system stumbles into a conditional statement, it cannot resolve the two statements independently. It must determine how many of its child cores will go down each path of the conditional, grab the list of required machine code instructions for one path, resolve them for all cores taking that path, and repeat for each path until all possible paths have been processed. So, for an if-else statement (two possibilities), it will tell one group of cores to process the "true" path, then ask the remaining cores to process the "false" path. Unless every core takes the same path, it must process both paths every time. So, we should avoid branching and conditional statements in our Shader code. Of course, this depends on how essential the conditional is to achieving the graphical effect we desire. But, if the conditional is not dependent on per pixel behavior, then we would often be better off absorbing the cost of unnecessary mathematics than inflicting a branching cost on the GPU. For example, we might be checking whether a value is non-zero before using it in a calculation, or comparing against some global flag in the Material before taking one action or another. Both of these cases would be good candidates for optimization by removing the conditional check. Reduce data dependencies The compiler will try its best to optimize our Shader code into the more GPU-friendly low-level language so that it is not waiting on data to be fetched when it could be processing some other task. For example, the following poorly-optimized code, could be written in our Shader: float sum = input.color1.r; sum = sum + input.color2.g; sum = sum + input.color3.b; sum = sum + input.color4.a; float result = calculateSomething(sum); If we were able to force the Shader compiler to compile this code into machine code instructions as it is written, then this code has a data dependency such that each calculation cannot begin until the last finishes due to the dependency on the sum variable. But, such situations are often detected by the Shader compiler and optimized into a version that uses instruction-level parallelism (the code shown next is the high-level code equivalent of the resulting machine code): float sum1, sum2, sum3, sum4; sum1 = input.color1.r; sum2 = input.color2.g; sum3 = input.color3.b sum4 = input.color4.a; float sum = sum1 + sum2 + sum3 + sum4; float result = CalculateSomething(sum); In this case, the compiler would recognize that it can fetch the four values from memory in parallel and complete the summation once all four have been fetched independently via thread-level parallelism. This can save a lot of time, relative to performing the four fetches one after another. However, long chains of data dependency can absolutely murder Shader performance. If we create a strong data dependency in our Shader's source code, then it has been given no freedom to make such optimizations. For example, the following data dependency would be painful on performance, as one step cannot be completed without waiting on another to fetch data and performing the appropriate calculation. float4 val1 = tex2D(_tex1, input.texcoord.xy); float4 val2 = tex2D(_tex2, val1.yz); float4 val3 = tex2D(_tex3, val2.zw); Strong data dependencies such as these should be avoided whenever possible. Surface Shaders If we're using Unity's Surface Shaders, which are a way for Unity developers to get to grips with Shader programming in a more simplified fashion, then the Unity Engine takes care of converting our Surface Shader code for us, abstracting away some of the optimization opportunities we have just covered. However, it does provide some miscellaneous values that can be used as replacements, which reduce accuracy but simplify the mathematics in the resulting code. Surface Shaders are designed to handle the general case fairly efficiently, but optimization is best achieved with a personal touch. The approxview attribute will approximate the view direction, saving costly operations. halfasview will reduce the precision of the view vector, but beware of its effect on mathematical operations involving multiple precision types. noforwardadd will limit the Shader to only considering a single directional light, reducing Draw Calls since the Shader will render in only a single pass, but reducing lighting complexity. Finally, noambient will disable ambient lighting in the Shader, removing some extra mathematical operations that we may not need. Use Shader-based LOD We can force Unity to render distant objects using simpler Shaders, which can be an effective way of saving fill rate, particularly if we're deploying our game onto multiple platforms or supporting a wide range of hardware capability. The LOD keyword can be used in the Shader to set the onscreen size factor that the Shader supports. If the current LOD level does not match this value, it will drop to the next fallback Shader and so on until it finds the Shader that supports the given size factor. We can also change a given Shader object's LOD value at runtime using the maximumLOD property. This feature is similar to the mesh-based LOD covered earlier, and uses the same LOD values for determining object form factor, so it should be configured as such. Memory bandwidth Another major component of back end processing and a potential source of bottlenecks is memory bandwidth. Memory bandwidth is consumed whenever a texture must be pulled from a section of the GPU's main video memory (also known as VRAM). The GPU contains multiple cores that each have access to the same area of VRAM, but they also each contain a much smaller, local Texture Cache that stores the current texture(s) the GPU has been most recently working with. This is similar in design to the multitude of CPU cache levels that allow memory transfer up and down the chain, as a workaround for the fact that faster memory will, invariably, be more expensive to produce, and hence smaller in capacity compared to slower memory. Whenever a Fragment Shader requests a sample from a texture that is already within the core's local Texture Cache, then it is lightning fast and barely perceivable. But, if a texture sample request is made, that does not yet exist within the Texture Cache, then it must be pulled in from VRAM before it can be sampled. This fetch request risks cache misses within VRAM as it tries to find the relevant texture. The transfer itself consumes a certain amount of memory bandwidth, specifically an amount equal to the total size of the texture file stored within VRAM (which may not be the exact size of the original file, nor the size in RAM, due to GPU-level compression). It's for this reason that, if we're bottlenecked on memory bandwidth, then performing a brute force test by reducing texture quality would suddenly result in a performance improvement. We've shrunk the size of our textures, easing the burden on the GPU's memory bandwidth, allowing it to fetch the necessary textures much quicker. Globally reducing texture quality can be achieved by going to Edit | Project Settings | Quality | Texture Quality and setting the value to Half Res, Quarter Res, or Eighth Res. In the event that memory bandwidth is bottlenecked, then the GPU will keep fetching the necessary texture files, but the entire process will be throttled as the Texture Cache waits for the data to appear before processing the fragment. The GPU won't be able to push data back to the Frame Buffer in time to be rendered onto the screen, blocking the whole process and culminating in a poor frame rate. Ultimately, proper usage of memory bandwidth is a budgeting concern. For example, with a memory bandwidth of 96 GB/sec per core and a target frame rate of 60 frames per second, then the GPU can afford to pull 96/60 = 1.6 GB worth of texture data every frame before being bottlenecked on memory bandwidth. Memory bandwidth is often listed on a per core basis, but some GPU manufacturers may try to mislead you by multiplying memory bandwidth by the number of cores in order to list a bigger, but less practical number. Because of this, research may be necessary to confirm the memory bandwidth limit we have for the target GPU hardware is given on a per core basis. Note that this value is not the maximum limit on the texture data that our game can contain in the project, nor in CPU RAM, not even in VRAM. It is a metric that limits how much texture swapping can occur during one frame. The same texture could be pulled back and forth multiple times in a single frame depending on how many Shaders need to use them, the order that the objects are rendered, and how often texture sampling must occur, so rendering just a few objects could consume whole gigabytes of memory bandwidth if they all require the same high quality, massive textures, require multiple secondary texture maps (normal maps, emission maps, and so on), and are not batched together, because there simply isn't enough Texture Cache space available to keep a single texture file long enough to exploit it during the next rendering pass. There are several approaches we can take to solve bottlenecks in memory bandwidth. Use less texture data This approach is simple, straightforward, and always a good idea to consider. Reducing texture quality, either through resolution or bit rate, is not ideal for graphical quality, but we can sometimes get away with using 16-bit textures without any noticeable degradation. Mip Maps are another excellent way of reducing the amount of texture data being pushed back and forth between VRAM and the Texture Cache. Note that the Scene View has a Mipmaps Shading Mode, which will highlight textures in our scene blue or red depending on whether the current texture scale is appropriate for the current Scene View's camera position and orientation. This will help identify what textures are good candidates for further optimization. Mip Maps should almost always be used in 3D Scenes, unless the camera moves very little. Test different GPU Texture Compression formats The Texture Compression techniques helpe reduce our application's footprint (executable file size), and runtime CPU memory usage, that is, the storage area where all texture resource data is kept until it is needed by the GPU. However, once the data reaches the GPU, it uses a different form of compression to keep texture data small. The common formats are DXT, PVRTC, ETC, and ASTC. To make matters more confusing, each platform and GPU hardware supports different compression formats, and if the device does not support the given compression format, then it will be handled at the software level. In other words, the CPU will need to stop and recompress the texture to the desired format the GPU wants, as opposed to the GPU taking care of it with a specialized hardware chip. The compression options are only available if a texture resource has its Texture Type field set to Advanced. Using any of the other texture type settings will simplify the choices, and Unity will make a best guess when deciding which format to use for the target platform, which may not be ideal for a given piece of hardware and thus will consume more memory bandwidth than necessary. The best approach to determining the correct format is to simply test a bunch of different devices and Texture Compression techniques and find one that fits. For example, common wisdom says that ETC is the best choice for Android since more devices support it, but some developers have found their game works better with the DXT and PVRTC formats on certain devices. Beware that, if we're at the point where individually tweaking Texture Compression techniques is necessary, then hopefully we have exhausted all other options for reducing memory bandwidth. By going down this road, we could be committing to supporting many different devices each in their own specific way. Many of us would prefer to keep things simple with a general solution instead of personal customization and time-consuming handiwork to work around problems like this. Minimize texture sampling Can we modify our Shaders to remove some texture sampling overhead? Did we add some extra texture lookup files to give ourselves some fill rate savings on mathematical functions? If so, we might want to consider lowering the resolution of such textures or reverting the changes and solving our fill rate problems in other ways. Essentially, the less texture sampling we do, the less often we need to use memory bandwidth and the closer we get to resolving the bottleneck. Organize assets to reduce texture swaps This approach basically comes back to Batching and Atlasing again. Are there opportunities to batch some of our biggest texture files together? If so, then we could save the GPU from having to pull in the same texture files over and over again during the same frame. As a last resort, we could look for ways to remove some textures from the entire project and reuse similar files. For instance, if we have fill rate budget to spare, then we may be able to use some Fragment Shaders to make a handful of textures files appear in our game with different color variations. VRAM limits One last consideration related to textures is how much VRAM we have available. Most texture transfer from CPU to GPU occurs during initialization, but can also occur when a non-existent texture is first required by the current view. This process is asynchronous and will result in a blank texture being used until the full texture is ready for rendering. As such, we should avoid too much texture variation across our Scenes. Texture preloading Even though it doesn't strictly relate to graphics performance, it is worth mentioning that the blank texture that is used during asynchronous texture loading can be jarring when it comes to game quality. We would like a way to control and force the texture to be loaded from disk to the main memory and then to VRAM before it is actually needed. A common workaround is to create a hidden GameObject that features the texture and place it somewhere in the Scene on the route that the player will take towards the area where it is actually needed. As soon as the textured object becomes a candidate for the rendering system (even if it's technically hidden), it will begin the process of copying the data towards VRAM. This is a little clunky, but is easy to implement and works sufficiently well in most cases. We can also control such behavior via Script code by changing a hidden Material's texture: GetComponent<Renderer>().material.texture = textureToPreload; Texture thrashing In the rare event that too much texture data is loaded into VRAM, and the required texture is not present, the GPU will need to request it from the main memory and overwrite the existing texture data to make room. This is likely to worsen over time as the memory becomes fragmented, and it introduces a risk that the texture just flushed from VRAM needs to be pulled again within the same frame. This will result in a serious case of memory "thrashing", and should be avoided at all costs. This is less of a concern on modern consoles such as the PS4, Xbox One, and WiiU, since they share a common memory space for both CPU and GPU. This design is a hardware-level optimization given the fact that the device is always running a single application, and almost always rendering 3D graphics. But, all other platforms must share time and space with multiple applications and be capable of running without a GPU. They therefore feature separate CPU and GPU memory, and we must ensure that the total texture usage at any given moment remains below the available VRAM of the target hardware. Note that this "thrashing" is not precisely the same as hard disk thrashing, where memory is copied back and forth between main memory and virtual memory (the swap file), but it is analogous. In either case, data is being unnecessarily copied back and forth between two regions of memory because too much data is being requested in too short a time period for the smaller of the two memory regions to hold it all. Thrashing such as this can be a common cause of dreadful graphics performance when games are ported from modern consoles to the desktop and should be treated with care. Avoiding this behavior may require customizing texture quality and file sizes on a per-platform and per-device basis. Be warned that some players are likely to notice these inconsistencies if we're dealing with hardware from the same console or desktop GPU generation. As many of us will know, even small differences in hardware can lead to a lot of apples-versus-oranges comparisons, but hardcore gamers will expect a similar level of quality across the board. Lighting and Shadowing Lighting and Shadowing can affect all parts of the graphics pipeline, and so they will be treated separately. This is perhaps one of the most important parts of game art and design to get right. Good Lighting and Shadowing can turn a mundane scene into something spectacular as there is something magical about professional coloring that makes it visually appealing. Even the low-poly art style (think Monument Valley) relies heavily on a good lighting and shadowing profile in order to allow the player to distinguish one object from another. But, this isn't an art book, so we will focus on the performance characteristics of various Lighting and Shadowing features. Unity offers two styles of dynamic light rendering, as well as baked lighting effects through lightmaps. It also provides multiple ways of generating shadows with varying levels of complexity and runtime processing cost. Between the two, there are a lot of options to explore, and a lot of things that can trip us up if we're not careful. The Unity documentation covers all of these features in an excellent amount of detail (start with this page and work through them: http://docs.unity3d.com/Manual/Lighting.html), so we'll examine these features from a performance standpoint. Let's tackle the two main light rendering modes first. This setting can be found under Edit | Project Settings | Player | Other Settings | Rendering, and can be configured on a per-platform basis. Forward Rendering Forward Rendering is the classical form of rendering lights in our scene. Each object is likely to be rendered in multiple passes through the same Shader. How many passes are required will be based on the number, distance, and brightness of light sources. Unity will try to prioritize which directional light is affecting the object the most and render the object in a "base pass" as a starting point. It will then take up to four of the most powerful point lights nearby and re-render the same object multiple times through the same Fragment Shader. The next four point lights will then be processed on a per-vertex basis. All remaining lights are treated as a giant blob by means of a technique called spherical harmonics. Some of this behavior can be simplified by setting a light's Render Mode to values such as Not Important, and changing the value of Edit | Project Settings | Quality | Pixel Light Count. This value limits how many lights will be treated on a per pixel basis, but is overridden by any lights with a Render Mode set to Important. It is therefore up to us to use this combination of settings responsibly. As you can imagine, the design of Forward Rendering can utterly explode our Draw Call count very quickly in scenes with a lot of point lights present, due to the number of render states being configured and Shader passes being reprocessed. CPU-bound applications should avoid this rendering mode if possible. More information on Forward Rendering can be found in the Unity documentation: http://docs.unity3d.com/Manual/RenderTech-ForwardRendering.html. Deferred Shading Deferred Shading or Deferred Rendering as it is sometimes known, is only available on GPUs running at least Shader Model 3.0. In other words, any desktop graphics card made after around 2004. The technique has been around for a while, but it has not resulted in a complete replacement of the Forward Rendering method due to the caveats involved and limited support on mobile devices. Anti-aliasing, transparency, and animated characters receiving shadows are all features that cannot be managed through Deferred Shading alone and we must use the Forward Rendering technique as a fallback. Deferred Shading is so named because actual shading does not occur until much later in the process; that is, it is deferred until later. From a performance perspective, the results are quite impressive as it can generate very good per pixel lighting with surprisingly little Draw Call effort. The advantage is that a huge amount of lighting can be accomplished using only a single pass through the lighting Shader. The main disadvantages include the additional costs if we wish to pile on advanced lighting features such as Shadowing and any steps that must pass through Forward Rendering in order to complete, such as transparency. The Unity documentation contains an excellent source of information on the Deferred Shading technique, its advantages, and its pitfalls: http://docs.unity3d.com/Manual/RenderTech-DeferredShading.html Vertex Lit Shading (legacy) Technically, there are more than two lighting methods. Unity allows us to use a couple of legacy lighting systems, only one of which may see actual use in the field: Vertex Lit Shading. This is a massive simplification of lighting, as lighting is only considered per vertex, and not per pixel. In other words, entire faces are colored based on the incoming light color, and not individual pixels. It is not expected that many, or really any, 3D games will make use of this legacy technique, as a lack of shadows and proper lighting make visualizations of depth very difficult. It is mostly relegated to 2D games that don't intend to make use of shadows, normal maps, and various other lighting features, but it is there if we need it. Real-time Shadows Soft Shadows are expensive, Hard Shadows are cheap, and No Shadows are free. Shadow Resolution, Shadow Projection, Shadow Distance, and Shadow Cascades are all settings we can find under Edit | Project Settings | Quality | Shadows that we can use to modify the behavior and complexity of our shadowing passes. That summarizes almost everything we need to know about Unity's real-time shadowing techniques from a high-level performance standpoint. We will cover shadows more in the following section on optimizing our lighting effects. Lighting optimization With a cursory glance at all of the relevant lighting techniques, let's run through some techniques we can use to improve lighting costs. Use the appropriate Shading Mode It is worth testing both of the main rendering modes to see which one best suits our game. Deferred Shading is often used as a backup in the event that Forward Rendering is becoming a burden on performance, but it really depends on where else we're finding bottlenecks as it is sometimes difficult to tell the difference between them. Use Culling Masks A Light Component's Culling Mask property is a layer-based mask that can be used to limit which objects will be affected by the given Light. This is an effective way of reducing lighting overhead, assuming that the layer interactions also make sense with how we are using layers for physics optimization. Objects can only be a part of a single layer, and reducing physics overhead probably trumps lighting overhead in most cases; thus, if there is a conflict, then this may not be the ideal approach. Note that there is limited support for Culling Masks when using Deferred Shading. Because of the way it treats lighting in a very global fashion, only four layers can be disabled from the mask, limiting our ability to optimize its behavior through this method. Use Baked Lightmaps Baking Lighting and Shadowing into a Scene is significantly less processor-intensive than generating them at runtime. The downside is the added application footprint, memory consumption, and potential for memory bandwidth abuse. Ultimately, unless a game's lighting effects are being handled exclusively through Legacy Vertex Lighting or a single Directional Light, then it should probably include Lightmapping to make some huge budget savings on lighting calculations. Relying entirely on real-time lighting and shadows is a recipe for disaster unless the game is trying to win an award for the smallest application file size of all time. Optimize Shadows Shadowing passes mostly consume our Draw Calls and fill rate, but the amount of vertex position data we feed into the process and our selection for the Shadow Projection setting will affect the front end's ability to generate the required shadow casters and shadow receivers. We should already be attempting to reduce vertex counts to solve front end bottlenecking in the first place, and making this change will be an added multiplier towards that effort. Draw Calls are consumed during shadowing by rendering visible objects into a separate buffer (known as the shadow map) as either a shadow caster, a shadow receiver, or both. Each object that is rendered into this map will consume another Draw Call, which makes shadows a huge performance cost multiplier, so it is often a setting that games will expose to users via quality settings, allowing users with weaker hardware to reduce the effect or even disable it entirely. Shadow Distance is a global multiplier for runtime shadow rendering. The fewer shadows we need to draw, the happier the entire rendering process will be. There is little point in rendering shadows at a great distance from the camera, so this setting should be configured specific to our game and how much shadowing we expect to witness during gameplay. It is also a common setting that is exposed to the user to reduce the burden of rendering shadows. Higher values of Shadow Resolution and Shadow Cascades will increase our memory bandwidth and fill rate consumption. Both of these settings can help curb the effects of artefacts in shadow rendering, but at the cost of a much larger shadow map size that must be moved around and of the canvas size to draw to. The Unity documentation contains an excellent summary on the topic of the aliasing effect of shadow maps and how the Shadow Cascades feature helps to solve the problem: http://docs.unity3d.com/Manual/DirLightShadows.html. It's worth noting that Soft Shadows do not consume any more memory or CPU overhead relative to Hard Shadows, as the only difference is a more complex Shader. This means that applications with enough fill rate to spare can enjoy the improved graphical fidelity of Soft Shadows. Optimizing graphics for mobile Unity's ability to deploy to mobile devices has contributed greatly to its popularity among hobbyist, small, and mid-size development teams. As such, it would be prudent to cover some approaches that are more beneficial for mobile platforms than for desktop and other devices. Note that any, and all, of the following approaches may become obsolete soon, if they aren't already. The mobile device market is moving blazingly fast, and the following techniques as they apply to mobile devices merely reflect conventional wisdom from the last half decade. We should occasionally test the assumptions behind these approaches from time-to-time to see whether the limitations of mobile devices still fit the mobile marketplace. Minimize Draw Calls Mobile applications are more often bottlenecked on Draw Calls than on fill rate. Not that fill rate concerns should be ignored (nothing should, ever!), but this makes it almost necessary for any mobile application of reasonable quality to implement Mesh Combining, Batching, and Atlasing techniques from the very beginning. Deferred Rendering is also the preferred technique as it fits well with other mobile-specific concerns, such as avoiding transparency and having too many animated characters. Minimize the Material count This concern goes hand in hand with the concepts of Batching and Atlasing. The fewer Materials we use, the fewer Draw Calls will be necessary. This strategy will also help with concerns relating to VRAM and memory bandwidth, which tend to be very limited on mobile devices. Minimize texture size and Material count Most mobile devices feature a very small Texture Cache relative to desktop GPUs. For instance, the iPhone 3G can only support a total texture size of 1024x1024 due to running OpenGLES1.1 with simple vertex rendering techniques. Meanwhile the iPhone 3GS, iPhone 4, and iPad generation run OpenGLES 2.0, which only supports textures up to 2048x2048. Later generations can support textures up to 4096x4096. Double check the device hardware we are targeting to be sure it supports the texture file sizes we wish to use (there are too many Android devices to list here).
However, later-generation devices are never the most common devices in the mobile marketplace. If we wish our game to reach a wide audience (increasing its chances of success), then we must be willing to support weaker hardware. Note that textures that are too large for the GPU will be downscaled by the CPU during initialization, wasting valuable loading time, and leaving us with unintended graphical fidelity. This makes texture reuse of paramount importance for mobile devices due to the limited VRAM and Texture Cache sizes available. Make textures square and power-of-2 The GPU will find it difficult, or simply be unable to compress the texture if it is not in a square format, so make sure you stick to the common development convention and keep things square and sized to a power of 2. Use the lowest possible precision formats in Shaders Mobile GPUs are particularly sensitive to precision formats in its Shaders, so the smallest formats should be used. On a related note, format conversion should be avoided for the same reason. Avoid Alpha Testing Mobile GPUs haven't quite reached the same levels of chip optimization as desktop GPUs, and Alpha Testing remains a particularly costly task on mobile devices. In most cases it should simply be avoided in favor of Alpha Blending. Summary If you've made it this far without skipping ahead, then congratulations are in order. That was a lot of information to absorb for just one component of the Unity Engine, but then it is clearly the most complicated of them all, requiring a matching depth of explanation. Hopefully, you've learned a lot of approaches to help you improve your rendering performance and enough about the rendering pipeline to know how to use them responsibly! To learn more about Unity 5, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Unity 5 Game Optimization (https://www.packtpub.com/game-development/unity-5-game-optimization) Unity 5.x By Example (https://www.packtpub.com/game-development/unity-5x-example) Unity 5.x Cookbook (https://www.packtpub.com/game-development/unity-5x-cookbook) Unity 5 for Android Essentials (https://www.packtpub.com/game-development/unity-5-android-essentials) Resources for Article: Further resources on this subject: The Vertex Functions [article] UI elements and their implementation [article] Routing for Yii Demystified [article]
Read more
  • 0
  • 0
  • 40570

article-image-what-is-unitys-new-data-oriented-technology-stack-dots
Guest Contributor
04 Dec 2019
7 min read
Save for later

What is Unity’s new Data-Oriented Technology Stack (DOTS)

Guest Contributor
04 Dec 2019
7 min read
If we look at the evolution of computing and gaming over the last decade, we can see how different things are with respect to ten years ago. However, one of the most significant change was moving from a world where 90% of the code ran on a single thread on a single core, to a world where we all carry in our pockets hundreds of GPU cores, and we must design efficient code that can run in parallel. If we look at this change, we can imagine why Unity feels the urge to adapt to this new paradigm. Unity’s original design born in a different era, and now it is time for it to adjust to the future. The Data-Oriented Technology Stack (DOTS) is the collective name for Unity's attempt at reshaping its internal architecture in a way that is faster, lighter, and, more important, optimized for the current massive multi-threading world. In this article, we will take a look at the main three components of DOTS and how it can help you develop next-generation games. Want to learn more optimization techniques in Unity? Unity engine comes with a great set of features to help you build high-performance games. If you want to know the techniques for writing better game scripts and learn how to optimize a game using Unity technologies such as ECS and the Burst compiler, read the book Unity Game Optimization - Third Edition written by Chris Dickinson and Dr. Davide Aversa. This book will help you get up to speed with a series of performance-enhancing coding techniques and methods that will help you improve the performance of your Unity applications. The Data-Oriented Technology Stack Three components compose the Data-Oriented Technology Stack: The Entity Component System (ECS) The C# Job System The Burst compiler Let's see each one of them. The Entity Component System (ECS) If you know Unity, you know that two basic structures represent every part of a game: the GameObject and the MonoBehavior. Every GameObject contains one or more MonoBehavior, which in turn describes the data (what the object knows) and the behavior (what the object does) of each element in a scene. GameObject and MonoBehavior worked well during Unity’s initial years; however, with the rise of multithreaded programming, many issues with the GameObject architecture started to become more evident. First of all, a GameObject is a fat, heavy, data structure. In theory, it should only be a container of MonoBehavior instances. In practice, instead, it has a significant number of problems. To name a few:  Every GameObject has a name and an ID.  Every GameObject has a C# wrapping object pointing to the native C++ code Creating and deleting a GameObject requires to lock and edit a global list (that is, these operations cannot run in parallel). Moreover, both GameObject and MonoBehavior are dynamic objects, and they are stored everywhere in memory. It would be much better if we could keep all the MonoBehavior of a GameObject close to each other so that finding and running them would be more efficient. To solve all these issues, Unity introduced the Entity Component System (ECS), a new paradigm alternative to the traditional GameObject/MonoBehavior one. As the name suggests, there are three elements in ECS: Components: They are conceptually similar to a MonoBehavior, but they contain only data. For instance, a Position component will contain only a 3D vector representing the entity position in space; a LinearVelocity component would contain only the velocity of the object, and so on. They are just plain data. Entities: They are just a “collection” of components. For example, if I have a particle in space, I can represent it just with the list of components, e.g., Position and LinearVelocity components. System: A system is where the behavior is. Each system takes a list of components and executes a function over all the entities composed by the components of the archetype. [box type="shadow" align="" class="" width=""]To be technically correct, an entity is not a collection data structure. Instead, it is a pointer to a location in memory where the entity’s components are stored. The actual storage, though, is handled by Unity.[/box] With this system, we can store components into contiguous arrays, and an entity is just a pointer to the archetype instance. A single function for each system can define the behavior of thousands of similar entities. This is more efficient than running an Update on every MonoBehavior in every GameObject. For this reason, with ECS, we can use entities without any slowdown or system overhead where it was impossible with GameObject instances. For instance, having an entity for each particle of a particle system. For more technical info on ECS there is a very detailed blog post on Unity’s official website. The C# Job System If ECS is how we describe the scene, we need a way to run the systems efficiently. As we said in the introduction, the modern approach to efficiency is to exploit every core in our system, and this means to run code in parallel using massive multithreaded systems. Sadly, multi-threading is hard. Extremely hard. As any experienced developer can tell you, moving from single-thread to multi-thread programming introduce a large class of new issues and bugs such as race conditions. Moreover, for true multi-threading, we should go as much close as possible to the metal, avoiding all the dynamic allocations and deallocations of C# and the Garbage Collector and code part of our game in C++. Luckily for us, Unity introduced a component in Data-Oriented Technology Stack with the specific purpose of simplifying multithreaded programming in Unity using only C#: the Job System. You can imagine a Job as a piece of code that you want to run in parallel over as much cores as possible. The Unity C# Job System helps you design this code in a way to avoid all common multi-threading pitfalls using only C#. You can finally unleash all the power of your machine without writing a single line of C++ code. The Burst Compiler What if I tell you that it is possible to obtain higher performances by writing C# code instead of C++? You would think I am crazy. However, I am not, and this the goal of the last component of Data-Oriented Technology Stack (DOTS): the Burst compiler. The Burst compiler is a specialized code-generator that compiles a subset of C# (often called High-Performance C# or HPC#) into machine code that is, most of the time, smaller and faster than the one that is generated by an equivalent C++ code. The Burst compiler is still in preview, but you can already try it by using the Unity's Package Manager. Of course, you get the most from it when combined with the other two DOTS components. For more technical info on the Burst compiler, you can refer to Unity’s blog post. Learn More About Unity Optimization In this article, we only scratched the surface of Data-Oriented Technology Stack (DOTS). If you want to learn more on how to use the DOTS technologies and other optimization techniques for Unity you can read more in my  book Unity Game Optimization - Third Edition. This Unity book is your guide to optimizing various aspects of your game development, from game characters and scripts, right through to animations. You will also explore techniques for solving performance issues with your VR projects and learn best practices for project organization to save time through an improved workflow. Author Bio Dr. Davide Aversa holds a PhD in artificial intelligence and an MSc in artificial intelligence and robotics from the University of Rome La Sapienza in Italy. He has a strong interest in artificial intelligence for the development of interactive virtual agents and procedural content generation. He served as a Program Committee member of video game-related conferences such as the IEEE conference on computational intelligence and games, and he also regularly participates in game-jam contests. He also writes a blog on game design and game development. You can find him on Twitter, Github, Linkedin. Unity 2019.2 releases with updated ProBuilder, Shader Graph, 2D Animation, Burst Compiler and more Japanese Anime studio Khara is switching its primary 3D CG tools to Blender Following Epic Games, Ubisoft joins Blender Development fund; adopts Blender as its main DCC tool
Read more
  • 0
  • 0
  • 39768

article-image-microsoft-build-2019-microsoft-showcases-new-updates-to-ms-365-platfrom-with-focus-on-ai-and-developer-productivity
Sugandha Lahoti
07 May 2019
10 min read
Save for later

Microsoft Build 2019: Microsoft showcases new updates to MS 365 platform with focus on AI and developer productivity

Sugandha Lahoti
07 May 2019
10 min read
At the ongoing Microsoft Build 2019 conference, Microsoft has announced a ton of new features and tool releases with a focus on innovation using AI and mixed reality with the intelligent cloud and the intelligent edge. In his opening keynote, Microsoft CEO Satya Nadella outlined the company’s vision and developer opportunity across Microsoft Azure, Microsoft Dynamics 365 and IoT Platform, Microsoft 365, and Microsoft Gaming. “As computing becomes embedded in every aspect of our lives, the choices developers make will define the world we live in,” said Satya Nadella, CEO, Microsoft. “Microsoft is committed to providing developers with trusted tools and platforms spanning every layer of the modern technology stack to build magical experiences that create new opportunity for everyone.” https://youtu.be/rIJRFHDr1QE Increasing developer productivity in Microsoft 365 platform Microsoft Graph data connect Microsoft Graphs are now powered with data connectivity, a service that combines analytics data from the Microsoft Graph with customers’ business data. Microsoft Graph data connect will provide Office 365 data and Microsoft Azure resources to users via a toolset. The migration pipelines are deployed and managed through Azure Data Factory. Microsoft Graph data connect can be used to create new apps shared within enterprises or externally in the Microsoft Azure Marketplace. It is generally available as a feature in Workplace Analytics and also as a standalone SKU for ISVs. More information here. Microsoft Search Microsoft Search works as a unified search experience across all Microsoft apps-  Office, Outlook, SharePoint, OneDrive, Bing and Windows. It applies AI technology from Bing and deep personalized insights surfaced by the Microsoft Graph to personalized searches. Other features included in Microsoft Search are: Search box displacement Zero query typing and key-phrase suggestion feature Query history feature, and personal search query history Administrator access to the history of popular searches for their organizations, but not to search history for individual users Files/people/site/bookmark suggestions Microsoft Search will begin publicly rolling out to all Microsoft 365 and Office 365 commercial subscriptions worldwide at the end of May. Read more on MS Search here. Fluid Framework As the name suggests Microsoft's newly launched Fluid framework allows seamless editing and collaboration between different applications. Essentially, it is a web-based platform and componentized document model that allows users to, for example, edit a document in an application like Word and then share a table from that document in Microsoft Teams (or even a third-party application) with real-time syncing. Microsoft says Fluid can translate text, fetch content, suggest edits, perform compliance checks, and more. The company will launch the software developer kit and the first experiences powered by the Fluid Framework later this year on Microsoft Word, Teams, and Outlook. Read more about Fluid framework here. Microsoft Edge new features Microsoft Build 2019 paved way for a bundle of new features to Microsoft’s flagship web browser, Microsoft Edge. New features include: Internet Explorer mode: This mode integrates Internet Explorer directly into the new Microsoft Edge via a new tab. This allows businesses to run legacy Internet Explorer-based apps in a modern browser. Privacy Tools: Additional privacy controls which allow customers to choose from 3 levels of privacy in Microsoft Edge—Unrestricted, Balanced, and Strict. These options limit third parties to track users across the web.  “Unrestricted” allows all third-party trackers to work on the browser. “Balanced” prevents third-party trackers from sites the user has not visited before. And “Strict” blocks all third-party trackers. Collections: Collections allows users to collect, organize, share and export content more efficiently and with Office integration. Microsoft is also migrating Edge as a whole over to Chromium. This will make Edge easier to develop for by third parties. For more details, visit Microsoft’s developer blog. New toolkit enhancements in Microsoft 365 Platform Windows Terminal Windows Terminal is Microsoft’s new application for Windows command-line users. Top features include: User interface with emoji-rich fonts and graphics-processing-unit-accelerated text rendering Multiple tab support and theming and customization features Powerful command-line user experience for users of PowerShell, Cmd, Windows Subsystem for Linux (WSL) and all forms of command-line application Windows Terminal will arrive in mid-June and will be delivered via the Microsoft Store in Windows 10. Read more here. React Native for Windows Microsoft announced a new open-source project for React Native developers at Microsoft Build 2019. Developers who prefer to use the React/web ecosystem to write user-experience components can now leverage those skills and components on Windows by using “React Native for Windows” implementation. React for Windows is under the MIT License and will allow developers to target any Windows 10 device, including PCs, tablets, Xbox, mixed reality devices and more. The project is being developed on GitHub and is available for developers to test. More mature releases will follow soon. Windows Subsystem for Linux 2 Microsoft rolled out a new architecture for Windows Subsystem for Linux: WSL 2 at the MSBuild 2019. Microsoft will also be shipping a fully open-source Linux kernel with Windows specially tuned for WSL 2. New features include massive file system performance increases (twice as much speed for file-system heavy operations, such as Node Package Manager install). WSL also supports running Linux Docker containers. The next generation of WSL arrives for Insiders in mid-June. More information here. New releases in multiple Developer Tools .NET 5 arrives in 2020 .NET 5 is the next major version of the .NET Platform which will be available in 2020. .NET 5 will have all .NET Core features as well as more additions: One Base Class Library containing APIs for building any type of application More choice on runtime experiences Java interoperability will be available on all platforms. Objective-C and Swift interoperability will be supported on multiple operating systems .NET 5 will provide both Just-in-Time (JIT) and Ahead-of-Time (AOT) compilation models to support multiple compute and device scenarios. .NET 5 also will offer one unified toolchain supported by new SDK project types as well as a flexible deployment model (side-by-side and self-contained EXEs) Detailed information here. ML.NET 1.0 ML.NET is Microsoft’s open-source and cross-platform framework that runs on Windows, Linux, and macOS and makes machine learning accessible for .NET developers. Its new version, ML.NET 1.0, was released at the Microsoft Build Conference 2019 yesterday. Some new features in this release are: Automated Machine Learning Preview: Transforms input data by selecting the best performing ML algorithm with the right settings. AutoML support in ML.NET is in preview and currently supports Regression and Classification ML tasks. ML.NET Model Builder Preview: Model Builder is a simple UI tool for developers which uses AutoML to build ML models. It also generates model training and model consumption code for the best performing model. ML.NET CLI Preview: ML.NET CLI is a dotnet tool which generates ML.NET Models using AutoML and ML.NET. The ML.NET CLI quickly iterates through a dataset for a specific ML Task and produces the best model. Visual Studio IntelliCode, Microsoft’s tool for AI-assisted coding Visual Studio IntelliCode, Microsoft’s AI-assisted coding is now generally available. It is essentially an enhanced IntelliSense, Microsoft’s extremely popular code completion tool. Intellicode is trained by using the code of thousands of open-source projects from GitHub that have at least 100 stars. It is available for C# and XAML for Visual Studio and Java, JavaScript, TypeScript, and Python for Visual Studio Code. IntelliCode also is included by default in Visual Studio 2019, starting in version 16.1 Preview 2. Additional capabilities, such as custom models, remain in public preview. Visual Studio 2019 version 16.1 Preview 2 Visual Studio 2019 version 16.1 Preview 2 release includes IntelliCode and the GitHub extensions by default. It also brings out of preview the Time Travel Debugging feature introduced with version 16.0. Also includes multiple performances and productivity improvements for .NET and C++ developers. Gaming and Mixed Reality Minecraft AR game for mobile devices At the end of Microsoft’s Build 2019 keynote yesterday, Microsoft teased a new Minecraft game in augmented reality, running on a phone. The teaser notes that more information will be coming on May 17th, the 10-year anniversary of Minecraft. https://www.youtube.com/watch?v=UiX0dVXiGa8 HoloLens 2 Development Edition and unreal engine support The HoloLens 2 Development Edition includes a HoloLens 2 device, $500 in Azure credits and three-months free trials of Unity Pro and Unity PiXYZ Plugin for CAD data, starting at $3,500 or as low as $99 per month. The HoloLens 2 Development Edition will be available for preorder soon and will ship later this year. Unreal Engine support for streaming and native platform integration will be available for HoloLens 2 by the end of May. Intelligent Edge and IoT Azure IoT Central new features Microsoft Build 2019 also featured new additions to Azure IoT Central, an IoT software-as-a-service solution. Better rules processing and customs rules with services like Azure Functions or Azure Stream Analytics Multiple dashboards and data visualization options for different types of users Inbound and outbound data connectors, so that operators can integrate with   systems Ability to add custom branding and operator resources to an IoT Central application with new white labeling options New Azure IoT Central features are available for customer trials. IoT Plug and Play IoT Plug and Play is a new, open modeling language to connect IoT devices to the cloud seamlessly without developers having to write a single line of embedded code. IoT Plug and Play also enable device manufacturers to build smarter IoT devices that just work with the cloud. Cloud developers will be able to find IoT Plug and Play enabled devices in Microsoft’s Azure IoT Device Catalog. The first device partners include Compal, Kyocera, and STMicroelectronics, among others. Azure Maps Mobility Service Azure Maps Mobility Service is a new API which provides real-time public transit information, including nearby stops, routes and trip intelligence. This API also will provide transit services to help with city planning, logistics, and transportation. Azure Maps Mobility Service will be in public preview in June. Read more about Azure Maps Mobility Service here. KEDA: Kubernetes-based event-driven autoscaling Microsoft and Red Hat collaborated to create KEDA, which is an open-sourced project that supports the deployment of serverless, event-driven containers on Kubernetes. It can be used in any Kubernetes environment — in any public/private cloud or on-premises such as Azure Kubernetes Service (AKS) and Red Hat OpenShift. KEDA has support for built-in triggers to respond to events happening in other services or components. This allows the container to consume events directly from the source, instead of routing through HTTP. KEDA also presents a new hosting option for Azure Functions that can be deployed as a container in Kubernetes clusters. Securing elections and political campaigns ElectionGuard SDK and Microsoft 365 for Campaigns ElectionGuard, is a free open-source software development kit (SDK) as an extension of Microsoft’s Defending Democracy Program to enable end-to-end verifiability and improved risk-limiting audit capabilities for elections in voting systems. Microsoft365 for Campaigns provides security capabilities of Microsoft 365 Business to political parties and individual candidates. More details here. Microsoft Build is in its 6th year and will continue till 8th May. The conference hosts over 6,000 attendees with early 500 student-age developers and over 2,600 customers and partners in attendance. Watch it live here! Microsoft introduces Remote Development extensions to make remote development easier on VS Code Docker announces a collaboration with Microsoft’s .NET at DockerCon 2019 How Visual Studio Code can help bridge the gap between full-stack development and DevOps [Sponsered by Microsoft]
Read more
  • 0
  • 0
  • 37033

article-image-game-objective
Packt
04 Jan 2017
5 min read
Save for later

Game objective

Packt
04 Jan 2017
5 min read
In this article by Alan Thorn, author of the book Mastering Unity 5.x, we will see what the game objective is and asset preparation. Every game (except for experimental and experiential games) need an objective for the player; something they must strive to do, not just within specific levels, but across the game overall. This objective is important not just for the player (to make the game fun), but also for the developer, for deciding how challenge, diversity and interest can be added to the mix. Before starting development, have a clearly stated and identified objective in mind. Challenges are introduced primarily as obstacles to the objective, and bonuses are 'things' that facilitate the objective; that make it possible and easier to achieve. For Dead Keys, the primary objective is to survive and reach the level end. Zombies threaten that objective by attacking and damaging the player, and bonuses exist along the way to make things more interesting. I highly recommend using project management and team collaboration tools to chart, document and time-track tasks within your project. And you can do this for free too. Some online tools for this include Trello (https://trello.com), Bitrix 24 (https://www.bitrix24.com), BaseCamp (https://basecamp.com), FreedCamp (https://freedcamp.com), UnFuddle (https://unfuddle.com), BitBucket (https://bitbucket.org), Microsoft Visual Studio Team Services (https://www.visualstudio.com/en-us/products/visual-studio-team-services-vs.aspx), Concord Contract Management (http://www.concordnow.com). Asset preparation When you've reached a clear decision on initial concept and design, you're ready to prototype! This means building a Unity project demonstrating the core mechanic and game rules in action; as a playable sample. After this, you typically refine the design more, and repeat prototyping until arriving at an artefact you want to pursue. From here, the art team must produce assets (meshes and textures) based on concept art, the game design, and photographic references. When producing meshes and textures for Unity, some important guidelines should be followed to achieve optimal graphical performance in-game. This is about structuring and building assets in a smart way, so they export cleanly and easily from their originating software, and can then be imported with minimal fuss, performing as best as they can at run-time. Let's see some of these guidelines for meshes and textures. Meshes - work only with good topology Good mesh topology consists in all polygons having only three or four sides in the model (not more). Additionally, Edge Loops should flow in an ordered, regular way along the contours of the model, defining its shape and form. Clean Topology Unity automatically converts, on import, any NGons (Polygons with more than four sides) into triangles, if the mesh has any. But, it's better to build meshes without NGons, as opposed to relying on Unity's automated methods. Not only does this cultivate good habits at the modelling phase, but it avoids any automatic and unpredictable retopology of the mesh, which affects how it's shaded and animated. Meshes - minimize polygon count Every polygon in a mesh entails a rendering performance hit insofar as a GPU needs time to process and render each polygon. Consequently, it's sensible to minimize the number of a polygons in a mesh, even though modern graphics hardware is adept at working with many polygons. It's good practice to minimize polygons where possible and to the degree that it doesn't detract from your central artistic vision and style. High-Poly Meshes! (Try reducing polygons where possible) There are many techniques available for reducing polygon counts. Most 3D applications (like 3DS Max, Maya and Blender) offer automated tools that decimate polygons in a mesh while retaining its basic shape and outline. However, these methods frequently make a mess of topology; leaving you with faces and edge loops leading in all directions. Even so, this can still be useful for reducing polygons in static meshes (Meshes that never animate), like statues or houses or chairs. However, it's typically bad for animated meshes where topology is especially important. Reducing Mesh Polygons with Automated Methods can produce messy topology! If you want to know the total vertex and face count of a mesh, you can use your 3D Software statistics. Blender, Maya, 3DS Max, and most 3D software, let you see vertex and face counts of selected meshes directly from the viewport. However, this information should only be considered a rough guide! This is because, after importing a mesh into Unity, the vertex count frequently turns out higher than expected! There are many reasons for this, explained in more depth online, here: http://docs.unity3d.com/Manual/OptimizingGraphicsPerformance.html In short, use the Unity Vertex Count as the final word on the actual Vertex Count of your mesh. To view the vertex-count for an imported mesh in Unity, click the right-arrow on the mesh thumbnail in the Project Panel. This shows the Internal Mesh asset. Select this asset, and then view the Vertex Count from the Preview Pane in the Object Inspector. Viewing the Vertex and Face Count for meshes in Unity Summary In this article, we've learned about what are game objectives and about asset preparation.
Read more
  • 0
  • 0
  • 36834
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-google-deepminds-ai-alphastar-beats-starcraft-ii-pros-tlo-and-mana-wins-10-1-against-the-gamers
Natasha Mathur
25 Jan 2019
5 min read
Save for later

Google DeepMind’s AI AlphaStar beats StarCraft II pros TLO and MaNa; wins 10-1 against the gamers

Natasha Mathur
25 Jan 2019
5 min read
It was two days back when the Blizzard team announced an update about the demo of the progress made by Google’s DeepMind AI at StarCraft II, a real-time strategy video game. The demo was presented yesterday over a live stream where it showed, AlphaStar, DeepMind’s StarCraft II AI program, beating the top two professional StarCraft II players, TLO and MaNa. The demo presented a series of five separate test matches that were held earlier on 19 December, against Team Liquid’s Grzegorz "MaNa" Komincz, and Dario “TLO” Wünsch. AlphaStar beat the two professional players, managing to score 10-0 in total (5-0 against each). After the 10 straight wins, AlphaStar finally got beaten by MaNa in a live match streamed by Blizzard and DeepMind. https://twitter.com/LiquidTLO/status/1088524496246657030 https://twitter.com/Liquid_MaNa/status/1088534975044087808 How does AlphaStar learn? AlphaStar learns by imitating the basic micro and macro-strategies used by players on the StarCraft ladder. A neural network was trained initially using supervised learning from anonymised human games released by Blizzard. This initial AI agent managed to defeat the “Elite” level AI in 95% of games. Once the agents get trained from human game replays, they’re then trained against other competitors in the “AlphaStar league”. This is where a multi-agent reinforcement learning process starts. New competitors are added to the league (branched from existing competitors). Each of these agents then learns from games against other competitors. This ensures that each competitor performs well against the strongest strategies, and does not forget how to defeat earlier ones.                                          AlphaStar As the league continues to progress, new counter-strategies emerge, that can defeat the earlier strategies. Also, each agent has its own learning objective which gets adapted during the training. One agent might have an objective to beat one specific competitor, while another one might want to beat a whole distribution of competitors. So, the neural network weights of each agent get updated using reinforcement learning, from its games against competitors. This helps optimise their personal learning objective. How does AlphaStar play the game? TLO and MaNa, professional StarCraft players, can issue hundreds of actions per minute (APM) on average. AlphaStar had an average APM of around 280 in its games against TLO and MaNa, which is significantly lower than the professional players. This is because AlphaStar starts its learning using replays and thereby mimics the way humans play the game. Moreover, AlphaStar also showed the delay between observation and action of 350ms on average.                                                    AlphaStar AlphaStar might have had a slight advantage over the human players as it interacted with the StarCraft game engine directly via its raw interface. What this means is that it could observe the attributes of its own as well as its opponent’s visible units on the map directly, basically getting a zoomed out view of the game. Human players, however, have to split their time and attention to decide where to focus the camera on the map. But, the analysis results of the game showed that the AI agents “switched context” about 30 times per minute, akin to MaNa or TLO. This proves that AlphaStar’s success against MaNa and TLO is due to its superior macro and micro-strategic decision-making. It isn’t the superior click-rate, faster reaction times, or the raw interface, that made the AI win. MaNa managed to beat AlphaStar in one match DeepMind also developed a second version of AlphaStar, which played like human players, meaning that it had to choose when and where to move the camera. Two new agents were trained, one that used the raw interface and the other that learned to control the camera, against the AlphaStar league.                                                           AlphaStar “The version of AlphaStar using the camera interface was almost as strong as the raw interface, exceeding 7000 MMR on our internal leaderboard”, states the DeepMind team. But, the team didn’t get the chance to test the AI against a human pro prior to the live stream.   In a live exhibition match, MaNa managed to defeat the new version of AlphaStar using the camera interface, which was trained for only 7 days. “We hope to evaluate a fully trained instance of the camera interface in the near future”, says the team. DeepMind team states AlphaStar’s performance was initially tested against TLO, where it won the match. “I was surprised by how strong the agent was..(it) takes well-known strategies..I hadn’t thought of before, which means there may still be new ways of playing the game that we haven’t fully explored yet,” said TLO. The agents were then trained for an extra one week, after which they played against MaNa. AlphaStar again won the game. “I was impressed to see AlphaStar pull off advanced moves and different strategies across almost every game, using a very human style of gameplay I wouldn’t have expected..this has put the game in a whole new light for me. We’re all excited to see what comes next,” said MaNa. Public reaction to the news is very positive, with people congratulating the DeepMind team for AlphaStar’s win: https://twitter.com/SebastienBubeck/status/1088524371285557248 https://twitter.com/KaiLashArul/status/1088534443718045696 https://twitter.com/fhuszar/status/1088534423786668042 https://twitter.com/panicsw1tched/status/1088524675540549635 https://twitter.com/Denver_sc2/status/1088525423229759489 To learn about the strategies developed by AlphaStar, check out the complete set of replays of AlphaStar's matches against TLO and MaNa on DeepMind's website. Best game engines for Artificial Intelligence game development Deepmind’s AlphaZero shows unprecedented growth in AI, masters 3 different games Deepmind’s AlphaFold is successful in predicting the 3D structure of a protein making major inroads for AI use in healthcare
Read more
  • 0
  • 0
  • 36799

article-image-unreal-engine-4-23-releases-with-major-new-features-like-chaos-virtual-production-improvement-in-real-time-ray-tracing-and-more
Vincy Davis
09 Sep 2019
5 min read
Save for later

Unreal Engine 4.23 releases with major new features like Chaos, Virtual Production, improvement in real-time ray tracing and more

Vincy Davis
09 Sep 2019
5 min read
Last week, Epic released the stable version of Unreal Engine 4.23 with a whopping 192 improvements. The major features include beta varieties like Chaos - Destruction, Multi-Bounce Reflection fallback in Real-Time Ray Tracing, Virtual Texturing, Unreal Insights, HoloLens 2 native support, Niagara improvements and many more. Unreal Engine 4.23 will no longer support iOS 10, as iOS 11 is now the minimum required version. What’s new in Unreal Engine 4.23? Chaos - Destruction Labelled as “Unreal Engine's new high-performance physics and destruction system” Chaos is available in beta for users to attain cinematic-quality visuals in real-time scenes. It also supports high level artist control over content creation and destruction. https://youtu.be/fnuWG2I2QCY Chaos supports many distinct characteristics like- Geometry Collections: It is a new type of asset in Unreal for short-lived objects. The Geometry assets can be built using one or more Static Meshes. It offers flexibility to the artist on choosing what to simulate, how to organize and author the destruction. Fracturing: A Geometry Collection can be broken into pieces either individually, or by applying one pattern across multiple pieces using the Fracturing tools. Clustering: Sub-fracturing is used by artists to increase optimization. Every sub-fracture is an extra level added to the Geometry Collection. The Chaos system keeps track of the extra levels and stores the information in a Cluster, to be controlled by the artist. Fields: It can be used to control simulation and other attributes of the Geometry Collection. Fields enable users to vary the mass, make something static, to make the corner more breakable than the middle, and others. Unreal Insights Currently in beta, Unreal Insights enable developers to collect and analyze data about Unreal Engine's behavior in a fixed way. The Trace System API system is one of its components and is used to collect information from runtime systems consistently. Another component of Unreal Insights is called the Unreal Insights Tool. It supplies interactive visualization of data through the Analysis API. For in-depth details about Unreal Insights and other features, you can also check out the first preview release of Unreal Engine 4.23. Virtual Production Pipeline Improvements Unreal Engine 4.23 explores advancements in virtual production pipeline by improving virtually scout environments and compose shots by connecting live broadcast elements with digital representations and more. In-Camera VFX: With improvements in-Camera VFX, users can achieve final shots live on set by combining real-world actors and props with Unreal Engine environment backgrounds. VR Scouting for Filmmakers: The new VR Scouting tools can be used by filmmakers to navigate and interact with the virtual world in VR. Controllers and settings can also be customized in Blueprints,rather than rebuilding the engine in C++. Live Link Datatypes and UX Improvements: The Live Link Plugin be used to drive character animation, camera, lights, and basic 3D transforms dynamically from other applications and data sources in the production pipeline. Other improvements include save and load presets for Live Link setups, better status indicators to show the current Live Link sources, and more. Remote Control over HTTP: Unreal Engine 4.23 users can send commands to Unreal Engine and Unreal Editor remotely over HTTP. This makes it possible for users to create customized web user interfaces to trigger changes in the project's content. Read Also: Epic releases Unreal Engine 4.22, focuses on adding “photorealism in real-time environments” Real-Time Ray tracing Improvements Performance and Stability Expanded DirectX 12 Support Improved Denoiser quality Increased Ray Traced Global Illumination (RTGI) quality Additional Geometry and Material Support Landscape Terrain Hierarchical Instanced Static Meshes (HISM) and Instanced Static Meshes (ISM) Procedural Meshes Transmission with SubSurface Materials World Position Offset (WPO) support for Landscape and Skeletal Mesh geometries Multi-Bounce Reflection Fallback Unreal Engine 4.23 provides improved support for multi-bounce Ray Traced Reflections (RTR) by using Reflection Captures. This will increase the performance of all types of intra-reflections. Virtual Texturing The beta version of Virtual Texturing in Unreal Engine 4.23 enables users to create and use large textures for a lower and more constant memory footprint at runtime. Streaming Virtual Texturing: The Streaming Virtual Texturing uses the Virtual Texture assets to present an option to stream textures from disk rather than the existing Mip-based streaming. It minimizes the texture memory overhead and increases performance when using very large textures. Runtime Virtual Texturing: The Runtime Virtual Texturing avails a Runtime Virtual Texture asset. It can be used to supply shading data over large areas, thus making it suitable for Landscape shading. Unreal Engine 4.23 also presents new features like Skin Weight Profiles, Animation Streaming, Dynamic Animation Graphs, Open Sound Control, Sequencer Curve Editor Improvements, and more. As expected, users love the new features in Unreal Engine 4.23, especially Chaos. https://twitter.com/rista__m/status/1170608746692673537 https://twitter.com/jayakri59101140/status/1169553133518782464 https://twitter.com/NoisestormMusic/status/1169303013149806595 To know about the full updates in Unreal Engine 4.23, users can head over to the Unreal Engine blog. Other news in Game Development Japanese Anime studio Khara is switching its primary 3D CG tools to Blender Following Epic Games, Ubisoft joins Blender Development fund; adopts Blender as its main DCC tool Epic Games grants Blender $1.2 million in cash to improve the quality of their software development projects
Read more
  • 0
  • 0
  • 36361

article-image-using-leap-motion-controller-arduino
Packt
19 Nov 2014
18 min read
Save for later

Using the Leap Motion Controller with Arduino

Packt
19 Nov 2014
18 min read
This article by Brandon Sanders, the author of the book Mastering Leap Motion, focuses on what he specializes in—hardware. While normal applications are all fine and good, he finds it much more gratifying if a program he writes has an impact in the physical world. (For more resources related to this topic, see here.) One of the most popular hobbyist hardware solutions, as I'm sure you know, is the Arduino. This cute little blue board from Italy brought the power of micro controllers to the masses. Throughout this article, we're going to work on integrating a Leap Motion Controller with an Arduino board via a simplistic program; the end goal is to make the built-in LED on an Arduino board blink either slower or faster depending on how far a user's hand is away from the Leap. While this is a relatively simple task, it's a great way to demonstrate how you can connect something like the Leap to an external piece of hardware. From there, it's only a hop, skip, and jump to control robots and other cool things with the Leap! This project will follow the client-server model of programming: we'll be writing a simple Java server which will be run from a computer, and a C++ client which will run on an Arduino board connected to the computer. The server will be responsible for retrieving Leap Motion input and sending it to the client, while the client will be responsible for making an LED blink based on data received from the server. Before we begin, I'd like to note that you can download the completed (and working) project from GitHub at https://github.com/Mizumi/Mastering-Leap-Motion-Chapter-9-Project-Leapduino. A few things you'll need Before you begin working on this tutorial, there are a few things you're going to need: A computer (for obvious reasons). A Leap Motion Controller. An Arduino of some kind. This tutorial is based around the Uno model, but other similar models like the Mega should work just as well. A USB cable to connect your Arduino to your computer. Optionally, the Eclipse IDE (this tutorial will assume you're using Eclipse for the sake of readability and instruction). Setting up the environment First off, you're going to need a copy of the Leap Motion SDK so that you can add the requisite library jar files and DLLs to the project. If you don't already have it, you can get a copy of the SDK from https://www.developer.leapmotion.com/downloads/. Next, you're going to need the Java Simple Serial Connector (JSSC) library and the Arduino IDE. You can download the library JAR file for JSSC from GitHub at https://github.com/scream3r/java-simple-serial-connector/releases. Once the download completes, extract the JAR file from the downloaded ZIP folder and store it somewhere safe; you'll need it later on in this tutorial. You can then proceed to download the Arduino IDE from their official website at http://arduino.cc/en/Main/Software. If you're on Windows, you will be able to download a Windows installer file which will automagically install the entire IDE on to your computer. On the other hand, Mac and Linux users will need to instead download .zip or .tgz files and then extract them manually, running the executable binary from the extracted folder contents. Setting up the project To set up our project, perform the following steps: The first thing we're going to do is create a new Java project. This can be easily achieved by opening up Eclipse (to reiterate for the third time, this tutorial will assume you're using Eclipse) and heading over to File -> New -> Java Project. You will then be greeted by a project creation wizard, where you'll be prompted to choose a name for the project (I used Leapduino). Click on the Finish button when you're done. My current development environment is based around the Eclipse IDE for Java Developers, which can be found at http://www.eclipse.org/downloads. The instructions that follow will use Eclipse nomenclature and jargon, but they will still be usable if you're using something else (like NetBeans). Once the project is created, navigate to it in the Package Explorer window. You'll want to go ahead and perform the following actions: Create a new package for the project by right-clicking on the src folder for your project in the Package Explorer and then navigating to New | Package in the resulting tooltip. You can name it whatever you like; I personally called mine com.mechakana.tutorials. You'll now want to add three files to our newly-created package: Leapduino.java, LeapduinoListener.java, and RS232Protocol.java. To create a new file, simply right-click on the package and then navigate to New | Class. Create a new folder in your project by right-clicking on the project name in the Package Explorer and then navigating to New | Folder in the resulting tooltip. For the purposes of this tutorial, please name it Leapduino. Now add one file to your newly created folder: Leapduino.ino. This file will contain all of the code that we're going to upload to the Arduino. With all of our files created, we need to add the libraries to the project. Go ahead and create a new folder at the root directory of your project, called lib. Within the lib folder, you'll want to place the jssc.jar file that you downloaded earlier, along with the LeapJava.jar file from the Leap Motion SDK. Then, you will want to add the appropriate Leap.dll and LeapJava.dll files for your platform to the root of your project. Finally, you'll need to modify your Java build path to link the LeapJava.jar and jssc.jar files to your project. This can be achieved by right-clicking on your project in the Package Explorer (within Eclipse) and navigating to Build Path… | Configure Build Path…. From there, go to the Libraries tab and click on Add JARs…, selecting the two aforementioned JAR files (LeapJava.jar and jssc.jar). When you're done, your project should look similar to the following screenshot: And you're done; now to write some code! Writing the Java side of things With everything set up and ready to go, we can start writing some code. First off, we're going to write the RS232Protocol class, which will allow our application to communicate with any Arduino board connected to the computer via a serial (RS-232) connection. This is where the JSSC library will come into play, allowing us to quickly and easily write code that would otherwise be quite lengthy (and not fun). Fun fact RS-232 is a standard for serial communications and transmission of data. There was a time when it was a common feature on a personal computer, used for modems, printers, mice, hard drives, and so on. With time, though, the Universal Serial Bus (USB) technology replaced RS-232 for many of those roles. Despite this, today's industrial machines, scientific equipment and (of course) robots still make heavy usage of this protocol due to its light weight and ease of use; the Arduino is no exception! Go ahead and open up the RS232Protocol.java file which we created earlier, and enter the following: package com.mechakana.tutorials; import jssc.SerialPort; import jssc.SerialPortEvent; import jssc.SerialPortEventListener; import jssc.SerialPortException; public class RS232Protocol { //Serial port we're manipulating. private SerialPort port; //Class: RS232Listener public class RS232Listener implements SerialPortEventListener {    public void serialEvent(SerialPortEvent event)    {      //Check if data is available.      if (event.isRXCHAR() && event.getEventValue() > 0)      {        try        {          int bytesCount = event.getEventValue();          System.out.print(port.readString(bytesCount));        }                 catch (SerialPortException e) { e.printStackTrace(); }      }    } } //Member Function: connect public void connect(String newAddress) {    try    {      //Set up a connection.      port = new SerialPort(newAddress);         //Open the new port and set its parameters.      port.openPort();      port.setParams(38400, 8, 1, 0);               //Attach our event listener.      port.addEventListener(new RS232Listener());    }     catch (SerialPortException e) { e.printStackTrace(); } } //Member Function: disconnect public void disconnect() {    try { port.closePort(); }     catch (SerialPortException e) { e.printStackTrace(); } } //Member Function: write public void write(String text) {    try { port.writeBytes(text.getBytes()); }     catch (SerialPortException e) { e.printStackTrace(); } } } All in all, RS232Protocol is a simple class—there really isn't a whole lot to talk about here! However, I'd love to point your attention to one interesting part of the class: public class RS232Listener implements SerialPortEventListener { public void serialEvent(SerialPortEvent event) { /*code*/ } } You might have found it rather odd that we didn't create a function for reading from the serial port—we only created a function for writing to it. This is because we've opted to utilize an event listener, the nested RS232Listener class. Under normal operating conditions, this class's serialEvent function will be called and executed every single time new information is received from the port. When this happens, the function will print all of the incoming data out to the user's screen. Isn't that nifty? Moving on, our next class is a familiar one—LeapduinoListener, a simple Listener implementation. This class represents the meat of our program, receiving Leap Motion tracking data and then sending it over our serial port to the connected Arduino. Go ahead and open up LeapduinoListener.java and enter the following code: package com.mechakana.tutorials; import com.leapmotion.leap.*; public class LeapduinoListener extends Listener {   //Serial port that we'll be using to communicate with the Arduino. private RS232Protocol serial; //Constructor public LeapduinoListener(RS232Protocol serial) {    this.serial = serial; } //Member Function: onInit public void onInit(Controller controller) {    System.out.println("Initialized"); } //Member Function: onConnect public void onConnect(Controller controller) {    System.out.println("Connected"); } //Member Function: onDisconnect public void onDisconnect(Controller controller) {    System.out.println("Disconnected"); } //Member Function: onExit public void onExit(Controller controller) {    System.out.println("Exited"); } //Member Function: onFrame public void onFrame(Controller controller) {    //Get the most recent frame.    Frame frame = controller.frame();    //Verify a hand is in view.    if (frame.hands().count() > 0)    {      //Get some hand tracking data.      int hand = (int) (frame.hands().frontmost().palmPosition().getY());      //Send the hand pitch to the Arduino.      serial.write(String.valueOf(hand));      //Give the Arduino some time to process our data.      try { Thread.sleep(30); }      catch (InterruptedException e) { e.printStackTrace(); }    } } } In this class, we've got the basic Leap Motion API onInit, onConnect, onDisconnect, onExit, and onFrame functions. Our onFrame function is fairly straightforward: we get the most recent frame, verify a hand is within view, retrieve its y axis coordinates (height from the Leap Motion Controller) and then send it off to the Arduino via our instance of the RS232Protocol class (which gets assigned during initialization). The remaining functions simply print text out to the console telling us when the Leap has initialized, connected, disconnected, and exited (respectively). And now, for our final class on the Java side of things: Leapduino! This class is a super basic main class that simply initializes the RS232Protocol class and the LeapduinoListener—that's it! Without further ado, go on ahead and open up Leapduino.java and enter the following code: package com.mechakana.tutorials; import com.leapmotion.leap.Controller; public class Leapduino { //Main public static final void main(String args[]) {      //Initialize serial communications.    RS232Protocol serial = new RS232Protocol();    serial.connect("COM4");    //Initialize the Leapduino listener.    LeapduinoListener leap = new LeapduinoListener(serial);    Controller controller = new Controller();    controller.addListener(leap); } } Like all of the classes so far, there isn't a whole lot to say here. That said, there is one line that you must absolutely be aware of, since it can change depending on how you're Arduino is connected: serial.connect("COM4"); Depending on which port Windows chose for your Arduino when it connected to your computer (more on that next), you will need to modify the COM4 value in the above line to match the port your Arduino is on. Examples of values you'll probable use are COM3, COM4, and COM5. And with that, the Java side of things is complete. If you run this project right now, most likely all you'll see will be two lines of output: Initialized and Connected. If you want to see anything else happen, you'll need to move on to the next section and get the Arduino side of things working. Writing the Arduino side of things With our Java coding done, it's time to write some good-old C++ for the Arduino. If you were able to use the Windows installer for Arduino, simply navigate to the Leapduino.ino file in your Eclipse project explorer and double click on it. If you had to extract the entire Arduino IDE and store it somewhere instead of running a simple Windows installer, navigate to it and launch the Arduino.exe file. From there, select File | Open, navigate to the Leapduino.ino file on your computer and double click on it. You will now be presented with a screen similar to the one here: This is the wonderful Arduino IDE—a minimalistic and straightforward text editor and compiler for the Arduino microcontrollers. On the top left of the IDE, you'll find two circular buttons: the check mark verifies (compiles) your code to make sure it works, and the arrow deploys your code to the Arduino board connected to your computer. On the bottom of the IDE, you'll find the compiler output console (the black box), and on the very bottom right you'll see a line of text telling you which Arduino model is connected to your computer, and on what port (I have an Arduino Uno on COM4 in the preceding screenshot). As is typical for many IDEs and text editors, the big white area in the middle is where your code will go. So without further ado, let's get started with writing some code! Input all of the text shown here into the Arduino IDE: //Most Arduino boards have an LED pre-wired to pin 13. int led = 13; //Current LED state. LOW is off and HIGH is on. int ledState = LOW; //Blink rate in milliseconds. long blinkRate = 500; //Last time the LED was updated. long previousTime = 0; //Function: setup void setup() { //Initialize the built-in LED (assuming the Arduino board has one) pinMode(led, OUTPUT); //Start a serial connection at a baud rate of 38,400. Serial.begin(38400); } //Function: loop void loop() { //Get the current system time in milliseconds. unsigned long currentTime = millis(); //Check if it's time to toggle the LED on or off. if (currentTime - previousTime >= blinkRate) {    previousTime = currentTime;       if (ledState == LOW) ledState = HIGH;    else ledState = LOW;       digitalWrite(led, ledState); } //Check if there is serial data available. if (Serial.available()) {    //Wait for all data to arrive.    delay(20);       //Our data.    String data = "";       //Iterate over all of the available data and compound it into      a string.    while (Serial.available())      data += (char) (Serial.read());       //Set the blink rate based on our newly-read data.    blinkRate = abs(data.toInt() * 2);       //A blink rate lower than 30 milliseconds won't really be      perceptable by a human.    if (blinkRate < 30) blinkRate = 30;       //Echo the data.    Serial.println("Leapduino Client Received:");    Serial.println("Raw Leap Data: " + data + " | Blink Rate (MS):      " + blinkRate); } } Now, let's go over the contents. The first few lines are basic global variables, which we'll be using throughout the program (the comments do a good job of describing them, so we won't go into much detail here). The first function, setup, is an Arduino's equivalent of a constructor; it's called only once, when the Arduino is first turned on. Within the setup function, we initialize the built-in LED (most Arduino boards have an LED pre-wired to pin 13) on the board. We then initialize serial communications at a baud rate of 38,400 bits per second—this will allow our board to communicate with the computer later on. Fun fact The baud rate (abbreviated as Bd in some diagrams) is the unit for symbol rate or modulation rate in symbols or pulses per second. Simply put, on serial ports, the baud rate controls how many bits a serial port can send per second—the higher the number, the faster a serial port can communicate. The question is, why don't we set a ridiculously high rate? Well, the higher you go with the baud rate, the more likely it is for there to be data loss—and we all know data loss just isn't good. For many applications, though, a baud rate of 9,600 to 38,400 bits per second is sufficient. Moving on to the second function, loop is the main function in any Arduino program, which is repeatedly called while the Arduino is turned on. Due to this functionality, many programs will treat any code within this function as if it were inside a while (true) loop. In loop, we start off by getting the current system time (in milliseconds) and then comparing it to our ideal blink rate for the LED. If the time elapsed since our last blink exceeds the ideal blink rate, we'll go ahead and toggle the LED on or off accordingly. We then proceed to check if any data has been received over the serial port. If it has, we'll proceed to wait for a brief period of time, 20 milliseconds, to make sure all data has been received. At that point, our code will proceed to read in all of the data, parse it for an integer (which will be our new blink rate), and then echo the data back out to the serial port for diagnostics purposes. As you can see, an Arduino program (or sketch, as they are formally known) is quite simple. Why don't we test it out? Deploying and testing the application With all of the code written, it's time to deploy the Arduino side of things to the, well, Arduino. The first step is to simply open up your Leapduino.ino file in the Arduino IDE. Once that's done, navigate to Tools | Board and select the appropriate option for your Arduino board. In my case, it's an Arduino Uno. At this point, you'll want to verify that you have an Arduino connected to your computer via a USB cable—after all, we can't deploy to thin air! At this point, once everything is ready, simply hit the Deploy button in the top-left of the IDE, as seen here: If all goes well, you'll see the following output in the console after 15 or so seconds: And with that, your Arduino is ready to go! How about we test it out? Keeping your Arduino plugged into your computer, go on over to Eclipse and run the project we just made. Once it's running, try moving your hand up and down over your Leap Motion controller; if all goes well, you'll see the following output from within the console in Eclipse: All of that data is coming directly from the Arduino, not your Java program; isn't that cool? Now, take a look at your Arduino while you're doing this; you should notice that the built-in LED (circled in the following image, labelled L on the board itself) will begin to blink slower or faster depending on how close your hand gets to the Leap. Circled in red: the built-in L LED on an Arduino Uno, wired to pin 13 by default. With this, you've created a simple Leap Motion application for use with an Arduino. From here, you could go on to make an Arduino-controlled robotic arm driven by coordinates from the Leap, or maybe an interactive light show. The possibilities are endless, and this is just the (albeit extremely, extremely simple) tip of the iceberg. Summary In this article, you had a lengthy look at some things you can do with the Leap Motion Controller and hardware such as Arduino. If you have any questions, I encourage you to contact me directly at brandon@mechakana.com. You can also visit my website, http://www.mechakana.com, for more technological goodies and tutorials. Resources for Article: Further resources on this subject: Major SDK components [Article] 2D Twin-stick Shooter [Article] What's Your Input? [Article]
Read more
  • 0
  • 0
  • 36076

article-image-debugging-vulkan
Packt
23 Nov 2016
16 min read
Save for later

Debugging in Vulkan

Packt
23 Nov 2016
16 min read
In this article by Parminder Singh, author of Learning Vulkan, we learn Vulkan debugging in order to avoid unpleasant mistakes. Vulkan allows you to perform debugging through validation layers. These validation layer checks are optional and can be injected into the system at runtime. Traditional graphics APIs perform validation right up front using some sort of error-checking mechanism, which is a mandatory part of the pipeline. This is indeed useful in the development phase, but actually, it is an overhead during the release stage because the validation bugs might have already been fixed at the development phase itself. Such compulsory checks cause the CPU to spend a significant amount of time in error checking. On the other hand, Vulkan is designed to offer maximum performance, where the optional validation process and debugging model play a vital role. Vulkan assumes the application has done its homework using the validation and debugging capabilities available at the development stage, and it can be trusted flawlessly at the release stage. In this article, we will learn the validation and debugging process of a Vulkan application. We will cover the following topics: Peeking into Vulkan debugging Understanding LunarG validation layers and their features Implementing debugging in Vulkan (For more resources related to this topic, see here.) Peeking into Vulkan debugging Vulkan debugging validates the application implementation. It not only surfaces the errors, but also other validations, such as proper API usage. It does so by verifying each parameter passed to it, warning about the potentially incorrect and dangerous API practices in use and reporting any performance-related warnings when the API is not used optimally. By default, debugging is disabled, and it's the application's responsibility to enable it. Debugging works only for those layers that are explicitly enabled at the instance level at the time of the instance creation (VkInstance). When debugging is enabled, it inserts itself into the call chain for the Vulkan commands the layer is interested in. For each command, the debugging visits all the enabled layers and validates them for any potential error, warning, debugging information, and so on. Debugging in Vulkan is simple. The following is an overview that describes the steps required to enable it in an application: Enable the debugging capabilities by adding the VK_EXT_DEBUG_REPORT_EXTENSION_NAME extension at the instance level. Define the set of the validation layers that are intended for debugging. For example, we are interested in the following layers at the instance and device level. For more information about these layer functionalities, refer to the next section: VK_LAYER_GOOGLE_unique_objects VK_LAYER_LUNARG_api_dump VK_LAYER_LUNARG_core_validation VK_LAYER_LUNARG_image VK_LAYER_LUNARG_object_tracker VK_LAYER_LUNARG_parameter_validation VK_LAYER_LUNARG_swapchain VK_LAYER_GOOGLE_threading The Vulkan debugging APIs are not part of the core command, which can be statically loaded by the loader. These are available in the form of extension APIs that can be retrieved at runtime and dynamically linked to the predefined function pointers. So, as the next step, the debug extension APIs vkCreateDebugReportCallbackEXT and vkDestroyDebugReportCallbackEXT are queried and linked dynamically. These are used for the creation and destruction of the debug report. Once the function pointers for the debug report are retrieved successfully, the former API (vkCreateDebugReportCallbackEXT) creates the debug report object. Vulkan returns the debug reports in a user-defined callback, which has to be linked to this API. Destroy the debug report object when debugging is no more required. Understanding LunarG validation layers and their features The LunarG Vulkan SDK supports the following layers for debugging and validation purposes. In the following points, we have described some of the layers that will help you understand the offered functionalities: VK_LAYER_GOOGLE_unique_objects: Non-dispatchable handles are not required to be unique; a driver may return the same handle for multiple objects that it considers equivalent. This behavior makes the tracking of the object difficult because it is not clear which object to reference at the time of deletion. This layer packs the Vulkan objects into a unique identifier at the time of creation and unpacks them when the application uses it. This ensures there is proper object lifetime tracking at the time of validation. As per LunarG's recommendation, this layer must be last in the chain of the validation layer, making it closer to the display driver. VK_LAYER_LUNARG_api_dump: This layer is helpful in knowing the parameter values passed to the Vulkan APIs. It prints all the data structure parameters along with their values. VK_LAYER_LUNARG_core_validation: This is used for validating and printing important pieces of information from the descriptor set, pipeline state, dynamic state, and so on. This layer tracks and validates the GPU memory, object binding, and command buffers. Also, it validates the graphics and compute pipelines. VK_LAYER_LUNARG_image: This layer can be used for validating texture formats, rendering target formats, and so on. For example, it verifies whether the requested format is supported on the device. It validates whether the image view creation parameters are reasonable for the image that the view is being created for. VK_LAYER_LUNARG_object_tracker: This keeps track of object creation along with its use and destruction, which is helpful in avoiding memory leaks. It also validates that the referenced object is properly created and is presently valid. VK_LAYER_LUNARG_parameter_validation: This validation layer ensures that all the parameters passed to the API are correct as per the specification and are up to the required expectation. It checks whether the value of a parameter is consistent and within the valid usage criteria defined in the Vulkan specification. Also, it checks whether the type field of a Vulkan control structure contains the same value that is expected for a structure of that type. VK_LAYER_LUNARG_swapchain: This layer validates the use of the WSI swapchain extensions. For example, it checks whether the WSI extension is available before its functions could be used. Also, it validates that an image index is within the number of images in a swapchain. VK_LAYER_GOOGLE_threading: This is helpful in the context of thread safety. It checks the validity of multithreaded API usage. This layer ensures the simultaneous use of objects using calls running under multiple threads. It reports threading rule violations and enforces a mutex for such calls. Also, it allows an application to continue running without actually crashing, despite the reported threading problem. VK_LAYER_LUNARG_standard_validation: This enables all the standard layers in the correct order. For more information on validation layers, visit LunarG's official website. Check out https://vulkan.lunarg.com/doc/sdk and specifically refer to the Validation layer details section for more details. Implementing debugging in Vulkan Since debugging is exposed by validation layers, most of the core implementation of the debugging will be done under the VulkanLayerAndExtension class (VulkanLED.h/.cpp). In this section, we will learn about the implementation that will help us enable the debugging process in Vulkan: The Vulkan debug facility is not part of the default core functionalities. Therefore, in order to enable debugging and access the report callback, we need to add the necessary extensions and layers: Extension: Add the VK_EXT_DEBUG_REPORT_EXTENSION_NAME extension to the instance level. This will help in exposing the Vulkan debug APIs to the application: vector<const char *> instanceExtensionNames = { . . . . // other extensios VK_EXT_DEBUG_REPORT_EXTENSION_NAME, }; Layer: Define the following layers at the instance level to allow debugging at these layers: vector<const char *> layerNames = { "VK_LAYER_GOOGLE_threading", "VK_LAYER_LUNARG_parameter_validation", "VK_LAYER_LUNARG_device_limits", "VK_LAYER_LUNARG_object_tracker", "VK_LAYER_LUNARG_image", "VK_LAYER_LUNARG_core_validation", "VK_LAYER_LUNARG_swapchain", “VK_LAYER_GOOGLE_unique_objects” }; In addition to the enabled validation layers, the LunarG SDK provides a special layer called VK_LAYER_LUNARG_standard_validation. This enables basic validation in the correct order as mentioned here. Also, this built-in metadata layer loads a standard set of validation layers in the optimal order. It is a good choice if you are not very specific when it comes to a layer. a) VK_LAYER_GOOGLE_threading b) VK_LAYER_LUNARG_parameter_validation c) VK_LAYER_LUNARG_object_tracker d) VK_LAYER_LUNARG_image e) VK_LAYER_LUNARG_core_validation f) VK_LAYER_LUNARG_swapchain g) VK_LAYER_GOOGLE_unique_objects These layers are then supplied to the vkCreateInstance() API to enable them: VulkanApplication* appObj = VulkanApplication::GetInstance(); appObj->createVulkanInstance(layerNames, instanceExtensionNames, title); // VulkanInstance::createInstance() VkResult VulkanInstance::createInstance(vector<const char *>& layers, std::vector<const char *>& extensionNames, char const*const appName) { . . . VkInstanceCreateInfo instInfo = {}; // Specify the list of layer name to be enabled. instInfo.enabledLayerCount = layers.size(); instInfo.ppEnabledLayerNames = layers.data(); // Specify the list of extensions to // be used in the application. instInfo.enabledExtensionCount = extensionNames.size(); instInfo.ppEnabledExtensionNames = extensionNames.data(); . . . vkCreateInstance(&instInfo, NULL, &instance); } The validation layer is very specific to the vendors and SDK version. Therefore, it is advisable to first check whether the layers are supported by the underlying implementation before passing them to the vkCreateInstance() API. This way, the application remains portable throughout when ran against another driver implementation. The areLayersSupported() is a user-defined utility function that inspects the incoming layer names against system-supported layers. The unsupported layers are informed to the application and removed from the layer names before feeding them into the system: // VulkanLED.cpp VkBool32 VulkanLayerAndExtension::areLayersSupported (vector<const char *> &layerNames) { uint32_t checkCount = layerNames.size(); uint32_t layerCount = layerPropertyList.size(); std::vector<const char*> unsupportLayerNames; for (uint32_t i = 0; i < checkCount; i++) { VkBool32 isSupported = 0; for (uint32_t j = 0; j < layerCount; j++) { if (!strcmp(layerNames[i], layerPropertyList[j]. properties.layerName)) { isSupported = 1; } } if (!isSupported) { std::cout << "No Layer support found, removed” “ from layer: "<< layerNames[i] << endl; unsupportLayerNames.push_back(layerNames[i]); } else { cout << "Layer supported: " << layerNames[i] << endl; } } for (auto i : unsupportLayerNames) { auto it = std::find(layerNames.begin(), layerNames.end(), i); if (it != layerNames.end()) layerNames.erase(it); } return true; } The debug report is created using the vkCreateDebugReportCallbackEXT API. This API is not a part of Vulkan's core commands; therefore, the loader is unable to link it statically. If you try to access it in the following manner, you will get an undefined symbol reference error: vkCreateDebugReportCallbackEXT(instance, NULL, NULL, NULL); All the debug-related APIs need to be queried using the vkGetInstanceProcAddr() API and linked dynamically. The retrieved API reference is stored in a corresponding function pointer called PFN_vkCreateDebugReportCallbackEXT. The VulkanLayerAndExtension::createDebugReportCallback() function retrieves the create and destroy debug APIs, as shown in the following implementation: /********* VulkanLED.h *********/ // Declaration of the create and destroy function pointers PFN_vkCreateDebugReportCallbackEXT dbgCreateDebugReportCallback; PFN_vkDestroyDebugReportCallbackEXT dbgDestroyDebugReportCallback; /********* VulkanLED.cpp *********/ VulkanLayerAndExtension::createDebugReportCallback(){ . . . // Get vkCreateDebugReportCallbackEXT API dbgCreateDebugReportCallback=(PFN_vkCreateDebugReportCallbackEXT) vkGetInstanceProcAddr(*instance,"vkCreateDebugReportCallbackEXT"); if (!dbgCreateDebugReportCallback) { std::cout << "Error: GetInstanceProcAddr unable to locate vkCreateDebugReportCallbackEXT function.n"; return VK_ERROR_INITIALIZATION_FAILED; } // Get vkDestroyDebugReportCallbackEXT API dbgDestroyDebugReportCallback= (PFN_vkDestroyDebugReportCallbackEXT)vkGetInstanceProcAddr (*instance, "vkDestroyDebugReportCallbackEXT"); if (!dbgDestroyDebugReportCallback) { std::cout << "Error: GetInstanceProcAddr unable to locate vkDestroyDebugReportCallbackEXT function.n"; return VK_ERROR_INITIALIZATION_FAILED; } . . . } The vkGetInstanceProcAddr() API obtains the instance-level extensions dynamically; these extensions are not exposed statically on a platform and need to be linked through this API dynamically. The following is the signature of this API: PFN_vkVoidFunction vkGetInstanceProcAddr( VkInstance instance, const char* name); The following table describes the API fields: Parameters Description instance This is a VkInstance variable. If this variable is NULL, then the name must be one of these: vkEnumerateInstanceExtensionProperties, vkEnumerateInstanceLayerProperties, or vkCreateInstance. name This is the name of the API that needs to be queried for dynamic linking.   Using the dbgCreateDebugReportCallback()function pointer, create the debugging report object and store the handle in debugReportCallback. The second parameter of the API accepts a VkDebugReportCallbackCreateInfoEXT control structure. This data structure defines the behavior of the debugging, such as what should the debug information include—errors, general warnings, information, performance-related warning, debug information, and so on. In addition, it also takes the reference of a user-defined function (debugFunction); this helps filter and print the debugging information once it is retrieved from the system. Here's the syntax for creating the debugging report: struct VkDebugReportCallbackCreateInfoEXT { VkStructureType type; const void* next; VkDebugReportFlagsEXT flags; PFN_vkDebugReportCallbackEXT fnCallback; void* userData; }; The following table describes the purpose of the mentioned API fields: Parameters Description type This is the type information of this control structure. It must be specified as VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT. flags This is to define the kind of debugging information to be retrieved when debugging is on; the next table defines these flags. fnCallback This field refers to the function that filters and displays the debug messages. The VkDebugReportFlagBitsEXT control structure can exhibit a bitwise combination of the following flag values: Insert table here The createDebugReportCallback function implements the creation of the debug report. First, it creates the VulkanLayerAndExtension control structure object and fills it with relevant information. This primarily includes two things: first, assigning a user-defined function (pfnCallback) that will print the debug information received from the system (see the next point), and second, assigning the debugging flag (flags) in which the programmer is interested: /********* VulkanLED.h *********/ // Handle of the debug report callback VkDebugReportCallbackEXT debugReportCallback; // Debug report callback create information control structure VkDebugReportCallbackCreateInfoEXT dbgReportCreateInfo = {}; /********* VulkanLED.cpp *********/ VulkanLayerAndExtension::createDebugReportCallback(){ . . . // Define the debug report control structure, // provide the reference of 'debugFunction', // this function prints the debug information on the console. dbgReportCreateInfo.sType = VK_STRUCTURE_TYPE_DEBUG_REPORT_CREATE_INFO_EXT; dbgReportCreateInfo.pfnCallback = debugFunction; dbgReportCreateInfo.pUserData = NULL; dbgReportCreateInfo.pNext = NULL; dbgReportCreateInfo.flags = VK_DEBUG_REPORT_WARNING_BIT_EXT | VK_DEBUG_REPORT_PERFORMANCE_WARNING_BIT_EXT | VK_DEBUG_REPORT_ERROR_BIT_EXT | VK_DEBUG_REPORT_DEBUG_BIT_EXT; // Create the debug report callback and store the handle // into 'debugReportCallback' result = dbgCreateDebugReportCallback (*instance, &dbgReportCreateInfo, NULL, &debugReportCallback); if (result == VK_SUCCESS) { cout << "Debug report callback object created successfullyn"; } return result; } Define the debugFunction() function that prints the retrieved debug information in a user-friendly way. It describes the type of debug information along with the reported message: VKAPI_ATTR VkBool32 VKAPI_CALL VulkanLayerAndExtension::debugFunction( VkFlags msgFlags, VkDebugReportObjectTypeEXT objType, uint64_t srcObject, size_t location, int32_t msgCode, const char *pLayerPrefix, const char *pMsg, void *pUserData){ if (msgFlags & VK_DEBUG_REPORT_ERROR_BIT_EXT) { std::cout << "[VK_DEBUG_REPORT] ERROR: [" <<layerPrefix<<"] Code" << msgCode << ":" << msg << std::endl; } else if (msgFlags & VK_DEBUG_REPORT_WARNING_BIT_EXT) { std::cout << "[VK_DEBUG_REPORT] WARNING: ["<<layerPrefix<<"] Code" << msgCode << ":" << msg << std::endl; } else if (msgFlags & VK_DEBUG_REPORT_INFORMATION_BIT_EXT) { std::cout<<"[VK_DEBUG_REPORT] INFORMATION:[" <<layerPrefix<<"] Code" << msgCode << ":" << msg << std::endl; } else if(msgFlags& VK_DEBUG_REPORT_PERFORMANCE_WARNING_BIT_EXT){ cout <<"[VK_DEBUG_REPORT] PERFORMANCE: ["<<layerPrefix<<"] Code" << msgCode << ":" << msg << std::endl; } else if (msgFlags & VK_DEBUG_REPORT_DEBUG_BIT_EXT) { cout << "[VK_DEBUG_REPORT] DEBUG: ["<<layerPrefix<<"] Code" << msgCode << ":" << msg << std::endl; } else { return VK_FALSE; } return VK_SUCCESS; } The following table describes the various fields from the debugFunction()callback: Parameters Description msgFlags This specifies the type of debugging event that has triggered the call, for example, an error, warning, performance warning, and so on. objType This is the type object that is manipulated by the triggering call. srcObject This is the handle of the object that's being created or manipulated by the triggered call. location This refers to the place of the code describing the event. msgCode This refers to the message code. layerPrefix This is the layer responsible for triggering the debug event. msg This field contains the debug message text. userData Any application-specific user data is specified to the callback using this field.  The debugFunction callback has a Boolean return value. The true return value indicates the continuation of the command chain to subsequent validation layers even after an error is occurred. However, the false value indicates the validation layer to abort the execution when an error occurs. It is advisable to stop the execution at the very first error. Having an error itself indicates that something has occurred unexpectedly; letting the system run in these circumstances may lead to undefined results or further errors, which could be completely senseless sometimes. In the latter case, where the execution is aborted, it provides a better chance for the developer to concentrate and fix the reported error. In contrast, it may be cumbersome in the former approach, where the system throws a bunch of errors, leaving the developers in a confused state sometimes. In order to enable debugging at vkCreateInstance, provide dbgReportCreateInfo to the VkInstanceCreateInfo’spNext field: VkInstanceCreateInfo instInfo = {}; . . . instInfo.pNext = &layerExtension.dbgReportCreateInfo; vkCreateInstance(&instInfo, NULL, &instance); Finally, once the debug is no longer in use, destroy the debug callback object: void VulkanLayerAndExtension::destroyDebugReportCallback(){ VulkanApplication* appObj = VulkanApplication::GetInstance(); dbgDestroyDebugReportCallback(instance,debugReportCallback,NULL); } The following is the output from the implemented debug report. Your output may differ from this based on the GPU vendor and SDK provider. Also, the explanation of the errors or warnings reported are very specific to the SDK itself. But at a higher level, the specification will hold; this means you can expect to see a debug report with a warning, information, debugging help, and so on, based on the debugging flag you have turned on. Summary This article was short, precise, and full of practical implementations. Working on Vulkan without debugging capabilities is like shooting in the dark. We know very well that Vulkan demands an appreciable amount of programming and developers make mistakes for obvious reasons; they are humans after all. We learn from our mistakes, and debugging allows us to find and correct these errors. It also provides insightful information to build quality products. Let's do a quick recap. We learned the Vulkan debugging process. We looked at the various LunarG validation layers and understood the roles and responsibilities offered by each one of them. Next, we added a few selected validation layers that we were interested to debug. We also added the debug extension that exposes the debugging capabilities; without this, the API's definition could not be dynamically linked to the application. Then, we implemented the Vulkan create debug report callback and linked it to our debug reporting callback; this callback decorates the captured debug report in a user-friendly and presentable fashion. Finally, we implemented the API to destroy the debugging report callback object. Resources for Article: Further resources on this subject: Get your Apps Ready for Android N [article] Multithreading with Qt [article] Manage Security in Excel [article]
Read more
  • 0
  • 0
  • 35623
article-image-third-dimension
Packt
10 Aug 2016
13 min read
Save for later

The Third Dimension

Packt
10 Aug 2016
13 min read
In this article by Sebastián Di Giuseppe, author of the book, Building a 3D game with LibGDX, describes about how to work in 3 dimensions! For which we require new camera techniques. The third dimension adds a new axis, instead of having just the x and y grid, a slightly different workflow, and lastly new render methods are required to draw our game. We'll learn the very basics of this workflow in this article for you to have a sense of what's coming, like moving, scaling, materials, environment, and some others and we are going to move systematically between them one step at a time. (For more resources related to this topic, see here.) The following topics will be covered in this article: Camera techniques Workflow LibGDX's 3D rendering API Math Camera techniques The goal of this article is to successfully learn about working with 3D as stated. In order to achieve this we will start at the basics, making a simple first person camera. We will facilitate the functions and math that LibGDX contains. Since you probably have used LibGDX more than once, you should be familiar with the concepts of the camera in 2D. The way 3D works is more or less the same, except there is a z axis now for the depth . However instead of an OrthographicCamera class, a PerspectiveCamera class is used to set up the 3D environment. Creating a 3D camera is just as easy as creating a 2D camera. The constructor of a PerspectiveCamera class requires three arguments, the field of vision, camera width and camera height. The camera width and height are known from 2D cameras, the field of vision is new. Initialization of a PerspectiveCamera class looks like this: float FoV = 67; PerspectiveCamera camera = new PerspectiveCamera(FoV, Gdx.graphics.getWidth(), Gdx.graphics.getHeight()); The first argument, field of vision, describes the angle the first person camera can see. The image above gives a good idea what the field of view is. For first person shooters values up to 100 are used. Higher than 100 confuses the player, and with a lower field of vision the player is bound to see less. Displaying a texture. We will start by doing something exciting, drawing a cube on the screen! Drawing a cube First things first! Let's create a camera. Earlier, we showed the difference between the 2D camera and the 3D camera, so let's put this to use. Start by creating a new class on your main package (ours is com.deeep.spaceglad) and name it as you like. The following imports are used on our test: import com.badlogic.gdx.ApplicationAdapter; import com.badlogic.gdx.Gdx; import com.badlogic.gdx.graphics.Color; import com.badlogic.gdx.graphics.GL20; import com.badlogic.gdx.graphics.PerspectiveCamera; import com.badlogic.gdx.graphics.VertexAttributes; import com.badlogic.gdx.graphics.g3d.*; import com.badlogic.gdx.graphics.g3d.attributes.ColorAttribute; import com.badlogic.gdx.graphics.g3d.environment.DirectionalLight; import com.badlogic.gdx.graphics.g3d.utils.ModelBuilder; Create a class member called cam of type PerspectiveCamera; public PerspectiveCamera cam; Now this camera needs to be initialized and needs to be configured. This will be done in the create method as shown below. public void create() { cam = new PerspectiveCamera(67, Gdx.graphics.getWidth(), Gdx.graphics.getHeight()); cam.position.set(10f, 10f, 10f); cam.lookAt(0,0,0); cam.near = 1f; cam.far = 300f; cam.update(); } In the above code snippet we are setting the position of the camera, and looking towards a point set at 0, 0, 0 . Next up, is getting a cube ready to draw. In 2D it was possible to draw textures, but textures are flat. In 3D, models are used. Later on we will import those models. But we will start with generated models. LibGDX offers a convenient class to build simple models such as: spheres, cubes, cylinders, and many more to choose from. Let's add two more class members, a Model and a ModelInstance. The Model class contains all the information on what to draw, and the resources that go along with it. The ModelInstance class has information on the whereabouts of the model such as the location rotation and scale of the model. public Model model; public ModelInstance instance; Add those class members. We use the overridden create function to initialize our new class members. public void create() { … ModelBuilder modelBuilder = new ModelBuilder();Material mat = new Material(ColorAttribute.createDiffuse(Color.BLUE));model = modelBuilder.createBox(5, 5, 5, mat, VertexAttributes.Usage.Position | VertexAttributes.Usage.Normal);instance = new ModelInstance(model); } We use a ModelBuilder class to create a box. The box will need a material, a color. A material is an object that holds different attributes. You could add as many as you would like. The attributes passed on to the material changes the way models are perceived and shown on the screen. We could, for example, add FloatAttribute.createShininess(8f) after the ColorAttribute class, that will make the box to shine with lights around. There are more complex configurations possible but we will leave that out of the scope for now. With the ModelBuilder class, we create a box of (5, 5, 5). Then we pass the material in the constructor, and the fifth argument are attributes for the specific box we are creating. We use a bitwise operator to combine a position attribute and a normal attribute. We tell the model that it has a position, because every cube needs a position, and the normal is to make sure the lighting works and the cube is drawn as we want it to be drawn. These attributes are passed down to openGL on which LibGDX is build. Now we are almost ready for drawing our first cube. Two things are missing, first of all: A batch to draw to. When designing 2D games in LibGDX a SpriteBatch class is used. However since we are not using sprites anymore, but rather models, we will use a ModelBatch class. Which is the equivalent for models. And lastly, we will have to create an environment and add lights to it. For that we will need two more class members: public ModelBatchmodelBatch; public Environment environment; And they are to be initialized, just like the other class members: public void create() { .... modelBatch = new ModelBatch(); environment = new Environment(); environment.set(new ColorAttribute(ColorAttribute.AmbientLight, 0.4f, 0.4f, 0.4f, 1f)); environment.add(new DirectionalLight().set(0.8f, 0.8f, 0.8f, - 1f, -0.8f, -0.2f)); } Here we add two lights, an ambient light, which lights up everything that is being drawn (a general light source for all the environment), and a directional light, which has a direction (most similar to a "sun" type of source). In general, for lights, you can experiment directions, colors, and different types. Another type of light would be PointLight and it can be compared to a flashlight. Both lights start with 3 arguments, for the color, which won't make a difference yet as we don't have any textures. The directional lights constructor is followed by a direction. This direction can be seen as a vector. Now we are all set to draw our environment and the model in it @Override public void render() { Gdx.gl.glViewport(0, 0, Gdx.graphics.getWidth(), Gdx.graphics.getHeight()); Gdx.gl.glClear(GL20.GL_COLOR_BUFFER_BIT | GL20.GL_DEPTH_BUFFER_BIT); modelBatch.begin(cam); modelBatch.render(instance, environment); modelBatch.end(); } It directly renders our cube. The ModelBatch catch behaves just like a SpriteBatch, as can be seen if we run it, it has to be started (begin), then ask for it to render and give them the parameters (models and environment in our case), and then make it stop. We should not forget to release any resources that our game allocated. The model we created allocates memory that should be disposed of. @Override public void dispose() { model.dispose(); } Now we can look at our beautiful cube! It's only very static and empty. We will add some movement to it in our next subsection! Translation Translating rotating and scaling are a bit different to that of a 2D game. It's slightly more mathematical. The easier part are vectors, instead of a vector2D, we can now use a vector3D, which is essentially the same, just that, it adds another dimension. Let's look at some basic operations of 3D models. We will use the cube that we previously created. With translation we are able to move the model along all three the axis. Let's create a function that moves our cube along the x axis. We add a member variable to our class to store the position in for now. A Vector3 class. Vector3 position = new Vector3(); private void movement() { instance.transform.getTranslation(position); position.x += Gdx.graphics.getDeltaTime(); instance.transform.setTranslation(position); } The above code snippet retrieves the translation, adds the delta time to the x attribute of the translation. Then we set the translation of the ModelInstance. The 3D library returns the translation a little bit different than normally. We pass a vector, and that vector gets adjusted to the current state of the object. We have to call this function every time the game updates. So therefore we put it in our render loop before we start drawing. @Override public void render() { movement(); ... } It might seem like the cube is moving diagonally, but that's because of the angle of our camera. In fact it's' moving towards one face of the cube. That was easy! It's only slightly annoying that it moves out of bounds after a short while. Therefor we will change the movement function to contain some user input handling. private void movement() { instance.transform.getTranslation(position); if(Gdx.input.isKeyPressed(Input.Keys.W)){ position.x+=Gdx.graphics.getDeltaTime(); } if(Gdx.input.isKeyPressed(Input.Keys.D)){ position.z+=Gdx.graphics.getDeltaTime(); } if(Gdx.input.isKeyPressed(Input.Keys.A)){ position.z-=Gdx.graphics.getDeltaTime(); } if(Gdx.input.isKeyPressed(Input.Keys.S)){ position.x-=Gdx.graphics.getDeltaTime(); } instance.transform.setTranslation(position); } The rewritten movement function retrieves our position, updates it based on the keys that are pressed, and sets the translation of our model instance. Rotation Rotation is slightly different from 2D. Since there are multiple axes on which we can rotate, namely the x, y, and z axis. We will now create a function to showcase the rotation of the model. First off let us create a function in which  we can rotate an object on all axis private void rotate() { if (Gdx.input.isKeyPressed(Input.Keys.NUM_1)) instance.transform.rotate(Vector3.X, Gdx.graphics.getDeltaTime() * 100); if (Gdx.input.isKeyPressed(Input.Keys.NUM_2)) instance.transform.rotate(Vector3.Y, Gdx.graphics.getDeltaTime() * 100); if (Gdx.input.isKeyPressed(Input.Keys.NUM_3)) instance.transform.rotate(Vector3.Z, Gdx.graphics.getDeltaTime() * 100); } And let's not forget to call this function from the render loop, after we call the movement function @Override public void render() { ... rotate(); } If we press the number keys 1, 2 or 3, we can rotate our model. The first argument of the rotate function is the axis to rotate on. The second argument is the amount to rotate. These functions are to add a rotation. We can also set the value of an axis, instead of add a rotation, with the following function: instance.transform.setToRotation(Vector3.Z, Gdx.graphics.getDeltaTime() * 100); However say, we want to set all three axis rotations at the same time, we can't simply call setToRotation function three times in a row for each axis, as they eliminate any other rotation done before that. Luckily LibGDX has us covered with a function that is able to take all three axis. float rotation; private void rotate() { rotation = (rotation + Gdx.graphics.getDeltaTime() * 100) % 360; instance.transform.setFromEulerAngles(0, 0, rotation); } The above function will continuously rotate our cube. We face one last problem. We can't seem to move the cube! The setFromEulerAngles function clears all the translation and rotation properties. Lucky for us the setFromEulerAngles returns a Matrix4 type, so we can chain and call another function from it. A function which translates the matrix for example. For that we use the trn(x,y,z) function. Short for translate. Now we can update our rotation function, although it also translates. instance.transform.setFromEulerAngles(0, 0, rotation).trn(position.x, position.y, position.z); Now we can set our cube to a rotation, and translate it! These are the most basic operations which we will use a lot throughout the book. As you can see this function does both the rotation and the translation. So we can remove the last line in our movement function instance.transform.setTranslation(position); Our latest rotate function looks like the following: private void rotate() { rotation = (rotation + Gdx.graphics.getDeltaTime() * 100) % 360; instance.transform.setFromEulerAngles(0, 0, rotation).trn(position.x, position.y, position.z); } The setFromEulerAngles function will be extracted to a function of its own, as it serves multiple purposes now and is not solely bound to our rotate function. private void updateTransformation(){ instance.transform.setFromEulerAngles(0, 0, rotation).trn(position.x, position.y, position.z).scale(scale,scale,scale); } This function should be called after we've calculated our rotation and translation public void render() { rotate(); movement(); updateTransformation(); ... } Scaling We've almost had all of the transformations we can apply to models. The last one being described in this book is the scaling of a model. LibGDX luckily contains all the required functions and methods for this. Let's extend our previous example and make our box growing and shrinking over time. We first create a function that increments and subtracts from a scale variable. boolean increment;float scale = 1; void scale(){ if(increment) { scale = (scale + Gdx.graphics.getDeltaTime()/5); if (scale >= 1.5f) { increment = false; } else { scale = (scale - Gdx.graphics.getDeltaTime()/5); if(scale <= 0.5f) increment = true; } } Now to apply this scaling we can adjust our updateTransformation function to include the scaling. private void updateTransformation(){ instance.transform.setFromEulerAngles(0, 0, rotation).trn(position.x, position.y, position.z).scale(scale,scale,scale); } Our render method should now include the scaling function as well public void render() { rotate(); movement(); scale(); updateTransformation(); ... } And there you go, we can now successfully move, rotate and scale our cube! Summary In this article we learned about the workflow of LibGDX 3D API. We are now able to apply multiple kinds of transformations to a model, and understand the differences between 2D and 3D. We also learned how to apply materials to models, which will change the appearance of the model and lets us create cool effects. Note that there's plenty more information that you can learn about 3D and a lot of practice to go with it to fully understand it. There's also subjects not covered here, like how to create your own materials, and how to make and use of shaders. There's plenty room for learning and experimenting. In the next article we will start on applying the theory that's learned in this article, and start working towards an actual game! We will also go more in depth on the environment and lights, as well as collision detection. So plenty to look forward to. Resources for Article: Further resources on this subject: 3D Websites [Article] Your 3D World [Article] Using 3D Objects [Article]
Read more
  • 0
  • 0
  • 35025

article-image-vertex-functions
Packt
01 Feb 2016
18 min read
Save for later

The Vertex Functions

Packt
01 Feb 2016
18 min read
In this article by Alan Zucconi, author of the book Unity 5.x Shaders and Effects Cookbook, we will see that the term shader originates from the fact that Cg has been mainly used to simulate realistic lighting conditions (shadows) on three-dimensional models. Despite this, shaders are now much more than that. They not only define the way objects are going to look, but also redefine their shapes entirely. If you want to learn how to manipulate the geometry of a three-dimensional object only via shaders, this article is for you. In this article, you will learn the following: Extruding your models Implementing a snow shader Implementing a volumetric explosion (For more resources related to this topic, see here.) In this article, we will explain that 3D models are not just a collection of triangles. Each vertex can contain data, which is essential for correctly rendering the model itself. This article will explore how to access this information in order to use it in a shader. We will also explore how the geometry of an object can be deformed simply using Cg code. Extruding your models One of the biggest problems in games is repetition. Creating new content is a time-consuming task and when you have to face a thousand enemies, the chances are that they will all look the same. A relatively cheap technique to add variations to your models is using a shader that alters its basic geometry. This recipe will show a technique called normal extrusion, which can be used to create a chubbier or skinnier version of a model, as shown in the following image with the soldier from the Unity camp (Demo Gameplay): Getting ready For this recipe, we need to have access to the shader used by the model that you want to alter. Once you have it, we will duplicate it so that we can edit it safely. It can be done as follows: Find the shader that your model is using and, once selected, duplicate it by pressing Ctrl+D. Duplicate the original material of the model and assign the cloned shader to it. Assign the new material to your model and start editing it. For this effect to work, your model should have normals. How to do it… To create this effect, start by modifying the duplicated shader as shown in the following: Let's start by adding a property to our shader, which will be used to modulate its extrusion. The range that is presented here goes from -1 to +1;however, you might have to adjust that according to your own needs, as follows: _Amount ("Extrusion Amount", Range(-1,+1)) = 0 Couple the property with its respective variable, as shown in the following: float _Amount; Change the pragma directive so that it now uses a vertex modifier. You can do this by adding vertex:function_name at the end of it. In our case, we have called the vertfunction, as follows: #pragma surface surf Lambert vertex:vert Add the following vertex modifier: void vert (inout appdata_full v) { v.vertex.xyz += v.normal * _Amount; } The shader is now ready; you can use the Extrusion Amount slider in the Inspectormaterial to make your model skinnier or chubbier. How it works… Surface shaders works in two steps: the surface function and the vertex modifier. It takes the data structure of a vertex (which is usually called appdata_full) and applies a transformation to it. This gives us the freedom to virtually do everything with the geometry of our model. We signal the graphics processing unit(GPU) that such a function exists by adding vertex:vert to the pragma directive of the surface shader. One of the most simple yet effective techniques that can be used to alter the geometry of a model is called normal extrusion. It works by projecting a vertex along its normal direction. This is done by the following line of code: v.vertex.xyz += v.normal * _Amount; The position of a vertex is displaced by the_Amount units toward the vertex normal. If _Amount gets too high, the results can be quite unpleasant. However, you can add lot of variations to your modelswith smaller values. There's more… If you have multiple enemies and you want each one to have theirown weight, you have to create a different material for each one of them. This is necessary as thematerials are normally shared between models and changing one will change all of them. There are several ways in which you can do this; the quickest one is to create a script that automatically does it for you. The following script, once attached to an object with Renderer, will duplicate its first material and set the _Amount property automatically, as follows: using UnityEngine; publicclassNormalExtruder : MonoBehaviour { [Range(-0.0001f, 0.0001f)] publicfloat amount = 0; // Use this for initialization void Start () { Material material = GetComponent<Renderer>().sharedMaterial; Material newMaterial = new Material(material); newMaterial.SetFloat("_Amount", amount); GetComponent<Renderer>().material = newMaterial; } } Adding extrusion maps This technique can actually be improved even further. We can add an extra texture (or using the alpha channel of the main one) to indicate the amount of the extrusion. This allows a better control over which parts are raised or lowered. The following code shows how it is possible to achieve such an effect: sampler2D _ExtrusionTex; void vert(inout appdata_full v) { float4 tex = tex2Dlod (_ExtrusionTex, float4(v.texcoord.xy,0,0)); float extrusion = tex.r * 2 - 1; v.vertex.xyz += v.normal * _Amount * extrusion; } The red channel of _ExtrusionTex is used as a multiplying coefficient for normal extrusion. A value of 0.5 leaves the model unaffected; darker or lighter shades are used to extrude vertices inward or outward, respectively. You should notice that to sample a texture in a vertex modifier, tex2Dlod should be used instead of tex2D. In shaders, colour channels go from 0 to 1.Although, sometimes, you need to represent negative values as well (such as inward extrusion). When this is the case, treat 0.5 as zero; having smaller values as negative and higher values as positive. This is exactly what happens with normals, which are usually encoded in RGB textures. The UnpackNormal()function is used to map a value in the (0,1) range on the (-1,+1)range. Mathematically speaking, this is equivalent to tex.r * 2 -1. Extrusion maps are perfect to zombify characters by shrinking the skin in order to highlight the shape of the bones underneath. The following image shows how a "healthy" soldier can be transformed into a corpse using a shader and an extrusion map. Compared to the previous example, you can notice how the clothing is unaffected. The shader used in the following image also darkens the extruded regions in order to give an even more emaciated look to the soldier:   Implementing a snow shader The simulation of snow has always been a challenge in games. The vast majority of games simply baked snow directly in the models textures so that their tops look white. However, what if one of these objects starts rotating? Snow is not just a lick of paint on a surface; it is a proper accumulation of material and it should be treated as so. This recipe will show how to give a snowy look to your models using just a shader. This effect is achieved in two steps. First, a white colour is used for all the triangles facing the sky. Second, their vertices are extruded to simulate the effect of snow accumulation. You can see the result in the following image:   Keep in mind that this recipe does not aim to create photorealistic snow effect. It provides a good starting point;however, it is up to an artist to create the right textures and find the right parameters to make it fit your game. Getting ready This effect is purely based on shaders. We will need to do the following: Create a new shader for the snow effect. Create a new material for the shader. Assign the newly created material to the object that you want to be snowy. How to do it… To create a snowy effect, open your shader and make the following changes: Replace the properties of the shader with the following ones: _MainColor("Main Color", Color) = (1.0,1.0,1.0,1.0) _MainTex("Base (RGB)", 2D) = "white" {} _Bump("Bump", 2D) = "bump" {} _Snow("Level of snow", Range(1, -1)) = 1 _SnowColor("Color of snow", Color) = (1.0,1.0,1.0,1.0) _SnowDirection("Direction of snow", Vector) = (0,1,0) _SnowDepth("Depth of snow", Range(0,1)) = 0 Complete them with their relative variables, as follows: sampler2D _MainTex; sampler2D _Bump; float _Snow; float4 _SnowColor; float4 _MainColor; float4 _SnowDirection; float _SnowDepth; Replace the Input structure with the following: struct Input { float2 uv_MainTex; float2 uv_Bump; float3 worldNormal; INTERNAL_DATA }; Replace the surface function with the following one. It will color the snowy parts of the model white: void surf(Input IN, inout SurfaceOutputStandard o) { half4 c = tex2D(_MainTex, IN.uv_MainTex); o.Normal = UnpackNormal(tex2D(_Bump, IN.uv_Bump)); if (dot(WorldNormalVector(IN, o.Normal), _SnowDirection.xyz) >= _Snow) o.Albedo = _SnowColor.rgb; else o.Albedo = c.rgb * _MainColor; o.Alpha = 1; } Configure the pragma directive so that it uses a vertex modifiers, as follows: #pragma surface surf Standard vertex:vert Add the following vertex modifiers that extrudes the vertices covered in snow, as follows: void vert(inout appdata_full v) { float4 sn = mul(UNITY_MATRIX_IT_MV, _SnowDirection); if (dot(v.normal, sn.xyz) >= _Snow) v.vertex.xyz += (sn.xyz + v.normal) * _SnowDepth * _Snow; } You can now use the Inspectormaterial to select how much of your mode is going to be covered and how thick the snow should be. How it works… This shader works in two steps. Coloring the surface The first one alters the color of the triangles thatare facing the sky. It affects all the triangles with a normal direction similar to _SnowDirection. Comparing unit vectors can be done using the dot product. When two vectors are orthogonal, their dot product is zero; it is one (or minus one) when they are parallel to each other. The _Snowproperty is used to decide how aligned they should be in order to be considered facing the sky. If you look closely at the surface function, you can see that we are not directly dotting the normal and the snow direction. This is because they are usually defined in a different space. The snow direction is expressed in world coordinates, while the object normals are usually relative to the model itself. If we rotate the model, its normals will not change, which is not what we want. To fix this, we need to convert the normals from their object coordinates to world coordinates. This is done with the WorldNormalVector()function, as follows: if (dot(WorldNormalVector(IN, o.Normal), _SnowDirection.xyz) >= _Snow) o.Albedo = _SnowColor.rgb; else o.Albedo = c.rgb * _MainColor; This shader simply colors the model white; a more advanced one should initialize the SurfaceOutputStandard structure with textures and parameters from a realistic snow material. Altering the geometry The second effect of this shader alters the geometry to simulate the accumulation of snow. Firstly, we identify the triangles that have been coloured white by testing the same condition used in the surface function. This time, unfortunately, we cannot rely on WorldNormalVector()asthe SurfaceOutputStandard structure is not yet initialized in the vertex modifier. We will use this other method instead, which converts _SnowDirection in objectcoordinates, as follows: float4 sn = mul(UNITY_MATRIX_IT_MV, _SnowDirection); Then, we can extrude the geometry to simulate the accumulation of snow, as shown in the following: if (dot(v.normal, sn.xyz) >= _Snow) v.vertex.xyz += (sn.xyz + v.normal) * _SnowDepth * _Snow; Once again, this is a very basic effect. One could use a texture map to control the accumulation of snow more precisely or to give it a peculiar, uneven look. See also If you need high quality snow effects and props for your game, you can also check the following resources in the Asset Storeof Unity: Winter Suite ($30): A much more sophisticated version of the snow shader presented in this recipe can be found at: https://www.assetstore.unity3d.com/en/#!/content/13927 Winter Pack ($60): A very realistic set of props and materials for snowy environments are found at: https://www.assetstore.unity3d.com/en/#!/content/13316 Implementing a volumetric explosion The art of game development is a clever trade-off between realism and efficiency. This is particularly true for explosions; they are at the heart of many games, yet the physics behind them is often beyond the computational power of modern machines. Explosions are essentially nothing more than hot balls of gas; hence, the only way to correctly simulate them is by integrating a fluid simulation in your game. As you can imagine, this is infeasible for runtime applications and many games simply simulate them with particles. When an object explodes, it is common to simply instantiate many fire, smoke, and debris particles that can have believableresulttogether. This approach, unfortunately, is not very realistic and is easy to spot. There is an intermediate technique that can be used to achieve a much more realistic effect: the volumetric explosions. The idea behind this concept is that the explosions are not treated like a bunch of particlesanymore; they are evolving three-dimensional objects and not just flat two-dimensionaltextures. Getting ready Start this recipe with the following steps: Create a new shader for this effect. Create a new material to host the shader. Attach the material to a sphere. You can create one directly from the editor bynavigating to GameObject | 3D Object | Sphere. This recipe works well with the standard Unity Sphere;however, if you need big explosions, you might need to use a more high-poly sphere. In fact, a vertex function can only modify the vertices of a mesh. All the other points will be interpolated using the positions of the nearby vertices. Fewer vertices mean lower resolution for your explosions. For this recipe, you will also need a ramp texture that has, in a gradient, all the colors that your explosions will have. You can create the following texture using GIMP or Photoshop. The following is the one used for this recipe: Once you have the picture, import it to Unity. Then, from its Inspector, make sure the Filter Mode is set to Bilinear and the Wrap Mode to Clamp. These two settings make sure that the ramp texture is sampled smoothly. Lastly, you will need a noisy texture. You can find many of them on the Internet as freely available noise textures. The most commonly used ones are generated using Perlin noise. How to do it… This effect works in two steps: a vertex function to change the geometry and a surface function to give it the right color. The steps are as follows: Add the following properties for the shader: _RampTex("Color Ramp", 2D) = "white" {} _RampOffset("Ramp offset", Range(-0.5,0.5))= 0 _NoiseTex("Noise tex", 2D) = "gray" {} _Period("Period", Range(0,1)) = 0.5 _Amount("_Amount", Range(0, 1.0)) = 0.1 _ClipRange("ClipRange", Range(0,1)) = 1 Add their relative variables so that the Cg code of the shader can actually access them, as follows: _RampTex("Color Ramp", 2D) = "white" {} _RampOffset("Ramp offset", Range(-0.5,0.5))= 0 _NoiseTex("Noise tex", 2D) = "gray" {} _Period("Period", Range(0,1)) = 0.5 _Amount("_Amount", Range(0, 1.0)) = 0.1 _ClipRange("ClipRange", Range(0,1)) = 1 Change the Input structure so that it receives the UV data of the ramp texture, as shown in the following: struct Input { float2 uv_NoiseTex; }; Add the following vertex function: void vert(inout appdata_full v) { float3 disp = tex2Dlod(_NoiseTex, float4(v.texcoord.xy,0,0)); float time = sin(_Time[3] *_Period + disp.r*10); v.vertex.xyz += v.normal * disp.r * _Amount * time; } Add the following surface function: void surf(Input IN, inout SurfaceOutput o) { float3 noise = tex2D(_NoiseTex, IN.uv_NoiseTex); float n = saturate(noise.r + _RampOffset); clip(_ClipRange - n); half4 c = tex2D(_RampTex, float2(n,0.5)); o.Albedo = c.rgb; o.Emission = c.rgb*c.a; } We will specify the vertex function in the pragma directive, adding the nolightmapparameter to prevent Unity from adding realistic lightings to our explosion, as follows: #pragma surface surf Lambert vertex:vert nolightmap The last step is to select the material and attaching the two textures in the relative slotsfrom its inspector. This is an animated material, meaning that it evolves over time. You can watch the material changing in the editor by clicking on Animated Materials from the Scene window: How it works If you are reading this recipe, you are already familiar with how surface shaders and vertex modifiers work. The main idea behind this effect is to alter the geometry of the sphere in a seemingly chaotic way, exactly like it happens in a real explosion. The following image shows how such explosion will look in the editor. You can see that the original mesh has been heavily deformed in the following image: The vertex function is a variant of the technique called normal extrusion. The difference here is that the amount of the extrusion is determined by both the time and the noise texture. When you need a random number in Unity, you can rely on the Random.Range()function. There is no standard way to get random numbers within a shader, therefore,the easiest way is to sample a noise texture. There is no standard way to do this, therefore, take the following only as an example: float time = sin(_Time[3] *_Period + disp.r*10); The built-in _Time[3]variable is used to get the current time from the shader and the red channel of the disp.rnoise texture is used to make sure that each vertex moves independently. The sin()function makes the vertices go up and down, simulating the chaotic behavior of an explosion. Then, the normal extrusion takes place as shown in the following: v.vertex.xyz += v.normal * disp.r * _Amount * time; You should play with these numbers and variables until you find a pattern of movement that you are happy with. The last part of the effect is achieved by the surface function. Here, the noise texture is used to sample a random color from the ramp texture. However, there are two more aspects that are worth noticing. The first one is the introduction of _RampOffset. Its usage forces the explosion to sample colors from the left or right side of the texture. With positive values, the surface of the explosion tends to show more grey tones— which is exactly what happens when it is dissolving. You can use _RampOffset to determine how much fire or smoke should be there in your explosion. The second aspect introduced in the surface function is the use of clip(). Theclip()function clips (removes) pixels from the rendering pipeline. When invoked with a negative value, the current pixel is not drawn. This effect is controlled by _ClipRange, which determines the pixels of the volumetric explosions that are going to be transparent. By controlling both _RampOffset and _ClipRange, you have full control to determine how the explosion behaves and dissolves. There's more… The shader presented in this recipe makes a sphere look like an explosion. If you really want to use it, you should couple it with some scripts in order to get the most out of it. The best thing to do is to create an explosion object and turn it to a prefab so that you can reuse it every time you need. You can do this by dragging the sphere back in the Project window. Once it is done, you can create as many explosions as you want using the Instantiate() function. However,it is worth noticing that all the objects with the same material share the same look. If you have multiple explosions at the same time, they should not use the same material. When you are instantiating a new explosion, you should also duplicate its material. You can do this easily with the following piece of code: GameObject explosion = Instantiate(explosionPrefab) as GameObject; Renderer renderer = explosion.GetComponent<Renderer>(); Material material = new Material(renderer.sharedMaterial); renderer.material = material; Lastly, if you are going to use this shader in a realistic way, you should attach a script to it, which changes its size—_RampOffsetor_ClipRange—accordingly to the type of explosion you want to recreate. See also A lot more can be done to make explosions realistic. The approach presented in this recipe only creates an empty shell; the explosion in it is actually empty. An easy trick to improve it is to create particles in it. However, you can only go so far with this. The short movie,The Butterfly Effect (http://unity3d.com/pages/butterfly), created by Unity Technologies in collaboration with Passion Pictures and Nvidia, is the perfect example. It is based on the same concept of altering the geometry of a sphere;however, it renders it with a technique called volume ray casting. In a nutshell, it renders the geometry as if it's complete. You can see the following image as an example:   If you are looking for high quality explosions, refer toPyro Technix (https://www.assetstore.unity3d.com/en/#!/content/16925) on the Asset Store. It includes volumetric explosions and couples them with realistic shockwaves. Summary In this article, we saw the recipes to extrude models and implement a snow shader and volumetric explosion. Resources for Article: Further resources on this subject: Lights and Effects [article] Looking Back, Looking Forward [article] Animation features in Unity 5 [article]
Read more
  • 0
  • 0
  • 34864

article-image-overview-unreal-engine-4
Packt
18 Sep 2015
2 min read
Save for later

Overview of Unreal Engine 4

Packt
18 Sep 2015
2 min read
In this article by Katax Emperor and Devin Sherry, author of the book Unreal Engine Physics Essentials, we will discuss and evaluate the basic 3D physics and mathematics concepts in an effort to gain a basic understanding of Unreal Engine 4 physics and real-world physics. To start with, we will discuss the units of measurement, what they are, and how they are used in Unreal Engine 4. In addition, we will cover the following topics: The scientific notation 2D and 3D coordinate systems Scalars and vectors Newton's laws or Newtonian physics concepts Forces and energy For the purpose of this chapter, we will want to open Unreal Engine 4 and create a simple project using the First Person template by following these steps. (For more resources related to this topic, see here.) Launching Unreal Engine 4 When we first open Unreal Engine 4, we will see the Unreal Engine Launcher, which contains a News tab, a Learn tab, a Marketplace tab, and a Library tab. As the first title suggests, the News tab provides you with the latest news from Epic Games, ranging from Marketplace Content releases to Unreal Dev Grant winners, Twitch Stream Recaps, and so on. The Learn tab provides you with numerous resources to learn more about Unreal Engine 4, such as Written Documentation, Video Tutorials, Community Wikis, Sample Game Projects, and Community Contributions. The Marketplace tab allows you to purchase content, such as FX, Weapons Packs, Blueprint Scripts, Environmental Assets, and so on, from the community and Epic Games. Lastly, the Library tab is where you can download the newest versions of Unreal Engine 4, open previously created projects, and manage your project files. Let's start by first launching the Unreal Engine Launcher and choosing Launch from the Library tab, as seen in the following image: For the sake of consistency, we will use the latest version of the editor. At the time of writing this book, the version is 4.7.6. Next, we will select the New Project tab that appears at the top of the window, select the First Person project template with Starter Content, and name the project Unreal_PhyProject: Summary In this article we had an an overview of Unreal Engine 4 and how to launch Unreal Engine 4. Resources for Article: Further resources on this subject: Exploring and Interacting with Materials using Blueprints [article] Unreal Development Toolkit: Level Design HQ [article] Configuration and Handy Tweaks for UDK [article]
Read more
  • 0
  • 0
  • 34533
article-image-bitbucket-to-no-longer-support-mercurial-users-must-migrate-to-git-by-may-2020
Fatema Patrawala
21 Aug 2019
6 min read
Save for later

Bitbucket to no longer support Mercurial, users must migrate to Git by May 2020

Fatema Patrawala
21 Aug 2019
6 min read
Yesterday marked an end of an era for Mercurial users, as Bitbucket announced to no longer support Mercurial repositories after May 2020. Bitbucket, owned by Atlassian, is a web-based version control repository hosting service, for source code and development projects. It has used Mercurial since the beginning in 2008 and then Git since October 2011. Now almost after ten years of sharing its journey with Mercurial, the Bitbucket team has decided to remove the Mercurial support from the Bitbucket Cloud and its API. The official announcement reads, “Mercurial features and repositories will be officially removed from Bitbucket and its API on June 1, 2020.” The Bitbucket team also communicated the timeline for the sunsetting of the Mercurial functionality. After February 1, 2020 users will no longer be able to create new Mercurial repositories. And post June 1, 2020 users will not be able to use Mercurial features in Bitbucket or via its API and all Mercurial repositories will be removed. Additionally all current Mercurial functionality in Bitbucket will be available through May 31, 2020. The team said the decision was not an easy one for them and Mercurial held a special place in their heart. But according to a Stack Overflow Developer Survey, almost 90% of developers use Git, while Mercurial is the least popular version control system with only about 3% developer adoption. Apart from this Mercurial usage on Bitbucket saw a steady decline, and the percentage of new Bitbucket users choosing Mercurial fell to less than 1%. Hence they decided on removing the Mercurial repos. How can users migrate and export their Mercurial repos Bitbucket team recommends users to migrate their existing Mercurial repos to Git. They have also extended support for migration, and kept the available options open for discussion in their dedicated Community thread. Users can discuss about conversion tools, migration, tips, and also offer troubleshooting help. If users prefer to continue using the Mercurial system, there are a number of free and paid Mercurial hosting services for them. The Bitbucket team has also created a Git tutorial that covers everything from the basics of creating pull requests to rebasing and Git hooks. Community shows anger and sadness over decision to discontinue Mercurial support There is an outrage among the Mercurial users as they are extremely unhappy and sad with this decision by Bitbucket. They have expressed anger not only on one platform but on multiple forums and community discussions. Users feel that Bitbucket’s decision to stop offering Mercurial support is bad, but the decision to also delete the repos is evil. On Hacker News, users speculated that this decision was influenced by potential to market rather than based on technically superior architecture and ease of use. They feel GitHub has successfully marketed Git and that's how both have become synonymous to the developer community. One of them comments, “It's very sad to see bitbucket dropping mercurial support. Now only Facebook and volunteers are keeping mercurial alive. Sometimes technically better architecture and user interface lose to a non user friendly hard solutions due to inertia of mass adoption. So a lesson in Software development is similar to betamax and VHS, so marketing is still a winner over technically superior architecture and ease of use. GitHub successfully marketed git, so git and GitHub are synonymous for most developers. Now majority of open source projects are reliant on a single proprietary solution Github by Microsoft, for managing code and project. Can understand the difficulty of bitbucket, when Python language itself moved out of mercurial due to the same inertia. Hopefully gitlab can come out with mercurial support to migrate projects using it from bitbucket.” Another user comments that Mercurial support was the only reason for him to use Bitbucket when GitHub is miles ahead of Bitbucket. Now when it stops supporting Mercurial too, Bitbucket will end soon. The comment reads, “Mercurial support was the one reason for me to still use Bitbucket: there is no other Bitbucket feature I can think of that Github doesn't already have, while Github's community is miles ahead since everyone and their dog is already there. More importantly, Bitbucket leaves the migration to you (if I read the article correctly). Once I download my repo and convert it to git, why would I stay with the company that just made me go through an annoying (and often painful) process, when I can migrate to Github with the exact same command? And why isn't there a "migrate this repo to git" button right there? I want to believe that Bitbucket has smart people and that this choice is a good one. But I'm with you there - to me, this definitely looks like Bitbucket will die.” On Reddit, programming folks see this as a big change from Bitbucket as they are the major mercurial hosting provider. And they feel Bitbucket announced this at a pretty short notice and they require more time for migration. Apart from the developer community forums, on Atlassian community blog as well users have expressed displeasure. A team of scientists commented, “Let's get this straight : Bitbucket (offering hosting support for Mercurial projects) was acquired by Atlassian in September 2010. Nine years later Atlassian decides to drop Mercurial support and delete all Mercurial repositories. Atlassian, I hate you :-) The image you have for me is that of a harmful predator. We are a team of scientists working in a university. We don't have computer scientists, we managed to use a version control simple as Mercurial, and it was a hard work to make all scientists in our team to use a version control system (even as simple as Mercurial). We don't have the time nor the energy to switch to another version control system. But we will, forced and obliged. I really don't want to check out Github or something else to migrate our projects there, but we will, forced and obliged.” Atlassian Bitbucket, GitHub, and GitLab take collective steps against the Git ransomware attack Attackers wiped many GitHub, GitLab, and Bitbucket repos with ‘compromised’ valid credentials leaving behind a ransom note BitBucket goes down for over an hour
Read more
  • 0
  • 0
  • 34327

article-image-getting-started-multiplayer-game-programming
Packt
04 Jun 2015
33 min read
Save for later

Getting Started with Multiplayer Game Programming

Packt
04 Jun 2015
33 min read
In this article by Rodrigo Silveira author of the book Multiplayer gaming with HTML5 game development, if you're reading this, chances are pretty good that you are already a game developer. That being the case, then you already know just how exciting it is to program your own games, either professionally or as a highly gratifying hobby that is very time-consuming. Now you're ready to take your game programming skills to the next level—that is, you're ready to implement multiplayer functionality into your JavaScript-based games. (For more resources related to this topic, see here.) In case you have already set out to create multiplayer games for the Open Web Platform using HTML5 and JavaScript, then you may have already come to realize that a personal desktop computer, laptop, or a mobile device is not particularly the most appropriate device to share with another human player for games in which two or more players share the same game world at the same time. Therefore, what is needed in order to create exciting multiplayer games with JavaScript is some form of networking technology. We will be discussing the following principles and concepts: The basics of networking and network programming paradigms Socket programming with HTML5 Programming a game server and game clients Turn-based multiplayer games Understanding the basics of networking It is said that one cannot program games that make use of networking without first understanding all about the discipline of computer networking and network programming. Although having a deep understanding of any topic can be only beneficial to the person working on that topic, I don't believe that you must know everything there is to know about game networking in order to program some pretty fun and engaging multiplayer games. Saying that is the case is like saying that one needs to be a scholar of the Spanish language in order to cook a simple burrito. Thus, let us take a look at the most basic and fundamental concepts of networking. After you finish reading this article, you will know enough about computer networking to get started, and you will feel comfortable adding multiplayer aspects to your games. One thing to keep in mind is that, even though networked games are not nearly as old as single-player games, computer networking is actually a very old and well-studied subject. Some of the earliest computer network systems date back to the 1950s. Though some of the techniques have improved over the years, the basic idea remains the same: two or more computers are connected together to establish communication between the machines. By communication, I mean data exchange, such as sending messages back and forth between the machines, or one of the machines only sends the data and the other only receives it. With this brief introduction to the concept of networking, you are now grounded in the subject of networking, enough to know what is required to network your games—two or more computers that talk to each other as close to real time as possible. By now, it should be clear how this simple concept makes it possible for us to connect multiple players into the same game world. In essence, we need a way to share the global game data among all the players who are connected to the game session, then continue to update each player about every other player. There are several different techniques that are commonly used to achieve this, but the two most common approaches are peer-to-peer and client-server. Both techniques present different opportunities, including advantages and disadvantages. In general, neither is particularly better than the other, but different situations and use cases may be better suited for one or the other technique. Peer-to-peer networking A simple way to connect players into the same virtual game world is through the peer-to-peer architecture. Although the name might suggest that only two peers ("nodes") are involved, by definition a peer-to-peer network system is one in which two or more nodes are connected directly to each other without a centralized system orchestrating the connection or information exchange. On a typical peer-to-peer setup, each peer serves the same function as every other one—that is, they all consume the same data and share whatever data they produce so that others can stay synchronized. In the case of a peer-to-peer game, we can illustrate this architecture with a simple game of Tic-tac-toe. Once both the players have established a connection between themselves, whoever is starting the game makes a move by marking a cell on the game board. This information is relayed across the wire to the other peer, who is now aware of the decision made by his or her opponent, and can thus update their own game world. Once the second player receives the game's latest state that results from the first player's latest move, the second player is able to make a move of their own by checking some available space on the board. This information is then copied over to the first player who can update their own world and continue the process by making the next desired move. The process goes on until one of the peers disconnects or the game ends as some condition that is based on the game's own business logic is met. In the case of the game of Tic-tac-toe, the game would end once one of the players has marked three spaces on the board forming a straight line or if all nine cells are filled, but neither player managed to connect three cells in a straight path. Some of the benefits of peer-to-peer networked games are as follows: Fast data transmission: Here, the data goes directly to its intended target. In other architectures, the data could go to some centralized node first, then the central node (or the "server") contacts the other peer, sending the necessary updates. Simpler setup: You would only need to think about one instance of your game that, generally speaking, handles its own input, sends its input to other connected peers, and handles their output as input for its own system. This can be especially handy in turn-based games, for example, most board games such as Tic-tac-toe. More reliability: Here one peer that goes offline typically won't affect any of the other peers. However, in the simple case of a two-player game, if one of the players is unable to continue, the game will likely cease to be playable. Imagine, though, that the game in question has dozens or hundreds of connected peers. If a handful of them suddenly lose their Internet connection, the others can continue to play. However, if there is a server that is connecting all the nodes and the server goes down, then none of the other players will know how to talk to each other, and nobody will know what is going on. On the other hand, some of the more obvious drawbacks of peer-to-peer architecture are as follows: Incoming data cannot be trusted: Here, you don't know for sure whether or not the sender modified the data. The data that is input into a game server will also suffer from the same challenge, but once the data is validated and broadcasted to all the other peers, you can be more confident that the data received by each peer from the server will have at least been sanitized and verified, and will be more credible. Fault tolerance can be very low: If enough players share the game world, one or more crashes won't make the game unplayable to the rest of the peers. Now, if we consider the many cases where any of the players that suddenly crash out of the game negatively affect the rest of the players, we can see how a server could easily recover from the crash. Data duplication when broadcasting to other peers: Imagine that your game is a simple 2D side scroller, and many other players are sharing that game world with you. Every time one of the players moves to the right, you receive the new (x, y) coordinates from that player, and you're able to update your own game world. Now, imagine that you move your player to the right by a very few pixels; you would have to send that data out to all of the other nodes in the system. Overall, peer-to-peer is a very powerful networking architecture and is still widely used by many games in the industry. Since current peer-to-peer web technologies are still in their infancy, most JavaScript-powered games today do not make use of peer-to-peer networking. For this and other reasons that should become apparent soon, we will focus almost exclusively on the other popular networking paradigm, namely, the client-server architecture. Client-server networking The idea behind the client-server networking architecture is very simple. If you squint your eyes hard enough, you can almost see a peer-to-peer graph. The most obvious difference between them, is that, instead of every node being an equal peer, one of the nodes is special. That is, instead of every node connecting to every other node, every node (client) connects to a main centralized node called the server. While the concept of a client-server network seems clear enough, perhaps a simple metaphor might make it easier for you to understand the role of each type of node in this network format as well as differentiate it from peer-to-peer . In a peer-to-peer network, you can think of it as a group of friends (peers) having a conversation at a party. They all have access to all the other peers involved in the conversation and can talk to them directly. On the other hand, a client-server network can be viewed as a group of friends having dinner at a restaurant. If a client of the restaurant wishes to order a certain item from the menu, he or she must talk to the waiter, who is the only person in that group of people with access to the desired products and the ability to serve the products to the clients. In short, the server is in charge of providing data and services to one or more clients. In the context of game development, the most common scenario is when two or more clients connect to the same server; the server will keep track of the game as well as the distributed players. Thus, if two players are to exchange information that is only pertinent to the two of them, the communication will go from the first player to and through the server and will end up at the other end with the second player. Following the example of the two players involved in a game of Tic-tac-toe, we can see how similar the flow of events is on a client-server model. Again, the main difference is that players are unaware of each other and only know what the server tells them. While you can very easily mimic a peer-to-peer model by using a server to merely connect the two players, most often the server is used much more actively than that. There are two ways to engage the server in a networked game, namely in an authoritative and a non-authoritative way. That is to say, you can have the enforcement of the game's logic strictly in the server, or you can have the clients handle the game logic, input validation, and so on. Today, most games using the client-server architecture actually use a hybrid of the two (authoritative and non-authoritative servers). For all intents and purposes, however, the server's purpose in life is to receive input from each of the clients and distribute that input throughout the pool of connected clients. Now, regardless of whether you decide to go with an authoritative server instead of a non-authoritative one, you will notice that one of challenges with a client-server game is that you will need to program both ends of the stack. You will have to do this even if your clients do nothing more than take input from the user, forward it to the server, and render whatever data they receive from the server; if your game server does nothing more than forward the input that it receives from each client to every other client, you will still need to write a game client and a game server. We will discuss game clients and servers later. For now, all we really need to know is that these two components are what set this networking model apart from peer-to-peer. Some of the benefits of client-server networked games are as follows: Separation of concerns: If you know anything about software development, you know that this is something you should always aim for. That is, good, maintainable software is written as discrete components where each does one "thing", and it is done well. Writing individual specialized components lets you focus on performing one individual task at a time, making your game easier to design, code, test, reason, and maintain. Centralization: While this can be argued against as well as in favor of, having one central place through which all communication must flow makes it easier to manage such communication, enforce any required rules, control access, and so forth. Less work for the client: Instead of having a client (peer) in charge of taking input from the user as well as other peers, validating all the input, sharing data among other peers, rendering the game, and so on, the client can focus on only doing a few of these things, allowing the server to offload some of this work. This is particularly handy when we talk about mobile gaming, and how much subtle divisions of labor can impact the overall player experience. For example, imagine a game where 10 players are engaged in the same game world. In a peer-to-peer setup, every time one player takes an action, he or she would need to send that action to nine other players (in other words, there would need to be nine network calls, boiling down to more mobile data usage). On the other hand, on a client-server configuration, one player would only need to send his or her action to one of the peers, that is, the server, who would then be responsible for sending that data to the remaining nine players. Common drawbacks of client-server architectures, whether or not the server is authoritative, are as follows: Communication takes longer to propagate: In the very best possible scenario imaginable, every message sent from the first player to the second player would take twice as long to be delivered as compared to a peer-to-peer connection. That is, the message would be first sent from the first player to the server and then from the server to the second player. There are many techniques that are used today to solve the latency problem faced in this scenario, some of which we will discuss in much more depth later. However, the underlying dilemma will always be there. More complexity due to more moving parts: It doesn't really matter how you slice the pizza; the more code you need to write (and trust me, when you build two separate modules for a game, you will write more code), the greater your mental model will have to be. While much of your code can be reused between the client and the server (especially if you use well-established programming techniques, such as object-oriented programming), at the end of the day, you need to manage a greater level of complexity. Single point of failure and network congestion: Up until now, we have mostly discussed the case where only a handful of players participates in the same game. However, the more common case is that a handful of groups of players play different games at the same time. Using the same example of the two-player game of Tic-tac-toe, imagine that there are thousands of players facing each other in single games. In a peer-to-peer setup, once a couple of players have directly paired off, it is as though there are no other players enjoying that game. The only thing to keep these two players from continuing their game is their own connection with each other. On the other hand, if the same thousands of players are connected to each other through a server sitting between the two, then two singled out players might notice severe delays between messages because the server is so busy handling all of the messages from and to all of the other people playing isolated games. Worse yet, these two players now need to worry about maintaining their own connection with each other through the server, but they also hope that the server's connection between them and their opponent will remain active. All in all, many of the challenges involved in client-server networking are well studied and understood, and many of the problems you're likely to face during your multiplayer game development will already have been solved by someone else. Client-server is a very popular and powerful game networking model, and the required technology for it, which is available to us through HTML5 and JavaScript, is well developed and widely supported. Networking protocols – UDP and TCP By discussing some of the ways in which your players can talk to each other across some form of network, we have yet only skimmed over how that communication is actually done. Let us then describe what protocols are and how they apply to networking and, more importantly, multiplayer game development. The word protocol can be defined as a set of conventions or a detailed plan of a procedure [Citation [Def. 3,4]. (n.d.). In Merriam Webster Online, Retrieved February 12, 2015, from http://www.merriam-webster.com/dictionary/protocol]. In computer networking, a protocol describes to the receiver of a message how the data is organized so that it can be decoded. For example, imagine that you have a multiplayer beat 'em up game, and you want to tell the game server that your player just issued a kick command and moved 3 units to the left. What exactly do you send to the server? Do you send a string with a value of "kick", followed by the number 3? Otherwise, do you send the number first, followed by a capitalized letter "K", indicating that the action taken was a kick? The point I'm trying to make is that, without a well-understood and agreed-upon protocol, it is impossible to successfully and predictably communicate with another computer. The two networking protocols that we'll discuss in the section, and that are also the two most widely used protocols in multiplayer networked games, are the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP). Both protocols provide communication services between clients in a network system. In simple terms, they are protocols that allow us to send and receive packets of data in such a way that the data can be identified and interpreted in a predictable way. When data is sent through TCP, the application running in the source machine first establishes a connection with the destination machine. Once a connection has been established, data is transmitted in packets in such a way that the receiving application can then put the data back together in the appropriate order. TCP also provides built-in error checking mechanisms so that, if a packet is lost, the target application can notify the sender application, and any missing packets are sent again until the entire message is received. In short, TCP is a connection-based protocol that guarantees the delivery of the full data in the correct order. Use cases where this behavior is desirable are all around us. When you download a game from a web server, for example, you want to make sure that the data comes in correctly. You want to be sure that your game assets will be properly and completely downloaded before your users start playing your game. While this guarantee of delivery may sound very reassuring, it can also be thought of as a slow process, which, as we'll see briefly, may sometimes be more important than knowing that the data will arrive in full. In contrast, UDP transmits packets of data (called datagrams) without the use of a pre-established connection. The main goal of the protocol is to be a very fast and frictionless way of sending data towards some target application. In essence, you can think of UDP as the brave employees who dress up as their company's mascot and stand outside their store waving a large banner in the hope that at least some of the people driving by will see them and give them their business. While at first, UDP may seem like a reckless protocol, the use cases that make UDP so desirable and effective includes the many situations when you care more about speed than missing packets a few times, getting duplicate packets, or getting them out of order. You may also want to choose UDP over TCP when you don't care about the reply from the receiver. With TCP, whether or not you need some form of confirmation or reply from the receiver of your message, it will still take the time to reply back to you, at least acknowledging that the message was received. Sometimes, you may not care whether or not the server received the data. A more concrete example of a scenario where UDP is a far better choice over TCP is when you need a heartbeat from the client letting the server know if the player is still there. If you need to let your server know that the session is still active every so often, and you don't care if one of the heartbeats get lost every now and again, then it would be wise to use UDP. In short, for any data that is not mission-critical and you can afford to lose, UDP might be the best option. In closing, keep in mind that, just as peer-to-peer and client-server models can be built side by side, and in the same way your game server can be a hybrid of authoritative and non-authoritative, there is absolutely no reason why your multiplayer games should only use TCP or UDP. Use whichever protocol a particular situation calls for. Network sockets There is one other protocol that we'll cover very briefly, but only so that you can see the need for network sockets in game development. As a JavaScript programmer, you are doubtlessly familiar with Hypertext Transfer Protocol (HTTP). This is the protocol in the application layer that web browsers use to fetch your games from a Web server. While HTTP is a great protocol to reliably retrieve documents from web servers, it was not designed to be used in real-time games; therefore, it is not ideal for this purpose. The way HTTP works is very simple: a client sends a request to a server, which then returns a response back to the client. The response includes a completion status code, indicating to the client that the request is either in process, needs to be forwarded to another address, or is finished successfully or erroneously. There are a handful of things to note about HTTP that will make it clear that a better protocol is needed for real-time communication between the client and server. Firstly, after each response is received by the requester, the connection is closed. Thus, before making each and every request, a new connection must be established with the server. Most of the time, an HTTP request will be sent through TCP, which, as we've seen, can be slow, relatively speaking. Secondly, HTTP is by design a stateless protocol. This means that, every time you request a resource from a server, the server has no idea who you are and what is the context of the request. (It doesn't know whether this is your first request ever or if you're a frequent requester.) A common solution to this problem is to include a unique string with every HTTP request that the server keeps track of, and can thus provide information about each individual client on an ongoing basis. You may recognize this as a standard session. The major downside with this solution, at least with regard to real-time gaming, is that mapping a session cookie to the user's session takes additional time. Finally, the major factor that makes HTTP unsuitable for multiplayer game programming is that the communication is one way—only the client can connect to the server, and the server replies back through the same connection. In other words, the game client can tell the game server that a punch command has been entered by the user, but the game server cannot pass that information along to other clients. Think of it like a vending machine. As a client of the machine, we can request specific items that we wish to buy. We formalize this request by inserting money into the vending machine, and then we press the appropriate button. Under no circumstance will a vending machine issue commands to a person standing nearby. That would be like waiting for a vending machine to dispense food, expecting people to deposit the money inside it afterwards. The answer to this lack of functionality in HTTP is pretty straightforward. A network socket is an endpoint in a connection that allows for two-way communication between the client and the server. Think of it more like a telephone call, rather than a vending machine. During a telephone call, either party can say whatever they want at any given time. Most importantly, the connection between both parties remains open throughout the duration of the conversation, making the communication process highly efficient. WebSocket is a protocol built on top of TCP, allowing web-based applications to have two-way communication with a server. The way a WebSocket is created consists of several steps, including a protocol upgrade from HTTP to WebSocket. Thankfully, all of the heavy lifting is done behind the scenes by the browser and JavaScript. For now, the key takeaway here is that with a TCP socket (yes, there are other types of socket including UDP sockets), we can reliably communicate with a server, and the server can talk back to us as per the need. Socket programming in JavaScript Let's now bring the conversation about network connections, protocols, and sockets to a close by talking about the tools—JavaScript and WebSockets—that bring everything together, allowing us to program awesome multiplayer games in the language of the open Web. The WebSocket protocol Modern browsers and other JavaScript runtime environments have implemented the WebSocket protocol in JavaScript. Don't make the mistake of thinking that just because we can create WebSocket objects in JavaScript, WebSockets are part of JavaScript. The standard that defines the WebSocket protocol is language-agnostic and can be implemented in any programming language. Thus, before you start to deploy your JavaScript games that make use of WebSockets, ensure that the environment that will run your game uses an implementation of the ECMA standard that also implements WebSockets. In other words, not all browsers will know what to do when you ask for a WebSocket connection. For the most part, though, the latest versions, as of this writing, of the most popular browsers today (namely, Google Chrome, Safari, Mozilla Firefox, Opera, and Internet Explorer) implement the current latest revision of RFC 6455. Previous versions of WebSockets (such as protocol version - 76, 7, or 10) are slowly being deprecated and have been removed by some of the previously mentioned browsers. Probably the most confusing thing about the WebSocket protocol is the way each version of the protocol is named. The very first draft (which dates back to 2010), was named draft-hixie-thewebsocketprotocol-75. The next version was named draft-hixie-thewebsocketprotocol-76. Some people refer to these versions as 75 and 76, which can be quite confusing, especially since the fourth version of the protocol is named draft-ietf-hybi-thewebsocketprotocol-07, which is named in the draft as WebSocket Version 7. The current version of the protocol (RFC 6455) is 13. Let us take a quick look at the programming interface (API) that we'll use within our JavaScript code to interact with a WebSocket server. Keep in mind that we'll need to write both the JavaScript clients that use WebSockets to consume data as well as the WebSocket server, which uses WebSockets but plays the role of the server. The difference between the two will become apparent as we go over some examples. Creating a client-side WebSocket The following code snippet creates a new object of type WebSocket that connects the client to some backend server. The constructor takes two parameters; the first is required and represents the URL where the WebSocket server is running and expecting connections. The second URL, is an optional list of sub-protocols that the server may implement. var socket = new WebSocket('ws://www.game-domain.com'); Although this one line of code may seem simple and harmless enough, here are a few things to keep in mind: We are no longer in HTTP territory. The address to your WebSocket server now starts with ws:// instead of http://. Similarly, when we work with secure (encrypted) sockets, we would specify the server's URL as wss://, just like in https://. It may seem obvious to you, but a common pitfall that those getting started with WebSockets fall into is that, before you can establish a connection with the previous code, you need a WebSocket server running at that domain. WebSockets implement the same-origin security model. As you may have already seen with other HTML5 features, the same-origin policy states that you can only access a resource through JavaScript if both the client and the server are in the same domain. For those who are not familiar with the same-domain (also known as the same-origin) policy, the three things that constitute a domain, in this context, are the protocol, host, and port of the resource being accessed. In the previous example, the protocol, host, and port number were, respectively ws (and not wss, http, or ssh), www.game-domain.com (any sub-domain, such as game-domain.com or beta.game-domain.com would violate the same-origin policy), and 80 (by default, WebSocket connects to port 80, and port 443 when it uses wss). Since the server in the previous example binds to port 80, we don't need to explicitly specify the port number. However, had the server been configured to run on a different port, say 2667, then the URL string would need to include a colon followed by the port number that would need to be placed at the end of the host name, such as ws://www.game-domain.com:2667. As with everything else in JavaScript, WebSocket instances attempt to connect to the backend server asynchronously. Thus, you should not attempt to issue commands on your newly created socket until you're sure that the server has connected; otherwise, JavaScript will throw an error that may crash your entire game. This can be done by registering a callback function on the socket's onopen event as follows: var socket = new WebSocket('ws://www.game-domain.com'); socket.onopen = function(event) {    // socket ready to send and receive data }; Once the socket is ready to send and receive data, you can send messages to the server by calling the socket object's send method, which takes a string as the message to be sent. // Assuming a connection was previously established socket.send('Hello, WebSocket world!'); Most often, however, you will want to send more meaningful data to the server, such as objects, arrays, and other data structures that have more meaning on their own. In these cases, we can simply serialize our data as JSON strings. var player = {    nickname: 'Juju',    team: 'Blue' };   socket.send(JSON.stringify(player)); Now, the server can receive that message and work with it as the same object structure that the client sent it, by running it through the parse method of the JSON object. var player = JSON.parse(event.data); player.name === 'Juju'; // true player.team === 'Blue'; // true player.id === undefined; // true If you look at the previous example closely, you will notice that we extract the message that is sent through the socket from the data attribute of some event object. Where did that event object come from, you ask? Good question! The way we receive messages from the socket is the same on both the client and server sides of the socket. We must simply register a callback function on the socket's onmessage event, and the callback will be invoked whenever a new message is received. The argument passed into the callback function will contain an attribute named data, which will contain the raw string object with the message that was sent. socket.onmessage = function(event) {    event instanceof MessageEvent; // true      var msg = JSON.parse(event.data); }; Other events on the socket object on which you can register callbacks include onerror, which is triggered whenever an error related to the socket occurs, and onclose, which is triggered whenever the state of the socket changes to CLOSED; in other words, whenever the server closes the connection with the client for any reason or the connected client closes its connection. As mentioned previously, the socket object will also have a property called readyState, which behaves in a similar manner to the equally-named attribute in AJAX objects (or more appropriately, XMLHttpRequest objects). This attribute represents the current state of the connection and can have one of four values at any point in time. This value is an unsigned integer between 0 and 3, inclusive of both the numbers. For clarity, there are four accompanying constants on the WebSocket class that map to the four numerical values of the instance's readyState attribute. The constants are as follows: WebSocket.CONNECTING: This has a value of 0 and means that the connection between the client and the server has not yet been established. WebSocket.OPEN: This has a value of 1 and means that the connection between the client and the server is open and ready for use. Whenever the object's readyState attribute changes from CONNECTING to OPEN, which will only happen once in the object's life cycle, the onopen callback will be invoked. WebSocket.CLOSING: This has a value of 2 and means that the connection is being closed. WebSocket.CLOSED: This has a value of 3 and means that the connection is now closed (or could not be opened to begin with). Once the readyState has changed to a new value, it will never return to a previous state in the same instance of the socket object. Thus, if a socket object is CLOSING or has already become CLOSED, it will never OPEN again. In this case, you would need a new instance of WebSocket if you would like to continue to communicate with the server. To summarize, let us bring together the simple WebSocket API features that we discussed previously and create a convenient function that simplifies data serialization, error checking, and error handling when communicating with the game server: function sendMsg(socket, data) {    if (socket.readyState === WebSocket.OPEN) {      socket.send(JSON.stringify(data));        return true;    }      return false; }; Game clients Earlier, we talked about the architecture of a multiplayer game that was based on the client-server pattern. Since this is the approach we will take for the games that we'll be developing, let us define some of the main roles that the game client will fulfill. From a higher level, a game client will be the interface between the human player and the rest of the game universe (which includes the game server and other human players who are connected to it). Thus, the game client will be in charge of taking input from the player, communicating this to the server, receive any further instructions and information from the server, and then render the final output to the human player again. Depending on the type of game server used, the client can be more sophisticated than just an input application that renders static data received from the server. For example, the client could very well simulate what the game server will do and present the result of this simulation to the user while the server performs the real calculations and tells the results to the client. The biggest selling point of this technique is that the game would seem a lot more dynamic and real-time to the user since the client responds to input almost instantly. Game servers The game server is primarily responsible for connecting all the players to the same game world and keeping the communication going between them. However as you will soon realize, there may be cases where you will want the server to be more sophisticated than a routing application. For example, just because one of the players is telling the server to inform the other participants that the game is over, and the player sending the message is the winner, we may still want to confirm the information before deciding that the game is in fact over. With this idea in mind, we can label the game server as being of one of the two kinds: authoritative or non-authoritative. In an authoritative game server, the game's logic is actually running in memory (although it normally doesn't render any graphical output like the game clients certainly will) all the time. As each client reports the information to the server by sending messages through its corresponding socket, the server updates the current game state and sends the updates back to all of the players, including the original sender. This way we can be more certain that any data coming from the server has been verified and is accurate. In a non-authoritative server, the clients take on a much more involved part in the game logic enforcement, which gives the client a lot more trust. As suggested previously, what we can do is take the best of both worlds and create a mix of both the techniques. What we will do is, have a strictly authoritative server, but clients that are smart and can do some of the work on their own. Since the server has the ultimate say in the game, however, any messages received by clients from the server are considered as the ultimate truth and supersede any conclusions it came to on its own. Summary Overall, we discussed the basics of networking and network programming paradigms. We saw how WebSockets makes it possible to develop real-time, multiplayer games in HTML5. Finally, we implemented a simple game client and game server using widely supported web technologies and built a fun game of Tic-tac-toe. Resources for Article: Further resources on this subject: HTML5 Game Development – A Ball-shooting Machine with Physics Engine [article] Creating different font files and using web fonts [article] HTML5 Canvas [article]
Read more
  • 0
  • 0
  • 34115
Modal Close icon
Modal Close icon