How-To Tutorials

22 Feb 2016

27 min read

Push your data to the Web

22 Feb 2016

0
0
15036

article-image-training-neural-networks-efficiently-using-keras

Packt

22 Feb 2016

9 min read

Training neural networks efficiently using Keras

Packt

22 Feb 2016

9 min read

In this article, we will take a look at Keras, one of the most recently developed libraries to facilitate neural network training. The development on Keras started in the early months of 2015; as of today, it has evolved into one of the most popular and widely used libraries that are built on top of Theano, and allows us to utilize our GPU to accelerate neural network training. One of its prominent features is that it's a very intuitive API, which allows us to implement neural networks in only a few lines of code. Once you have Theano installed, you can install Keras from PyPI by executing the following command from your terminal command line: (For more resources related to this topic, see here.) pip install Keras For more information about Keras, please visit the official website at http://keras.io. To see what neural network training via Keras looks like, let's implement a multilayer perceptron to classify the handwritten digits from the MNIST dataset. The MNIST dataset can be downloaded from http://yann.lecun.com/exdb/mnist/ in four parts as listed here: train-images-idx3-ubyte.gz: These are training set images (9912422 bytes) train-labels-idx1-ubyte.gz: These are training set labels (28881 bytes) t10k-images-idx3-ubyte.gz: These are test set images (1648877 bytes) t10k-labels-idx1-ubyte.gz: These are test set labels (4542 bytes) After downloading and unzipped the archives, we place the files into a directory mnist in our current working directory, so that we can load the training as well as the test dataset using the following function: import os import struct import numpy as np def load_mnist(path, kind='train'): """Load MNIST data from `path`""" labels_path = os.path.join(path, '%s-labels-idx1-ubyte' % kind) images_path = os.path.join(path, '%s-images-idx3-ubyte' % kind) with open(labels_path, 'rb') as lbpath: magic, n = struct.unpack('>II', lbpath.read(8)) labels = np.fromfile(lbpath, dtype=np.uint8) with open(images_path, 'rb') as imgpath: magic, num, rows, cols = struct.unpack(">IIII", imgpath.read(16)) images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(labels), 784) return images, labels X_train, y_train = load_mnist('mnist', kind='train') print('Rows: %d, columns: %d' % (X_train.shape[0], X_train.shape[1])) Rows: 60000, columns: 784 X_test, y_test = load_mnist('mnist', kind='t10k') print('Rows: %d, columns: %d' % (X_test.shape[0], X_test.shape[1])) Rows: 10000, columns: 784 On the following pages, we will walk through the code examples for using Keras step by step, which you can directly execute from your Python interpreter. However, if you are interested in training the neural network on your GPU, you can either put it into a Python script, or download the respective code from the Packt Publishing website. In order to run the Python script on your GPU, execute the following command from the directory where the mnist_keras_mlp.py file is located: THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python mnist_keras_mlp.py To continue with the preparation of the training data, let's cast the MNIST image array into 32-bit format: >>> import theano >>> theano.config.floatX = 'float32' >>> X_train = X_train.astype(theano.config.floatX) >>> X_test = X_test.astype(theano.config.floatX) Next, we need to convert the class labels (integers 0-9) into the one-hot format. Fortunately, Keras provides a convenient tool for this: >>> from keras.utils import np_utils >>> print('First 3 labels: ', y_train[:3]) First 3 labels: [5 0 4] >>> y_train_ohe = np_utils.to_categorical(y_train) >>> print('nFirst 3 labels (one-hot):n', y_train_ohe[:3]) First 3 labels (one-hot): [[ 0. 0. 0. 0. 0. 1. 0. 0. 0. 0.] [ 1. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]] Now, we can get to the interesting part and implement a neural network. However, we will replace the logistic units in the hidden layer with hyperbolic tangent activation functions, replace the logistic function in the output layer with softmax, and add an additional hidden layer. Keras makes these tasks very simple, as you can see in the following code implementation: >>> from keras.models import Sequential >>> from keras.layers.core import Dense >>> from keras.optimizers import SGD >>> np.random.seed(1) >>> model = Sequential() >>> model.add(Dense(input_dim=X_train.shape[1], ... output_dim=50, ... init='uniform', ... activation='tanh')) >>> model.add(Dense(input_dim=50, ... output_dim=50, ... init='uniform', ... activation='tanh')) >>> model.add(Dense(input_dim=50, ... output_dim=y_train_ohe.shape[1], ... init='uniform', ... activation='softmax')) >>> sgd = SGD(lr=0.001, decay=1e-7, momentum=.9) >>> model.compile(loss='categorical_crossentropy', optimizer=sgd) First, we initialize a new model using the Sequential class to implement a feedforward neural network. Then, we can add as many layers to it as we like. However, since the first layer that we add is the input layer, we have to make sure that the input_dim attribute matches the number of features (columns) in the training set (here, 768). Also, we have to make sure that the number of output units (output_dim) and input units (input_dim) of two consecutive layers match. In the preceding example, we added two hidden layers with 50 hidden units plus 1 bias unit each. Note that bias units are initialized to 0 in fully connected networks in Keras. This is in contrast to the MLP implementation, where we initialized the bias units to 1, which is a more common (not necessarily better) convention. Finally, the number of units in the output layer should be equal to the number of unique class labels—the number of columns in the one-hot encoded class label array. Before we can compile our model, we also have to define an optimizer. In the preceding example, we chose a stochastic gradient descent optimization. Furthermore, we can set values for the weight decay constant and momentum learning to adjust the learning rate at each epoch. Lastly, we set the cost (or loss) function to categorical_crossentropy. The (binary) cross-entropy is just the technical term for the cost function in logistic regression, and the categorical cross-entropy is its generalization for multi-class predictions via softmax. After compiling the model, we can now train it by calling the fit method. Here, we are using mini-batch stochastic gradient with a batch size of 300 training samples per batch. We train the MLP over 50 epochs, and we can follow the optimization of the cost function during training by setting verbose=1. The validation_split parameter is especially handy, since it will reserve 10 percent of the training data (here, 6,000 samples) for validation after each epoch, so that we can check if the model is overfitting during training. >>> model.fit(X_train, ... y_train_ohe, ... nb_epoch=50, ... batch_size=300, ... verbose=1, ... validation_split=0.1, ... show_accuracy=True) Train on 54000 samples, validate on 6000 samples Epoch 0 54000/54000 [==============================] - 1s - loss: 2.2290 - acc: 0.3592 - val_loss: 2.1094 - val_acc: 0.5342 Epoch 1 54000/54000 [==============================] - 1s - loss: 1.8850 - acc: 0.5279 - val_loss: 1.6098 - val_acc: 0.5617 Epoch 2 54000/54000 [==============================] - 1s - loss: 1.3903 - acc: 0.5884 - val_loss: 1.1666 - val_acc: 0.6707 Epoch 3 54000/54000 [==============================] - 1s - loss: 1.0592 - acc: 0.6936 - val_loss: 0.8961 - val_acc: 0.7615 […] Epoch 49 54000/54000 [==============================] - 1s - loss: 0.1907 - acc: 0.9432 - val_loss: 0.1749 - val_acc: 0.9482 Printing the value of the cost function is extremely useful during training, since we can quickly spot whether the cost is decreasing during training and stop the algorithm earlier if otherwise to tune the hyperparameters values. To predict the class labels, we can then use the predict_classes method to return the class labels directly as integers: >>> y_train_pred = model.predict_classes(X_train, verbose=0) >>> print('First 3 predictions: ', y_train_pred[:3]) >>> First 3 predictions: [5 0 4] Finally, let's print the model accuracy on training and test sets: >>> train_acc = np.sum( ... y_train == y_train_pred, axis=0) / X_train.shape[0] >>> print('Training accuracy: %.2f%%' % (train_acc * 100)) Training accuracy: 94.51% >>> y_test_pred = model.predict_classes(X_test, verbose=0) >>> test_acc = np.sum(y_test == y_test_pred, ... axis=0) / X_test.shape[0] print('Test accuracy: %.2f%%' % (test_acc * 100)) Test accuracy: 94.39% Note that this is just a very simple neural network without optimized tuning parameters. If you are interested in playing more with Keras, please feel free to further tweak the learning rate, momentum, weight decay, and number of hidden units. Although Keras is great library for implementing and experimenting with neural networks, there are many other Theano wrapper libraries that are worth mentioning. A prominent example is Pylearn2 (http://deeplearning.net/software/pylearn2/), which has been developed in the LISA lab in Montreal. Also, Lasagne (https://github.com/Lasagne/Lasagne) may be of interest to you if you prefer a more minimalistic but extensible library, that offers more control over the underlying Theano code. Summary We caught a glimpse of the most beautiful and most exciting algorithms in the whole machine learning field: artificial neural networks. I can recommend you to follow the works of the leading experts in this field, such as Geoff Hinton (http://www.cs.toronto.edu/~hinton/), Andrew Ng (http://www.andrewng.org), Yann LeCun (http://yann.lecun.com), Juergen Schmidhuber (http://people.idsia.ch/~juergen/), and Yoshua Bengio (http://www.iro.umontreal.ca/~bengioy), just to name a few. To learn more about material design, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Building Machine Learning Systems with Python (https://www.packtpub.com/big-data-and-business-intelligence/building-machine-learning-systems-python) Neural Network Programming with Java (https://www.packtpub.com/networking-and-servers/neural-network-programming-java) Resources for Article: Further resources on this subject: Python Data Analysis Utilities [article] Machine learning and Python – the Dream Team [article] Adding a Spark to R [article]

0
0
3708

article-image-social-media-insight-using-naive-bayes

Packt

22 Feb 2016

48 min read

Social Media Insight Using Naive Bayes

Packt

22 Feb 2016

48 min read

0
0
4053

Packt

22 Feb 2016

64 min read

Dynamic Graphics

Packt

22 Feb 2016

64 min read

There is no question that the rendering system of modern graphics devices is complicated. Even rendering a single triangle to the screen engages many of these components, since GPUs are designed for large amounts of parallelism, as opposed to CPUs, which are designed to handle virtually any computational scenario. Modern graphics rendering is a high-speed dance of processing and memory management that spans software, hardware, multiple memory spaces, multiple languages, multiple processors, multiple processor types, and a large number of special-case features that can be thrown into the mix. To make matters worse, every graphics situation we will come across is different in its own way. Running the same application against a different device, even by the same manufacturer, often results in an apples-versus-oranges comparison due to the different capabilities and functionality they provide. It can be difficult to determine where a bottleneck resides within such a complex chain of devices and systems, and it can take a lifetime of industry work in 3D graphics to have a strong intuition about the source of performance issues in modern graphics systems. Thankfully, Profiling comes to the rescue once again. If we can gather data about each component, use multiple performance metrics for comparison, and tweak our Scenes to see how different graphics features affect their behavior, then we should have sufficient evidence to find the root cause of the issue and make appropriate changes. So in this article, you will learn how to gather the right data, dig just deep enough into the graphics system to find the true source of the problem, and explore various solutions to work around a given problem. There are many more topics to cover when it comes to improving rendering performance, so in this article we will begin with some general techniques on how to determine whether our rendering is limited by the CPU or by the GPU, and what we can do about either case. We will discuss optimization techniques such as Occlusion Culling and Level of Detail (LOD) and provide some useful advice on Shader optimization, as well as large-scale rendering features such as lighting and shadows. Finally, since mobile devices are a common target for Unity projects, we will also cover some techniques that may help improve performance on limited hardware. (For more resources related to this topic, see here.) Profiling rendering issues Poor rendering performance can manifest itself in a number of ways, depending on whether the device is CPU-bound, or GPU-bound; in the latter case, the root cause could originate from a number of places within the graphics pipeline. This can make the investigatory stage rather involved, but once the source of the bottleneck is discovered and the problem is resolved, we can expect significant improvements as small fixes tend to reap big rewards when it comes to the rendering subsystem. The CPU sends rendering instructions through the graphics API, that funnel through the hardware driver to the GPU device, which results in commands entering the GPU's Command Buffer. These commands are processed by the massively parallel GPU system one by one until the buffer is empty. But there are a lot more nuances involved in this process. The following shows a (greatly simplified) diagram of a typical GPU pipeline (which can vary based on technology and various optimizations), and the broad rendering steps that take place during each stage: The top row represents the work that takes place on the CPU, the act of calling into the graphics API, through the hardware driver, and pushing commands into the GPU. Ergo, a CPU-bound application will be primarily limited by the complexity, or sheer number, of graphics API calls. Meanwhile, a GPU-bound application will be limited by the GPU's ability to process those calls, and empty the Command Buffer in a reasonable timeframe to allow for the intended frame rate. This is represented in the next two rows, showing the steps taking place in the GPU. But, because of the device's complexity, they are often simplified into two different sections: the front end and the back end. The front end refers to the part of the rendering process where the GPU has received mesh data, a draw call has been issued, and all of the information that was fed into the GPU is used to transform vertices and run through Vertex Shaders. Finally, the rasterizer generates a batch of fragments to be processed in the back end. The back end refers to the remainder of the GPU's processing stages, where fragments have been generated, and now they must be tested, manipulated, and drawn via Fragment Shaders onto the frame buffer in the form of pixels. Note that "Fragment Shader" is the more technically accurate term for Pixel Shaders. Fragments are generated by the rasterization stage, and only technically become pixels once they've been processed by the Shader and drawn to the Frame Buffer. There are a number of different approaches we can use to determine where the root cause of a graphics rendering issue lies: Profiling the GPU with the Profiler Examining individual frames with the Frame Debugger Brute Force Culling GPU profiling Because graphics rendering involves both the CPU and GPU, we must examine the problem using both the CPU Usage and GPU Usage areas of the Profiler as this can tell us which component is working hardest. For example, the following screenshot shows the Profiler data for a CPU-bound application. The test involved creating thousands of simple objects, with no batching techniques taking place. This resulted in an extremely large Draw Call count (around 15,000) for the CPU to process, but giving the GPU relatively little work to do due to the simplicity of the objects being rendered: This example shows that the CPU's "rendering" task is consuming a large amount of cycles (around 30 ms per frame), while the GPU is only processing for less than 16 ms, indicating that the bottleneck resides in the CPU. Meanwhile, Profiling a GPU-bound application via the Profiler is a little trickier. This time, the test involves creating a small number of high polycount objects (for a low Draw Call per vertex ratio), with dozens of real-time point lights and an excessively complex Shader with a texture, normal texture, heightmap, emission map, occlusion map, and so on, (for a high workload per pixel ratio). The following screenshot shows Profiler data for the example Scene when it is run in a standalone application: As we can see, the rendering task of the CPU Usage area matches closely with the total rendering costs of the GPU Usage area. We can also see that the CPU and GPU time costs at the bottom of the image are relatively similar (41.48 ms versus 38.95 ms). This is very unintuitive as we would expect the GPU to be working much harder than the CPU. Be aware that the CPU/GPU millisecond cost values are not calculated or revealed unless the appropriate Usage Area has been added to the Profiler window. However, let's see what happens when we test the same exact Scene through the Editor: This is a better representation of what we would expect to see in a GPU-bound application. We can see how the CPU and GPU time costs at the bottom are closer to what we would expect to see (2.74 ms vs 64.82 ms). However, this data is highly polluted. The spikes in the CPU and GPU Usage areas are the result of the Profiler Window UI updating during testing, and the overhead cost of running through the Editor is also artificially increasing the total GPU time cost. It is unclear what causes the data to be treated this way, and this could certainly change in the future if enhancements are made to the Profiler in future versions of Unity, but it is useful to know this drawback. Trying to determine whether our application is truly GPU-bound is perhaps the only good excuse to perform a Profiler test through the Editor. The Frame Debugger A new feature in Unity 5 is the Frame Debugger, a debugging tool that can reveal how the Scene is rendered and pieced together, one Draw Call at a time. We can click through the list of Draw Calls and observe how the Scene is rendered up to that point in time. It also provides a lot of useful details for the selected Draw Call, such as the current render target (for example, the shadow map, the camera depth texture, the main camera, or other custom render targets), what the Draw Call did (drawing a mesh, drawing a static batch, drawing depth shadows, and so on), and what settings were used (texture data, vertex colors, baked lightmaps, directional lighting, and so on). The following screenshot shows a Scene that is only being partially rendered due to the currently selected Draw Call within the Frame Debugger. Note the shadows that are visible from baked lightmaps that were rendered during an earlier pass before the object itself is rendered: If we are bound by Draw Calls, then this tool can be effective in helping us figure out what the Draw Calls are being spent on, and determine whether there are any unnecessary Draw Calls that are not having an effect on the scene. This can help us come up with ways to reduce them, such as removing unnecessary objects or batching them somehow. We can also use this tool to observe how many additional Draw Calls are consumed by rendering features, such as shadows, transparent objects, and many more. This could help us, when we're creating multiple quality levels for our game, to decide what features to enable/disable under the low, medium, and high quality settings. Brute force testing If we're poring over our Profiling data, and we're still not sure we can determine the source of the problem, we can always try the brute force method: cull a specific activity from the Scene and see if it results in greatly increased performance. If a small change results in a big speed improvement, then we have a strong clue about where the bottleneck lies. There's no harm in this approach if we eliminate enough unknown variables to be sure the data is leading us in the right direction. We will cover different ways to brute force test a particular issue in each of the upcoming sections. CPU-bound If our application is CPU-bound, then we will observe a generally poor FPS value within the CPU Usage area of the Profiler window due to the rendering task. However, if VSync is enabled the data will often get muddied up with large spikes representing pauses as the CPU waits for the screen refresh rate to come around before pushing the current frame buffer. So, we should make sure to disable the VSync block in the CPU Usage area before deciding the CPU is the problem. Brute-forcing a test for CPU-bounding can be achieved by reducing Draw Calls. This is a little unintuitive since, presumably, we've already been reducing our Draw Calls to a minimum through techniques such as Static and Dynamic Batching, Atlasing, and so forth. This would mean we have very limited scope for reducing them further. What we can do, however, is disable the Draw-Call-saving features such as batching and observe if the situation gets significantly worse than it already is. If so, then we have evidence that we're either already, or very close to being, CPU-bound. At this point, we should see whether we can re-enable these features and disable rendering for a few choice objects (preferably those with low complexity to reduce Draw Calls without over-simplifying the rendering of our scene). If this results in a significant performance improvement then, unless we can find further opportunities for batching and mesh combining, we may be faced with the unfortunate option of removing objects from our scene as the only means of becoming performant again. There are some additional opportunities for Draw Call reduction, including Occlusion Culling, tweaking our Lighting and Shadowing, and modifying our Shaders. These will be explained in the following sections. However, Unity's rendering system can be multithreaded, depending on the targeted platform, which version of Unity we're running, and various settings, and this can affect how the graphics subsystem is being bottlenecked by the CPU, and slightly changes the definition of what being CPU-bound means. Multithreaded rendering Multithreaded rendering was first introduced in Unity v3.5 in February 2012, and enabled by default on multicore systems that could handle the workload; at the time, this was only PC, Mac, and Xbox 360. Gradually, more devices were added to this list, and since Unity v5.0, all major platforms now enable multithreaded rendering by default (and possibly some builds of Unity 4). Mobile devices were also starting to feature more powerful CPUs that could support this feature. Android multithreaded rendering (introduced in Unity v4.3) can be enabled through a checkbox under Platform Settings | Other Settings | Multithreaded Rendering. Multithreaded rendering on iOS can be enabled by configuring the application to make use of the Apple Metal API (introduced in Unity v4.6.3), under Player Settings | Other Settings | Graphics API. When multithreaded rendering is enabled, tasks that must go through the rendering API (OpenGL, DirectX, or Metal), are handed over from the main thread to a "worker thread". The worker thread's purpose is to undertake the heavy workload that it takes to push rendering commands through the graphics API and driver, to get the rendering instructions into the GPU's Command Buffer. This can save an enormous number of CPU cycles for the main thread, where the overwhelming majority of other CPU tasks take place. This means that we free up extra cycles for the majority of the engine to process physics, script code, and so on. Incidentally, the mechanism by which the main thread notifies the worker thread of tasks operates in a very similar way to the Command Buffer that exists on the GPU, except that the commands are much more high-level, with instructions like "render this object, with this Material, using this Shader", or "draw N instances of this piece of procedural geometry", and so on. This feature has been exposed in Unity 5 to allow developers to take direct control of the rendering subsystem from C# code. This customization is not as powerful as having direct API access, but it is a step in the right direction for Unity developers to implement unique graphical effects. Confusingly, the Unity API name for this feature is called "CommandBuffer", so be sure not to confuse it with the GPU's Command Buffer. Check the Unity documentation on CommandBuffer to make use of this feature: http://docs.unity3d.com/ScriptReference/Rendering.CommandBuffer.html. Getting back to the task at hand, when we discuss the topic of being CPU-bound in graphics rendering, we need to keep in mind whether or not the multithreaded renderer is being used, since the actual root cause of the problem will be slightly different depending on whether this feature is enabled or not. In single-threaded rendering, where all graphics API calls are handled by the main thread, and in an ideal world where both components are running at maximum capacity, our application would become bottlenecked on the CPU when 50 percent or more of the time per frame is spent handling graphics API calls. However, resolving these bottlenecks can be accomplished by freeing up work from the main thread. For example, we might find that greatly reducing the amount of work taking place in our AI subsystem will improve our rendering significantly because we've freed up more CPU cycles to handle the graphics API calls. But, when multithreaded rendering is taking place, this task is pushed onto the worker thread, which means the same thread isn't being asked to manage both engine work and graphics API calls at the same time. These processes are mostly independent, and even though additional work must still take place in the main thread to send instructions to the worker thread in the first place (via the internal CommandBuffer system), it is mostly negligible. This means that reducing the workload in the main thread will have little-to-no effect on rendering performance. Note that being GPU-bound is the same regardless of whether multithreaded rendering is taking place. GPU Skinning While we're on the subject of CPU-bounding, one task that can help reduce CPU workload, at the expense of additional GPU workload, is GPU Skinning. Skinning is the process where mesh vertices are transformed based on the current location of their animated bones. The animation system, working on the CPU, only transforms the bones, but another step in the rendering process must take care of the vertex transformations to place the vertices around those bones, performing a weighted average over the bones connected to those vertices. This vertex processing task can either take place on the CPU or within the front end of the GPU, depending on whether the GPU Skinning option is enabled. This feature can be toggled under Edit | Project Settings | Player Settings | Other Settings | GPU Skinning. Front end bottlenecks It is not uncommon to use a mesh that contains a lot of unnecessary UV and Normal vector data, so our meshes should be double-checked for this kind of superfluous fluff. We should also let Unity optimize the structure for us, which minimizes cache misses as vertex data is read within the front end. We will also learn some useful Shader optimization techniques shortly, when we begin to discuss back end optimizations, since many optimization techniques apply to both Fragment and Vertex Shaders. The only attack vector left to cover is finding ways to reduce actual vertex counts. The obvious solutions are simplification and culling; either have the art team replace problematic meshes with lower polycount versions, and/or remove some objects from the scene to reduce the overall polygon count. If these approaches have already been explored, then the last approach we can take is to find some kind of middle ground between the two. Level Of Detail Since it can be difficult to tell the difference between a high quality distance object and a low quality one, there is very little reason to render the high quality version. So, why not dynamically replace distant objects with something more simplified? Level Of Detail (LOD), is a broad term referring to the dynamic replacement of features based on their distance or form factor relative to the camera. The most common implementation is mesh-based LOD: dynamically replacing a mesh with lower and lower detailed versions as the camera gets farther and farther away. Another example might be replacing animated characters with versions featuring fewer bones, or less sampling for distant objects, in order to reduce animation workload. The built-in LOD feature is available in the Unity 4 Pro Edition and all editions of Unity 5. However, it is entirely possible to implement it via Script code in Unity 4 Free Edition if desired. Making use of LOD can be achieved by placing multiple objects in the Scene and making them children of a GameObject with an attached LODGroup component. The LODGroup's purpose is to generate a bounding box from these objects, and decide which object should be rendered based on the size of the bounding box within the camera's field of view. If the object's bounding box consumes a large area of the current view, then it will enable the mesh(es) assigned to lower LOD groups, and if the bounding box is very small, it will replace the mesh(es) with those from higher LOD groups. If the mesh is too far away, it can be configured to hide all child objects. So, with the proper setup, we can have Unity replace meshes with simpler alternatives, or cull them entirely, which eases the burden on the rendering process. Check the Unity documentation for more detailed information on the LOD feature: http://docs.unity3d.com/Manual/LevelOfDetail.html. This feature can cost us a large amount of development time to fully implement; artists must generate lower polygon count versions of the same object, and level designers must generate LOD groups, configure them, and test them to ensure they don't cause jarring transitions as the camera moves closer or farther away. It also costs us in memory and runtime CPU; the alternative meshes need to be kept in memory, and the LODGroup component must routinely test whether the camera has moved to a new position that warrants a change in LOD level. In this era of graphics card capabilities, vertex processing is often the least of our concerns. Combined with the additional sacrifices needed for LOD to function, developers should avoid preoptimizing by automatically assuming LOD will help them. Excessive use of the feature will lead to burdening other parts of our application's performance, and chew up precious development time, all for the sake of paranoia. If it hasn't been proven to be a problem, then it's probably not a problem! Scenes that feature large, expansive views of the world, and lots of camera movement, should consider implementing this technique very early, as the added distance and massive number of visible objects will exacerbate the vertex count enormously. Scenes that are always indoors, or feature a camera with a viewpoint looking down at the world (real-time strategy and MOBA games, for example) should probably steer clear of implementing LOD from the beginning. Games somewhere between the two should avoid it until necessary. It all depends on how many vertices are expected to be visible at any given time and how much variability in camera distance there will be. Note that some game development middleware companies offer third-party tools for automated LOD mesh generation. These might be worth investigating to compare their ease of use versus quality loss versus cost effectiveness. Disable GPU Skinning As previously mentioned, we could enable GPU Skinning to reduce the burden on a CPU-bound application, but enabling this feature will push the same workload into the front end of the GPU. Since Skinning is one of those "embarrassingly parallel" processes that fits well with the GPU's parallel architecture, it is often a good idea to perform the task on the GPU. But this task can chew up precious time in the front end preparing the vertices for fragment generation, so disabling it is another option we can explore if we're bottlenecked in this area. Again, this feature can be toggled under Edit | Project Settings | Player Settings | Other Settings | GPU Skinning. GPU Skinning is available in Unity 4 Pro Edition, and all editions of Unity 5. Reduce tessellation There is one last task that takes place in the front end process and that we need to consider: tessellation. Tessellation through Geometry Shaders can be a lot of fun, as it is a relatively underused technique that can really make our graphical effects stand out from the crowd of games that only use the most common effects. But, it can contribute enormously to the amount of processing work taking place in the front end. There are no simple tricks we can exploit to improve tessellation, besides improving our tessellation algorithms, or easing the burden caused by other front end tasks to give our tessellation tasks more room to breathe. Either way, if we have a bottleneck in the front end and are making use of tessellation techniques, we should double-check that they are not consuming the lion's share of the front end's budget. Back end bottlenecks The back end is the more interesting part of the GPU pipeline, as many more graphical effects take place during this stage. Consequently, it is the stage that is significantly more likely to suffer from bottlenecks. There are two brute force tests we can attempt: Reduce resolution Reduce texture quality These changes will ease the workload during two important stages at the back end of the pipeline: fill rate and memory bandwidth, respectively. Fill rate tends to be the most common source of bottlenecks in the modern era of graphics rendering, so we will cover it first. Fill rate By reducing screen resolution, we have asked the rasterization system to generate significantly fewer fragments and transpose them over a smaller canvas of pixels. This will reduce the fill rate consumption of the application, giving a key part of the rendering pipeline some additional breathing room. Ergo, if performance suddenly improves with a screen resolution reduction, then fill rate should be our primary concern. Fill rate is a very broad term referring to the speed at which the GPU can draw fragments. But, this only includes fragments that have survived all of the various conditional tests we might have enabled within the given Shader. A fragment is merely a "potential pixel," and if it fails any of the enabled tests, then it is immediately discarded. This can be an enormous performance-saver as the pipeline can skip the costly drawing step and begin work on the next fragment instead. One such example is Z-testing, which checks whether the fragment from a closer object has already been drawn to the same pixel already. If so, then the current fragment is discarded. If not, then the fragment is pushed through the Fragment Shader and drawn over the target pixel, which consumes exactly one draw from our fill rate. Now imagine multiplying this process by thousands of overlapping objects, each generating hundreds or thousands of possible fragments, for high screen resolutions causing millions, or billions, of fragments to be generated each and every frame. It should be fairly obvious that skipping as many of these draws as we can will result in big rendering cost savings. Graphics card manufacturers typically advertise a particular fill rate as a feature of the card, usually in the form of gigapixels per second, but this is a bit of a misnomer, as it would be more accurate to call it gigafragments per second; however this argument is mostly academic. Either way, larger values tell us that the device can potentially push more fragments through the pipeline, so with a budget of 30 GPix/s and a target frame rate of 60 Hz, we can afford to process 30,000,000,000/60 = 500 million fragments per frame before being bottlenecked on fill rate. With a resolution of 2560x1440, and a best-case scenario where each pixel is only drawn over once, then we could theoretically draw the entire scene about 125 times without any noticeable problems. Sadly, this is not a perfect world, and unless we take significant steps to avoid it, we will always end up with some amount of redraw over the same pixels due to the order in which objects are rendered. This is known as overdraw, and it can be very costly if we're not careful. The reason that resolution is a good attack vector to check for fill rate bounding is that it is a multiplier. A reduction from a resolution of 2560x1440 to 800x600 is an improvement factor of about eight, which could reduce fill rate costs enough to make the application perform well again. Overdraw Determining how much overdraw we have can be represented visually by rendering all objects with additive alpha blending and a very transparent flat color. Areas of high overdraw will show up more brightly as the same pixel is drawn over with additive blending multiple times. This is precisely how the Scene view's Overdraw shading mode reveals how much overdraw our scene is suffering. The following screenshot shows a scene with several thousand boxes drawn normally, and drawn using the Scene view's Overdraw shading mode: At the end of the day, fill rate is provided as a means of gauging the best-case behavior. In other words, it's primarily a marketing term and mostly theoretical. But, the technical side of the industry has adopted the term as a way of describing the back end of the pipeline: the stage where fragment data is funneled through our Shaders and drawn to the screen. If every fragment required an absolute minimum level of processing (such as a Shader that returned a constant color), then we might get close to that theoretical maximum. The GPU is a complex beast, however, and things are never so simple. The nature of the device means it works best when given many small tasks to perform. But, if the tasks get too large, then fill rate is lost due to the back end not being able to push through enough fragments in time and the rest of the pipeline is left waiting for tasks to do. There are several more features that can potentially consume our theoretical fill rate maximum, including but not limited to alpha testing, alpha blending, texture sampling, the amount of fragment data being pulled through our Shaders, and even the color format of the target render texture (the final Frame Buffer in most cases). The bad news is that this gives us a lot of subsections to cover, and a lot of ways to break the process, but the good news is it gives us a lot of avenues to explore to improve our fill rate usage. Occlusion Culling One of the best ways to reduce overdraw is to make use of Unity's Occlusion Culling system. The system works by partitioning Scene space into a series of cells and flying through the world with a virtual camera making note of which cells are invisible from other cells (are occluded) based on the size and position of the objects present. Note that this is different to the technique of Frustum Culling, which culls objects not visible from the current camera view. This feature is always active in all versions, and objects culled by this process are automatically ignored by the Occlusion Culling system. Occlusion Culling is available in the Unity 4 Pro Edition and all editions of Unity 5. Occlusion Culling data can only be generated for objects properly labeled Occluder Static and Occludee Static under the StaticFlags dropdown. Occluder Static is the general setting for static objects where we want it to hide other objects, and be hidden by large objects in its way. Occludee Static is a special case for transparent objects that allows objects behind them to be rendered, but we want them to be hidden if something large blocks their visibility. Naturally, because one of the static flags must be enabled for Occlusion Culling, this feature will not work for dynamic objects. The following screenshot shows how effective Occlusion Culling can be at reducing the number of visible objects in our Scene: This feature will cost us in both application footprint and incur some runtime costs. It will cost RAM to keep the Occlusion Culling data structure in memory, and there will be a CPU processing cost to determine which objects are being occluded in each frame. The Occlusion Culling data structure must be properly configured to create cells of the appropriate size for our Scene, and the smaller the cells, the longer it takes to generate the data structure. But, if it is configured correctly for the Scene, Occlusion Culling can provide both fill rate savings through reduced overdraw, and Draw Call savings by culling non-visible objects. Shader optimization Shaders can be a significant fill rate consumer, depending on their complexity, how much texture sampling takes place, how many mathematical functions are used, and so on. Shaders do not directly consume fill rate, but do so indirectly because the GPU must calculate or fetch data from memory during Shader processing. The GPU's parallel nature means any bottleneck in a thread will limit how many fragments can be pushed into the thread at a later date, but parallelizing the task (sharing small pieces of the job between several agents) provides a net gain over serial processing (one agent handling each task one after another). The classic example is a vehicle assembly line. A complete vehicle requires multiple stages of manufacture to complete. The critical path to completion might involve five steps: stamping, welding, painting, assembly, and inspection, and each step is completed by a single team. For any given vehicle, no stage can begin before the previous one is finished, but whatever team handled the stamping for the last vehicle can begin stamping for the next vehicle as soon as it has finished. This organization allows each team to become masters of their particular domain, rather than trying to spread their knowledge too thin, which would likely result in less consistent quality in the batch of vehicles. We can double the overall output by doubling the number of teams, but if any team gets blocked, then precious time is lost for any given vehicle, as well as all future vehicles that would pass through the same team. If these delays are rare, then they can be negligible in the grand scheme, but if not, and one stage takes several minutes longer than normal each and every time it must complete the task, then it can become a bottleneck that threatens the release of the entire batch. The GPU parallel processors work in a similar way: each processor thread is an assembly line, each processing stage is a team, and each fragment is a vehicle. If the thread spends a long time processing a single stage, then time is lost on each fragment. This delay will multiply such that all future fragments coming through the same thread will be delayed. This is a bit of an oversimplification, but it often helps to paint a picture of how poorly optimized Shader code can chew up our fill rate, and how small improvements in Shader optimization provide big benefits in back end performance. Shader programming and optimization have become a very niche area of game development. Their abstract and highly-specialized nature requires a very different kind of thinking to generate Shader code compared to gameplay and engine code. They often feature mathematical tricks and back-door mechanisms for pulling data into the Shader, such as precomputing values in texture files. Because of this, and the importance of optimization, Shaders tend to be very difficult to read and reverse-engineer. Consequently, many developers rely on prewritten Shaders, or visual Shader creation tools from the Asset Store such as Shader Forge or Shader Sandwich. This simplifies the act of initial Shader code generation, but might not result in the most efficient form of Shaders. If we're relying on pre-written Shaders or tools, we might find it worthwhile to perform some optimization passes over them using some tried-and-true techniques. So, let's focus on some easily reachable ways of optimizing our Shaders. Consider using Shaders intended for mobile platforms The built-in mobile Shaders in Unity do not have any specific restrictions that force them to only be used on mobile devices. They are simply optimized for minimum resource usage (and tend to feature some of the other optimizations listed in this section). Desktop applications are perfectly capable of using these Shaders, but they tend to feature a loss of graphical quality. It only becomes a question of whether the loss of graphical quality is acceptable. So, consider doing some testing with the mobile equivalents of common Shaders to see whether they are a good fit for your game. Use small data types GPUs can calculate with smaller data types more quickly than larger types (particularly on mobile platforms!), so the first tweak we can attempt is replacing our float data types (32-bit, floating point) with smaller versions such as half (16-bit, floating point), or even fixed (12-bit, fixed point). The size of the data types listed above will vary depending on what floating point formats the target platform prefers. The sizes listed are the most common. The importance for optimization is in the relative size between formats. Color values are good candidates for precision reduction, as we can often get away with less precise color values without any noticeable loss in coloration. However, the effects of reducing precision can be very unpredictable for graphical calculations. So, changes such as these can require some testing to verify whether the reduced precision is costing too much graphical fidelity. Note that the effects of these tweaks can vary enormously between one GPU architecture and another (for example, AMD versus Nvidia versus Intel), and even GPU brands from the same manufacturer. In some cases, we can make some decent performance gains for a trivial amount of effort. In other cases, we might see no benefit at all. Avoid changing precision while swizzling Swizzling is the Shader programming technique of creating a new vector (an array of values) from an existing vector by listing the components in the order in which we wish to copy them into the new structure. Here are some examples of swizzling: float4 input = float4(1.0, 2.0, 3.0, 4.0); // initial test value float2 val1 = input.yz; // swizzle two components float3 val2 = input.zyx; // swizzle three components in a different order float4 val3 = input.yyy; // swizzle the same component multiple times float sclr = input.w; float3 val4 = sclr.xxx // swizzle a scalar multiple times We can use both the xyzw and rgba representations to refer to the same components, sequentially. It does not matter whether it is a color or vector; they just make the Shader code easier to read. We can also list components in any order we like to fill in the desired data, repeating them if necessary. Converting from one precision type to another in a Shader can be a costly operation, but converting the precision type while simultaneously swizzling can be particularly painful. If we have mathematical operations that rely on being swizzled into different precision types, it would be wiser if we simply absorbed the high-precision cost from the very beginning, or reduced precision across the board to avoid the need for changes in precision. Use GPU-optimized helper functions The Shader compiler often performs a good job of reducing mathematical calculations down to an optimized version for the GPU, but compiled custom code is unlikely to be as effective as both the Cg library's built-in helper functions and the additional helpers provided by the Unity Cg included files. If we are using Shaders that include custom function code, perhaps we can find an equivalent helper function within the Cg or Unity libraries that can do a better job than our custom code can. These extra include files can be added to our Shader within the CGPROGRAM block like so: CGPROGRAM // other includes #include "UnityCG.cginc" // Shader code here ENDCG Example Cg library functions to use are abs() for absolute values, lerp() for linear interpolation, mul() for multiplying matrices, and step() for step functionality. Useful UnityCG.cginc functions include WorldSpaceViewDir() for calculating the direction towards the camera, and Luminance() for converting a color to grayscale. Check the following URL for a full list of Cg standard library functions: http://http.developer.nvidia.com/CgTutorial/cg_tutorial_appendix_e.html. Check the Unity documentation for a complete and up-to-date list of possible include files and their accompanying helper functions: http://docs.unity3d.com/Manual/SL-BuiltinIncludes.html. Disable unnecessary features Perhaps we can make savings by simply disabling Shader features that aren't vital. Does the Shader really need multiple passes, transparency, Z-writing, alpha-testing, and/or alpha blending? Will tweaking these settings or removing these features give us a good approximation of our desired effect without losing too much graphical fidelity? Making such changes is a good way of making fill rate cost savings. Remove unnecessary input data Sometimes the process of writing a Shader involves a lot of back and forth experimentation in editing code and viewing it in the Scene. The typical result of this is that input data that was needed when the Shader was going through early development is now surplus fluff once the desired effect has been obtained, and it's easy to forget what changes were made when/if the process drags on for a long time. But, these redundant data values can cost the GPU valuable time as they must be fetched from memory even if they are not explicitly used by the Shader. So, we should double check our Shaders to ensure all of their input geometry, vertex, and fragment data is actually being used. Only expose necessary variables Exposing unnecessary variables from our Shader to the accompanying Material(s) can be costly as the GPU can't assume these values are constant. This means the Shader code cannot be compiled into a more optimized form. This data must be pushed from the CPU with every pass since they can be modified at any time through the Material's methods such as SetColor(), SetFloat(), and so on. If we find that, towards the end of the project, we always use the same value for these variables, then they can be replaced with a constant in the Shader to remove such excess runtime workload. The only cost is obfuscating what could be critical graphical effect parameters, so this should be done very late in the process. Reduce mathematical complexity Complicated mathematics can severely bottleneck the rendering process, so we should do whatever we can to limit the damage. Complex mathematical functions could be replaced with a texture that is fed into the Shader and provides a pre-generated table for runtime lookup. We may not see any improvement with functions such as sin and cos, since they've been heavily optimized to make use of GPU architecture, but complex methods such as pow, exp, log, and other custom mathematical processes can only be optimized so much, and would be good candidates for simplification. This is assuming we only need one or two input values, which are represented through the X and Y coordinates of the texture, and mathematical accuracy isn't of paramount importance. This will cost us additional graphics memory to store the texture at runtime (more on this later), but if the Shader is already receiving a texture (which they are in most cases) and the alpha channel is not being used, then we could sneak the data in through the texture's alpha channel, costing us literally no performance, and the rest of the Shader code and graphics system would be none-the-wiser. This will involve the customization of art assets to include such data in any unused color channel(s), requiring coordination between programmers and artists, but is a very good way of saving Shader processing costs with no runtime sacrifices. In fact, Material properties and textures are both excellent entry points for pushing work from the Shader (the GPU) onto the CPU. If a complex calculation does not need to vary on a per pixel basis, then we could expose the value as a property in the Material, and modify it as needed (accepting the overhead cost of doing so from the previous section Only expose necessary variables). Alternatively, if the result varies per pixel, and does not need to change often, then we could generate a texture file from script code, containing the results of the calculations in the RGBA values, and pulling the texture into the Shader. Lots of opportunities arise when we ignore the conventional application of such systems, and remember to think of them as just raw data being transferred around. Reduce texture lookups While we're on the subject of texture lookups, they are not trivial tasks for the GPU to process and they have their own overhead costs. They are the most common cause of memory access problems within the GPU, especially if a Shader is performing samples across multiple textures, or even multiple samples across a single texture, as they will likely inflict cache misses in memory. Such situations should be simplified as much as possible to avoid severe GPU memory bottlenecking. Even worse, sampling a texture in a random order would likely result in some very costly cache misses for the GPU to suffer through, so if this is being done, then the texture should be reordered so that it can be sampled in a more sequential order. Avoid conditional statements In modern day CPU architecture, conditional statements undergo a lot of clever predictive techniques to make use of instruction-level parallelism. This is a feature where the CPU attempts to predict which direction a conditional statement will go in before it has actually been resolved, and speculatively begins processing the most likely result of the conditional using any free components that aren't being used to resolve the conditional (fetching some data from memory, copying some floats into unused registers, and so on). If it turns out that the decision is wrong, then the current result is discarded and the proper path is taken instead. So long as the cost of speculative processing and discarding false results is less than the time spent waiting to decide the correct path, and it is right more often than it is wrong, then this is a net gain for the CPU's speed. However, this feature is not possible on GPU architecture because of its parallel nature. The GPU's cores are typically managed by some higher-level construct that instructs all cores under its command to perform the same machine-code-level instruction simultaneously. So, if the Fragment Shader requires a float to be multiplied by 2, then the process will begin by having all cores copy data into the appropriate registers in one coordinated step. Only when all cores have finished copying to the registers will the cores be instructed to begin the second step: multiplying all registers by 2. Thus, when this system stumbles into a conditional statement, it cannot resolve the two statements independently. It must determine how many of its child cores will go down each path of the conditional, grab the list of required machine code instructions for one path, resolve them for all cores taking that path, and repeat for each path until all possible paths have been processed. So, for an if-else statement (two possibilities), it will tell one group of cores to process the "true" path, then ask the remaining cores to process the "false" path. Unless every core takes the same path, it must process both paths every time. So, we should avoid branching and conditional statements in our Shader code. Of course, this depends on how essential the conditional is to achieving the graphical effect we desire. But, if the conditional is not dependent on per pixel behavior, then we would often be better off absorbing the cost of unnecessary mathematics than inflicting a branching cost on the GPU. For example, we might be checking whether a value is non-zero before using it in a calculation, or comparing against some global flag in the Material before taking one action or another. Both of these cases would be good candidates for optimization by removing the conditional check. Reduce data dependencies The compiler will try its best to optimize our Shader code into the more GPU-friendly low-level language so that it is not waiting on data to be fetched when it could be processing some other task. For example, the following poorly-optimized code, could be written in our Shader: float sum = input.color1.r; sum = sum + input.color2.g; sum = sum + input.color3.b; sum = sum + input.color4.a; float result = calculateSomething(sum); If we were able to force the Shader compiler to compile this code into machine code instructions as it is written, then this code has a data dependency such that each calculation cannot begin until the last finishes due to the dependency on the sum variable. But, such situations are often detected by the Shader compiler and optimized into a version that uses instruction-level parallelism (the code shown next is the high-level code equivalent of the resulting machine code): float sum1, sum2, sum3, sum4; sum1 = input.color1.r; sum2 = input.color2.g; sum3 = input.color3.b sum4 = input.color4.a; float sum = sum1 + sum2 + sum3 + sum4; float result = CalculateSomething(sum); In this case, the compiler would recognize that it can fetch the four values from memory in parallel and complete the summation once all four have been fetched independently via thread-level parallelism. This can save a lot of time, relative to performing the four fetches one after another. However, long chains of data dependency can absolutely murder Shader performance. If we create a strong data dependency in our Shader's source code, then it has been given no freedom to make such optimizations. For example, the following data dependency would be painful on performance, as one step cannot be completed without waiting on another to fetch data and performing the appropriate calculation. float4 val1 = tex2D(_tex1, input.texcoord.xy); float4 val2 = tex2D(_tex2, val1.yz); float4 val3 = tex2D(_tex3, val2.zw); Strong data dependencies such as these should be avoided whenever possible. Surface Shaders If we're using Unity's Surface Shaders, which are a way for Unity developers to get to grips with Shader programming in a more simplified fashion, then the Unity Engine takes care of converting our Surface Shader code for us, abstracting away some of the optimization opportunities we have just covered. However, it does provide some miscellaneous values that can be used as replacements, which reduce accuracy but simplify the mathematics in the resulting code. Surface Shaders are designed to handle the general case fairly efficiently, but optimization is best achieved with a personal touch. The approxview attribute will approximate the view direction, saving costly operations. halfasview will reduce the precision of the view vector, but beware of its effect on mathematical operations involving multiple precision types. noforwardadd will limit the Shader to only considering a single directional light, reducing Draw Calls since the Shader will render in only a single pass, but reducing lighting complexity. Finally, noambient will disable ambient lighting in the Shader, removing some extra mathematical operations that we may not need. Use Shader-based LOD We can force Unity to render distant objects using simpler Shaders, which can be an effective way of saving fill rate, particularly if we're deploying our game onto multiple platforms or supporting a wide range of hardware capability. The LOD keyword can be used in the Shader to set the onscreen size factor that the Shader supports. If the current LOD level does not match this value, it will drop to the next fallback Shader and so on until it finds the Shader that supports the given size factor. We can also change a given Shader object's LOD value at runtime using the maximumLOD property. This feature is similar to the mesh-based LOD covered earlier, and uses the same LOD values for determining object form factor, so it should be configured as such. Memory bandwidth Another major component of back end processing and a potential source of bottlenecks is memory bandwidth. Memory bandwidth is consumed whenever a texture must be pulled from a section of the GPU's main video memory (also known as VRAM). The GPU contains multiple cores that each have access to the same area of VRAM, but they also each contain a much smaller, local Texture Cache that stores the current texture(s) the GPU has been most recently working with. This is similar in design to the multitude of CPU cache levels that allow memory transfer up and down the chain, as a workaround for the fact that faster memory will, invariably, be more expensive to produce, and hence smaller in capacity compared to slower memory. Whenever a Fragment Shader requests a sample from a texture that is already within the core's local Texture Cache, then it is lightning fast and barely perceivable. But, if a texture sample request is made, that does not yet exist within the Texture Cache, then it must be pulled in from VRAM before it can be sampled. This fetch request risks cache misses within VRAM as it tries to find the relevant texture. The transfer itself consumes a certain amount of memory bandwidth, specifically an amount equal to the total size of the texture file stored within VRAM (which may not be the exact size of the original file, nor the size in RAM, due to GPU-level compression). It's for this reason that, if we're bottlenecked on memory bandwidth, then performing a brute force test by reducing texture quality would suddenly result in a performance improvement. We've shrunk the size of our textures, easing the burden on the GPU's memory bandwidth, allowing it to fetch the necessary textures much quicker. Globally reducing texture quality can be achieved by going to Edit | Project Settings | Quality | Texture Quality and setting the value to Half Res, Quarter Res, or Eighth Res. In the event that memory bandwidth is bottlenecked, then the GPU will keep fetching the necessary texture files, but the entire process will be throttled as the Texture Cache waits for the data to appear before processing the fragment. The GPU won't be able to push data back to the Frame Buffer in time to be rendered onto the screen, blocking the whole process and culminating in a poor frame rate. Ultimately, proper usage of memory bandwidth is a budgeting concern. For example, with a memory bandwidth of 96 GB/sec per core and a target frame rate of 60 frames per second, then the GPU can afford to pull 96/60 = 1.6 GB worth of texture data every frame before being bottlenecked on memory bandwidth. Memory bandwidth is often listed on a per core basis, but some GPU manufacturers may try to mislead you by multiplying memory bandwidth by the number of cores in order to list a bigger, but less practical number. Because of this, research may be necessary to confirm the memory bandwidth limit we have for the target GPU hardware is given on a per core basis. Note that this value is not the maximum limit on the texture data that our game can contain in the project, nor in CPU RAM, not even in VRAM. It is a metric that limits how much texture swapping can occur during one frame. The same texture could be pulled back and forth multiple times in a single frame depending on how many Shaders need to use them, the order that the objects are rendered, and how often texture sampling must occur, so rendering just a few objects could consume whole gigabytes of memory bandwidth if they all require the same high quality, massive textures, require multiple secondary texture maps (normal maps, emission maps, and so on), and are not batched together, because there simply isn't enough Texture Cache space available to keep a single texture file long enough to exploit it during the next rendering pass. There are several approaches we can take to solve bottlenecks in memory bandwidth. Use less texture data This approach is simple, straightforward, and always a good idea to consider. Reducing texture quality, either through resolution or bit rate, is not ideal for graphical quality, but we can sometimes get away with using 16-bit textures without any noticeable degradation. Mip Maps are another excellent way of reducing the amount of texture data being pushed back and forth between VRAM and the Texture Cache. Note that the Scene View has a Mipmaps Shading Mode, which will highlight textures in our scene blue or red depending on whether the current texture scale is appropriate for the current Scene View's camera position and orientation. This will help identify what textures are good candidates for further optimization. Mip Maps should almost always be used in 3D Scenes, unless the camera moves very little. Test different GPU Texture Compression formats The Texture Compression techniques helpe reduce our application's footprint (executable file size), and runtime CPU memory usage, that is, the storage area where all texture resource data is kept until it is needed by the GPU. However, once the data reaches the GPU, it uses a different form of compression to keep texture data small. The common formats are DXT, PVRTC, ETC, and ASTC. To make matters more confusing, each platform and GPU hardware supports different compression formats, and if the device does not support the given compression format, then it will be handled at the software level. In other words, the CPU will need to stop and recompress the texture to the desired format the GPU wants, as opposed to the GPU taking care of it with a specialized hardware chip. The compression options are only available if a texture resource has its Texture Type field set to Advanced. Using any of the other texture type settings will simplify the choices, and Unity will make a best guess when deciding which format to use for the target platform, which may not be ideal for a given piece of hardware and thus will consume more memory bandwidth than necessary. The best approach to determining the correct format is to simply test a bunch of different devices and Texture Compression techniques and find one that fits. For example, common wisdom says that ETC is the best choice for Android since more devices support it, but some developers have found their game works better with the DXT and PVRTC formats on certain devices. Beware that, if we're at the point where individually tweaking Texture Compression techniques is necessary, then hopefully we have exhausted all other options for reducing memory bandwidth. By going down this road, we could be committing to supporting many different devices each in their own specific way. Many of us would prefer to keep things simple with a general solution instead of personal customization and time-consuming handiwork to work around problems like this. Minimize texture sampling Can we modify our Shaders to remove some texture sampling overhead? Did we add some extra texture lookup files to give ourselves some fill rate savings on mathematical functions? If so, we might want to consider lowering the resolution of such textures or reverting the changes and solving our fill rate problems in other ways. Essentially, the less texture sampling we do, the less often we need to use memory bandwidth and the closer we get to resolving the bottleneck. Organize assets to reduce texture swaps This approach basically comes back to Batching and Atlasing again. Are there opportunities to batch some of our biggest texture files together? If so, then we could save the GPU from having to pull in the same texture files over and over again during the same frame. As a last resort, we could look for ways to remove some textures from the entire project and reuse similar files. For instance, if we have fill rate budget to spare, then we may be able to use some Fragment Shaders to make a handful of textures files appear in our game with different color variations. VRAM limits One last consideration related to textures is how much VRAM we have available. Most texture transfer from CPU to GPU occurs during initialization, but can also occur when a non-existent texture is first required by the current view. This process is asynchronous and will result in a blank texture being used until the full texture is ready for rendering. As such, we should avoid too much texture variation across our Scenes. Texture preloading Even though it doesn't strictly relate to graphics performance, it is worth mentioning that the blank texture that is used during asynchronous texture loading can be jarring when it comes to game quality. We would like a way to control and force the texture to be loaded from disk to the main memory and then to VRAM before it is actually needed. A common workaround is to create a hidden GameObject that features the texture and place it somewhere in the Scene on the route that the player will take towards the area where it is actually needed. As soon as the textured object becomes a candidate for the rendering system (even if it's technically hidden), it will begin the process of copying the data towards VRAM. This is a little clunky, but is easy to implement and works sufficiently well in most cases. We can also control such behavior via Script code by changing a hidden Material's texture: GetComponent<Renderer>().material.texture = textureToPreload; Texture thrashing In the rare event that too much texture data is loaded into VRAM, and the required texture is not present, the GPU will need to request it from the main memory and overwrite the existing texture data to make room. This is likely to worsen over time as the memory becomes fragmented, and it introduces a risk that the texture just flushed from VRAM needs to be pulled again within the same frame. This will result in a serious case of memory "thrashing", and should be avoided at all costs. This is less of a concern on modern consoles such as the PS4, Xbox One, and WiiU, since they share a common memory space for both CPU and GPU. This design is a hardware-level optimization given the fact that the device is always running a single application, and almost always rendering 3D graphics. But, all other platforms must share time and space with multiple applications and be capable of running without a GPU. They therefore feature separate CPU and GPU memory, and we must ensure that the total texture usage at any given moment remains below the available VRAM of the target hardware. Note that this "thrashing" is not precisely the same as hard disk thrashing, where memory is copied back and forth between main memory and virtual memory (the swap file), but it is analogous. In either case, data is being unnecessarily copied back and forth between two regions of memory because too much data is being requested in too short a time period for the smaller of the two memory regions to hold it all. Thrashing such as this can be a common cause of dreadful graphics performance when games are ported from modern consoles to the desktop and should be treated with care. Avoiding this behavior may require customizing texture quality and file sizes on a per-platform and per-device basis. Be warned that some players are likely to notice these inconsistencies if we're dealing with hardware from the same console or desktop GPU generation. As many of us will know, even small differences in hardware can lead to a lot of apples-versus-oranges comparisons, but hardcore gamers will expect a similar level of quality across the board. Lighting and Shadowing Lighting and Shadowing can affect all parts of the graphics pipeline, and so they will be treated separately. This is perhaps one of the most important parts of game art and design to get right. Good Lighting and Shadowing can turn a mundane scene into something spectacular as there is something magical about professional coloring that makes it visually appealing. Even the low-poly art style (think Monument Valley) relies heavily on a good lighting and shadowing profile in order to allow the player to distinguish one object from another. But, this isn't an art book, so we will focus on the performance characteristics of various Lighting and Shadowing features. Unity offers two styles of dynamic light rendering, as well as baked lighting effects through lightmaps. It also provides multiple ways of generating shadows with varying levels of complexity and runtime processing cost. Between the two, there are a lot of options to explore, and a lot of things that can trip us up if we're not careful. The Unity documentation covers all of these features in an excellent amount of detail (start with this page and work through them: http://docs.unity3d.com/Manual/Lighting.html), so we'll examine these features from a performance standpoint. Let's tackle the two main light rendering modes first. This setting can be found under Edit | Project Settings | Player | Other Settings | Rendering, and can be configured on a per-platform basis. Forward Rendering Forward Rendering is the classical form of rendering lights in our scene. Each object is likely to be rendered in multiple passes through the same Shader. How many passes are required will be based on the number, distance, and brightness of light sources. Unity will try to prioritize which directional light is affecting the object the most and render the object in a "base pass" as a starting point. It will then take up to four of the most powerful point lights nearby and re-render the same object multiple times through the same Fragment Shader. The next four point lights will then be processed on a per-vertex basis. All remaining lights are treated as a giant blob by means of a technique called spherical harmonics. Some of this behavior can be simplified by setting a light's Render Mode to values such as Not Important, and changing the value of Edit | Project Settings | Quality | Pixel Light Count. This value limits how many lights will be treated on a per pixel basis, but is overridden by any lights with a Render Mode set to Important. It is therefore up to us to use this combination of settings responsibly. As you can imagine, the design of Forward Rendering can utterly explode our Draw Call count very quickly in scenes with a lot of point lights present, due to the number of render states being configured and Shader passes being reprocessed. CPU-bound applications should avoid this rendering mode if possible. More information on Forward Rendering can be found in the Unity documentation: http://docs.unity3d.com/Manual/RenderTech-ForwardRendering.html. Deferred Shading Deferred Shading or Deferred Rendering as it is sometimes known, is only available on GPUs running at least Shader Model 3.0. In other words, any desktop graphics card made after around 2004. The technique has been around for a while, but it has not resulted in a complete replacement of the Forward Rendering method due to the caveats involved and limited support on mobile devices. Anti-aliasing, transparency, and animated characters receiving shadows are all features that cannot be managed through Deferred Shading alone and we must use the Forward Rendering technique as a fallback. Deferred Shading is so named because actual shading does not occur until much later in the process; that is, it is deferred until later. From a performance perspective, the results are quite impressive as it can generate very good per pixel lighting with surprisingly little Draw Call effort. The advantage is that a huge amount of lighting can be accomplished using only a single pass through the lighting Shader. The main disadvantages include the additional costs if we wish to pile on advanced lighting features such as Shadowing and any steps that must pass through Forward Rendering in order to complete, such as transparency. The Unity documentation contains an excellent source of information on the Deferred Shading technique, its advantages, and its pitfalls: http://docs.unity3d.com/Manual/RenderTech-DeferredShading.html Vertex Lit Shading (legacy) Technically, there are more than two lighting methods. Unity allows us to use a couple of legacy lighting systems, only one of which may see actual use in the field: Vertex Lit Shading. This is a massive simplification of lighting, as lighting is only considered per vertex, and not per pixel. In other words, entire faces are colored based on the incoming light color, and not individual pixels. It is not expected that many, or really any, 3D games will make use of this legacy technique, as a lack of shadows and proper lighting make visualizations of depth very difficult. It is mostly relegated to 2D games that don't intend to make use of shadows, normal maps, and various other lighting features, but it is there if we need it. Real-time Shadows Soft Shadows are expensive, Hard Shadows are cheap, and No Shadows are free. Shadow Resolution, Shadow Projection, Shadow Distance, and Shadow Cascades are all settings we can find under Edit | Project Settings | Quality | Shadows that we can use to modify the behavior and complexity of our shadowing passes. That summarizes almost everything we need to know about Unity's real-time shadowing techniques from a high-level performance standpoint. We will cover shadows more in the following section on optimizing our lighting effects. Lighting optimization With a cursory glance at all of the relevant lighting techniques, let's run through some techniques we can use to improve lighting costs. Use the appropriate Shading Mode It is worth testing both of the main rendering modes to see which one best suits our game. Deferred Shading is often used as a backup in the event that Forward Rendering is becoming a burden on performance, but it really depends on where else we're finding bottlenecks as it is sometimes difficult to tell the difference between them. Use Culling Masks A Light Component's Culling Mask property is a layer-based mask that can be used to limit which objects will be affected by the given Light. This is an effective way of reducing lighting overhead, assuming that the layer interactions also make sense with how we are using layers for physics optimization. Objects can only be a part of a single layer, and reducing physics overhead probably trumps lighting overhead in most cases; thus, if there is a conflict, then this may not be the ideal approach. Note that there is limited support for Culling Masks when using Deferred Shading. Because of the way it treats lighting in a very global fashion, only four layers can be disabled from the mask, limiting our ability to optimize its behavior through this method. Use Baked Lightmaps Baking Lighting and Shadowing into a Scene is significantly less processor-intensive than generating them at runtime. The downside is the added application footprint, memory consumption, and potential for memory bandwidth abuse. Ultimately, unless a game's lighting effects are being handled exclusively through Legacy Vertex Lighting or a single Directional Light, then it should probably include Lightmapping to make some huge budget savings on lighting calculations. Relying entirely on real-time lighting and shadows is a recipe for disaster unless the game is trying to win an award for the smallest application file size of all time. Optimize Shadows Shadowing passes mostly consume our Draw Calls and fill rate, but the amount of vertex position data we feed into the process and our selection for the Shadow Projection setting will affect the front end's ability to generate the required shadow casters and shadow receivers. We should already be attempting to reduce vertex counts to solve front end bottlenecking in the first place, and making this change will be an added multiplier towards that effort. Draw Calls are consumed during shadowing by rendering visible objects into a separate buffer (known as the shadow map) as either a shadow caster, a shadow receiver, or both. Each object that is rendered into this map will consume another Draw Call, which makes shadows a huge performance cost multiplier, so it is often a setting that games will expose to users via quality settings, allowing users with weaker hardware to reduce the effect or even disable it entirely. Shadow Distance is a global multiplier for runtime shadow rendering. The fewer shadows we need to draw, the happier the entire rendering process will be. There is little point in rendering shadows at a great distance from the camera, so this setting should be configured specific to our game and how much shadowing we expect to witness during gameplay. It is also a common setting that is exposed to the user to reduce the burden of rendering shadows. Higher values of Shadow Resolution and Shadow Cascades will increase our memory bandwidth and fill rate consumption. Both of these settings can help curb the effects of artefacts in shadow rendering, but at the cost of a much larger shadow map size that must be moved around and of the canvas size to draw to. The Unity documentation contains an excellent summary on the topic of the aliasing effect of shadow maps and how the Shadow Cascades feature helps to solve the problem: http://docs.unity3d.com/Manual/DirLightShadows.html. It's worth noting that Soft Shadows do not consume any more memory or CPU overhead relative to Hard Shadows, as the only difference is a more complex Shader. This means that applications with enough fill rate to spare can enjoy the improved graphical fidelity of Soft Shadows. Optimizing graphics for mobile Unity's ability to deploy to mobile devices has contributed greatly to its popularity among hobbyist, small, and mid-size development teams. As such, it would be prudent to cover some approaches that are more beneficial for mobile platforms than for desktop and other devices. Note that any, and all, of the following approaches may become obsolete soon, if they aren't already. The mobile device market is moving blazingly fast, and the following techniques as they apply to mobile devices merely reflect conventional wisdom from the last half decade. We should occasionally test the assumptions behind these approaches from time-to-time to see whether the limitations of mobile devices still fit the mobile marketplace. Minimize Draw Calls Mobile applications are more often bottlenecked on Draw Calls than on fill rate. Not that fill rate concerns should be ignored (nothing should, ever!), but this makes it almost necessary for any mobile application of reasonable quality to implement Mesh Combining, Batching, and Atlasing techniques from the very beginning. Deferred Rendering is also the preferred technique as it fits well with other mobile-specific concerns, such as avoiding transparency and having too many animated characters. Minimize the Material count This concern goes hand in hand with the concepts of Batching and Atlasing. The fewer Materials we use, the fewer Draw Calls will be necessary. This strategy will also help with concerns relating to VRAM and memory bandwidth, which tend to be very limited on mobile devices. Minimize texture size and Material count Most mobile devices feature a very small Texture Cache relative to desktop GPUs. For instance, the iPhone 3G can only support a total texture size of 1024x1024 due to running OpenGLES1.1 with simple vertex rendering techniques. Meanwhile the iPhone 3GS, iPhone 4, and iPad generation run OpenGLES 2.0, which only supports textures up to 2048x2048. Later generations can support textures up to 4096x4096. Double check the device hardware we are targeting to be sure it supports the texture file sizes we wish to use (there are too many Android devices to list here). However, later-generation devices are never the most common devices in the mobile marketplace. If we wish our game to reach a wide audience (increasing its chances of success), then we must be willing to support weaker hardware. Note that textures that are too large for the GPU will be downscaled by the CPU during initialization, wasting valuable loading time, and leaving us with unintended graphical fidelity. This makes texture reuse of paramount importance for mobile devices due to the limited VRAM and Texture Cache sizes available. Make textures square and power-of-2 The GPU will find it difficult, or simply be unable to compress the texture if it is not in a square format, so make sure you stick to the common development convention and keep things square and sized to a power of 2. Use the lowest possible precision formats in Shaders Mobile GPUs are particularly sensitive to precision formats in its Shaders, so the smallest formats should be used. On a related note, format conversion should be avoided for the same reason. Avoid Alpha Testing Mobile GPUs haven't quite reached the same levels of chip optimization as desktop GPUs, and Alpha Testing remains a particularly costly task on mobile devices. In most cases it should simply be avoided in favor of Alpha Blending. Summary If you've made it this far without skipping ahead, then congratulations are in order. That was a lot of information to absorb for just one component of the Unity Engine, but then it is clearly the most complicated of them all, requiring a matching depth of explanation. Hopefully, you've learned a lot of approaches to help you improve your rendering performance and enough about the rendering pipeline to know how to use them responsibly! To learn more about Unity 5, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Unity 5 Game Optimization (https://www.packtpub.com/game-development/unity-5-game-optimization) Unity 5.x By Example (https://www.packtpub.com/game-development/unity-5x-example) Unity 5.x Cookbook (https://www.packtpub.com/game-development/unity-5x-cookbook) Unity 5 for Android Essentials (https://www.packtpub.com/game-development/unity-5-android-essentials) Resources for Article: Further resources on this subject: The Vertex Functions [article] UI elements and their implementation [article] Routing for Yii Demystified [article]

0
0
40570

article-image-concurrency-and-parallelism-swift-2

Packt

22 Feb 2016

35 min read

Concurrency and Parallelism with Swift 2

Packt

22 Feb 2016

35 min read

0
0
20135

Packt

22 Feb 2016

38 min read

Component Composition

Packt

22 Feb 2016

38 min read

0
0
13386

How-To Tutorials

Packt

22 Feb 2016

10 min read

Publication of Apps

Packt

22 Feb 2016

10 min read

Ever wondered if you could prepare and publish an app on Google Play and you needed a short article on how you could get this done quickly? Here it is! Go ahead, read this piece of article, and you'll be able to get your app running on Google Play. (For more resources related to this topic, see here.) Preparing to publish You probably don't want to upload any of the apps from this book, so the first step is to develop an app that you want to publish. Head over to https://play.google.com/apps/publish/ and follow the instructions to get a Google Play developer account. This was $25 at the time of writing and is a one-time charge with no limit on the number of apps you can publish. Creating an app icon Exactly how to design an icon is beyond the remit of this book. But, simply put, you need to create a nice image for each of the Android screen density categorizations. This is easier than it sounds. Design one nice app icon in your favorite drawing program and save it as a .png file. Then, visit http://romannurik.github.io/AndroidAssetStudio/icons-launcher.html. This will turn your single icon into a complete set of icons for every single screen density. Warning! The trade-off for using this service is that the website will collect your e-mail address for their own marketing purposes. There are many sites that offer a similar free service. Once you have downloaded your .zip file from the preceding site, you can simply copy the res folder from the download into the main folder within the project explorer. All icons at all densities have now been updated. Preparing the required resources When we log into Google Play to create a new listing in the store, there is nothing technical to handle, but we do need to prepare quite a few images that we will need to upload. Prepare upto 8 screenshots for each device type (a phone/tablet/TV/watch) that your app is compatible with. Don't crop or pad these images. Create a 512 x 512 pixel image that will be used to show off your app icon on the Google Play store. You can prepare your own icon, or the process of creating app icons that we just discussed will have already autogenerated icons for you. You also need to create three banner graphics, which are as follows: 1024 x 500 180 x 120 320 x 180 These can be screenshots, but it is usually worth taking a little time to create something a bit more special. If you are not artistically minded, you can place a screenshot inside some quite cool device art and then simply add a background image. You can generate some device art at https://developer.android.com/distribute/tools/promote/device-art.html. Then, just add the title or feature of your app to the background. The following banner was created with no skill at all, just with a pretty background purchased for $10 and the device art tool I just mentioned: Also, consider creating a video of your app. Recording video of your Android device is nearly impossible unless your device is rooted. I cannot recommend you to root your device; however, there is a tool called ARC (App Runtime for Chrome) that enables you to run APK files on your desktop. There is no debugging output, but it can run a demanding app a lot more smoothly than the emulator. It will then be quite simple to use a free, open source desktop capture program such as OBS (Open Broadcaster Software) to record your app running within ARC. You can learn more about ARC at https://developer.chrome.com/apps/getstarted_arc and about OBS at https://obsproject.com/. Building the publishable APK file What we are doing in this section is preparing the file that we will upload to Google Play. The format of the file we will create is .apk. This type of file is often referred to as an APK. The actual contents of this file are the compiled class files, all the resources that we've added, and the files and resources that Android Studio has autogenerated. We don't need to concern ourselves with the details, as we just need to follow these steps. The steps not only create the APK, but they also create a key and sign your app with the key. This process is required and it also protects the ownership of your app: Note that this is not the same thing as copy protection/digital rights management. In Android Studio, open the project that you want to publish and navigate to Build | Generate Signed APK and a pop-up window will open, as shown: In the Generate Signed APK window, click on the Create new button. After this, you will see the New Key Store window, as shown in the following screenshot: In the Key store path field, browse to a location on your hard disk where you would like to keep your new key, and enter a name for your key store. If you don't have a preference, simply enter keys and click on OK. Add a password and then retype it to confirm it. Next, you need to choose an alias and type it into the Alias field. You can treat this like a name for your key. It can be any word that you like. Now, enter another password for the key itself and type it again to confirm. Leave Validity (years) at its default value of 25. Now, all you need to do is fill out your personal/business details. This doesn't need to be 100% complete as the only mandatory field is First and Last Name. Click on the OK button to continue. You will be taken back to the Generate Signed APK window with all the fields completed and ready to proceed, as shown in the following window: Now, click on Next to move to the next screen: Choose where you would like to export your new APK file and select release for the Build Type field. Click on Finish and Android Studio will build the shiny new APK into the location you've specified, ready to be uploaded to the App Store. Taking a backup of your key store in multiple safe places! The key store is extremely valuable. If you lose it, you will effectively lose control over your app. For example, if you try to update an app that you have on Google Play, it will need to be signed by the same key. Without it, you would not be able to update it. Think of the chaos if you had lots of users and your app needed a database update, but you had to issue a whole new app because of a lost key store. As we will need it quite soon, locate the file that has been built and ends in the .apk extension. Publishing the app Log in to your developer account at https://play.google.com/apps/publish/. From the left-hand side of your developer console, make sure that the All applications tab is selected, as shown: On the top right-hand side corner, click on the Add new application button, as shown in the next screenshot: Now, we have a bit of form filling to do, and you will need all the images from the Preparing to publish section that is near the start of the chapter. In the ADD NEW APPLICATION window shown next, choose a default language and type the title of your application: Now, click on the Upload APK button and then the Upload your first APK button and browse to the APK file that you built and signed in. Wait for the file to finish uploading: Now, from the inner left-hand side menu, click on Store Listing: We are faced with a fair bit of form filling here. If, however, you have all your images to hand, you can get through this in about 10 minutes. Almost all the fields are self-explanatory, and the ones that aren't have helpful tips next to the field entry box. Here are a few hints and tips to make the process smooth and produce a good end result: In the Full description and Short description fields, you enter the text that will be shown to potential users/buyers of your app. Be sure to make the description as enticing and exciting as you can. Mention all the best features in a clear list, but start the description with one sentence that sums up your app and what it does. Don't worry about the New content rating field as we will cover that in a minute. If you haven't built your app for tablet/phone devices, then don't add images in these tabs. If you have, however, make sure that you add a full range of images for each because these are the only images that the users of this type of device will see. When you have completed the form, click on the Save draft button at the top-right corner of the web page. Now, click on the Content rating tab and you can answer questions about your app to get a content rating that is valid (and sometimes varied) across multiple countries. The last tab you need to complete is the Pricing and Distribution tab. Click on this tab and choose the Paid or Free distribution button. Then, enter a price if you've chosen Paid. Note that if you choose Free, you can never change this. You can, however, unpublish it. If you chose Paid, you can click on Auto-convert prices now to set up equivalent pricing for all currencies around the world. In the DISTRIBUTE IN THESE COUNTRIES section, you can select countries individually or check the SELECT ALL COUNTRIES checkbox, as shown in the next screenshot: The next six options under the Device categories and User programs sections in the context of what you have learned in this book should all be left unchecked. Do read the tips to find out more about Android Wear, Android TV, Android Auto, Designed for families, Google Play for work, and Google Play for education, however. Finally, you must check two boxes to agree with the Google consent guidelines and US export laws. Click on the Publish App button in the top-right corner of the web page and your app will soon be live on Google Play. Congratulations. Summary You can now start building Android apps. Don't run off and build the next Evernote, Runtatstic, or Angry Birds just yet. Head over to our book, Android Programming for Beginners: https://www.packtpub.com/application-development/android-programming-beginners. Here are a few more books that you can check out to learn more about Android: Android Studio Cookbook (https://www.packtpub.com/application-development/android-studio-cookbook) Learning Android Google Maps (https://www.packtpub.com/application-development/learning-android-google-maps) Android 6 Essentials (https://www.packtpub.com/application-development/android-6-essentials) Android Sensor Programming By Example (https://www.packtpub.com/application-development/android-sensor-programming-example) Resources for Article: Further resources on this subject: Saying Hello to Unity and Android[article] Android and iOS Apps Testing at a Glance[article] Testing with the Android SDK[article]

0
0
13775

How-To Tutorials

Packt

22 Feb 2016

3 min read

Adding a Spark to R

Packt

22 Feb 2016

3 min read

Spark is written in a language called Scala. It has interfaces to use from Java and Python and from the recent version 1.4.0; it also supports R. This is called SparkR, which we will describe in the next section. The four classes of libraries available in Spark are SQL and DataFrames, Spark Streaming, MLib (machine learning), and GraphX (graph algorithms). Currently, SparkR supports only SQL and DataFrames; others are definitely in the roadmap. Spark can be downloaded from the Apache project page at http://spark.apache.org/downloads.html. Starting from 1.4.0 version, SparkR is included in Spark and no separate download is required. (For more resources related to this topic, see here.) SparkR Similar to RHadoop, SparkR is an R package that allows R users to use Spark APIs through the RDD class. For example, using SparkR, users can run jobs on Spark from RStudio. SparkR can be evoked from RStudio. To enable this, include the following lines in your .Rprofile file that R uses at startup to initialize the environments: Sys.setenv(SPARK_HOME/.../spark-1.5.0-bin-hadoop2.6") #provide the correct path where spark downloaded folder is kept for SPARK_HOME .libPaths(c(file.path(Sys.getenv("SPARK_HOME"),""R",""lib"),".libPaths())) Once this is done, start RStudio and enter the following commands to start using SparkR: >library(SparkR) >sc ← sparkR.init(master="local") As mentioned, as of the latest version 1.5 when this chapter is in writing, SparkR supports limited functionalities of R. This mainly includes data slicing and dicing and summary stat functions. The current version does not support the use of contributed R packages; however, it is planned for a future release. On machine learning, currently SparkR supports the glm( ) function. We will do an example in the next section. Linear regression using SparkR In the following example, we will illustrate how to use SparkR for machine learning. >library(SparkR) >sc ← sparkR.init(master="local") >sqlContext ← sparkRSQL.init(sc) #Importing data >df ← read.csv("/Users/harikoduvely/Projects/Book/Data /ENB2012_data.csv",header = T) >#Excluding variable Y2,X6,X8 and removing records from 768 containing mainly null values >df ← df[1:768,c(1,2,3,4,5,7,9)] >#Converting to a Spark R Dataframe >dfsr ← createDataFrame(sqlContext,df) >model ← glm(Y1 ~ X1 + X2 + X3 + X4 + X5 + X7,data = dfsr,family = "gaussian") > summary(model) Summary In this article we have seen examples of SparkR and linear regression using SparkR. For more information on Spark you can refer to: https://www.packtpub.com/big-data-and-business-intelligence/spark-python-developers https://www.packtpub.com/big-data-and-business-intelligence/spark-beginners Resources for Article: Further resources on this subject: Data Analysis Using R[article] Introducing Bayesian Inference[article] Bayesian Network Fundamentals[article]

0
0
2637

Packt

22 Feb 2016

25 min read

VR Build and Run

Packt

22 Feb 2016

25 min read

Yeah well, this is cool and everything, but where's my VR? I WANT MY VR! Hold on kid, we're getting there. In this article, we are going to set up a project that can be built and run with a virtual reality head-mounted display (HMD) and then talk more in depth about how the VR hardware technology really works. We will be discussing the following topics: The spectrum of the VR device integration software Installing and building a project for your VR device The details and defining terms for how the VR technology really works (For more resources related to this topic, see here.) VR device integration software Before jumping in, let's understand the possible ways to integrate our Unity project with virtual reality devices. In general, your Unity project must include a camera object that can render stereographic views, one for each eye on the VR headset. Software for the integration of applications with the VR hardware spans a spectrum, from built-in support and device-specific interfaces to the device-independent and platform-independent ones. Unity's built-in VR support Since Unity 5.1, support for the VR headsets is built right into Unity. At the time of writing this article, there is direct support for Oculus Rift and Samsung Gear VR (which is driven by the Oculus software). Support for other devices has been announced, including Sony PlayStation Morpheus. You can use a standard camera component, like the one attached to Main Camera and the standard character asset prefabs. When your project is built with Virtual Reality Supported enabled in Player Settings, it renders stereographic camera views and runs on an HMD. The device-specific SDK If a device is not directly supported in Unity, the device manufacturer will probably publish the Unity plugin package. An advantage of using the device-specific interface is that it can directly take advantage of the features of the underlying hardware. For example, Steam Valve and Google have device-specific SDK and Unity packages for the Vive and Cardboard respectively. If you're using one of these devices, you'll probably want to use such SDK and Unity packages. (At the time of writing this article, these devices are not a part of Unity's built-in VR support.) Even Oculus, supported directly in Unity 5.1, provides SDK utilities to augment that interface (see, https://developer.oculus.com/documentation/game-engines/latest/concepts/unity-intro/). Device-specific software locks each build into the specific device. If that's a problem, you'll either need to do some clever coding, or take one of the following approaches instead. The OSVR project In January 2015, Razer Inc. led a group of industry leaders to announce the Open Source Virtual Reality (OSVR) platform (for more information on this, visit http://www.osvr.com/) with plans to develop open source hardware and software, including an SDK that works with multiple devices from multiple vendors. The open source middleware project provides device-independent SDKs (and Unity packages) so that you can write your code to a single interface without having to know which devices your users are using. With OSVR, you can build your Unity game for a specific operating system (such as Windows, Mac, and Linux) and then let the user configure the app (after they download it) for whatever hardware they're going to use. At the time of writing this article, the project is still in its early stage, is rapidly evolving, and is not ready for this article. However, I encourage you to follow its development. WebVR WebVR (for more information, visit http://webvr.info/) is a JavaScript API that is being built directly into major web browsers. It's like WebGL (2D and 3D graphics API for the web) with VR rendering and hardware support. Now that Unity 5 has introduced the WebGL builds, I expect WebVR to surely follow, if not in Unity then from a third-party developer. As we know, browsers run on just about any platform. So, if you target your game to WebVR, you don't even need to know the user's operating system, let alone which VR hardware they're using! That's the idea anyway. New technologies, such as the upcoming WebAssembly, which is a new binary format for the Web, will help to squeeze the best performance out of your hardware and make web-based VR viable. For WebVR libraries, check out the following: WebVR boilerplate: https://github.com/borismus/webvr-boilerplate GLAM: http://tparisi.github.io/glam/ glTF: http://gltf.gl/ MozVR (the Mozilla Firefox Nightly builds with VR): http://mozvr.com/downloads/ WebAssembly: https://github.com/WebAssembly/design/blob/master/FAQ.md 3D worlds There are a number of third-party 3D world platforms that provide multiuser social experiences in shared virtual spaces. You can chat with other players, move between rooms through portals, and even build complex interactions and games without having to be an expert. For examples of 3D virtual worlds, check out the following: VRChat: http://vrchat.net/ JanusVR: http://janusvr.com/ AltspaceVR: http://altvr.com/ High Fidelity: https://highfidelity.com/ For example, VRChat lets you develop 3D spaces and avatars in Unity, export them using their SDK, and load them into VRChat for you and others to share over the Internet in a real-time social VR experience. Creating the MeMyselfEye prefab To begin, we will create an object that will be a proxy for the user in the virtual environment. Let's create the object using the following steps: Open Unity and the project from the last article. Then, open the Diorama scene by navigating to File | Open Scene (or double-click on the scene object in Project panel, under Assets). From the main menu bar, navigate to GameObject | Create Empty. Rename the object MeMyselfEye (hey, this is VR!). Set its position up close into the scene, at Position (0, 1.4, -1.5). In the Hierarchy panel, drag the Main Camera object into MeMyselfEye so that it's a child object. With the Main Camera object selected, reset its transform values (in the Transform panel, in the upper right section, click on the gear icon and select Reset). The Game view should show that we're inside the scene. If you recall the Ethan experiment that we did earlier, I picked a Y-position of 1.4 so that we'll be at about the eye level with Ethan. Now, let's save this as a reusable prefabricated object, or prefab, in the Project panel, under Assets: In Project panel, under Assets, select the top-level Assets folder, right-click and navigate to Create | Folder. Rename the folder Prefabs. Drag the MeMyselfEye prefab into the Project panel, under Assets/Prefabs folder to create a prefab. Now, let's configure the project for your specific VR headset. Build for the Oculus Rift If you have a Rift, you've probably already downloaded Oculus Runtime, demo apps, and tons of awesome games. To develop for the Rift, you'll want to be sure that the Rift runs fine on the same machine on which you're using Unity. Unity has built-in support for the Oculus Rift. You just need to configure your Build Settings..., as follows: From main menu bar, navigate to File | Build Settings.... If the current scene is not listed under Scenes In Build, click on Add Current. Choose PC, Mac, & Linux Standalone from the Platform list on the left and click on Switch Platform. Choose your Target Platform OS from the Select list on the right (for example, Windows). Then, click on Player Settings... and go to the Inspector panel. Under Other Settings, check off the Virtual Reality Supported checkbox and click on Apply if the Changing editor vr device dialog box pops up. To test it out, make sure that the Rift is properly connected and turned on. Click on the game Play button at the top of the application in the center. Put on the headset, and IT SHOULD BE AWESOME! Within the Rift, you can look all around—left, right, up, down, and behind you. You can lean over and lean in. Using the keyboard, you can make Ethan walk, run, and jump just like we did earlier. Now, you can build your game as a separate executable app using the following steps. Most likely, you've done this before, at least for non-VR apps. It's pretty much the same: From the main menu bar, navigate to File | Build Settings.... Click on Build and set its name. I like to keep my builds in a subdirectory named Builds; create one if you want to. Click on Save. An executable will be created in your Builds folder. If you're on Windows, there may also be a rift_Data folder with built data. Run Diorama as you would do for any executable application—double-click on it. Choose the Windowed checkbox option so that when you're ready to quit, close the window with the standard Close icon in the upper right of your screen. Build for Google Cardboard Read this section if you are targeting Google Cardboard on Android and/or iOS. A good starting point is the Google Cardboard for Unity, Get Started guide (for more information, visit https://developers.google.com/cardboard/unity/get-started). The Android setup If you've never built for Android, you'll first need to download and install the Android SDK. Take a look at Unity manual for Android SDK Setup (http://docs.unity3d.com/Manual/android-sdksetup.html). You'll need to install the Android Developer Studio (or at least, the smaller SDK Tools) and other related tools, such as Java (JVM) and the USB drivers. It might be a good idea to first build, install, and run another Unity project without the Cardboard SDK to ensure that you have all the pieces in place. (A scene with just a cube would be fine.) Make sure that you know how to install and run it on your Android phone. The iOS setup A good starting point is Unity manual, Getting Started with iOS Development guide (http://docs.unity3d.com/Manual/iphone-GettingStarted.html). You can only perform iOS development from a Mac. You must have an Apple Developer Account approved (and paid for the standard annual membership fee) and set up. Also, you'll need to download and install a copy of the Xcode development tools (via the Apple Store). It might be a good idea to first build, install, and run another Unity project without the Cardboard SDK to ensure that you have all the pieces in place. (A scene with just a cube would be fine). Make sure that you know how to install and run it on your iPhone. Installing the Cardboard Unity package To set up our project to run on Google Cardboard, download the SDK from https://developers.google.com/cardboard/unity/download. Within your Unity project, import the CardboardSDKForUnity.unitypackage assets package, as follows: From the Assets main menu bar, navigate to Import Package | Custom Package.... Find and select the CardboardSDKForUnity.unitypackage file. Ensure that all the assets are checked, and click on Import. Explore the imported assets. In the Project panel, the Assets/Cardboard folder includes a bunch of useful stuff, including the CardboardMain prefab (which, in turn, contains a copy of CardboardHead, which contains the camera). There is also a set of useful scripts in the Cardboard/Scripts/ folder. Go check them out. Adding the camera Now, we'll put the Cardboard camera into MeMyselfEye, as follows: In the Project panel, find CardboardMain in the Assets/Cardboard/Prefabs folder. Drag it onto the MeMyselfEye object in the Hierarchy panel so that it's a child object. With CardboardMain selected in Hierarchy, look at the Inspector panel and ensure the Tap is Trigger checkbox is checked. Select the Main Camera in the Hierarchy panel (inside MeMyselfEye) and disable it by unchecking the Enable checkbox on the upper left of its Inspector panel. Finally, apply theses changes back onto the prefab, as follows: In the Hierarchy panel, select the MeMyselfEye object. Then, in its Inspector panel, next to Prefab, click on the Apply button. Save the scene. We now have replaced the default Main Camera with the VR one. The build settings If you know how to build and install from Unity to your mobile phone, doing it for Cardboard is pretty much the same: From the main menu bar, navigate to File | Build Settings.... If the current scene is not listed under Scenes to Build, click on Add Current. Choose Android or iOS from the Platform list on the left and click on Switch Platform. Then, click on Player Settings… in the Inspector panel. For Android, ensure that Other Settings | Virtual Reality Supported is unchecked, as that would be for GearVR (via the Oculus drivers), not Cardboard Android! Navigate to Other Settings | PlayerSettings.bundleIdentifier and enter a valid string, such as com.YourName.VRisAwesome. Under Resolution and Presentation | Default Orientation set Landscape Left. Play Mode To test it out, you do not need your phone connected. Just press the game's Play button at the top of the application in the center to enter Play Mode. You will see the split screen stereographic views in the Game view panel. While in Play Mode, you can simulate the head movement if you were viewing it with the Cardboard headset. Use Alt + mouse-move to pan and tilt forward or backwards. Use Ctrl + mouse-move to tilt your head from side to side. You can also simulate magnetic clicks (we'll talk more about user input in a later article) with mouse clicks. Note that since this emulates running on a phone, without a keyboard, the keyboard keys that we used to move Ethan do not work now. Building and running in Android To build your game as a separate executable app, perform the following steps: From the main menu bar, navigate to File | Build & Run. Set the name of the build. I like to keep my builds in a subdirectory named Build; you can create one if you want. Click on Save. This will generate an Android executable .apk file, and then install the app onto your phone. The following screenshot shows the Diorama scene running on an Android phone with Cardboard (and Unity development monitor in the background). Building and running in iOS To build your game and run it on the iPhone, perform the following steps: Plug your phone into the computer via a USB cable/port. From the main menu bar, navigate to File | Build & Run. This allows you to create an Xcode project, launch Xcode, build your app inside Xcode, and then install the app onto your phone. Antique Stereograph (source https://www.pinterest.com/pin/493073859173951630/) The device-independent clicker At the time of writing this article, VR input has not yet been settled across all platforms. Input devices may or may not fit under Unity's own Input Manager and APIs. In fact, input for VR is a huge topic and deserves its own book. So here, we will keep it simple. As a tribute to the late Steve Jobs and a throwback to the origins of Apple Macintosh, I am going to limit these projects to mostly one-click inputs! Let's write a script for it, which checks for any click on the keyboard, mouse, or other managed device: In the Project panel, select the top-level Assets folder. Right-click and navigate to Create | Folder. Name it Scripts. With the Scripts folder selected, right-click and navigate to Create | C# Script. Name it Clicker. Double-click on the Clicker.cs file in the Projects panel to open it in the MonoDevelop editor. Now, edit the Script file, as follows: using UnityEngine; using System.Collections; public class Clicker { public bool clicked() { return Input.anyKeyDown; } } Save the file. If you are developing for Google Cardboard, you can add a check for the Cardboard's integrated trigger when building for mobile devices, as follows: using UnityEngine; using System.Collections; public class Clicker { public bool clicked() { #if (UNITY_ANDROID || UNITY_IPHONE) return Cardboard.SDK.CardboardTriggered; #else return Input.anyKeyDown; #endif } } Any scripts that we write that require user clicks will use this Clicker file. The idea is that we've isolated the definition of a user click to a single script, and if we change or refine it, we only need to change this file. How virtual reality really works So, with your headset on, you experienced the diorama! It appeared 3D, it felt 3D, and maybe you even had a sense of actually being there inside the synthetic scene. I suspect that this isn't the first time you've experienced VR, but now that we've done it together, let's take a few minutes to talk about how it works. The strikingly obvious thing is, VR looks and feels really cool! But why? Immersion and presence are the two words used to describe the quality of a VR experience. The Holy Grail is to increase both to the point where it seems so real, you forget you're in a virtual world. Immersion is the result of emulating the sensory inputs that your body receives (visual, auditory, motor, and so on). This can be explained technically. Presence is the visceral feeling that you get being transported there—a deep emotional or intuitive feeling. You can say that immersion is the science of VR, and presence is the art. And that, my friend, is cool. A number of different technologies and techniques come together to make the VR experience work, which can be separated into two basic areas: 3D viewing Head-pose tracking In other words, displays and sensors, like those built into today's mobile devices, are a big reason why VR is possible and affordable today. Suppose the VR system knows exactly where your head is positioned at any given moment in time. Suppose that it can immediately render and display the 3D scene for this precise viewpoint stereoscopically. Then, wherever and whenever you moved, you'd see the virtual scene exactly as you should. You would have a nearly perfect visual VR experience. That's basically it. Ta-dah! Well, not so fast. Literally. Stereoscopic 3D viewing Split-screen stereography was discovered not long after the invention of photography, like the popular stereograph viewer from 1876 shown in the following picture (B.W. Kilborn & Co, Littleton, New Hampshire, see http://en.wikipedia.org/wiki/Benjamin_W._Kilburn). A stereo photograph has separate views for the left and right eyes, which are slightly offset to create parallax. This fools the brain into thinking that it's a truly three-dimensional view. The device contains separate lenses for each eye, which let you easily focus on the photo close up. Similarly, rendering these side-by-side stereo views is the first job of the VR-enabled camera in Unity. Let's say that you're wearing a VR headset and you're holding your head very still so that the image looks frozen. It still appears better than a simple stereograph. Why? The old-fashioned stereograph has twin relatively small images rectangularly bound. When your eye is focused on the center of the view, the 3D effect is convincing, but you will see the boundaries of the view. Move your eyeballs around (even with the head still), and any remaining sense of immersion is totally lost. You're just an observer on the outside peering into a diorama. Now, consider what an Oculus Rift screen looks like without the headset (see the following screenshot): The first thing that you will notice is that each eye has a barrel shaped view. Why is that? The headset lens is a very wide-angle lens. So, when you look through it you have a nice wide field of view. In fact, it is so wide (and tall), it distorts the image (pincushion effect). The graphics software (SDK) does an inverse of that distortion (barrel distortion) so that it looks correct to us through the lenses. This is referred to as an ocular distortion correction. The result is an apparent field of view (FOV), that is wide enough to include a lot more of your peripheral vision. For example, the Oculus Rift DK2 has a FOV of about 100 degrees. Also of course, the view angle from each eye is slightly offset, comparable to the distance between your eyes, or the Inter Pupillary Distance (IPD). IPD is used to calculate the parallax and can vary from one person to the next. (Oculus Configuration Utility comes with a utility to measure and configure your IPD. Alternatively, you can ask your eye doctor for an accurate measurement.) It might be less obvious, but if you look closer at the VR screen, you see color separations, like you'd get from a color printer whose print head is not aligned properly. This is intentional. Light passing through a lens is refracted at different angles based on the wavelength of the light. Again, the rendering software does an inverse of the color separation so that it looks correct to us. This is referred to as a chromatic aberration correction. It helps make the image look really crisp. Resolution of the screen is also important to get a convincing view. If it's too low-res, you'll see the pixels, or what some refer to as a screen door effect. The pixel width and height of the display is an oft-quoted specification when comparing the HMD's, but the pixels per inch (ppi) value may be more important. Other innovations in display technology such as pixel smearing and foveated rendering (showing a higher-resolution detail exactly where the eyeball is looking) will also help reduce the screen door effect. When experiencing a 3D scene in VR, you must also consider the frames per second (FPS). If FPS is too slow, the animation will look choppy. Things that affect FPS include the graphics processor (GPU) performance and complexity of the Unity scene (number of polygons and lighting calculations), among other factors. This is compounded in VR because you need to draw the scene twice, once for each eye. Technology innovations, such as GPUs optimized for VR, frame interpolation and other techniques, will improve the frame rates. For us developers, performance-tuning techniques in Unity, such as those used by mobile game developers, can be applied in VR. These techniques and optics help make the 3D scene appear realistic. Sound is also very important—more important than many people realize. VR should be experienced while wearing stereo headphones. In fact, when the audio is done well but the graphics are pretty crappy, you can still have a great experience. We see this a lot in TV and cinema. The same holds true in VR. Binaural audio gives each ear its own stereo view of a sound source in such a way that your brain imagines its location in 3D space. No special listening devices are needed. Regular headphones will work (speakers will not). For example, put on your headphones and visit the Virtual Barber Shop at https://www.youtube.com/watch?v=IUDTlvagjJA. True 3D audio, such as VisiSonics (licensed by Oculus), provides an even more realistic spatial audio rendering, where sounds bounce off nearby walls and can be occluded by obstacles in the scene to enhance the first-person experience and realism. Lastly, the VR headset should fit your head and face comfortably so that it's easy to forget that you're wearing it and should block out light from the real environment around you. Head tracking So, we have a nice 3D picture that is viewable in a comfortable VR headset with a wide field of view. If this was it and you moved your head, it'd feel like you have a diorama box stuck to your face. Move your head and the box moves along with it, and this is much like holding the antique stereograph device or the childhood View Master. Fortunately, VR is so much better. The VR headset has a motion sensor (IMU) inside that detects spatial acceleration and rotation rate on all three axes, providing what's called the six degrees of freedom. This is the same technology that is commonly found in mobile phones and some console game controllers. Mounted on your headset, when you move your head, the current viewpoint is calculated and used when the next frame's image is drawn. This is referred to as motion detection. Current motion sensors may be good if you wish to play mobile games on a phone, but for VR, it's not accurate enough. These inaccuracies (rounding errors) accumulate over time, as the sensor is sampled thousands of times per second, one may eventually lose track of where you are in the real world. This drift is a major shortfall of phone-based VR headsets such as Google Cardboard. It can sense your head motion, but it loses track of your head position. High-end HMDs account for drift with a separate positional tracking mechanism. The Oculus Rift does this with an inside-out positional tracking, where an array of (invisible) infrared LEDs on the HMD are read by an external optical sensor (infrared camera) to determine your position. You need to remain within the view of the camera for the head tracking to work. Alternatively, the Steam VR Vive Lighthouse technology does an outside-in positional tracking, where two or more dumb laser emitters are placed in the room (much like the lasers in a barcode reader at the grocery checkout), and an optical sensor on the headset reads the rays to determine your position. Either way, the primary purpose is to accurately find the position of your head (and other similarly equipped devices, such as handheld controllers). Together, the position, tilt, and the forward direction of your head—or the head pose—is used by the graphics software to redraw the 3D scene from this vantage point. Graphics engines such as Unity are really good at this. Now, let's say that the screen is getting updated at 90 FPS, and you're moving your head. The software determines the head pose, renders the 3D view, and draws it on the HMD screen. However, you're still moving your head. So, by the time it's displayed, the image is a little out of date with respect to your then current position. This is called latency, and it can make you feel nauseous. Motion sickness caused by latency in VR occurs when you're moving your head and your brain expects the world around you to change exactly in sync. Any perceptible delay can make you uncomfortable, to say the least. Latency can be measured as the time from reading a motion sensor to rendering the corresponding image, or the sensor-to-pixel delay. According to Oculus' John Carmack: "A total latency of 50 milliseconds will feel responsive, but still noticeable laggy. 20 milliseconds or less will provide the minimum level of latency deemed acceptable." There are a number of very clever strategies that can be used to implement latency compensation. The details are outside the scope of this article and inevitably will change as device manufacturers improve on the technology. One of these strategies is what Oculus calls the timewarp, which tries to guess where your head will be by the time the rendering is done, and uses that future head pose instead of the actual, detected one. All of this is handled in the SDK, so as a Unity developer, you do not have to deal with it directly. Meanwhile, as VR developers, we need to be aware of latency as well as the other causes of motion sickness. Latency can be reduced by faster rendering of each frame (keeping the recommended FPS). This can be achieved by discouraging the moving of your head too quickly and using other techniques to make the user feel grounded and comfortable. Another thing that the Rift does to improve head tracking and realism is that it uses a skeletal representation of the neck so that all the rotations that it receives are mapped more accurately to the head rotation. An example of this is looking down at your lap makes a small forward translation since it knows it's impossible to rotate one's head downwards on the spot. Other than head tracking, stereography and 3D audio, virtual reality experiences can be enhanced with body tracking, hand tracking (and gesture recognition), locomotion tracking (for example, VR treadmills), and controllers with haptic feedback. The goal of all of this is to increase your sense of immersion and presence in the virtual world. Summary In this article, we discussed the different levels of device integration software and then installed the software that is appropriate for your target VR device. We also discussed what happens inside the hardware and software SDK that makes virtual reality work and how it matters to us VR developers. For more information on VR development and Unity refer to the following Packt books: Unity UI Cookbook, by Francesco Sapio: https://www.packtpub.com/game-development/unity-ui-cookbook Building a Game with Unity and Blender, by Lee Zhi Eng: https://www.packtpub.com/game-development/building-game-unity-and-blender Augmented Reality with Kinect, by Rui Wang: https://www.packtpub.com/application-development/augmented-reality-kinect Resources for Article: Further resources on this subject: Virtually Everything for Everyone [article] Unity Networking – The Pong Game [article] Getting Started with Mudbox 2013 [article]

0
0
15808

article-image-cross-platform-solution-xamarinforms-and-mvvm-architecture

Packt

22 Feb 2016

9 min read

A cross-platform solution with Xamarin.Forms and MVVM architecture

Packt

22 Feb 2016

9 min read

In this article by George Taskos, the author of the book, Xamarin Cross Platform Development Cookbook, we will discuss a cross-platform solution with Xamarin.Forms and MVVM architecture. Creating a cross-platform solution correctly requires a lot of things to be taken under consideration. In this article, we will quickly provide you with a starter MVVM architecture showing data retrieved over the network in a ListView control. (For more resources related to this topic, see here.) How to do it... In Xamarin Studio, click on File | New | Xamarin.Forms App. Provide the name XamFormsMVVM. Add the NuGet dependencies by right-clicking on each project in the solution and choosing Add | Add NuGet Packages…. Search for the packages XLabs.Forms and modernhttpclient, and install them. Repeat step 2 for the XamFormsMVVM portable class library and add the packages Microsoft.Net.Http and Newtonsoft.Json. In the XamFormsMVVM portable class library, create the following folders: Models, ViewModels, and Views. To create a folder, right-click on the project and select Add | New Folder. Right-click on the Models folder and select Add | New File…, choose the General | Empty Interface template, name it IDataService, and click on New, and add the following code: public interface IDataService { Task<IEnumerable<OrderModel>> GetOrdersAsync (); } Right-click on the Models folder again and select Add | New File…, choose the General | Empty Class template, name it DataService, and click on New, and add the following code: [assembly: Xamarin.Forms.Dependency (typeof (DataService))] namespace XamFormsMVVM{ public class DataService : IDataService { protected const string BaseUrlAddress = @"https://api.parse.com/1/classes"; protected virtual HttpClient GetHttpClient() { HttpClient httpClient = new HttpClient(new NativeMessageHandler()); httpClient.BaseAddress = new Uri(BaseUrlAddress); httpClient.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue ("application/json")); return httpClient; } public async Task<IEnumerable<OrderModel>> GetOrdersAsync () { using (HttpClient client = GetHttpClient ()) { HttpRequestMessage requestMessage = new HttpRequestMessage(HttpMethod.Get, client.BaseAddress + "/Order"); requestMessage.Headers.Add("X-Parse- Application-Id", "fwpMhK1Ot1hM9ZA4iVRj49VFz DePwILBPjY7wVFy"); requestMessage.Headers.Add("X-Parse-REST- API-Key", "egeLQVTC7IsQJGd8GtRj3ttJV RECIZgFgR2uvmsr"); HttpResponseMessage response = await client.SendAsync(requestMessage); response.EnsureSuccessStatusCode (); string ordersJson = await response.Content.ReadAsStringAsync(); JObject jsonObj = JObject.Parse (ordersJson); JArray ordersResults = (JArray)jsonObj ["results"]; return JsonConvert.DeserializeObject <List<OrderModel>> (ordersResults.ToString ()); } } } } Right-click on the Models folder and select Add | New File…, choose the General | Empty Interface template, name it IDataRepository, and click on New, and add the following code: public interface IDataRepository { Task<IEnumerable<OrderViewModel>> GetOrdersAsync (); } Right-click on the Models folder and select Add | New File…, choose the General | Empty Class template, name it DataRepository, and click on New, and add the following code in that file: [assembly: Xamarin.Forms.Dependency (typeof (DataRepository))] namespace XamFormsMVVM { public class DataRepository : IDataRepository { private IDataService DataService { get; set; } public DataRepository () : this(DependencyService.Get<IDataService> ()) { } public DataRepository (IDataService dataService) { DataService = dataService; } public async Task<IEnumerable<OrderViewModel>> GetOrdersAsync () { IEnumerable<OrderModel> orders = await DataService.GetOrdersAsync ().ConfigureAwait (false); return orders.Select (o => new OrderViewModel (o)); } } } In the ViewModels folder, right-click on Add | New File… and name it OrderViewModel. Add the following code in that file: public class OrderViewModel : XLabs.Forms.Mvvm.ViewModel { string _orderNumber; public string OrderNumber { get { return _orderNumber; } set { SetProperty (ref _orderNumber, value); } } public OrderViewModel (OrderModel order) { OrderNumber = order.OrderNumber; } public override string ToString () { return string.Format ("[{0}]", OrderNumber); } } Repeat step 5 and create a class named OrderListViewModel.cs: public class OrderListViewModel : XLabs.Forms.Mvvm.ViewModel{ protected IDataRepository DataRepository { get; set; } ObservableCollection<OrderViewModel> _orders; public ObservableCollection<OrderViewModel> Orders { get { return _orders; } set { SetProperty (ref _orders, value); } } public OrderListViewModel () : this(DependencyService.Get<IDataRepository> ()) { } public OrderListViewModel (IDataRepository dataRepository) { DataRepository = dataRepository; DataRepository.GetOrdersAsync ().ContinueWith (antecedent => { if (antecedent.Status == TaskStatus.RanToCompletion) { Orders = new ObservableCollection<OrderViewModel> (antecedent.Result); } }, TaskScheduler. FromCurrentSynchronizationContext ()); } } Right-click on the Views folder and choose Add | New File…, select the Forms | Forms Content Page Xaml, name it OrderListView, and click on New: <?xml version="1.0" encoding="UTF-8"?> <ContentPage x_Class="XamFormsMVVM.OrderListView" Title="Orders"> <ContentPage.Content> <ListView ItemsSource="{Binding Orders}"/> </ContentPage.Content> </ContentPage> Go to XmaFormsMVVM.cs and replace the contents with the following code: public App() { if (!Resolver.IsSet) { SetIoc (); } RegisterViews(); MainPage = new NavigationPage((Page)ViewFactory. CreatePage<OrderListViewModel, OrderListView>()); } private void SetIoc() { var resolverContainer = new SimpleContainer(); Resolver.SetResolver (resolverContainer.GetResolver()); } private void RegisterViews() { ViewFactory.Register<OrderListView, OrderListViewModel>(); } Run the application, and you will get results like the following screenshots: For Android: For iOS: How it works… A cross-platform solution should share as much logic and common operations as possible, such as retrieving and/or updating data in a local database or over the network, having your logic centralized, and coordinating components. With Xamarin.Forms, you even have a cross-platform UI, but this shouldn't stop you from separating the concerns correctly; the more abstracted you are from the user interface and programming against interfaces, the easier it is to adapt to changes and remove or add components. Starting with models and creating a DataService implementation class with its equivalent interface, IDataService retrieves raw JSON data over the network from the Parse API and converts it to a list of OrderModel, which are POCO classes with just one property. Every time you invoke the GetOrdersAsync method, you get the same 100 orders from the server. Notice how we used the Dependency attribute declaration above the namespace to instruct DependencyService that we want to register this implementation class for the interface. We took a step to improve the performance of the REST client API; although we do use the HTTPClient package, we pass a delegate handler, NativeMessageHandler, when constructing in the GetClient() method. This handler is part of the modernhttpclient NuGet package and it manages undercover to use a native REST API for each platform: NSURLSession in iOS and OkHttp in Android. The IDataService interface is used by the DataRepository implementation, which acts as a simple intermediate repository layer converting the POCO OrderModel received from the server in OrderViewModel instances. Any model that is meant to be used on a view is a ViewModel, the view's model, and also, when retrieving and updating data, you don't carry business logic. Only data logic that is known should be included as data transfer objects. Dependencies, such as in our case, where we have a dependency of IDataService for the DataRepository to work, should be clear to classes that will use the component, which is why we create a default empty constructor required from the XLabs ViewFactory class, but in reality, we always invoke the constructor that accepts an IDataService instance; this way, when we unit test this unit, we can pass our mock IDataService class and test the functionality of the methods. We are using the DependencyService class to register the implementation to its equivalent IDataRepository interface here as well. OrderViewModel inherits XLabs.Forms.ViewModel; it is a simple ViewModel class with one property raising property change notifications and accepting an OrderModel instance as a dependency in the default constructor. We override the ToString() method too for a default string representation of the object, which simplifies the ListView control without requiring us, in our example, to use a custom cell with DataTemplate. The second ViewModel in our architecture is the OrderListViewModel, which inherits XLabs.Forms.ViewModel too and has a dependency of IDataRepository, following the same pattern with a default constructor and a constructor with the dependency argument. This ViewModel is responsible for retrieving a list of OrderViewModel and holding it to an ObservableCollection<OrderViewModel> instance that raises collection change notifications. In the constructor, we invoke the GetOrdersAsync() method and register an action delegate handler to be invoked on the main thread when the task has finished passing the orders received in a new ObservableCollection<OrderViewModel> instance set to the Orders property. The view of this recipe is super simple: in XAML, we set the title property which is used in the navigation bar for each platform and we leverage the built-in data-binding mechanism of Xamarin.Forms to bind the Orders property in the ListView ItemsSource property. This is how we abstract the ViewModel from the view. But we need to provide a BindingContext class to the view while still not coupling the ViewModel to the view, and Xamarin Forms Labs is a great framework for filling the gap. XLabs has a ViewFactory class; with this API, we can register the mapping between a view and a ViewModel, and the framework will take care of injecting our ViewModel into the BindingContext class of the view. When a page is required in our application, we use the ViewFactory.CreatePage class, which will construct and provide us with the desired instance. Xamarin Forms Labs uses a dependency resolver internally; this has to be set up early in the application startup entry point, so it is handled in the App.cs constructor. Run the iOS application in the simulator or device and in your preferred Android emulator or device; the result is the same with the equivalent native themes for each platform. Summary Xamarin.Forms is a great cross-platform UI framework that you can use to describe your user interface code declaratives in XAML, and it will be translated into the equivalent native views and pages with the ability of customizing each native application layer. Xamarin.Forms and MVVM are made for each other; the pattern fits naturally into the design of native cross-platform mobile applications and abstracts the view from the data easy using the built-in data-binding mechanism. Resources for Article: Further resources on this subject: Code Sharing Between iOS and Android [Article] Working with Xamarin.Android [Article] Sharing with MvvmCross [Article]

0
0
16314

article-image-working-commands-and-plugins

Packt

22 Feb 2016

26 min read

Working with Commands and Plugins

Packt

22 Feb 2016

26 min read

0
0
5735

article-image-customizing-heat-maps-intermediate

Packt

22 Feb 2016

11 min read

Customizing heat maps (Intermediate)

Packt

22 Feb 2016

11 min read

This article will help you explore more advanced functions to customize the layout of the heat maps. The main focus lies on the usage of different color palettes, but we will also cover other useful features, such as cell notes that will be used in this recipe. (For more resources related to this topic, see here.) To ensure that our heat maps look good in any situation, we will make use of different color palettes in this recipe, and we will even learn how to create our own. Further, we will add some more extras to our heat maps including visual aids such as cell note labels, which will make them even more useful and accessible as a tool for visual data analysis. The following image shows a heat map with cell notes and an alternative color palette created from the arabidopsis_genes.csv data set: Getting ready Download the 5644OS_03_01.r script and the Arabidopsis_genes.csv data set from your account at http://www.packtpub.com and save it to your hard drive. I recommend that you save the script and data file to the same folder on your hard drive. If you execute the script from a different location to the data file, you will have to change the current R working directory accordingly. The script will check automatically if any additional packages need to be installed in R. How to do it... Execute the following code in R via the 5644OS_03_01.r script and take a look at the PDF file custom_heatmaps.pdf that will be created in the current working directory: ### loading packages if (!require("gplots")) { install.packages("gplots", dependencies = TRUE) library(RColorBrewer) } if (!require("RColorBrewer")) { install.packages("RColorBrewer", dependencies = TRUE) library(RColorBrewer) } ### reading in data gene_data <- read.csv("arabidopsis_genes.csv") row_names <- gene_data[,1] gene_data <- data.matrix(gene_data[,2:ncol(gene_data)]) rownames(gene_data) <- row_names ### setting heatmap.2() default parameters heat2 <- function(...) heatmap.2(gene_data, tracecol = "black", dendrogram = "column", Rowv = NA, trace = "none", margins = c(8,10), density.info = "density", ...) pdf("custom_heatmaps.pdf") ### 1) customizing colors # 1.1) in-built color palettes heat2(col = terrain.colors(n = 1000), main = "1.1) Terrain Colors") # 1.2) RColorBrewer palettes heat2(col = brewer.pal(n = 9, "YlOrRd"), main = "1.2) Brewer Palette") # 1.3) creating own color palettes my_colors <- c(y1 = "#F7F7D0", y2 = "#FCFC3A", y3 = "#D4D40D", b1 = "#40EDEA", b2 = "#18B3F0", b3 = "#186BF0", r1 = "#FA8E8E", r2 = "#F26666", r1 = "#C70404") heat2(col = my_colors, main = "1.3) Own Color Palette") my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000) heat2(col = my_palette, main = "1.3) ColorRampPalette") # 1.4) gray scale heat2(col = gray(level = (0:100)/100), main ="1.4) Gray Scale") ### 2) adding cell notes fold_change <- 2^gene_data rounded_fold_changes <- round(rounded_fold_changes, 2) heat2(cellnote = rounded, notecex = 0.5, notecol = "black", col = my_palette, main = "2) Cell Notes") ### 3) adding column side colors heat2(ColSideColors = c("red", "gray", "red", rep("green",13)), main = "3) ColSideColors") dev.off() How it works... Primarily, we will be using read.csv() and heatmap.2() to read in data into R and construct our heat maps. In this recipe, however, we will focus on advanced features to enhance our heat maps, such as customizing color and other visual elements: Inspecting the arabidopsis_genes.csv data set: The arabidopsis_genes.csv file contains a compilation of gene expression data from the model plant Arabidopsis thaliana. I obtained the freely available data of 16 different genes as log 2 ratios of target and reference gene from the Arabidopsis eFP Browser (http://bar.utoronto.ca/efp_arabidopsis/). For each gene, expression data of 47 different areas of the plant is available in this data file. Reading the data and converting it into a numeric matrix: We have to convert the data table into a numeric matrix first before we can construct our heat maps: gene_data <- read.csv("arabidopsis_genes.csv") row_names <- gene_data[,1] gene_data <- data.matrix(gene_data[,2:ncol(gene_data)]) rownames(gene_data) <- row_names Creating a customized heatmap.2() function: To reduce typing efforts, we are defining our own version of the heatmap.2() function now, where we will include some arguments that we are planning to keep using throughout this recipe: heat2 <- function(...) heatmap.2(gene_data, tracecol = "black", dendrogram = "column", Rowv = NA, trace = "none", margins = c(8,10), density.info = "density", ...) So, each time we call our newly defined heat2() function, it will behave similar to the heatmap.2() function, except for the additional arguments that we will pass along. We also include a new argument, black, for the tracecol parameter, to better distinguish the density plot in the color key from the background. The built-in color palettes: There are four more color palettes available in the base R that we could use instead of the heat.colors palette: rainbow, terrain.colors, topo.colors, and cm.colors. So let us make use of the terrain.colors color palette now, which will give us a nice color transition from green over yellow to rose: heat2(col = terrain.colors(n = 1000), main = "1.1) Terrain Colors") Every number for the parameter n that is larger than the default value 12 will add additional colors, which will make the transition smoother. A value of 1000 for the n parameter should be more than sufficient to make the transition between the individual colors indistinguishable to the human eye. The following image shows a side-by-side comparison of the heat.colors and terrain.colors color palettes using a different number of color shades: Further, it is also possible to reverse the direction of the color transition. For example, if we want to have a heat.color transition from yellow to red instead of red to yellow in our heat map, we could simply define a reverse function: rev_heat.colors <- function(x) rev(heat.colors(x)) heat2(col = rev_heat.colors(500)) RColorBrewer palettes: A lot of color palettes are available from the RColorBrewer package. To see how they look like, you can type display.brewer.all() into the R command-line after loading the RColorBrewer package. However, in contrast to the dynamic range color palettes that we have seen previously, the RColorBrewer palettes have a distinct number of different colors. So to select all nine colors from the YlOrRd palette, a gradient from yellow to red, we use the following command: heat2(col = brewer.pal(n = 9, "YlOrRd"), main = "1.2) Brewer Palette") The following image gives you a good overview of all the different color palettes that are available from the RColorBrewer package: Creating our own color palettes: Next, we will see how we can create our own color palettes. A whole bunch of different colors are already defined in R. An overview of those colors can be seen by typing colors() into the command line of R. The most convenient way to assign new colors to a color palette is using hex colors (hexadecimal colors). Many different online tools are freely available that allow us to obtain the necessary hex codes. A great example is color picker (http://www.colorpicker.com), which allows us to choose from a rich color table and provides us with the corresponding hex codes. Once we gather all the hexadecimal codes for the colors that we want to use for our color palette, we can assign them to a variable as we have done before with the explicit color names: my_colors <- c(y1 = "#F7F7D0", y2 = "#FCFC3A", y3 = "#D4D40D", b1 = "#40EDEA", b2 = "#18B3F0", b3 = "#186BF0", r1 = "#FA8E8E", r2 = "#F26666", r1 = "#C70404") heat2(col = my_colors, main = "1.3) Own Color Palette") This is a very handy approach for creating a color key with very distinct colors. However, the downside of this method is that we have to provide a lot of different colors if we want to create a smooth color gradient; we have used 1000 different colors for the terrain.color() palette to get a smooth transition in the color key! Using colorRampPalette for smoother color gradients: A convenient approach to create a smoother color gradient is to use the colorRampPalette() function, so we don't have to insert all the different colors manually. The function takes a vector of different colors as an argument. Here, we provide three colors: blue for the lower end of the color key, yellow for the middle range, and red for the higher end. As we did it for the in-built color palettes, such as heat.color, we assign the value 1000 to the n parameter: my_palette <- colorRampPalette(c("blue", "yellow", "red"))(n = 1000) heat2(col = my_palette, main = "1.3) ColorRampPalette") In this case, it is more convenient to use discrete color names over hex colors, since we are using the colorRampPalette() function to create a gradient and do not need all the different shades of a particular color. Grayscales: It might happen that the medium or device that we use to display our heat maps does not support colors. Under these circumstances, we can use the gray palette to create a heat map that is optimized for those conditions. The level parameter of the gray() function takes a vector with values between 0 and 1 as an argument, where 0 represents black and 1 represents white, respectively. For a smooth gradient, we use a vector with 100 equally spaced shades of gray ranging from 0 to 1. heat2(col = gray(level = (0:200)/200), main ="1.4) Gray Scale") We can make use of the same color palettes for the levelplot() function too. It works in a similar way as it did for the heatmap.2() function that we are using in this recipe. However, inside the levelplot() function call, we must use col.regions instead of the simple col, so that we can include a color palette argument. Adding cell notes to our heat map: Sometimes, we want to show a data set along with our heat map. A neat way is to use so-called cell notes to display data values inside the individual heat map cells. The underlying data matrix for the cell notes does not necessarily have to be the same numeric matrix we used to construct our heat map, as long as it has the same number of rows and columns. As we recall, the data we read from arabidopsis_genes.csv resembles log 2 ratios of sample and reference gene expression levels. Let us calculate the fold changes of the gene expression levels now and display them—rounded to two digits after the decimal point—as cell notes on our heat map: fold_change <- 2^gene_data rounded_fold_changes <- round(fold_change, 2) heat2(cellnote = rounded_fold_changes, notecex = 0.5, notecol = "black", col = rev_heat.colors, main = "Cell Notes") The notecex parameter controls the size of the cell notes. Its default size is 1, and every argument between 0 and 1 will make the font smaller, whereas values larger than 1 will make the font larger. Here, we decreased the font size of the cell notes by 50 percent to fit it into the cell boundaries. Also, we want to display the cell notes in black to have a nice contrast to the colored background; this is controlled by the notecol parameter. Row and column side colors: Another approach to pronounce certain regions, that is, rows or columns on the heat map is to make use of row and column side colors. The ColSideColors argument will place a colored box between the dendrogram and heat map that can be used to annotate certain columns. We pass our vector with colors to ColSideColors, where its length must be equal to the number of columns of the heat map. Here, we want to color the first and third column red, the second one gray, and all the remaining 13 columns green: heat2(ColSideColors = c("red", "gray", "red", rep("green", 13)), main = "ColSideColors") You can see in the following image how the column side colors look like when we include the ColSideColors argument as shown previously: Attentive readers may have noticed that the order of colors in the column color box slightly differs from the order of colors we passed as a vector to ColSideColors. We see red two times next to each other, followed by a green and a gray box. This is due to the fact that the columns of our heat map have been reordered by the hierarchical clustering algorithm. Summary To learn more about the similar technology, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Instant R Starter (https://www.packtpub.com/big-data-and-business-intelligence/instant-r-starter-instant) Machine Learning with R - Second Edition (https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-r-second-edition) Mastering RStudio – Develop, Communicate, and Collaborate with R (https://www.packtpub.com/application-development/mastering-rstudio-%E2%80%93-develop-communicate-and-collaborate-r) Resources for Article: Further resources on this subject: Data Analysis Using R[article] Big Data Analysis[article] Big Data Analysis (R and Hadoop)[article]

0
0
4540

article-image-architectural-and-feature-overview

Packt

22 Feb 2016

12 min read

Architectural and Feature Overview

Packt

22 Feb 2016

12 min read

0
0
9571

article-image-what-naive-bayes-classifier

Packt

22 Feb 2016

9 min read

What is Naïve Bayes classifier?

Packt

22 Feb 2016

9 min read

The name Naïve Bayes comes from the basic assumption in the model that the probability of a particular feature Xi is independent of any other feature Xj given the class label CK. This implies the following: Using this assumption and the Bayes rule, one can show that the probability of class CK, given features {X1,X2,X3,...,Xn}, is given by: Here, P(X1,X2,X3,...,Xn) is the normalization term obtained by summing the numerator on all the values of k. It is also called Bayesian evidence or partition function Z. The classifier selects a class label as the target class that maximizes the posterior class probability P(CK |{X1,X2,X3,...,Xn}): The Naïve Bayes classifier is a baseline classifier for document classification. One reason for this is that the underlying assumption that each feature (words or m-grams) is independent of others, given the class label typically holds good for text. Another reason is that the Naïve Bayes classifier scales well when there is a large number of documents. There are two implementations of Naïve Bayes. In Bernoulli Naïve Bayes, features are binary variables that encode whether a feature (m-gram) is present or absent in a document. In multinomial Naïve Bayes, the features are frequencies of m-grams in a document. To avoid issues when the frequency is zero, a Laplace smoothing is done on the feature vectors by adding a 1 to each count. Let's look at multinomial Naïve Bayes in some detail. Let ni be the number of times the feature Xi occurred in the class CK in the training data. Then, the likelihood function of observing a feature vector X={X1,X2,X3,..,Xn}, given a class label CK, is given by: Here, is the probability of observing the feature Xi in the class CK. Using Bayesian rule, the posterior probability of observing the class CK, given a feature vector X, is given by: Taking logarithm on both the sides and ignoring the constant term Z, we get the following: So, by taking logarithm of posterior distribution, we have converted the problem into a linear regression model with as the coefficients to be determined from data. This can be easily solved. Generally, instead of term frequencies, one uses TF-IDF (term frequency multiplied by inverse frequency) with the document length normalized to improve the performance of the model. The R package e1071 (Miscellaneous Functions of the Department of Statistics) by T.U. Wien contains an R implementation of Naïve Bayes. For this article, we will use the SMS spam dataset from the UCI Machine Learning repository (reference 1 in the References section of this article). The dataset consists of 425 SMS spam messages collected from the UK forum Grumbletext, where consumers can submit spam SMS messages. The dataset also contains 3375 normal (ham) SMS messages from the NUS SMS corpus maintained by the National University of Singapore. The dataset can be downloaded from the UCI Machine Learning repository (https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection). Let's say that we have saved this as file SMSSpamCollection.txt in the working directory of R (actually, you need to open it in Excel and save it is as tab-delimited file for it to read in R properly). Then, the command to read the file into the tm (text mining) package would be the following: >spamdata ←read.table("SMSSpamCollection.txt",sep="\t",stringsAsFactors = default.stringsAsFactors()) We will first separate the dependent variable y and independent variables x and split the dataset into training and testing sets in the ratio 80:20, using the following R commands: >samp←sample.int(nrow(spamdata),as.integer(nrow(spamdata)*0.2),replace=F) >spamTest ←spamdata[samp,] >spamTrain ←spamdata[-samp,] >ytrain←as.factor(spamTrain[,1]) >ytest←as.factor(spamTest[,1]) >xtrain←as.vector(spamTrain[,2]) >xtest←as.vector(spamTest[,2]) Since we are dealing with text documents, we need to do some standard preprocessing before we can use the data for any machine learning models. We can use the tm package in R for this purpose. In the next section, we will describe this in some detail. Text processing using the tm package The tm package has methods for data import, corpus handling, preprocessing, metadata management, and creation of term-document matrices. Data can be imported into the tm package either from a directory, a vector with each component a document, or a data frame. The fundamental data structure in tm is an abstract collection of text documents called Corpus. It has two implementations; one is where data is stored in memory and is called VCorpus (volatile corpus) and the second is where data is stored in the hard disk and is called PCorpus (permanent corpus). We can create a corpus of our SMS spam dataset by using the following R commands; prior to this, you need to install the tm package and SnowballC package by using the install.packages("packagename") command in R: >library(tm) >library(SnowballC) >xtrain ← VCorpus(VectorSource(xtrain)) First, we need to do some basic text processing, such as removing extra white space, changing all words to lowercase, removing stop words, and stemming the words. This can be achieved by using the following functions in the tm package: >#remove extra white space >xtrain ← tm_map(xtrain,stripWhitespace) >#remove punctuation >xtrain ← tm_map(xtrain,removePunctuation) >#remove numbers >xtrain ← tm_map(xtrain,removeNumbers) >#changing to lower case >xtrain ← tm_map(xtrain,content_transformer(tolower)) >#removing stop words >xtrain ← tm_map(xtrain,removeWords,stopwords("english")) >#stemming the document >xtrain ← tm_map(xtrain,stemDocument) Finally, the data is transformed into a form that can be consumed by machine learning models. This is the so called document-term matrix form where each document (SMS in this case) is a row, the terms appearing in all documents are the columns, and the entry in each cell denotes how many times each word occurs in one document: >#creating Document-Term Matrix >xtrain ← as.data.frame.matrix(DocumentTermMatrix(xtrain)) The same set of processes is done on the xtest dataset as well. The reason we converted y to factors and xtrain to a data frame is to match the input format for the Naïve Bayes classifier in the e1071 package. Model training and prediction You need to first install the e1071 package from CRAN. The naiveBayes() function can be used to train the Naïve Bayes model. The function can be called using two methods. The following is the first method: >naiveBayes(formula,data,laplace=0, ,subset,na.action=na.pass) Here formula stands for the linear combination of independent variables to predict the following class: >class ~ x1+x2+… Also, data stands for either a data frame or contingency table consisting of categorical and numerical variables. If we have the class labels as a vector y and dependent variables as a data frame x, then we can use the second method of calling the function, as follows: >naiveBayes(x,y,laplace=0,…) We will use the second method of calling in our example. Once we have a trained model, which is an R object of class naiveBayes, we can predict the classes of new instances as follows: >predict(object,newdata,type=c(class,raw),threshold=0.001,eps=0,…) So, we can train the Naïve Bayes model on our training dataset and score on the test dataset by using the following commands: >#Training the Naive Bayes Model >nbmodel ← naiveBayes(xtrain,ytrain,laplace=3) >#Prediction using trained model >ypred.nb ← predict(nbmodel,xtest,type = "class",threshold = 0.075) >#Converting classes to 0 and 1 for plotting ROC >fconvert ← function(x){ if(x == "spam"){ y ← 1} else {y ← 0} y } >ytest1 ← sapply(ytest,fconvert,simplify = "array") >ypred1 ← sapply(ypred.nb,fconvert,simplify = "array") >roc(ytest1,ypred1,plot = T) Here, the ROC curve for this model and dataset is shown. This is generated using the pROC package in CRAN: >#Confusion matrix >confmat ← table(ytest,ypred.nb) >confmat pred.nb ytest ham spam ham 143 139 spam 9 35 From the ROC curve and confusion matrix, one can choose the best threshold for the classifier, and the precision and recall metrics. Note that the example shown here is for illustration purposes only. The model needs be to tuned further to improve accuracy. We can also print some of the most frequent words (model features) occurring in the two classes and their posterior probabilities generated by the model. This will give a more intuitive feeling for the model exercise. The following R code does this job: >tab ← nbmodel$tables >fham ← function(x){ y ← x[1,1] y } >hamvec ← sapply(tab,fham,simplify = "array") >hamvec ← sort(hamvec,decreasing = T) >fspam ← function(x){ y ← x[2,1] y } >spamvec ← sapply(tab,fspam,simplify = "array") >spamvec ← sort(spamvec,decreasing = T) >prb ← cbind(spamvec,hamvec) >print.table(prb) The output table is as follows: word Prob(word|spam) Prob(word|ham) call 0.6994 0.4084 free 0.4294 0.3996 now 0.3865 0.3120 repli 0.2761 0.3094 text 0.2638 0.2840 spam 0.2270 0.2726 txt 0.2270 0.2594 get 0.2209 0.2182 stop 0.2086 0.2025 The table shows, for example, that given a document is spam, the probability of the word call appearing in it is 0.6994, whereas the probability of the same word appearing in a normal document is only 0.4084. Summary In this article, we learned a basic and popular method for classification, Naïve Bayes, implemented using the Bayesian approach. For further information on Bayesian models, you can refer to: https://www.packtpub.com/big-data-and-business-intelligence/data-analysis-r https://www.packtpub.com/big-data-and-business-intelligence/building-probabilistic-graphical-models-python Resources for Article: Further resources on this subject: Introducing Bayesian Inference [article] Practical Applications of Deep Learning [article] Machine learning in practice [article]

0
0
23340

How-To Tutorials

article-image-building-recommendation-system-azure

Packt

19 Feb 2016

7 min read

Building A Recommendation System with Azure

Packt

19 Feb 2016

7 min read

Recommender systems are common these days. You may not have noticed, but you might already be a user or receiver of such a system somewhere. Most of the well-performing e-commerce platforms use recommendation systems to recommend items to their users. When you see on the Amazon website that a book is recommended to you based on your earlier preferences, purchases, and browse history, Amazon is actually using such a recommendation system. Similarly, Netflix uses its recommendation system to suggest movies for you. (For more resources related to this topic, see here.) A recommender or recommendation system is used to recommend a product or information often based on user characteristics, preferences, history, and so on. So, a recommendation is always personalized. Until recently, it was not so easy or straightforward to build a recommender, but Azure ML makes it really easy to build one as long as you have your data ready. This article introduces you to the concept of recommendation systems and also the model available in ML Studio for you to build your own recommender system. It then walks you through the process of building a recommendation system with a simple example. The Matchbox recommender Microsoft has developed a large-scale recommender system based on a probabilistic model (Bayesian) called Matchbox. This model can learn about a user's preferences through observations made on how they rate items, such as movies, content, or other products. Based on those observations, it recommends new items to the users when requested. Matchbox uses the available data for each user in the most efficient way possible. The learning algorithm it uses is designed specifically for big data. However, its main feature is that Matchbox takes advantage of metadata available for both users and items. This means that the things it learns about one user or item can be transferred across to other users or items. You can find more information about the Matchbox model at the Microsoft Research project link. Kinds of recommendations The Matchbox recommender supports the building of four kinds of recommenders, which will include most of the scenarios. Let's take a look at the following list: Rating Prediction: This predicts ratings for a given user and item, for example, if a new movie is released, the system will predict what will be your rating for that movie out of 1-5. Item Recommendation: This recommends items to a given user, for example, Amazon suggests you books or YouTube suggests you videos to watch on its home page (especially when you are logged in). Related Users: This finds users that are related to a given user, for example, LinkedIn suggests people that you can get connected to or Facebook suggests friends to you. Related Items: This finds the items related to a given item, for example, a blog site suggests you related posts when you are reading a blog post. Understanding the recommender modules The Matchbox recommender comes with three components; as you might have guessed, a module each to train, score, and evaluate the data. The modules are described as follows. The train Matchbox recommender This module contains the algorithm and generates the trained algorithm, as shown in the following screenshot: This module takes the values for the following two parameters. The number of traits This value decides how many implicit features (traits) the algorithm will learn about that are related to every user and item. The higher this value, the precise it would be as it would lead to better prediction. Typically, it takes a value in the range of 2 to 20. The number of recommendation algorithm iterations It is the number of times the algorithm iterates over the data. The higher this value, the better would the predictions be. Typically, it takes a value in the range of 1 to 10. The score matchbox recommender This module lets you specify the kind of recommendation and corresponding parameters you want: Rating Prediction Item Prediction Related Users Related Items Let's take a look at the following screenshot: The ML Studio help page for the module provides details of all the corresponding parameters. The evaluate recommender This module takes a test and a scored dataset and generates evaluation metrics, as shown in the following screenshot: It also lets you specify the kind of recommendation, such as the score module and corresponding parameters. Building a recommendation system Now, it would be worthwhile that you learn to build one by yourself. We will build a simple recommender system to recommend restaurants to a given user. ML Studio includes three sample datasets, described as follows: Restaurant customer data: This is a set of metadata about customers, including demographics and preferences, for example, latitude, longitude, interest, and personality. Restaurant feature data: This is a set of metadata about restaurants and their features, such as food type, dining style, and location, for example, placeID, latitude, longitude, price. Restaurant ratings: This contains the ratings given by users to restaurants on a scale of 0 to 2. It contains the columns: userID, placeID, and rating. Now, we will build a recommender that will recommend a given number of restaurants to a user (userID). To build a recommender perform the following steps: Create a new experiment. In the Search box in the modules palette, type Restaurant. The preceding three datasets get listed. Drag them all to the canvas one after another. Drag a Split module and connect it to the output port of the Restaurant ratings module. On the properties section to the right, choose Splitting mode as Recommender Split. Leave the other parameters at their default values. Drag a Project Columns module to the canvas and select the columns: userID, latitude, longitude, interest, and personality. Similarly, drag another Project Columns module and connect it to the Restaurant feature data module and select the columns: placeID, latitude, longitude, price, the_geom_meter, and address, zip. Drag a Train Matchbox Recommender module to the canvas and make connections to the three input ports, as shown in the following screenshot: Drag a Score Matchbox Recommender module to the canvas and make connections to the three input ports and set the property's values, as shown in the following screenshot: Run the experiment and when it gets completed, right-click on the output of the Score Matchbox Recommender module and click on Visualize to explore the scored data. You can note the different restaurants (IDs) recommended as items for a user from the test dataset. The next step is to evaluate the scored prediction. Drag the Evaluate Recommender module to the canvas and connect the second output of the Split module to its first input port and connect the output of the Score Matchbox Recommender module to its second input. Leave the module at its default properties. Run the experiment again and when finished, right-click on the output port of the Evaluate Recommender module and click on Visualize to find the evaluation metric. The evaluation metric Normalized Discounted Cumulative Gain (NDCG) is estimated from the ground truth ratings given in the test set. Its value ranges from 0.0 to 1.0, where 1.0 represents the most ideal ranking of the entities. Summary You started with gaining the basic knowledge about a recommender system. You then understood the Matchbox recommender that comes with ML Studio along with its components. You also explored different kinds of recommendations that you can make with it. Finally, you ended up building a simple recommendation system to recommend restaurants to a given user. For more information on Azure, take a look at the following books also by Packt Publishing: Learning Microsoft Azure (https://www.packtpub.com/networking-and-servers/learning-microsoft-azure) Microsoft Windows Azure Development Cookbook (https://www.packtpub.com/application-development/microsoft-windows-azure-development-cookbook) Resources for Article: Further resources on this subject: Introduction to Microsoft Azure Cloud Services[article] Microsoft Azure – Developing Web API for Mobile Apps[article] Security in Microsoft Azure[article]

0
0
19211

Push your data to the Web

Training neural networks efficiently using Keras

Social Media Insight Using Naive Bayes

Dynamic Graphics

Concurrency and Parallelism with Swift 2

Component Composition

Publication of Apps

Adding a Spark to R

VR Build and Run

A cross-platform solution with Xamarin.Forms and MVVM architecture

Trending Topics

Working with Commands and Plugins

Customizing heat maps (Intermediate)

Architectural and Feature Overview

What is Naïve Bayes classifier?

Building A Recommendation System with Azure

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access